ents
At least 3 Kafka brokers and 3 ZooKeeper nodes, spread across availability zones to ensure high availability and replication of data.
You MUST ensure that all topics are replicated across all availability zones of the cluster; otherwise, you risk making Vault vulnerable to downtime in the event of an availability zone failure.
Vault requires the appropriate permissions and Admin API access to create, update and delete Kafka topics and, optionally, Kafka ACLs.
Vault requires access to the __consumer_offsets topic to monitor the consumer lag of the Kafka processors. The Kubernetes HPA extension will use the consumer lag metric for scaling Vault Kafka processors (this does not impact scaling based on CPU and memory utilisation).
Possible workarounds include:
Manually adding a Kubernetes custom metric named max_service_consumer_group_lag that the Kubernetes HPA extension will use.
Pinning all Vault services to the max replicas specified in the Kubernetes Horizontal Pod Autoscaler resources.
Scaling based on memory and CPU utilisation only.
All Vault services MUST have network connectivity to every broker in the Kafka cluster. In practice, this means all Kubernetes nodes in the cluster that Vault is running on must have connectivity to every broker. Proxies, firewalls and network policies can also affect this.
Kafka brokers and ZooKeeper nodes that are sized appropriately in terms of resources. This includes CPU, RAM, disk size, disk IOPS, number of file descriptors, and network bandwidth (this list is not exhaustive). These settings are tightly correlated with the size of the Vault instance and whether the Kafka cluster is dedicated to only Vault only. You MUST have the ability to scale the cluster accordingly when the load increases.
The recommended minimal requirements per broker for a Vault-only dedicated Kafka cluster are:
CPU: 8 (prioritise number of vcores/cores over the speed of each)
RAM: 16GB
Disk: 1TiB SSD
IOPS: 4 IOPS per GB
Vault is not distributed with a solution for monitoring Kafka cluster brokers. It is your responsibility to leverage solutions for continuous monitoring of the Kafka cluster and ensure uptime.