Dimensioning with Kafka

Debashish Pattnaik
2 min readApr 23, 2021

If you are new to Kafka and looking for information on how to dimension a Kafka cluster, probably this short blog might assist and guide you to some extent.

There are various factors that comes into the picture while trying to dimension a Kafka cluster.

Basically the expectation of dimensioning is to find out the below:

  1. How many brokers will suffice to the number of messages generated by applications (or Producers).
  2. What should be configuration of each Broker i.e. CPU, Memory, I/O Threads etc.
  3. How many Consumer Groups should be defined
  4. How many consumers per Consumer Groups must be defined
  5. How many Partitions per Topic to be configured

We must have the following input:

  1. Consumer(s) shall be able to consume about e.g. 100000 messages/sec
  2. A Producer can generate e.g. 10000 messages/sec

Here is the approach to be followed:

  1. Check the size of each message produced from the Producer. Each application generates messages and sends them to the Kafka broker. Message size/content may vary and this is one of the most important factor for dimensioning.
  2. ‘same size doesn’t fit all’ i.e. Its not a right approach if we simply copy the cluster architecture of another vendor/company because the message generated by applications may be completely different. Therefore we need to create Bench-Marking of our own applications by performing tests with a minimal cluster configuration e.g. create a Kafka cluster of 2 brokers, try to produce and consume as many messages as possible. From this approach, you should have an understanding of how many messages can a Producer and Consumer deal with comfortably. For example, we found producer can generate ~10000/sec and consumer can consume ~8000/sec.
  3. Now, calculate the maximum number of Producers required to generate the desired messages (Total producers = 100000/10000) i.e. nearly 10.
  4. Similarly, total number of consumers to consume the desired number of messages (i.e. 100000messages/sec) is: 100000/8000 = ~13.
  5. Now, find the maximum value between producers (i.e. 10) and consumers(i.e. 13). The maximum value is 13.
  6. Keep a factor of 2 (as safety), we conclude a number i.e. 13+2=15
  7. Therefore 15 Partitions, 12 Producers (10 +2) and 16 Consumers(+1).
  8. Number of consumers must be at least same as the number of Partitions.
  9. Number of brokers for the cluster should be proposed to 15.

Above is based on my study and experience. Incase if you find anything wrong, feel free to comment out. Thank you :).

--

--