Step-by-Step Guide: How to Set Up a Kafka Cluster for High-Performance Distributed Data Processing
blog best suited for Ubuntu 22.04. For different Linux distributions, some commands may vary. Users need to check commands for other Linux distributions.
Kafka & Zookeeper Installation Steps :-
In this blog, We are setting up 3 node clusters on Ubuntu 22.04.
Install Kafka on all nodes of the cluster. You can download Kafka from the Apache Kafka website. ( kafka.apache.org/downloads )
Need to have Java installed before installing Kafka:- (skip if already installed)
// update the apt repository
sudo apt-get update
// install jdk in local system
sudo apt install default-jdk -y
// check for java version
sudo java --version
- Create & export Java profile:-
// create java profile
echo 'JAVA_HOME="/usr/lib/jvm/java-11-openjdk-amd64/"' >> /etc/profile
// export java profile as env variable
export JAVA_HOME="/usr/lib/jvm/java-11-openjdk-amd64/"
- Create folders for Kafka & Zookeeper:- (/opt/data/ is our installation Dir.)
// go to root directory.
cd
// create folder in root directory.
sudo mkdir opt
// navigate to opt and create data folder.
sudo mkdir data
// navigate to data folder and create kafka & zookeeper folder.
sudo mkdir kafka zookeeper
// navigate to kafka folder
cd kafka/
// install Kafka & Zookeeper from the website link
sudo wget https://downloads.apache.org/kafka/3.6.1/kafka_2.12-3.6.1.tgz
// extract the downloaded .tgz file
sudo tar xzf kafka_2.12-3.6.1.tgz
// move extracted kafka.tgz folder to kafka dir => /opt/data/kafka
sudo mv kafka_2.12-3.6.1/* /opt/data/kafka
- Create files with zookeeper.service in systemd:-
// Create file with name "zookeeper.service" in this dir "/etc/systemd/system/"
sudo nano /etc/systemd/system/zookeeper.service
// copy the below content in the above created file.
[Unit]
Description=Apache Zookeeper server
Documentation=http://zookeeper.apache.org
Requires=network.target remote-fs.target
After=network.target remote-fs.target
[Service]
Type=simple
ExecStart=/opt/data/kafka/bin/zookeeper-server-start.sh /opt/data/kafka/config/zookeeper.properties
ExecStop=/opt/data/kafka/bin/zookeeper-server-stop.sh
Restart=on-abnormal
[Install]
WantedBy=multi-user.target
- Create files with kafka.service in systemd:-
// Create file with name "kafka.service" in this dir "/etc/systemd/system/"
sudo nano /etc/systemd/system/kafka.service
// copy the below content in the above created file.
[Unit]
Description=Apache Kafka Server
Documentation=http://kafka.apache.org/documentation.html
Requires=zookeeper.service
[Service]
Type=simple
Environment="JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64"
ExecStart=/opt/data/kafka/bin/kafka-server-start.sh /opt/data/kafka/config/server.properties
ExecStop=/opt/data/kafka/bin/kafka-server-stop.sh
[Install]
WantedBy=multi-user.target
- Daemon reload to load Kafka and zookeeper in systemctl:-
// reload daemon so that kafka & zookeeper
sudo systemctl daemon-reload
- Navigate to /opt/data/ and create a folder with the name zookeeper.
// Navigate to /opt/data/zookeeper and create file with myid.
cd /opt/data/zookeeper
// create myid value 1 (in server 1)
echo '1' > myid
// create myid value 2 (in server 2)
echo '2' > myid
// create myid value 3 (in server 3)
echo '3' > myid
- Update the zookeeper configuration file in zookeeper.properties.
// Navigate to the folder and edit the file
sudo nano /opt/data/kafka/config/zookeeper.properties
// in the zookeeper.propeties file edit the below fields
################### CONFIG_START ################
dataDir=/opt/data/zookeeper
clientPort=2181
admin.enableServer=false
maxClientCnxns=300
tickTime=2000
server.1=10.103.5.7:2888:3888 // server ip address of the node-1.
server.2=10.103.5.8:2888:3888 // server ip address of the node-2.
server.3=10.103.5.9:2888:3888 // server ip address of the node-3.
initLimit=40
syncLimit=20
################### CONFIG_END ################
Note:- Add Same Configuration Of Zookeeper In All Nodes. Nothing Specific Changes For Nodes.
- Update the Kafka configuration file in server.properties.
// Navigate to the folder and edit the file
sudo nano /opt/data/kafka/config/server.properties
// in the server.propeties file edit the below fields
################### CONFIG_START ################
// in the server basics block update the brocker id.
broker.id=0 // update broker.id=0 in node-1 server.properties files
broker.id=1 // update broker.id=1 in node-2 server.properties files
broker.id=2 // update broker.id=2 in node-3 server.properties files
// in the Socket Server Settings block update the listeners.
listeners=PLAINTEXT://10.103.5.7:9092 // update node ip address in node-1 server.properties files
listeners=PLAINTEXT://10.103.5.8:9092 // update node ip address in node-2 server.properties files
listeners=PLAINTEXT://10.103.5.9:9092 // update node ip address in node-3 server.properties files
// in the Log Retention Policy block simply uncomment the below
log.segment.bytes=1073741824
// in the Zookeeper block update the string.
// comment the below line :-
zookeeper.connect=localhost:2181
// add this :-
zookeeper.connect=10.103.5.7:2181,10.103.5.8:2181,10.103.5.9:2181 // all three nodes ip
################### CONFIG_END ################
Note:- Some Configuration Of Kafka Is Specific For Nodes Eg- Broker.id, Listeners.
- start zookeeper and Kafka service in all three nodes.
// start zookeeper and kafka service.
sudo systemctl daemon-reload
sudo systemctl start zookeeper.service
sudo systemctl start kafka.service
- execute all the above (steps) & commands in all three nodes.
After successfully configuration and starting of service, your kafka service is up and running you can check with:- sudo systemctl status kafka.service --no-pager
feel free to ask queries related to this topic. I will be happy to help you.
connect with me:- utkarshsri0701@gmail.com / serv-ar-tistry Studio