Apache Kafka for Noobs

Lakshan Gunarathna
Analytics Vidhya
Published in
6 min readMay 23, 2021

--

This article is focused on providing an introduction to Apache Kafka and guides through on the setup and configuration of Apache Kafka on a Windows environment along with creating Kafka Consumers and Producer.

What is Apache Kafka?

Apache Kafka and Key Terminologies

Apache Kafka is a Distributed Streaming Platform that enables pushing and subscribing into the stream of data. Apache Kafka is fast, scalable, durable, and fault-tolerant. Furthermore, it has the ability in managing large volumes of data and could be used in the real-time data processing.

Key Features in Apache Kafka
Features of Apache Kafka

Key Terminologies

  • Kafka maintains feeds of messages in categories called topics.
  • Processes that publish messages into a Kafka topic are called producers.
  • Processes that subscribe to topics and that process the feed of published messages are called consumers.
  • Kafka is run as a cluster comprised of one or more servers each of which is called a broker.
  • Communication between all components happens via a high-performance, simple, binary API over TCP (Transmission Control Protocol) Protocol.

Kafka Set-up and Configuration

1. Install Java JDK 8 for the Windows Environment from the below link:

2. Download Apache Kafka from the below link:

3. Unzip the downloaded ‘kafka_2.13–2.8.0.tgz’ in the Windows Environment

Apache Kafka Unzipped Files
Unzipped Kafka Folder

4. Modify the Log Path in the ‘server.properties’ file

Search for ‘log.dirs’ keyword in ‘server.properties’ file, and update the location to the unzipped folder location ‘C:\kafka_2.13–2.8.0’.

Now, let us look into the following commands to start up the Apache Kafka server along with Consumers and Producers for a topic.

3. Running Apache Zookeeper Server

Apache Zookeeper acts as a centralized service used to maintain naming, configuration data and provides synchronization within distributed systems. Apache Zookeeper keeps a track of the status of the Kafka cluster nodes and it also keeps track of Kafka topics, partitions, etc. Apache Zookeeper allows multiple clients to perform simultaneous reads and writes and acts as a shared configuration service within the system. ZAB(Zookeeper atomic broadcast) protocol acts as the brain for the whole system, enabling it to act as an atomic broadcast system and issue orderly updates.

Let us start Apache Zookeeper Server in our Window environment.

Browse to ‘C:\kafka_2.13–2.8.0\bin\windows’ folder.

Zookeeper Folder Location
Zookeeper Folder Location ‘C:\kafka_2.13–2.8.0\bin\windows

Open a Command Prompt from this location ‘C:\kafka_2.13–2.8.0\bin\windows’, and execute the below:

C:\kafka_2.13-2.8.0\bin\windows>zookeeper-server-start.bat ../../config/zookeeper.properties

The above command will successfully start up the Apache Zookeeper Server in our Windows environment, it will be using the configuration settings defined in ‘C:\kafka_2.13–2.8.0\config\zookeeper.properties’.

**Remember, Apache Zookeeper Server will run on port: 2181

Command Prompt Log for Apache Zookeeper Server Start-up

Now, let us start our Apache Kafka Server.

4. Running Apache Kafka Server

To start running the Apache Kafka Server, Apache Zookeeper should be running in the background.

Open another Command Prompt from this location ‘C:\kafka_2.13–2.8.0\bin\windows’ and execute the below:

C:\kafka_2.13–2.8.0\bin\windows>kafka-server-start.bat ../../config/server.properties

The above command will successfully start up the Apache Kafka Server in our Windows environment, it will be using the configuration settings defined in ‘C:\kafka_2.13–2.8.0\config\server.properties’.

**Remember, Apache Kafka Server will run on port: 9092

5. Creating Kafka Topics

Kafka topic represents the category or the feed name to which the messages are published and stored. Kafka messaging architecture is organized into topics. Eventually, a record sent and received in Kafka belongs to a certain topic. The producer in Kafka writes a certain record into a specific topic and the interested/assigned consumer reads the record from the subscribed topic.

Since each message sent or received belongs to a specific topic, initially, we need to create a topic before creating the Consumer or Producer.

Open another Command Prompt from this location ‘C:\kafka_2.13–2.8.0\bin\windows’ and execute the below:

kafka-topics.bat — create — zookeeper localhost:2181 — replication-factor 1 — partitions 1 — topic noobtopic
Successful Kafka Topic Creation

We have created a single Kafka server, so we’ll be setting the ‘replication-factor’ as 1 and we need to create a single ‘partition’(as 1). Let’s list the topic created by using the below command.

kafka-topics.bat — list — zookeeper localhost:2181
Listing the Created Kafka Topics

6. Creating Kafka Producers

Kafka Data Producer and Consumer

We have created a Kafka topic in the previous step ‘noobtopic’, will now create a Producer, which will feed data to the topic stream. The Producer will be producing the records which will then be added to the Kafka server as records for the specified topic.

Open another Command Prompt from this location ‘C:\kafka_2.13–2.8.0\bin\windows’ and execute the below:

kafka-console-producer.bat --broker-list localhost:9092 --topic noobtopic

With the above command execution, we are creating a Producer for the topic ‘noobtopic’. The Producer can now start adding the record to this topic.

7. Creating Kafka Consumer

Relationship between Consumers and Producers

Records/Data added by a Producer could be consumed by a Consumer who is subscribed to the same topic. Now let us create a Consumer to consumer the records from the created topic ‘noobtopic’.

Open another Command Prompt from this location ‘C:\kafka_2.13–2.8.0\bin\windows’ and execute the below:

kafka-console-consumer.bat — bootstrap-server localhost:9092 — topic noobtopic

Since we have created a Consumer, we can consume the incoming data stream from the Producer. Further, we could create multiple Consumers for the ‘topic’, whenever the record is added to the topic by the Producer, the consumer can receive the published data.

Now let’s look into the process of adding a record.

Pushing a record to the topic by the Producer and Consuming

Conclusion

Hope you had a basic understanding of Apache Kafka and the process of setting up and configuring the Kafka Server within a Windows Environment.

Thanks for Reading! 😊 You are not a noob anymore😉

Stay connected for more.

References:

--

--

Lakshan Gunarathna
Analytics Vidhya

Experienced Software Engineer | Data Science Enthusiast