Getting Started with Kafka: Your First Event Streaming Project
Introduction
In today’s world, real-time systems are everywhere — ride-sharing, stock trading, and live dashboards. These systems rely on processing streams of data instantly and reliably. That’s where Apache Kafka shines.
In this post, I’ll walk you through how I built a real-time event streaming system using Kafka — simulating a basic live event stream. This project is simple and beginner-friendly, ideal for those who want to take their first step into the Kafka ecosystem by building something fun and practical.
The Use Case: Simulated Real-Time Bidding
Imagine a scenario where:
- Users place bids in real time.
- Bids need to be validated.
- The highest bid is tracked live.
- Users get instant notifications.
- All bids are stored for future reference.
Kafka will act as the central event hub for all of this using a topic called live-bids .
Architecture
[ Bidder App ] → [ Kafka Topic: live-bids ]
↓
┌────────────┼────────────┐────────────────────┐
↓ ↓ ↓ ↓
[ Bid Validator ] [ Highest Bid ] [ Notification ][ Persist to DB ]
Kafka decouples the producer from multiple consumers, enabling independent scaling and fault tolerance.
Step 1: Kafka Setup with Docker
Let’s keep setup simple using Docker. Create a docker-compose.yml :
version: '2'
services:
zookeeper:
image: confluentinc/cp-zookeeper:latest
environment:
ZOOKEEPER_CLIENT_PORT: 2181
kafka:
image: confluentinc/cp-kafka:latest
ports:
- "9092:9092"
environment:
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
Start Kafka:
docker-compose up -d
Once running, Kafka will be ready to accept messages (bids).
Step 2: Simulating a Bidder (Producer)
Let’s simulate a bidder that sends random bids.
producer_bidder.py
from kafka import KafkaProducer
import json, random, time
producer = KafkaProducer(bootstrap_servers='localhost:9092',
value_serializer=lambda v: json.dumps(v).encode('utf-8'))
auction_ids = ['A1', 'A2']
users = [101, 102, 103]
while True:
bid = {
"auction_id": random.choice(auction_ids),
"user_id": random.choice(users),
"amount": random.randint(100, 1000),
"timestamp": time.time()
}
producer.send('live-bids', bid)
print("Sent:", bid)
time.sleep(1)
Step 3: Bid Validator Consumer
consumer_validator.py
from kafka import KafkaConsumer, KafkaProducer
import json
consumer = KafkaConsumer('live-bids',
bootstrap_servers='localhost:9092',
group_id='validator',
auto_offset_reset='earliest',
value_deserializer=lambda x: json.loads(x.decode('utf-8')))
producer = KafkaProducer(bootstrap_servers='localhost:9092',
value_serializer=lambda v: json.dumps(v).encode('utf-8'))
for msg in consumer:
bid = msg.value
if bid['amount'] > 0:
print("Valid:", bid)
producer.send('valid-bids', bid)
else:
print("Invalid:", bid)
Step 4: Track Highest Bid
consumer_highest_bid.py
from kafka import KafkaConsumer
import json
consumer = KafkaConsumer('valid-bids',
bootstrap_servers='localhost:9092',
group_id='highest-bid',
auto_offset_reset='earliest',
value_deserializer=lambda x: json.loads(x.decode('utf-8')))
highest_bids = {}
for msg in consumer:
bid = msg.value
auction = bid['auction_id']
current = highest_bids.get(auction, {"amount": 0})
if bid['amount'] > current['amount']:
highest_bids[auction] = bid
print(f"New high for {auction}: {bid['amount']} by User {bid['user_id']}")
Step 5: Real-Time Notifications
consumer_notifier.py
from kafka import KafkaConsumer
import json
consumer = KafkaConsumer('valid-bids',
bootstrap_servers='localhost:9092',
group_id='notifier',
auto_offset_reset='earliest',
value_deserializer=lambda x: json.loads(x.decode('utf-8')))
for msg in consumer:
bid = msg.value
print(f"Notification: User {bid['user_id']} bid {bid['amount']} on {bid['auction_id']}")
Step 6: Persist Bids
consumer_persist.py
from kafka import KafkaConsumer
import json
consumer = KafkaConsumer('valid-bids',
bootstrap_servers='localhost:9092',
group_id='persistence',
auto_offset_reset='earliest',
value_deserializer=lambda x: json.loads(x.decode('utf-8')))
with open("bids_log.txt", "a") as f:
for msg in consumer:
f.write(json.dumps(msg.value) + "\n")
print("Saved:", msg.value)
Commands to Run the System
- Start Kafka:
docker-compose up -d
- Run producer:
python producer_bidder.py
- Run consumers in separate terminals:
python consumer_validator.py
python consumer_highest_bid.py
python consumer_notifier.py
python consumer_persist.py
What You’ll Learn
- Kafka’s publish-subscribe model makes it easy to build decoupled systems.
- You can plug in new consumers (like analytics or fraud detection) without touching the producer.
- Kafka topics serve as a single source of truth for streaming data.
Possible Enhancements
- Use Kafka Streams or Apache Flink for stateful bid tracking.
- Store bids in a real database like PostgreSQL or MongoDB.
- Integrate WebSockets to stream live bids to users.
- Host Kafka on Confluent Cloud or AWS MSK for production-grade reliability.
Conclusion
This project is a great starting point for anyone curious about Kafka. It shows how Kafka can serve as the real-time backbone of an event-driven application.
This is not a production-grade auction system, but rather a hands-on example to demonstrate how Kafka works. You can use this same architecture for many real-world problems: payment processing, live leaderboards, and notification systems.
If you’re new to Kafka, give this project a shot. It’s simple, practical, and gets you up and running with real-time data pipelines.