Streaming data is often too big for any one machine. A streaming platform helps organize our pipelines.
A common pattern for managing streaming pipelines is publish-subscribe, similar to how Twitter operates:
- Producers publish streaming information.
- Consumers subscribe to specific "topics" to process, analyze, and generate alerts based on detected conditions.
In this project, we use Apache Kafka, a popular, open-source streaming platform. We write producers that send data to topics and consumers that read from topics.
Kafka needs space - it's big. We'll use the Windows Subsystem for Linux on Windows machines.
Before starting, ensure you have completed the setup tasks in https://github.com/denisecase/buzzline-01-case first. Python 3.11 is required.
In this task, we will download, install, configure, and start a local Kafka service.
- Install Windows Subsystem for Linux (Windows machines only)
- Install Kafka Streaming Platform
- Start the Zookeeper service (leave the terminal open).
- Start the Kafka service (leave the terminal open).
For detailed instructions, see:
- SETUP-KAFKA (all machines)
Copy/fork this project into your GitHub account
and create your own version of this project to run and experiment with.
Name it buzzline-02-yourname
where yourname is something unique to you.
Follow the instructions in FORK-THIS-REPO.md).
Follow the instructions in MANAGE-VENV.md to:
- Create your .venv
- Activate .venv
- Install the required dependencies using requirements.txt.
Producers generate streaming data for our topics.
In VS Code, open a terminal. Use the commands below to activate .venv, and start the producer.
Windows:
.venv\Scripts\activate
py -m producers.kafka_producer_case
Mac/Linux:
source .venv/bin/activate
python3 -m producers.kafka_producer_case
Consumers process data from topics or logs in real time.
In VS Code, open a NEW terminal in your root project folder. Use the commands below to activate .venv, and start the consumer.
Windows:
.venv\Scripts\activate
py -m consumers.kafka_consumer_case
Mac/Linux:
source .venv/bin/activate
python3 -m consumers.kafka_consumer_case
When resuming work on this project:
- Open the folder in VS Code.
- Start the Zookeeper service.
- Start the Kafka service.
- Activate your local project virtual environment (.env).
To save disk space, you can delete the .venv folder when not actively working on this project. You can always recreate it, activate it, and reinstall the necessary packages later. Managing Python virtual environments is a valuable skill.
This project is licensed under the MIT License as an example project. You are encouraged to fork, copy, explore, and modify the code as you like. See the LICENSE file for more.