buzzline-05-case

Nearly every streaming analytics system stores processed data somewhere for further analysis, historical reference, or integration with BI tools.

In this example project, we incorporate a relational data store. We use SQLite, but the example could be altered to work with MySQL, PostgreSQL, or MongoDB.

VS Code Extensions

Black Formatter by Microsoft
Markdown All in One by Yu Zhang
PowerShell by Microsoft (on Windows Machines)
Pylance by Microsoft
Python by Microsoft
Python Debugger by Microsoft
Ruff by Astral Software (Linter)
SQLite Viewer by Florian Klampfer
WSL by Microsoft (on Windows Machines)

Task 1. Use Tools from Module 1 and 2

Before starting, ensure you have completed the setup tasks in https://github.com/denisecase/buzzline-01-case and https://github.com/denisecase/buzzline-02-case first.

Versions matter. Python 3.11 is required. See the instructions for the required Java JDK and more.

Task 2. Copy This Example Project and Rename

Once the tools are installed, copy/fork this project into your GitHub account and create your own version of this project to run and experiment with. Follow the instructions in FORK-THIS-REPO.md.

OR: For more practice, add these example scripts or features to your earlier project. You'll want to check requirements.txt, .env, and the consumers, producers, and util folders. Use your README.md to record your workflow and commands.

Task 3. Manage Local Project Virtual Environment

Follow the instructions in MANAGE-VENV.md to:

Create your .venv
Activate .venv
Install the required dependencies using requirements.txt.

Task 4. Start Zookeeper and Kafka (Takes 2 Terminals)

If Zookeeper and Kafka are not already running, you'll need to restart them. See instructions at [SETUP-KAFKA.md] to:

Start Zookeeper Service (link)
Start Kafka Service (link)

Task 5. Start a New Streaming Application

This will take two more terminals:

One to run the producer which writes messages.
Another to run the consumer which reads messages, processes them, and writes them to a data store.

Producer (Terminal 3)

Start the producer to generate the messages. The existing producer writes messages to a live data file in the data folder. If Zookeeper and Kafka services are running, it will try to write them to a Kafka topic as well. For configuration details, see the .env file.

In VS Code, open a NEW terminal. Use the commands below to activate .venv, and start the producer.

Windows:

.venv\Scripts\activate
py -m producers.producer_case

Mac/Linux:

source .venv/bin/activate
python3 -m producers.producer_case

The producer will still work if Kafka is not available.

Consumer (Terminal 4) - Two Options

Start an associated consumer. You have two options.

Start the consumer that reads from the live data file.
OR Start the consumer that reads from the Kafka topic.

In VS Code, open a NEW terminal in your root project folder. Use the commands below to activate .venv, and start the consumer.

Windows:

.venv\Scripts\activate
py -m consumers.kafka_consumer_case
OR
py -m consumers.file_consumer_case

Mac/Linux:

source .venv/bin/activate
python3 -m consumers.kafka_consumer_case
OR
python3 -m consumers.file_consumer_case

Review the Project Code

Review the requirements.txt file.

What - if any - new requirements do we need for this project?
Note that requirements.txt now lists both kafka-python and six.
What are some common dependencies as we incorporate data stores into our streaming pipelines?

Review the .env file with the environment variables.

Why is it helpful to put some settings in a text file?
As we add database access and passwords, we start to keep two versions:
- .evn
- .env.example
Read the notes in those files - which one is typically NOT added to source control?
How do we ignore a file so it doesn't get published in GitHub (hint: .gitignore)

Review the .gitignore file.

What new entry has been added?

Review the code for the producer and the two consumers.

Understand how the information is generated by the producer.
Understand how the different consumers read, process, and store information in a data store?

Compare the consumer that reads from a live data file and the consumer that reads from a Kafka topic.

Which functions are the same for both?
Which parts are different?

What files are in the utils folder?

Why bother breaking functions out into utility modules?
Would similar streaming projects be likely to take advantage of any of these files?

What files are in the producers folder?

How do these compare to earlier projects?
What has been changed?
What has stayed the same?

What files are in the consumers folder?

This is where the processing and storage takes place.
Why did we make a separate file for reading from the live data file vs reading from the Kafka file?
What functions are in each?
Are any of the functions duplicated?
Can you refactor the project so we could write a duplicated function just once and reuse it?
What functions are in the sqlite script?
What functions might be needed to initialize a different kind of data store?
What functions might be needed to insert a message into a different kind of data store?

Explorations

Did you run the kafka consumer or the live file consumer? Why?
Can you use the examples to add a database to your own streaming applications?
What parts are most interesting to you?
What parts are most challenging?

Later Work Sessions

When resuming work on this project:

Open the folder in VS Code.
Open a terminal and start the Zookeeper service. If Windows, remember to start wsl.
Open a terminal and start the Kafka service. If Windows, remember to start wsl.
Open a terminal to start the producer. Remember to activate your local project virtual environment (.env).
Open a terminal to start the consumer. Remember to activate your local project virtual environment (.env).

Save Space

To save disk space, you can delete the .venv folder when not actively working on this project. You can always recreate it, activate it, and reinstall the necessary packages later. Managing Python virtual environments is a valuable skill.

License

This project is licensed under the MIT License as an example project. You are encouraged to fork, copy, explore, and modify the code as you like. See the LICENSE file for more.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

buzzline-05-case

VS Code Extensions

Task 1. Use Tools from Module 1 and 2

Task 2. Copy This Example Project and Rename

Task 3. Manage Local Project Virtual Environment

Task 4. Start Zookeeper and Kafka (Takes 2 Terminals)

Task 5. Start a New Streaming Application

Producer (Terminal 3)

Consumer (Terminal 4) - Two Options

Review the Project Code

Explorations

Later Work Sessions

Save Space

License

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/workflows		.github/workflows
consumers		consumers
producers		producers
utils		utils
.env.example		.env.example
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
requirements.txt		requirements.txt

License

denisecase/buzzline-05-case

Folders and files

Latest commit

History

Repository files navigation

buzzline-05-case

VS Code Extensions

Task 1. Use Tools from Module 1 and 2

Task 2. Copy This Example Project and Rename

Task 3. Manage Local Project Virtual Environment

Task 4. Start Zookeeper and Kafka (Takes 2 Terminals)

Task 5. Start a New Streaming Application

Producer (Terminal 3)

Consumer (Terminal 4) - Two Options

Review the Project Code

Explorations

Later Work Sessions

Save Space

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages