PDF-Chatbot: Which Runs Locally on your Machine(No API Required)

Overview

This project is a PDF-based chatbot application built using Streamlit, Langchain, and Ollama. It allows users to upload a PDF document, and then ask questions about the content of the document. The application uses a vector database (Chroma) to store embeddings of the PDF text, enabling efficient retrieval of relevant information for answering user queries.

Installation

1.Create a virtual environment

python -m venv env

2.Activate the environment

env\Scripts\activate

3.install Ollama

4.Clone this repository:

 git clone https://github.com/sanjayram-a/PDF-Chatbot.git

5.Install the necessary libraries:

pip install -r requirement.txt

6.To install Necessary Models(Optional)

install.bat

7.To run

streamlit run app.py

Features

PDF Upload: Users can upload PDF files for querying.
Question Answering: The application answers questions based on the uploaded PDF content.
Session Management: The application manages sessions to handle multiple users and clean up temporary files after inactivity.

Architecture

The application consists of two main Python files:

app.py: This file contains the Streamlit application logic. It handles user interaction, file uploads, and displays the chatbot interface. It uses the functions defined in backend.py to process the PDF and answer questions.
backend.py: This file contains the core logic for processing the PDF, creating embeddings, and answering questions. It uses Langchain to load the PDF, split it into chunks, create embeddings using Ollama embeddings, and store them in a Chroma vector database. It uses Ollama for both the question answering based on the PDF and for a general AI fallback.

The application uses Ollama for both the large language model (LLM) and the embedding model. The specific models used are "gemma2:2b" for the LLM and "nomic-embed-text" for embeddings.

Data Handling

The application creates temporary folders and files to store uploaded PDFs and the Chroma vector database. These are automatically deleted after a period of inactivity.

Workflow

User uploads a PDF: The PDF is saved to a temporary folder.
PDF is processed: The PDF is loaded, split into chunks, and embeddings are generated.
Embeddings are stored: The embeddings are stored in a Chroma vector database.
User asks a question: The question is processed, and relevant information is retrieved from the vector database.
Answer is generated: The LLM generates an answer based on the retrieved information.
Answer is displayed: The answer is displayed to the user.

Future Improvements

Working on to delete temporary folders when user exits.
Add support for other document formats.
Improve the user interface.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
LICENSE		LICENSE
README.md		README.md
app.py		app.py
backend.py		backend.py
install.bat		install.bat
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF-Chatbot: Which Runs Locally on your Machine(No API Required)

Overview

Installation

Features

Architecture

Data Handling

Workflow

Future Improvements

About

Releases

Packages

Contributors 2

Languages

License

sanjayram-a/PDF-Chatbot

Folders and files

Latest commit

History

Repository files navigation

PDF-Chatbot: Which Runs Locally on your Machine(No API Required)

Overview

Installation

Features

Architecture

Data Handling

Workflow

Future Improvements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages