This repository contains a Retrieval-Augmented Generation (RAG) system for indexing and querying Confluence wiki pages using Azure AI and Qdrant vector database.
cc_qa_bot_azureai.mp4
- Confluence Page Extraction: Retrieves and processes pages from a Confluence space.
- Text Chunking: Splits content into manageable chunks for efficient embedding.
- Vector Storage: Stores embeddings in Qdrant for fast retrieval.
- Azure AI Integration: Uses Azure OpenAI models for embeddings and text generation.
- Web Interface: Provides a FastAPI-based web application for querying indexed data.
- Python 3.12+
pip
andvenv
oruv
- Confluence API token
- Azure AI API key
- Qdrant server
- Clone the repository:
git clone <repository-url> cd <repository>
- Create a virtual environment and install dependencies:
python -m venv venv source venv/bin/activate # On Windows use: venv\Scripts\activate pip install -r requirements.txt
- Copy the sample config and update configuration:
cp config.toml.sample config.toml nano config.toml # Edit with your details
- Copy the sample env file and update environment variables:
cp env.sample .env nano .env # Edit with your details
Before running the system, create a local directory for Qdrant storage and start the database:
mkdir -p qdrant_storage
podman run -d -p 6333:6333 -v ./qdrant_storage:/qdrant/storage qdrant/qdrant
Access the Qdrant dashboard at http://localhost:6333/dashboard
Run the script to fetch and store Confluence page embeddings in Qdrant:
python cc_index.py
Launch the FastAPI-based web UI:
uvicorn cc_webapp:app --host 0.0.0.0 --port 8000
Access the web UI at http://localhost:8000
Send a POST request to retrieve answers from indexed documents:
curl -X POST "http://localhost:8000/ask" -H "Content-Type: application/json" \
-d '{"query": "What is the company policy on remote work?", "model": "<model deployment name>"}'
NOTE: Replace <model deployment name> in the above curl
command.
Edit config.toml
to update parameters like:
- Confluence credentials
- Chunking strategy
- Azure AI model settings
- Qdrant database connection
- Data Ingestion
- Fetches Confluence pages
- Cleans and tokenizes text
- Splits into chunks
- Creates embeddings for chunks via Azure AI
- Stores in Qdrant
- Query Processing
- Receives user query
- Creates embeddings for query via Azure AI
- Retrieves relevant documents from Qdrant
- Uses Azure AI to generate a response
MIT License
- Cybergavin