Confluence Cloud RAG System

This repository contains a Retrieval-Augmented Generation (RAG) system for indexing and querying Confluence wiki pages using Azure AI and Qdrant vector database.

Quick Demo

cc_qa_bot_azureai.mp4

Features

Confluence Page Extraction: Retrieves and processes pages from a Confluence space.
Text Chunking: Splits content into manageable chunks for efficient embedding.
Vector Storage: Stores embeddings in Qdrant for fast retrieval.
Azure AI Integration: Uses Azure OpenAI models for embeddings and text generation.
Web Interface: Provides a FastAPI-based web application for querying indexed data.

Installation

Prerequisites

Python 3.12+
pip and venv or uv
Confluence API token
Azure AI API key
Qdrant server

Setup

Clone the repository:

git clone <repository-url>
cd <repository>

Create a virtual environment and install dependencies:

python -m venv venv
source venv/bin/activate  # On Windows use: venv\Scripts\activate
pip install -r requirements.txt

Copy the sample config and update configuration:

cp config.toml.sample config.toml
nano config.toml  # Edit with your details

Copy the sample env file and update environment variables:
```
cp env.sample .env
nano .env  # Edit with your details
```

Launch Qdrant Database

Before running the system, create a local directory for Qdrant storage and start the database:

mkdir -p qdrant_storage
podman run -d -p 6333:6333 -v ./qdrant_storage:/qdrant/storage qdrant/qdrant

Access the Qdrant dashboard at http://localhost:6333/dashboard

Usage

Index Confluence Pages

Run the script to fetch and store Confluence page embeddings in Qdrant:

python cc_index.py

Start the Web Application

Launch the FastAPI-based web UI:

uvicorn cc_webapp:app --host 0.0.0.0 --port 8000

Access the web UI at http://localhost:8000

Query the System via API

Send a POST request to retrieve answers from indexed documents:

curl -X POST "http://localhost:8000/ask" -H "Content-Type: application/json" \
-d '{"query": "What is the company policy on remote work?", "model": "<model deployment name>"}'

NOTE: Replace <model deployment name> in the above curl command.

Configuration

Edit config.toml to update parameters like:

Confluence credentials
Chunking strategy
Azure AI model settings
Qdrant database connection

Architecture

Data Ingestion
- Fetches Confluence pages
- Cleans and tokenizes text
- Splits into chunks
- Creates embeddings for chunks via Azure AI
- Stores in Qdrant
Query Processing
- Receives user query
- Creates embeddings for query via Azure AI
- Retrieves relevant documents from Qdrant
- Uses Azure AI to generate a response

License

MIT License

Contributors

Cybergavin

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
templates		templates
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
cc_parse_index.py		cc_parse_index.py
cc_rag.py		cc_rag.py
cc_webapp.py		cc_webapp.py
config.toml.sample		config.toml.sample
env.sample		env.sample
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Confluence Cloud RAG System

Quick Demo

Features

Installation

Prerequisites

Setup

Launch Qdrant Database

Usage

Index Confluence Pages

Start the Web Application

Query the System via API

Configuration

Architecture

License

Contributors

About

Releases

Packages

Languages

License

cybergavin/cc-qa-bot-azureai

Folders and files

Latest commit

History

Repository files navigation

Confluence Cloud RAG System

Quick Demo

Features

Installation

Prerequisites

Setup

Launch Qdrant Database

Usage

Index Confluence Pages

Start the Web Application

Query the System via API

Configuration

Architecture

License

Contributors

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages