Skip to the content.

Chat with RAG: Modular Tool-Assisted RAG Pipeline

GitHub Stars GitHub Release CI Status

A modular Python framework for building Retrieval‑Augmented Generation (RAG) systems.

What This Project Provides

Chat with RAG implements a modular architecture for building tool‑assisted Retrieval‑Augmented Generation (RAG) systems. The framework combines:

View the Code Technical Deep Dive API Reference

System Architecture Overview

Chat with RAG architecture overview showing multi-LLM orchestration, prompt registry, context management, observability, and embeddable interfaces

🏗️ Architectural Pillars

This framework separates knowledge preparation from runtime reasoning and orchestration.

1. High-Fidelity Ingestion Engine (The “Memory”)

A robust workflow designed to transform unstructured data into a structured, queryable knowledge base:

2. Tool-Assisted Response Pipeline (The “Brain”)

A modular execution flow where the system determines the best path to an answer in real-time:

3. Runtime Intelligence

Operational Observability

Designed for developers who need visibility into each stage of the system’s operation:

Real-time SSE Streams: Watch the pipeline execute stage-by-stage (Rewrite → Retrieve → Tool Use → Synthesis).

Per-Turn Accounting: Precise tracking of token usage and actual cost for every single interaction.

Domain Isolation: Securely serve different knowledge bases and prompt configurations to different websites from a single backend.

High-Level Pipeline Orchestration

The system is organized around two primary pipelines: document ingestion and chat orchestration.

Pipeline Flow
Ingestion Documents / URLsLoad SourcesExtract & ParseChunk & NormalizeMetadata AugmentationEmbeddingsVector Storage
Chat User PromptQuery RewriteRetrievalRerankContext AssemblyLLM InferenceTool ExecutionResponse SynthesisPost-ProcessingFinal Response

🗺️ Next Up (Roadmap)

Enhancements focused on retrieval precision and identity management:

Retrieval Enhancement: Implementing Query Expansion (Multi-query generation) to capture broader semantic intent.

Hybrid Search: Augmenting vector-based retrieval with text-based search (BM25) to improve keyword accuracy.

Advanced Reranking: Integration of cross-encoders for high-precision result filtering.

Identity Management: Adding user authentication and management to enhance existing multi-user session isolation.

💻 Technical Foundation

This project is built using a modern, performant stack designed for modularity:

Component Technology Role
Vector Database Qdrant High-performance vector storage and collection management
Model Adapter vrraj-llm-adapter Unified interface for OpenAI, Gemini, and multi-provider orchestration
Backend Framework FastAPI / Python High-performance, asynchronous API delivery and SSE streaming
Frontend HTML/CSS/JavaScript Responsive UI with real-time pipeline visualization and embeddable widget
Orchestration Custom Pipeline Deterministic multi-stage execution (Rewrite/Rerank/Response Synthesis)

🚀 Getting Started

Launch the entire stack—including the Qdrant vector database and the web application—using the provided bootstrap script:

git clone https://github.com/vrraj/chat-with-rag.git
cd chat-with-rag
bash scripts/rag_setup.sh

Add your OpenAI or Gemini API key to the .env file and start the application.

👉 http://localhost:8000

For the complete setup and configuration steps, see Getting Started in the README:

Getting Started

Use Cases

Chat with RAG can support several AI application patterns:

Application Interfaces

Chat with RAG provides three primary interfaces for different use cases:

The interface can also be deployed on a server and accessed by multiple users, making it useful for experimentation and collaborative testing.

Documentation

Core Documentation: Full Documentation (README) | API Reference | Configuration Reference

Architecture & Development: Technical Overview | Development Guide | Deployment Guide

Integration & Features: Embedded Chat Guide | Server-Sent Events | Troubleshooting Guide | Attributions

Story on Medium: Chat-with-rag: A Modular Reference Architecture for RAG


© 2026 Rajkumar Velliavitil — All Rights Reserved.

Source-available for personal, educational, and evaluation purposes.