Skip to the content.

← Back to Chat with RAG Home

Configuration Reference

About this document

This page provides a comprehensive configuration reference for the Chat-with-RAG system, including environment variables, model settings, and runtime parameters.

Note: If you landed here directly (for example from documentation hosting or search), start with the repository README to see how to run the system locally and try the interactive demo.

Table of Contents


Environment Variables

Create a .env file in the project root with these variables:

Core Settings

# API Keys (required)
OPENAI_API_KEY=sk-your-openai-key-here
GEMINI_API_KEY=your-gemini-key-here

# Server Configuration
HOST=0.0.0.0
PORT=8000
ALLOWED_ORIGINS=http://localhost:8000,http://127.0.0.1:8000

# Debug Settings
DEBUG_VERBOSE=false
DEBUG_LOG_KEYS=false
DEBUG_LOG_TRUNCATE_CHARS=200
SHOW_PROCESSING_STEPS=true

Database Settings

# Qdrant Configuration
QDRANT_HOST=localhost
QDRANT_PORT=6333
QDRANT_API_KEY=  # Leave empty for local Qdrant

# Collection Settings
DEFAULT_COLLECTION=document_index

LLM Provider Settings

# OpenAI Configuration
OPENAI_BASE_URL=https://api.openai.com/v1
OPENAI_ORG_ID=  # Optional

# Gemini Configuration  
GEMINI_BASE_URL=https://generativelanguage.googleapis.com/v1

Prompt Registry

# Prompt Debugging
PROMPT_REGISTRY_LOG_FULL=0  # Set to 1 to log full resolved prompts

Backend Configuration

Main Settings Class

Located in backend/core/config.py:

class Settings(BaseSettings):
    # Server Settings
    host: str = "0.0.0.0"
    port: int = 8000
    allowed_origins: List[str] = ["http://localhost:8000"]
    
    # Database
    qdrant_host: str = "localhost"
    qdrant_port: int = 6333
    qdrant_api_key: Optional[str] = None
    
    # Default Models
    embedding_model: str = "openai:embed_small"
    inference_model: str = "openai:gpt-4o"
    rewrite_model: str = "openai:gpt-4o-mini"
    summary_model: str = "openai:gpt-4o-mini"
    rerank_model: str = "openai:gpt-4o-mini"
    tools_synth_model: str = "openai:gpt-4o-mini"
    
    # Processing Settings
    top_k: int = 8
    score_threshold: float = 0.35
    max_inference_output_tokens: int = 500
    temperature: float = 0.7
    top_p: float = 0.9
    
    # Chat Settings
    raw_tail_turns: int = 10
    summarizer_max_input_tokens: int = 4000
    summarizer_max_output_tokens: int = 128
    summarizer_temperature: float = 0.3
    
    # Query Rewrite
    enable_query_rewrite: bool = True
    rewrite_confidence_threshold: float = 0.6
    rewrite_tail_turns: int = 1
    rewrite_summary_turns: int = 3
    rewrite_cache_ttl_s: int = 300
    
    # Tools
    use_tools: bool = True
    use_web_search: bool = False
    max_tool_passes: int = 2
    
    # Embedding Settings
    embedding_batch_size: int = 100
    default_chunk_size: int = 800
    default_chunk_overlap: int = 100
    max_chunks_per_doc: int = 0
    
    # Reasoning
    inference_reasoning_effort: str = "low"
    inference_reasoning_model: bool = False
    debug_thoughts: bool = True
    
    # Debug
    debug_verbose: bool = False
    debug_log_keys: bool = False
    debug_log_truncate_chars: int = 200
    show_processing_steps: bool = True

Content Processing Configuration

class MediaWikiConfig(BaseModel):
    api_url: str = "https://en.wikipedia.org/w/api.php"
    user_agent: str = "WebsiteChatAgent/0.1 (contact@example.com)"
    max_chunks: int = 0  # 0 = no limit
    skip_sections: List[str] = [
        "References", "External links", "See also", "Further reading"
    ]
    estimate: bool = True
    force_delete: bool = False

class HTMLConfig(BaseModel):
    max_chunks: int = 0
    skip_sections: List[str] = [
        "References", "External links", "See also", "Further reading"
    ]
    estimate: bool = True
    force_delete: bool = False

class PDFConfig(BaseModel):
    max_chunks: int = 0
    skip_sections: List[str] = [
        "References", "External links", "Further reading", 
        "Notes", "See Also", "Acknowledgements"
    ]
    estimate: bool = True
    force_delete: bool = False

Model Registry

The model registry defines all available LLM providers and models for the chat-with-rag system. For complete model details, pricing, and capabilities, see the full Model Registry documentation.

Available Models

OpenAI Models

Gemini Models

Model Categories

Category Models Use Case
Embeddings openai:embed_*, gemini:native-embed Vector search and retrieval
Fast Inference openai:gpt-4o-mini, gemini:*-flash* Chat responses, query rewriting
Standard Inference openai:gpt-4o Complex tasks, summarization
Reasoning openai:reasoning_*, gemini:*-reasoning* Complex problem solving

Default Configuration

The system uses these default models:

For detailed model specifications, pricing, and advanced configuration options, see the complete Model Registry documentation.


Domain Configuration

Domain-based configuration allows multiple isolated knowledge bases:

DOMAIN_EMBEDDING_CONFIG = {
    "default": {
        "collection_name": "document_index",
        "embedding_model_key": "openai:embed_small"
    },
    "mountains": {
        "collection_name": "document_index",
        "embedding_model_key": "openai:embed_small" 
    },
    "oceans": {
        "collection_name": "document_index_gemini",
        "embedding_model_key": "gemini:native-embed"
    }
}

# Active domain (change this to switch domains)
active_domain: str = "mountains"

Using Different Domains

# In backend/core/config.py, change:
active_domain = "oceans"  # Switch to oceans domain

# Or override via environment variable
# ACTIVE_DOMAIN=oceans python start.py

Embedding Configuration

Chunking Parameters

# Text chunking settings
default_chunk_size: int = 800          # Characters per chunk
default_chunk_overlap: int = 100       # Overlap between chunks
embedding_batch_size: int = 100  # Chunks per embedding API call
max_chunks_per_doc: int = 0        # 0 = no limit

Provider-Specific Limits

Provider Max Inputs Max Tokens per Input Batch API
OpenAI 2,048 8,191 Yes
Gemini 250 2,048 No

OpenAI text-embedding-3-small:

chunk_size = 800
embedding_batch_size = 100

Gemini gemini-embedding-001:

chunk_size = 600
embedding_batch_size = 50

Chat Pipeline Configuration

Retrieval Settings

# Vector search parameters
top_k: int = 8                    # Number of documents to retrieve
score_threshold: float = 0.35     # Minimum similarity score
namespace: str = "default"        # Collection/domain isolation

Inference Settings

# LLM generation parameters
temperature: float = 0.7          # Randomness (0.0-1.0)
top_p: float = 0.9                # Nucleus sampling
max_inference_output_tokens: int = 500     # Response length limit
reasoning_effort: str = "low"    # For reasoning models
inference_reasoning_model: bool = False  # Enable reasoning model

Context Management

# Conversation memory
raw_tail_turns: int = 10                    # Verbatim recent turns
summarizer_max_input_tokens: int = 4000      # Summary input limit  
summarizer_max_output_tokens: int = 128     # Summary output limit
summarizer_temperature: float = 0.3         # Summarization randomness

Query Rewrite Configuration

enable_query_rewrite: bool = True
rewrite_confidence_threshold: float = 0.6    # Minimum confidence to accept rewrite
rewrite_tail_turns: int = 1                   # Recent turns for context
rewrite_summary_turns: int = 3                # How many summary turns to consider
rewrite_cache_ttl_s: int = 300                # Cache duration in seconds

Tool Configuration

use_tools: bool = True
use_web_search: bool = False
max_tool_passes: int = 2                     # Maximum tool loops per turn

# Available tools
# - get_weather: Weather information
# - get_airports: Airport lookup  
# - web_search: DuckDuckGo search (if enabled)

Processing Visibility

show_processing_steps: bool = True  # Show intermediate pipeline stages
show_sources: bool = True           # Show source citations

Runtime Parameter Override

All configuration can be overridden at runtime via the params object in API calls:

Example Override

params = {
    "top_k": 12,                    # Override default top_k
    "temperature": 0.3,             # Override default temperature
    "model_keys": {                 # New format for model overrides
        "inference": "openai:gpt-4o-mini"
    },
    "enable_query_rewrite": False,  # Disable query rewrite
    "show_processing_steps": False  # Hide processing steps
}

Per-Stage Model Override

params = {
    "model_keys": {
        "inference": "openai:gpt-4o",              # Main inference
        "rewrite": "openai:gpt-4o-mini",           # Query rewrite  
        "summary": "openai:gpt-4o-mini",           # Summarization
        "rerank": "openai:gpt-4o-mini",            # Reranking
        "tools_synth": "gemini:openai-2.5-flash-lite"  # Tool synthesis
    }
}

Reasoning Model Override

params = {
    "model_keys": {
        "inference": "openai:reasoning_o3-mini",    # OpenAI reasoning model
        "reasoning_effort": "medium"                 # Reasoning intensity
    }
}

Gemini Reasoning Model Override

params = {
    "model_keys": {
        "inference": "gemini:openai-3-flash-preview",  # Gemini reasoning model
        "thinking_level": "low"                      # Gemini reasoning parameter
    }
}

Configuration Validation

Validate Configuration

from backend.core.config import settings

# Check settings
print(f"Embedding model: {settings.embedding_model}")
print(f"Collection: {settings.collection_name}")
print(f"Top K: {settings.top_k}")

Test Connectivity

# Test API connections
python scripts/api_smoke_test_openai.py
python scripts/api_smoke_test_gemini.py

# Test embedding generation
python scripts/embedding_compare.py

# Test Qdrant connection
python scripts/qdrant_scripts/qdrant_ops.py --list-collections

Best Practices

Prompt Registry

Registry file

Prompt domains (params.prompt_domain)

You can select a prompt domain per request using params.prompt_domain.

In the UI (frontend/chat.html), the Prompt Domain dropdown under Inference controls the value sent on every chat request.

Template System and Context Injection

The prompt registry uses Jinja2 templating to safely inject dynamic context into prompts:

This templating approach allows:

Debug logging (safe by default)

The backend logs:

To log the full resolved prompt/template for debugging, set:

PROMPT_REGISTRY_LOG_FULL=1

Performance Optimization

  1. Use appropriate model tiers:
    • Fast models for rewrite/rerank/summary
    • Capable models for main inference
  2. Configure batch sizes:
    • Larger batches for embedding (within provider limits)
    • Smaller chunks for better relevance
  3. Set appropriate limits:
    • top_k: 5-15 for most use cases
    • max_output_tokens: Based on expected response length

Cost Management

  1. Enable estimate mode for large indexing operations
  2. Use faster models for non-critical stages
  3. Monitor usage with conversation totals
  4. Set appropriate token limits

Security

  1. Never commit API keys to version control
  2. Use environment variables for sensitive configuration
  3. Restrict allowed origins in production
  4. Monitor API usage and costs

Troubleshooting Configuration

Common Issues

  1. Dimension mismatch: Ensure embedding model matches collection
  2. API key errors: Verify keys in .env file
  3. Connection refused: Check Qdrant is running
  4. CORS errors: Verify allowed origins configuration

Debug Configuration

# Enable verbose logging
DEBUG_VERBOSE=true
DEBUG_LOG_KEYS=true

# Log full prompts (for debugging)
PROMPT_REGISTRY_LOG_FULL=1

# Check current configuration
python -c "from backend.core.config import settings; print(settings.dict())"

Reset Configuration

Reset Environment Configuration

# Reset environment variables to defaults
cp .env.example .env
# Edit .env with your API keys and restart application

Reset Qdrant Database

# Clear data but keep collection structure
python scripts/qdrant_scripts/qdrant_ops.py truncate --collection document_index
python scripts/qdrant_scripts/qdrant_ops.py truncate --collection document_index_gemini

# Delete entire collection and re-seed
python scripts/qdrant_scripts/qdrant_ops.py delete --collection document_index
python scripts/qdrant_scripts/qdrant_ops.py delete --collection document_index_gemini
make seed

Full System Reset

# Complete reset to factory defaults
cp .env.example .env
make stop
python scripts/qdrant_scripts/qdrant_ops.py delete --collection document_index
python scripts/qdrant_scripts/qdrant_ops.py delete --collection document_index_gemini
make seed
make start