β Back to Chat with RAG Home
Embedded Chat Integration Guide
About this document
This page explains the embeddable chat UI for the Chat-with-RAG system, including configuration options, integration methods, and usage examples for third-party websites.
Note: If you landed here directly (for example from documentation hosting or search), start with the repository README to see how to run the system locally and try the interactive demo.
This document describes the embeddable chat UI built on top of the existing stateless /chat endpoint.
It is focused on:
frontend/chat-embed.htmlβ minimal chat surface intended for iframes and widgets.frontend/static/chat-embed.jsβ the JS client that talks to/chat.frontend/static/embed-loader.jsβ a small helper script for third-party sites.
The backend logic (including backend/chat/chat_manager.py and POST /chat semantics) is unchanged and shared with the main frontend/chat.html app. For the low-level API details, see:
2. Overview
The embedded chat UI provides a small, self-contained chat box that can be dropped into other websites.
Example Embedded Chat Interface
Key properties:
- Embeddable via
<iframe>or a one-line<script>tag. - No parameter sidebar or metrics bar β those controls are hidden.
- Configurable via URL query parameters and data attributes.
- Reuses the same
/chatendpoint andparamscontract as documented indocs/api-reference.md.
The intent is to allow a host site to preconfigure behavior (retrieval, rewrite, tools, etc.) and expose a simple βchat with my docs/dataβ experience.
2. Files
frontend/chat-embed.html- Minimal HTML shell for the embeddable chat UI.
- Includes
static/chat.cssfor styling (same base chat bubbles aschat.html). - Includes
static/chat-embed.jsfor behavior.
frontend/static/chat-embed.js- Lightweight client that:
- Parses query parameters from the URL.
- Builds
paramsfor/chat. - Renders chat bubbles and sends messages.
- Lightweight client that:
frontend/static/embed-loader.js- Helper that host sites can load with a
<script>tag. - Automatically injects an
<iframe>pointing atchat-embed.htmlwith the configured query parameters.
- Helper that host sites can load with a
Backend files are not modified and are documented separately in README_CHAT_API.md and technical-overview.md.
3. Runtime behavior
3.1 chat-embed.html
- Renders:
#embed_chat_historyβ scrollable history region.#embed_chat_inputβ textarea for user input.#embed_send_buttonβ send button.- A small footer for attribution.
- Does not render:
- The left-hand parameter sidebar.
- The metrics bar.
- The models modal.
All behavior is driven by static/chat-embed.js.
3.2 chat-embed.js
chat-embed.js is responsible for:
- Parsing config from the current URL (
window.location.search). - Maintaining a conversation id (either provided by caller or locally generated).
- Building a
ChatRequestpayload and callingPOST /chat. - Rendering user and assistant messages in the embed UI.
It uses the exact same request shape as docs/api-reference.md describes:
{
"message": "<user text>",
"use_web_search": false,
"history": [
{ "role": "user", "content": "..." },
{ "role": "assistant", "content": "..." }
],
"params": {
"user_id": "Optional user identifier for token accounting",
"namespace": "Alternative way to provide a conversation identifier. If conversation_id is not provided, namespace will be used for params.conversation_id."
}
}
4. Configuration via URL query parameters
The embed page is configured by query parameters on the chat-embed.html URL. All fields are optional. If a field is omitted, the backend defaults apply (via Settings / run_pipeline).
4.1 Retrieval
top_kβ integer or float-like string.score_thresholdβ float between 0 and 1.
4.2 Summarizer / history
raw_tail_turnssummarizer_max_input_tokenssummarizer_max_output_tokens
All should be integers if provided.
4.3 Inference
temperatureβ float.top_pβ float.max_output_tokensβ integer.reasoning_effortβ string (βminimalβ, βlowβ, βmediumβ, βhighβ).render_htmlβ boolean (βtrueβ/βfalseβ).
4.4 Query rewrite
enable_query_rewriteβtrue/false/1/0/yes/no.rewrite_confidence_thresholdβ float.rewrite_tail_turnsβ integer.rewrite_summary_turnsβ integer.
4.5 Tools
use_toolsβtrue/false/1/0/yes/no.
4.6 Provider/model overrides (optional)
These map directly to the stage-spec overrides used in resolve_stage_specs:
New format (recommended):
model_keysβ JSON object with stage-specific model overrides:{ "inference": "gemini:gemini-2.5-flash", "rewrite": "openai:gpt-4o-mini", "rerank": "openai:gpt-4o-mini", "summary": "openai:gpt-4o-mini" }
If provided, they are passed through as strings to the backend and interpreted there.
4.7 UX / observability
show_processing_steps- Boolean-like string; defaults to
falsein the embed client. - When
true, intermediate SSE processing steps will still be emitted by the backend, butchat-embed.jsdoes not currently visualize them.
- Boolean-like string; defaults to
-
show_sourcesβ Boolean-like string for source citation display. conversation_id- Explicit conversation identifier to use.
- Useful if the embedding site wants deterministic IDs.
user_id- Optional user identifier for token accounting.
namespace- Alternative way to provide a conversation identifier.
- If
conversation_idis not provided,namespacewill be used forparams.conversation_id.
prompt_domain- Domain for prompt registry resolution.
mode- Optional string tag, defaults to
embed. - Sent as
params.modefor logging / analytics.
- Optional string tag, defaults to
query_idβ an 8-character ID per turn, generated in the browser.-
conversation_idβ chosen using this logic:- If
conversation_idquery param is present β use it. - Else if
namespacequery param is present β use it. - Else β use
sessionStorage['conversation_id_embed']if set. - Else β generate a new 8-character ID and store it in
sessionStorageunderconversation_id_embed.
- If
These align with the params contract in docs/api-reference.md.
5. Example embed URLs
5.1 Minimal default embed
/chat-embed.html
Relies entirely on backend defaults; uses a generated conversation_id.
5.2 Preset retrieval + inference
/chat-embed.html?top_k=8&score_threshold=0.35&temperature=0.4&max_output_tokens=300
5.3 With query rewrite and tools disabled
/chat-embed.html?
top_k=8&
score_threshold=0.35&
temperature=0.4&
max_output_tokens=300&
enable_query_rewrite=true&
rewrite_confidence_threshold=0.67&
rewrite_tail_turns=1&
rewrite_summary_turns=3&
use_tools=false
5.4 Explicit conversation / namespace
/chat-embed.html?namespace=docs-help&top_k=5
This will use params.conversation_id = "docs-help" for all turns in that iframe.
5.5 Overriding processing steps and top_k
You can control whether processing stages are shown and how many documents are retrieved by passing show_processing_steps and top_k in the iframe URL:
<iframe
src="https://YOUR_DOMAIN/chat-embed.html
?top_k=5
&show_processing_steps=true
&max_output_tokens=256
&namespace=docs-help"
style="border:0;width:100%;height:450px;"
></iframe>
In this example:
top_k=5reduces the number of retrieved chunks.show_processing_steps=trueenables the streaming visualizer in the embed (you can also turn it off withshow_processing_steps=false).- Other parameters (
max_output_tokens,namespace) are also overridden from their defaults.
6. Using embed-loader.js on a host site
For third-party websites that can only add a <script> tag, embed-loader.js provides a simple integration path.
6.1 Basic usage
<div id="support-chat"></div>
<script
src="https://your-app.com/static/embed-loader.js"
data-target="#support-chat"
data-top_k="8"
data-score_threshold="0.35"
data-temperature="0.4"
data-max_output_tokens="300"
data-enable_query_rewrite="true"
data-use_tools="false"
data-namespace="docs-help"
data-width="100%"
data-height="450px"
></script>
embed-loader.js will:
- Read its own
data-*attributes viascript.dataset. - Use
data-targetas a CSS selector to find the host element. - Treat all other
data-*keys (excepttarget,width,height) as query parameters tochat-embed.html. - Compute the
chat-embed.htmlURL relative to the script URL. - Inject an
<iframe>inside the target element with:src = computed chat-embed.html URL + querystring.style.width = data-width(default:100%).style.height = data-height(default:400px).- No border.
6.2 Notes for integrators
- The host page must allow loading the appβs domain in an iframe.
- If you change the path to
chat-embed.html, update the relative URL logic inembed-loader.jsaccordingly. - Configuration parameters are documented in
docs/api-reference.md.
7. Testing locally
-
Start the stack (see
README.md):make start -
Open the embed page directly:
http://localhost:8000/chat-embed.html -
Try with custom params:
http://localhost:8000/chat-embed.html?top_k=8&score_threshold=0.35&temperature=0.4 -
Optionally, create a small HTML page that includes
static/embed-loader.jsand serves it from the same domain to simulate third-party embedding.
8. Quick testing from the browser DevTools console
For fast, one-off experiments (without editing any HTML files), you can inject the embed directly from the browser console:
- Open the page where you want to preview the widget (e.g.
http://localhost:8000/searchorhttp://localhost:8000/chat.html). - Open DevTools β Console.
- Paste and run:
(function () {
const anchor = document.createElement('div');
anchor.id = 'embed-test-anchor';
anchor.style.marginTop = '24px';
document.body.appendChild(anchor);
const iframe = document.createElement('iframe');
iframe.src = 'http://localhost:8000/chat-embed.html'
+ '?top_k=8'
+ '&score_threshold=0.35'
+ '&temperature=0.4'
+ '&max_output_tokens=300'
+ '&enable_query_rewrite=true'
+ '&rewrite_confidence_threshold=0.67'
+ '&rewrite_tail_turns=1'
+ '&rewrite_summary_turns=3'
+ '&use_tools=false'
+ '&show_processing_steps=true'
+ '&namespace=devtools-test';
iframe.style.border = '0';
iframe.style.width = '100%';
iframe.style.height = '450px';
anchor.appendChild(iframe);
})();
This appends an <iframe> pointing at chat-embed.html to the end of the page body for the current browser session only (no file changes needed).
9. Relationship to the main chat UI
- Both
chat.htmlandchat-embed.htmlsend requests to the samePOST /chatendpoint. - Both use the same
paramskeys as defined indocs/api-reference.md. chat-embed.htmlis intentionally minimal and is suitable for iframes and third-party widgets.- Any future changes to the backend
/chatcontract should be reflected in bothdocs/api-reference.mdand this document, to keep embed integrators aligned.
10. Auth & Security (summary)
The embeddable chat shares the same FastAPI backend as the main UI. The
canonical description of authentication and security posture for this
project lives in the root README.md under the Auth & Security Note near the
Technical Overview.
In short:
chat-embed.htmlis a thin iframe UI that issuesPOST /chatrequests.- There is currently no built-in authentication or rate limiting in this
repository for
/chator ingestion routes. - Any client that can reach your deployment and copy the iframe snippet can, in
principle, send traffic to
/chat.
When moving beyond local or controlled environments, you should apply the same
protections discussed in README.md (origin/host allowlists, user auth, rate
limiting, etc.) to the embed usage as well.