API Reference
This document provides the complete API reference for the LLM Adapter, including method signatures, parameter details, response structures, and common usage patterns.
New here? Start with the project overview on the home page: vrraj-llm-adapter docs home.
Source + releases: GitHub repo and PyPI package are linked from the home page.
Note: The package exposes a convenience singleton
llm_adapterwhich is an instance ofLLMAdapter. You can either use this pre-configured instance or create your ownLLMAdapterinstance for custom configuration.
Table of Contents
Core Classes
LLMAdapter
The main adapter class that provides unified access to multiple LLM providers.
class LLMAdapter:
def __init__(
*,
openai_api_key: Optional[str] = None,
gemini_api_key: Optional[str] = None,
openai_base_url: Optional[str] = None,
gemini_base_url: Optional[str] = None,
model_registry: Optional[Dict[str, Any]] = None,
openai_client: Any = None,
gemini_client: Any = None,
)
Parameters:
openai_api_key: OpenAI API key (defaults toOPENAI_API_KEYenv var)gemini_api_key: Gemini API key (defaults toGEMINI_API_KEYenv var)openai_base_url: OpenAI base URL (defaults toOPENAI_BASE_URLenv var)gemini_base_url: Gemini base URL (defaults toGEMINI_OPENAI_BASE_URLenv var)model_registry: Custom model registry to override/extend defaultsopenai_client: Pre-configured OpenAI client (for dependency injection)gemini_client: Pre-configured Gemini client (for dependency injection)
ModelSpec
Structured configuration for model parameters.
@dataclass(frozen=True)
class ModelSpec:
provider: Provider
model: str
temperature: Optional[float] = None
max_output_tokens: Optional[int] = None
extra: Dict[str, Any] = field(default_factory=dict)
Main API Methods
create()
Generate text completions using any supported LLM provider.
def create(
self,
*,
input: Any,
provider: Optional[str] = None,
model: Optional[str] = None,
spec: Optional[ModelSpec] = None,
stream: bool = False,
**kwargs: Any,
) -> Union[AdapterResponse, Iterator[AdapterEvent]]
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
input |
str | list[dict] |
✅ | Prompt text or structured chat messages |
provider |
str |
❌ | Provider override (openai, gemini). Inferred from model if not specified |
model |
str |
❌* | Registry model key (e.g., "openai:gpt-4o-mini"). Required if spec not provided |
spec |
ModelSpec |
❌* | Alternative structured configuration. Required if model not provided |
stream |
bool |
❌ | Enable streaming responses (default: False) |
**kwargs |
Any |
❌ | Provider-specific parameters (filtered by model’s param_policy) |
Common **kwargs (model-dependent):
reasoning_effort:"none" | "minimal" | "low" | "medium" | "high"- Adapter-level reasoning hintmax_output_tokens:int- Maximum output tokenstemperature:float- Sampling temperature (0.0-2.0)top_p:float- Nucleus sampling (0.0-1.0)tools:list[dict]- Tool/function definitionstool_choice:str | dict- Tool choice strategyinclude_thoughts:bool- Include reasoning traces when supported (legacy, usereasoning_effortinstead)
Response Fields:
text:str- Main response textreasoning:str- Reasoning/thinking content fromgenerate.reasoningfieldusage:dict- Token usage information
Returns:
- Non-streaming:
AdapterResponse - Streaming:
Iterator[AdapterEvent]
Example:
# Basic usage
response = llm_adapter.create(
model="openai:gpt-4o-mini",
input="Explain quantum entanglement in simple terms."
)
# With structured messages
response = llm_adapter.create(
model="gemini:native-sdk-3-flash-preview",
input=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
],
temperature=0.7,
max_output_tokens=500
)
# Streaming
for event in llm_adapter.create(
model="openai:gpt-4o-mini",
input="Tell me a story",
stream=True
):
if event.type == "response.output_text.delta":
print(event.delta, end="")
create_embedding()
Generate embeddings using any supported provider.
def create_embedding(
self,
*,
input: Any,
provider: Optional[str] = None,
model: Optional[str] = None,
spec: Optional[ModelSpec] = None,
**kwargs: Any,
) -> EmbeddingResponse
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
input |
str | list[str] |
✅ | Text or list of texts to embed |
provider |
str |
❌ | Provider override. Inferred from model if not specified |
model |
str |
❌* | Registry model key. Required if spec not provided |
spec |
ModelSpec |
❌* | Alternative structured configuration. Required if model not provided |
**kwargs |
Any |
❌ | Provider-specific parameters |
Common **kwargs (model-dependent):
dimensions:int- Output embedding dimensions (when supported)normalize_embedding:bool- Whether to normalize vectors (Gemini only)task_type:str- Task type for Gemini native embeddingsoutput_dimensionality:int- Output dimensions for Gemini native embeddings
Returns: EmbeddingResponse
Example:
# Basic embedding
response = llm_adapter.create_embedding(
model="openai:embed_small",
input="The quick brown fox jumps over the lazy dog"
)
# Batch embeddings
response = llm_adapter.create_embedding(
model="gemini:native-embed",
input=["Text 1", "Text 2", "Text 3"],
dimensions=768
)
# Normalized embeddings (Gemini)
response = llm_adapter.create_embedding(
model="gemini:openai-embed",
input="Text to normalize",
normalize_embedding=True
)
normalize_adapter_response()
Convert an AdapterResponse to a standardized LLMResult format.
def normalize_adapter_response(
self,
resp: AdapterResponse,
*,
provider: Optional[str] = None,
model_key: Optional[str] = None,
) -> LLMResult
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
resp |
AdapterResponse |
✅ | Response from llm_adapter.create() |
provider |
str |
❌ | Provider override (inferred from response if not specified) |
model_key |
str |
❌ | Model key override (inferred from response if not specified) |
Returns: LLMResult
Example:
response = llm_adapter.create(model="openai:gpt-4o-mini", input="Hello")
normalized = llm_adapter.normalize_adapter_response(response)
print(f"Text: {normalized['text']}")
print(f"Reasoning: {normalized.get('reasoning')}")
print(f"Usage: {normalized['usage']}")
get_pricing_for_model()
Access pricing metadata for any model in the registry.
Note: This method is also available as
get_model_pricing()for backward compatibility.get_pricing_for_model()is the canonical name.
def get_pricing_for_model(self, model: str) -> Optional[Dict[str, Any]]
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
model |
str |
✅ | Registry model key or provider-native model name |
Returns: Optional[Dict[str, Any]] - Pricing metadata or None if not found
Example:
pricing = llm_adapter.get_pricing_for_model("openai:gpt-4o-mini")
if pricing:
print(f"Input cost: ${pricing['input_per_mm']}/1M tokens")
print(f"Output cost: ${pricing['output_per_mm']}/1M tokens")
Response Structures
AdapterResponse
Primary response from create() method for non-streaming calls.
class AdapterResponse:
def __init__(
self,
*,
output_text: str,
model: str,
usage: Optional[Dict[str, int]] = None,
metadata: Optional[Dict[str, Any]] = None,
adapter_response: Any | None = None,
model_response: Any | None = None,
status: Optional[str] = None,
finish_reason: Optional[str] = None,
tool_calls: Optional[List[Dict[str, Any]]] = None,
)
Fields:
| Field | Type | Stability | Description |
|---|---|---|---|
output_text |
str |
✅ Guaranteed | Generated text content |
model |
str |
✅ Guaranteed | Model identifier used |
usage |
Dict[str, int] |
✅ Guaranteed | Token usage information |
status |
str |
✅ Guaranteed | Completion status ("completed", "incomplete") |
finish_reason |
str |
✅ Guaranteed | Why generation stopped |
tool_calls |
List[Dict] |
✅ Guaranteed | Tool/function calls if any |
metadata |
Dict[str, Any] |
✅ Guaranteed | Provider and routing metadata |
adapter_response |
Any |
🔧 Debug/Opaque | Adapter-processed response (may vary) |
model_response |
Any |
🔧 Debug/Opaque | Original provider response (may vary) |
Legend:
- ✅ Guaranteed: Stable interface across providers and versions
- 🔧 Debug/Opaque: For debugging only, may change between providers/versions
EmbeddingResponse
Response from create_embedding() method.
class EmbeddingResponse:
def __init__(
self,
data: List[List[float]],
usage: Any,
normalized: Optional[bool] = None,
vector_dim: Optional[int] = None,
metadata: Optional[Dict[str, Any]] = None,
raw: Optional[Any] = None,
)
Fields:
| Field | Type | Description |
|---|---|---|
data |
List[List[float]] |
Direct list of embedding vectors |
usage |
EmbeddingUsage |
Token usage information |
normalized |
bool |
Whether vectors were normalized |
vector_dim |
int |
Dimension of each vector |
metadata |
Dict[str, Any] |
Additional metadata (includes provider, model, etc.) |
raw |
Any |
Original response for debugging |
EmbeddingUsage
Usage information for embedding responses.
class EmbeddingUsage:
def __init__(self, prompt_tokens: int = 0, total_tokens: int = 0)
Fields:
prompt_tokens:int- Number of input tokenstotal_tokens:int- Total tokens processed
LLMResult
Standardized response format from normalize_adapter_response().
class LLMResult(TypedDict, total=False):
text: str
reasoning: Optional[str]
role: str
status: str
finish_reason: Optional[str]
usage: "LLMUsage"
tool_calls: List["LLMToolCall"]
metadata: Optional[Dict[str, Any]]
raw: Any
Key Fields:
text:str- Main response textreasoning:Optional[str]- Separate reasoning content (Gemini)usage:LLMUsage- Standardized usage metricstool_calls:List[LLMToolCall]- Normalized tool calls
LLMUsage
Standardized usage metrics across all providers.
class LLMUsage(TypedDict, total=False):
prompt_tokens: int
cached_tokens: int
output_tokens: int
reasoning_tokens: int
answer_tokens: int
total_tokens: int
Key Relationships:
output_tokens = answer_tokens + reasoning_tokenstotal_tokens = prompt_tokens + cached_tokens + output_tokens
AdapterEvent
Streaming event from create(stream=True).
class AdapterEvent:
def __init__(self, event_type: str, delta: Optional[str] = None)
Fields:
type:str- Event type ("response.output_text.delta","response.output_text.done")delta:Optional[str]- Text delta for delta events
LLMError
Structured error for provider or configuration failures.
class LLMError(Exception):
def __init__(
self,
*,
provider: str,
model: Optional[str] = None,
kind: str = "llm_error",
code: Optional[Any] = None,
message: str = "",
retry_after: Optional[float] = None,
)
Common Error Kinds:
"config"- Configuration issues (missing API keys, invalid models)"rate_limit"- Rate limiting errors"auth"- Authentication failures"model_not_found"- Model not available"request"- Invalid request parameters"provider_error"- Provider-side errors
Error Handling
All methods can raise LLMError for structured error handling.
from llm_adapter import llm_adapter, LLMError
try:
response = llm_adapter.create(
model="openai:gpt-4o-mini",
input="Hello world"
)
except LLMError as e:
print(f"Provider: {e.provider}")
print(f"Model: {e.model}")
print(f"Error kind: {e.kind}")
print(f"Error code: {e.code}")
print(f"Message: {e}")
if e.retry_after:
print(f"Retry after: {e.retry_after} seconds")
Common Usage Patterns
1. Basic Text Generation
from llm_adapter import llm_adapter
response = llm_adapter.create(
model="openai:gpt-4o-mini",
input="Write a haiku about programming"
)
print(response.output_text)
print(f"Usage: {response.usage}")
2. Streaming Responses
def stream_response(model: str, prompt: str):
collected_text = []
for event in llm_adapter.create(
model=model,
input=prompt,
stream=True
):
if event.type == "response.output_text.delta":
delta = event.delta or ""
print(delta, end="", flush=True)
collected_text.append(delta)
elif event.type == "response.output_text.done":
print("\n[Streaming complete]")
break
return "".join(collected_text)
full_text = stream_response("openai:gpt-4o-mini", "Tell me a story")
3. Tool/Function Calling
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"}
},
"required": ["location"]
}
}
}
]
response = llm_adapter.create(
model="openai:gpt-4o-mini",
input="What's the weather in Seattle?",
tools=tools,
tool_choice="auto"
)
if response.tool_calls:
for tool_call in response.tool_calls:
print(f"Function: {tool_call['name']}")
print(f"Args: {tool_call['args']}")
4. Reasoning with Gemini
response = llm_adapter.create(
model="gemini:native-sdk-reasoning-2.5-flash",
input="Solve this step by step: 15 * 23 - 7",
reasoning_effort="high",
max_output_tokens=1000
)
normalized = llm_adapter.normalize_adapter_response(response)
if normalized.get('reasoning'):
print(f"Reasoning: {normalized['reasoning']}")
print(f"Answer: {normalized['text']}")
5. Batch Embeddings
texts = [
"The cat sat on the mat",
"Artificial intelligence is transforming society",
"Machine learning models require training data"
]
response = llm_adapter.create_embedding(
model="openai:embed_small",
input=texts
)
print(f"Generated {len(response.data)} embeddings")
print(f"Dimensions: {response.vector_dim}")
print(f"Usage: {response.usage}")
# Access individual embeddings
for i, embedding in enumerate(response.data):
print(f"Text {i}: {len(embedding)} dimensions")
6. ModelSpec for Reusable Configuration
from llm_adapter import ModelSpec
# Create reusable configuration
chat_spec = ModelSpec(
provider="openai",
model="gpt-4o-mini",
temperature=0.7,
max_output_tokens=1000
)
# Use with multiple requests
for prompt in ["Hello", "How are you?", "Goodbye"]:
response = llm_adapter.create(spec=chat_spec, input=prompt)
print(response.output_text)
7. Error Handling with Fallbacks
def safe_generate(prompt: str, primary_model: str, fallback_model: str):
try:
return llm_adapter.create(model=primary_model, input=prompt)
except LLMError as e:
if e.kind in ["rate_limit", "model_not_found"]:
print(f"Primary model failed: {e}. Trying fallback...")
return llm_adapter.create(model=fallback_model, input=prompt)
else:
raise
response = safe_generate(
"Explain photosynthesis",
"openai:gpt-4o-mini",
"gemini:native-sdk-3-flash-preview"
)
8. Access Control with Allowlist
import os
# Set allowlist (or use LLM_ADAPTER_ALLOWED_MODELS env var)
os.environ["LLM_ADAPTER_ALLOWED_MODELS"] = "openai:gpt-4o-mini,gemini:native-sdk-3-flash-preview"
try:
# This will work
response = llm_adapter.create(model="openai:gpt-4o-mini", input="Hello")
# This will raise LLMError with code="model_not_allowed"
response = llm_adapter.create(model="openai:gpt-4o", input="Hello")
except LLMError as e:
if e.code == "model_not_allowed":
print(f"Model not in allowlist: {e.model}")
Parameter Stability
Stable Parameters (guaranteed across providers)
| Parameter | Stability | Notes |
|---|---|---|
input |
✅ Stable | Core parameter for all methods |
model |
✅ Stable | Registry model key |
provider |
✅ Stable | Provider override |
spec |
✅ Stable | ModelSpec configuration |
stream |
✅ Stable | Streaming flag |
max_output_tokens |
✅ Stable | Canonical output limit |
temperature |
✅ Stable | When supported by model |
top_p |
✅ Stable | When supported by model |
Provider-Specific Parameters (passed via **kwargs)
| Parameter | Provider | Stability | Notes |
|---|---|---|---|
reasoning_effort |
OpenAI, Gemini | 🔄 Adapter-level | Normalized by adapter |
tools, tool_choice |
OpenAI, Gemini | 🔄 Adapter-level | Normalized by adapter |
include_thoughts |
Gemini | ⚠️ Legacy | Use reasoning_effort instead; reasoning available via generate.reasoning field |
normalize_embedding |
Gemini | 🔄 Provider-level | Gemini embeddings only |
dimensions |
OpenAI, Gemini | 🔄 Provider-level | When supported |
task_type, output_dimensionality |
Gemini native | 🔄 Provider-level | Native SDK only |
Legend:
- ✅ Stable: Guaranteed interface, won’t change
- 🔄 Adapter-level: Normalized by adapter for consistent behavior
- 🔄 Provider-level: Passed through to provider SDK
For complete parameter policies per model, see the model registry configuration.