OpenRouter

Install

To use OpenRouterModel, you need to either install pydantic-ai, or install pydantic-ai-slim with the openrouter optional group:

pipuv

pip install "pydantic-ai-slim[openrouter]"

uv add "pydantic-ai-slim[openrouter]"

Configuration

To use OpenRouter, first create an API key at openrouter.ai/keys.

You can set the OPENROUTER_API_KEY environment variable and use OpenRouterProvider by name:

from pydantic_ai import Agent

agent = Agent('openrouter:anthropic/claude-sonnet-4.6')
...

Or initialise the model and provider directly:

from pydantic_ai import Agent
from pydantic_ai.models.openrouter import OpenRouterModel
from pydantic_ai.providers.openrouter import OpenRouterProvider

model = OpenRouterModel(
    'anthropic/claude-sonnet-4.6',
    provider=OpenRouterProvider(api_key='your-openrouter-api-key'),
)
agent = Agent(model)
...

App Attribution

OpenRouter has an app attribution feature to track your application in their public ranking and analytics.

You can pass in an app_url and app_title when initializing the provider to enable app attribution.

from pydantic_ai.providers.openrouter import OpenRouterProvider

provider=OpenRouterProvider(
    api_key='your-openrouter-api-key',
    app_url='https://your-app.com',
    app_title='Your App',
),
...

Model Settings

You can customize model behavior using OpenRouterModelSettings:

from pydantic_ai import Agent
from pydantic_ai.models.openrouter import OpenRouterModel, OpenRouterModelSettings

settings = OpenRouterModelSettings(
    openrouter_reasoning={
        'effort': 'high',
    },
    openrouter_usage={
        'include': True,
    }
)
model = OpenRouterModel('openai/gpt-5.2')
agent = Agent(model, model_settings=settings)
...

Eager Input Streaming

For Anthropic models via OpenRouter, you can enable eager input streaming to reduce latency for tool calls with large inputs. Set anthropic_eager_input_streaming in AnthropicModelSettings:

from pydantic_ai import Agent
from pydantic_ai.models.anthropic import AnthropicModelSettings
from pydantic_ai.models.openrouter import OpenRouterModel

model = OpenRouterModel('anthropic/claude-sonnet-4-5')
settings = AnthropicModelSettings(anthropic_eager_input_streaming=True)
agent = Agent(model, model_settings=settings)
...

Prompt Caching

OpenRouter supports prompt caching for downstream providers that implement it. Pydantic AI's OpenRouter cache settings control explicit cache_control breakpoints for Anthropic and Gemini models:

Cache System Instructions: Set OpenRouterModelSettings.openrouter_cache_instructions to True or specify '5m' / '1h' directly
Cache the Last Message: Set OpenRouterModelSettings.openrouter_cache_messages to True to automatically cache the last message in the conversation
Cache Tool Definitions: Set OpenRouterModelSettings.openrouter_cache_tool_definitions to True or specify '5m' / '1h' directly
Fine-Grained Control with CachePoint: Insert a CachePoint marker in user messages to cache everything before it

Provider Differences

Anthropic models support prefix-based caching for both system instructions and message content. TTL values ('5m', '1h') are passed through to the provider.
Gemini models support caching for system instructions and normal message content, but OpenRouter uses only the last breakpoint across normal message content for Gemini caching. Use openrouter_cache_messages or CachePoint when that final message boundary is intentional; use openrouter_cache_instructions only for fully static system context. TTL values are ignored by Gemini. Cached Gemini systemInstruction content is immutable, so put dynamic prompt segments in a later user message instead of after cached system instructions.
Minimum token thresholds apply; see OpenRouter's minimum token requirements for current provider-specific values.

Caching via Model Settings

Use OpenRouterModelSettings to enable explicit caching for system instructions, the last conversation message, and tool definitions:

from pydantic_ai import Agent, RunContext
from pydantic_ai.models.openrouter import OpenRouterModel, OpenRouterModelSettings

model = OpenRouterModel('anthropic/claude-sonnet-4.6')
agent = Agent(
    model,
    instructions='You are a specialized assistant with deep domain knowledge...',
    model_settings=OpenRouterModelSettings(
        openrouter_cache_instructions=True,  # Cache system instructions (broadly supported)
        openrouter_cache_messages=True,  # Cache the last message (best with Anthropic)
        openrouter_cache_tool_definitions=True,  # Cache tool definitions (Anthropic only)
    ),
)


@agent.tool
def search_docs(ctx: RunContext, query: str) -> str:
    """Search documentation."""
    return f'Results for {query}'
...

Each setting accepts True or an explicit '5m' / '1h' TTL value. True sends Anthropic's default '5m' TTL for Anthropic models; Gemini ignores TTL values and manages cache lifetime itself. Check result.usage().cache_write_tokens on initial writes and result.usage().cache_read_tokens on reuse, including subsequent calls with message_history=result.all_messages().

OpenRouter uses provider sticky routing after prompt-cached requests to improve cache locality. For cache-sensitive workflows that need stricter provider control or disabled fallbacks, also set openrouter_provider, for example with {'order': ['anthropic'], 'allow_fallbacks': False}.

Fine-Grained Control with CachePoint

Use CachePoint markers to control exactly where cache boundaries are placed:

from pydantic_ai import Agent, CachePoint
from pydantic_ai.models.openrouter import OpenRouterModel

model = OpenRouterModel('anthropic/claude-sonnet-4.6')
agent = Agent(model)

prompt = [
    'Long reference document or context to cache...',
    CachePoint(),  # Cache everything before this point
    'Now answer my question about the context above',
]
...

Pass the prompt list to agent.run_sync(prompt). Everything before the CachePoint() marker is cached. You can place multiple markers for fine-grained control over cache boundaries.

Anthropic cache-breakpoint ordering

Anthropic processes cache breakpoints in a fixed order — tool definitions, then system instructions, then messages — and rejects a '1h' breakpoint that appears after a '5m' one in that sequence. When mixing TTLs across CachePoint markers or the cache settings on an Anthropic model, place the longer ('1h') breakpoints before the shorter ('5m') ones. Anthropic also allows at most four explicit breakpoints per request; excess breakpoints are dropped (oldest first) before the request is sent.

Web Search

OpenRouter supports web search via its plugins. You can enable it using the WebSearchTool.

Web Search Parameters

You can customize the web search behavior using the search_context_size parameter on WebSearchTool:

from pydantic_ai import Agent
from pydantic_ai.capabilities import NativeTool
from pydantic_ai.models.openrouter import OpenRouterModel
from pydantic_ai.native_tools import WebSearchTool

tool = WebSearchTool(search_context_size='high')
model = OpenRouterModel('openai/gpt-4.1')
agent = Agent(
    model,
    capabilities=[NativeTool(tool)],
)
result = agent.run_sync('What is the latest news in AI?')