Skip to content

pydantic_ai.models.openrouter

Setup

For details on how to set up authentication with this model, see model configuration for OpenRouter.

KnownOpenRouterProviders module-attribute

KnownOpenRouterProviders = Literal[
    "z-ai",
    "cerebras",
    "venice",
    "moonshotai",
    "morph",
    "stealth",
    "wandb",
    "klusterai",
    "openai",
    "sambanova",
    "amazon-bedrock",
    "mistral",
    "nextbit",
    "atoma",
    "ai21",
    "minimax",
    "baseten",
    "anthropic",
    "featherless",
    "groq",
    "lambda",
    "azure",
    "ncompass",
    "deepseek",
    "hyperbolic",
    "crusoe",
    "cohere",
    "mancer",
    "avian",
    "perplexity",
    "novita",
    "siliconflow",
    "switchpoint",
    "xai",
    "inflection",
    "fireworks",
    "deepinfra",
    "inference-net",
    "inception",
    "atlas-cloud",
    "nvidia",
    "alibaba",
    "friendli",
    "infermatic",
    "targon",
    "ubicloud",
    "aion-labs",
    "liquid",
    "nineteen",
    "cloudflare",
    "nebius",
    "chutes",
    "enfer",
    "crofai",
    "open-inference",
    "phala",
    "gmicloud",
    "meta",
    "relace",
    "parasail",
    "together",
    "google-ai-studio",
    "google-vertex",
]

Known providers in the OpenRouter marketplace

OpenRouterProviderName module-attribute

OpenRouterProviderName = str | KnownOpenRouterProviders

Possible OpenRouter provider names.

Since OpenRouter is constantly updating their list of providers, we explicitly list some known providers but allow any name in the type hints. See the OpenRouter API for a full list.

OpenRouterTransforms module-attribute

OpenRouterTransforms = Literal['middle-out']

Available messages transforms for OpenRouter models with limited token windows.

Currently only supports 'middle-out', but is expected to grow in the future.

OpenRouterCacheTTL module-attribute

OpenRouterCacheTTL = bool | Literal['5m', '1h']

Cache breakpoint time-to-live for OpenRouter prompt caching.

True selects the default TTL ('5m'); '5m' or '1h' may be given explicitly. The TTL is only forwarded to downstream providers that support it (Anthropic); it is omitted for Gemini.

OpenRouterProviderConfig

Bases: TypedDict

Represents the 'Provider' object from the OpenRouter API.

Source code in pydantic_ai_slim/pydantic_ai/models/openrouter.py
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
class OpenRouterProviderConfig(TypedDict, total=False):
    """Represents the 'Provider' object from the OpenRouter API."""

    order: list[OpenRouterProviderName]
    """List of provider slugs to try in order (e.g. ["anthropic", "openai"]). [See details](https://openrouter.ai/docs/features/provider-routing#ordering-specific-providers)"""

    allow_fallbacks: bool
    """Whether to allow backup providers when the primary is unavailable. [See details](https://openrouter.ai/docs/features/provider-routing#disabling-fallbacks)"""

    require_parameters: bool
    """Only use providers that support all parameters in your request."""

    data_collection: Literal['allow', 'deny']
    """Control whether to use providers that may store data. [See details](https://openrouter.ai/docs/features/provider-routing#requiring-providers-to-comply-with-data-policies)"""

    zdr: bool
    """Restrict routing to only ZDR (Zero Data Retention) endpoints. [See details](https://openrouter.ai/docs/features/provider-routing#zero-data-retention-enforcement)"""

    only: list[OpenRouterProviderName]
    """List of provider slugs to allow for this request. [See details](https://openrouter.ai/docs/features/provider-routing#allowing-only-specific-providers)"""

    ignore: list[str]
    """List of provider slugs to skip for this request. [See details](https://openrouter.ai/docs/features/provider-routing#ignoring-providers)"""

    quantizations: list[Literal['int4', 'int8', 'fp4', 'fp6', 'fp8', 'fp16', 'bf16', 'fp32', 'unknown']]
    """List of quantization levels to filter by (e.g. ["int4", "int8"]). [See details](https://openrouter.ai/docs/features/provider-routing#quantization)"""

    sort: Literal['price', 'throughput', 'latency']
    """Sort providers by price or throughput. (e.g. "price" or "throughput"). [See details](https://openrouter.ai/docs/features/provider-routing#provider-sorting)"""

    max_price: _OpenRouterMaxPrice
    """The maximum pricing you want to pay for this request. [See details](https://openrouter.ai/docs/features/provider-routing#max-price)"""

order instance-attribute

List of provider slugs to try in order (e.g. ["anthropic", "openai"]). See details

allow_fallbacks instance-attribute

allow_fallbacks: bool

Whether to allow backup providers when the primary is unavailable. See details

require_parameters instance-attribute

require_parameters: bool

Only use providers that support all parameters in your request.

data_collection instance-attribute

data_collection: Literal['allow', 'deny']

Control whether to use providers that may store data. See details

zdr instance-attribute

zdr: bool

Restrict routing to only ZDR (Zero Data Retention) endpoints. See details

only instance-attribute

List of provider slugs to allow for this request. See details

ignore instance-attribute

ignore: list[str]

List of provider slugs to skip for this request. See details

quantizations instance-attribute

quantizations: list[
    Literal[
        "int4",
        "int8",
        "fp4",
        "fp6",
        "fp8",
        "fp16",
        "bf16",
        "fp32",
        "unknown",
    ]
]

List of quantization levels to filter by (e.g. ["int4", "int8"]). See details

sort instance-attribute

sort: Literal['price', 'throughput', 'latency']

Sort providers by price or throughput. (e.g. "price" or "throughput"). See details

max_price instance-attribute

max_price: _OpenRouterMaxPrice

The maximum pricing you want to pay for this request. See details

OpenRouterReasoning

Bases: TypedDict

Configuration for reasoning tokens in OpenRouter requests.

Reasoning tokens allow models to show their step-by-step thinking process. You can configure this using either OpenAI-style effort levels or Anthropic-style token limits, but not both simultaneously.

Source code in pydantic_ai_slim/pydantic_ai/models/openrouter.py
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
class OpenRouterReasoning(TypedDict, total=False):
    """Configuration for reasoning tokens in OpenRouter requests.

    Reasoning tokens allow models to show their step-by-step thinking process.
    You can configure this using either OpenAI-style effort levels or Anthropic-style
    token limits, but not both simultaneously.
    """

    effort: Literal['xhigh', 'high', 'medium', 'low', 'minimal', 'none']
    """OpenAI-style reasoning effort level. Cannot be used with max_tokens."""

    max_tokens: int
    """Anthropic-style specific token limit for reasoning. Cannot be used with effort."""

    exclude: bool
    """Whether to exclude reasoning tokens from the response. Default is False. All models support this."""

    enabled: bool
    """Whether to enable reasoning with default parameters. Default is inferred from effort or max_tokens."""

effort instance-attribute

effort: Literal[
    "xhigh", "high", "medium", "low", "minimal", "none"
]

OpenAI-style reasoning effort level. Cannot be used with max_tokens.

max_tokens instance-attribute

max_tokens: int

Anthropic-style specific token limit for reasoning. Cannot be used with effort.

exclude instance-attribute

exclude: bool

Whether to exclude reasoning tokens from the response. Default is False. All models support this.

enabled instance-attribute

enabled: bool

Whether to enable reasoning with default parameters. Default is inferred from effort or max_tokens.

OpenRouterUsageConfig

Bases: TypedDict

Configuration for OpenRouter usage.

Source code in pydantic_ai_slim/pydantic_ai/models/openrouter.py
247
248
249
250
class OpenRouterUsageConfig(TypedDict, total=False):
    """Configuration for OpenRouter usage."""

    include: bool

OpenRouterModelSettings

Bases: ModelSettings

Settings used for an OpenRouter model request.

Source code in pydantic_ai_slim/pydantic_ai/models/openrouter.py
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
class OpenRouterModelSettings(ModelSettings, total=False):
    """Settings used for an OpenRouter model request."""

    # ALL FIELDS MUST BE `openrouter_` PREFIXED SO YOU CAN MERGE THEM WITH OTHER MODELS.

    openrouter_models: list[str]
    """A list of fallback models.

    These models will be tried, in order, if the main model returns an error. [See details](https://openrouter.ai/docs/features/model-routing#the-models-parameter)
    """

    openrouter_provider: OpenRouterProviderConfig
    """OpenRouter routes requests to the best available providers for your model. By default, requests are load balanced across the top providers to maximize uptime.

    You can customize how your requests are routed using the provider object. [See more](https://openrouter.ai/docs/features/provider-routing)"""

    openrouter_preset: str
    """Presets allow you to separate your LLM configuration from your code.

    Create and manage presets through the OpenRouter web application to control provider routing, model selection, system prompts, and other parameters, then reference them in OpenRouter API requests. [See more](https://openrouter.ai/docs/features/presets)"""

    openrouter_transforms: list[OpenRouterTransforms]
    """To help with prompts that exceed the maximum context size of a model.

    Transforms work by removing or truncating messages from the middle of the prompt, until the prompt fits within the model's context window. [See more](https://openrouter.ai/docs/features/message-transforms)
    """

    openrouter_reasoning: OpenRouterReasoning
    """To control the reasoning tokens in the request.

    The reasoning config object consolidates settings for controlling reasoning strength across different models. [See more](https://openrouter.ai/docs/use-cases/reasoning-tokens)
    """

    openrouter_usage: OpenRouterUsageConfig
    """To control the usage of the model.

    The usage config object consolidates settings for enabling detailed usage information. [See more](https://openrouter.ai/docs/use-cases/usage-accounting)
    """

    openrouter_cache_instructions: OpenRouterCacheTTL
    """Whether to add `cache_control` to stable system instructions.

    When enabled, supported downstream providers (Anthropic, Gemini) can cache stable
    system instructions and reduce costs. If dynamic instructions are present, the cache
    point is placed before them, matching Anthropic's static-prefix caching behavior.
    For Gemini models, this setting is ignored when dynamic instructions are present because
    OpenRouter normalizes system/developer messages into a single immutable `systemInstruction`.
    Ignored for other downstream providers.
    If `True`, uses TTL='5m'. You can also specify '5m' or '1h' directly.
    TTL is only included for Anthropic models; Gemini does not support explicit TTL.

    See https://openrouter.ai/docs/guides/best-practices/prompt-caching for more information.
    """

    openrouter_cache_messages: OpenRouterCacheTTL
    """Convenience setting to enable caching for the last message in the conversation.

    When enabled, this automatically adds `cache_control` to the last content block
    in the final message (regardless of role), which is useful for Anthropic's prefix-based
    caching in multi-turn conversations. In tool-use flows, this may target a tool result
    message rather than a user message, which is correct for prefix caching.
    Ignored for downstream providers that do not support explicit cache control.
    If `True`, uses TTL='5m'. You can also specify '5m' or '1h' directly.
    TTL is only included for Anthropic models; Gemini does not support explicit TTL.

    Note: OpenRouter uses only the last breakpoint across normal message content for
    Gemini caching. Use this when caching the final message boundary is intentional;
    use `openrouter_cache_instructions` for stable system context. Anthropic supports
    prefix-based caching across multi-turn conversations with this setting.

    See https://openrouter.ai/docs/guides/best-practices/prompt-caching for more information.
    """

    openrouter_cache_tool_definitions: OpenRouterCacheTTL
    """Whether to add `cache_control` to the last tool definition.

    When enabled, the last tool in the `tools` array will have `cache_control` set,
    allowing supported downstream providers to cache tool definitions and reduce costs.
    Ignored for downstream providers that do not support explicit tool definition caching.
    If `True`, uses TTL='5m'. You can also specify '5m' or '1h' directly.
    TTL is only included for Anthropic models.

    Currently only effective for Anthropic models via OpenRouter, as tool definition
    caching is not documented for other providers.

    See https://openrouter.ai/docs/guides/best-practices/prompt-caching for more information.
    """

openrouter_models instance-attribute

openrouter_models: list[str]

A list of fallback models.

These models will be tried, in order, if the main model returns an error. See details

openrouter_provider instance-attribute

openrouter_provider: OpenRouterProviderConfig

OpenRouter routes requests to the best available providers for your model. By default, requests are load balanced across the top providers to maximize uptime.

You can customize how your requests are routed using the provider object. See more

openrouter_preset instance-attribute

openrouter_preset: str

Presets allow you to separate your LLM configuration from your code.

Create and manage presets through the OpenRouter web application to control provider routing, model selection, system prompts, and other parameters, then reference them in OpenRouter API requests. See more

openrouter_transforms instance-attribute

openrouter_transforms: list[OpenRouterTransforms]

To help with prompts that exceed the maximum context size of a model.

Transforms work by removing or truncating messages from the middle of the prompt, until the prompt fits within the model's context window. See more

openrouter_reasoning instance-attribute

openrouter_reasoning: OpenRouterReasoning

To control the reasoning tokens in the request.

The reasoning config object consolidates settings for controlling reasoning strength across different models. See more

openrouter_usage instance-attribute

openrouter_usage: OpenRouterUsageConfig

To control the usage of the model.

The usage config object consolidates settings for enabling detailed usage information. See more

openrouter_cache_instructions instance-attribute

openrouter_cache_instructions: OpenRouterCacheTTL

Whether to add cache_control to stable system instructions.

When enabled, supported downstream providers (Anthropic, Gemini) can cache stable system instructions and reduce costs. If dynamic instructions are present, the cache point is placed before them, matching Anthropic's static-prefix caching behavior. For Gemini models, this setting is ignored when dynamic instructions are present because OpenRouter normalizes system/developer messages into a single immutable systemInstruction. Ignored for other downstream providers. If True, uses TTL='5m'. You can also specify '5m' or '1h' directly. TTL is only included for Anthropic models; Gemini does not support explicit TTL.

See https://openrouter.ai/docs/guides/best-practices/prompt-caching for more information.

openrouter_cache_messages instance-attribute

openrouter_cache_messages: OpenRouterCacheTTL

Convenience setting to enable caching for the last message in the conversation.

When enabled, this automatically adds cache_control to the last content block in the final message (regardless of role), which is useful for Anthropic's prefix-based caching in multi-turn conversations. In tool-use flows, this may target a tool result message rather than a user message, which is correct for prefix caching. Ignored for downstream providers that do not support explicit cache control. If True, uses TTL='5m'. You can also specify '5m' or '1h' directly. TTL is only included for Anthropic models; Gemini does not support explicit TTL.

Note: OpenRouter uses only the last breakpoint across normal message content for Gemini caching. Use this when caching the final message boundary is intentional; use openrouter_cache_instructions for stable system context. Anthropic supports prefix-based caching across multi-turn conversations with this setting.

See https://openrouter.ai/docs/guides/best-practices/prompt-caching for more information.

openrouter_cache_tool_definitions instance-attribute

openrouter_cache_tool_definitions: OpenRouterCacheTTL

Whether to add cache_control to the last tool definition.

When enabled, the last tool in the tools array will have cache_control set, allowing supported downstream providers to cache tool definitions and reduce costs. Ignored for downstream providers that do not support explicit tool definition caching. If True, uses TTL='5m'. You can also specify '5m' or '1h' directly. TTL is only included for Anthropic models.

Currently only effective for Anthropic models via OpenRouter, as tool definition caching is not documented for other providers.

See https://openrouter.ai/docs/guides/best-practices/prompt-caching for more information.

OpenRouterModel

Bases: OpenAIChatModel

Extends OpenAIChatModel to capture extra metadata for Openrouter.

Source code in pydantic_ai_slim/pydantic_ai/models/openrouter.py
 684
 685
 686
 687
 688
 689
 690
 691
 692
 693
 694
 695
 696
 697
 698
 699
 700
 701
 702
 703
 704
 705
 706
 707
 708
 709
 710
 711
 712
 713
 714
 715
 716
 717
 718
 719
 720
 721
 722
 723
 724
 725
 726
 727
 728
 729
 730
 731
 732
 733
 734
 735
 736
 737
 738
 739
 740
 741
 742
 743
 744
 745
 746
 747
 748
 749
 750
 751
 752
 753
 754
 755
 756
 757
 758
 759
 760
 761
 762
 763
 764
 765
 766
 767
 768
 769
 770
 771
 772
 773
 774
 775
 776
 777
 778
 779
 780
 781
 782
 783
 784
 785
 786
 787
 788
 789
 790
 791
 792
 793
 794
 795
 796
 797
 798
 799
 800
 801
 802
 803
 804
 805
 806
 807
 808
 809
 810
 811
 812
 813
 814
 815
 816
 817
 818
 819
 820
 821
 822
 823
 824
 825
 826
 827
 828
 829
 830
 831
 832
 833
 834
 835
 836
 837
 838
 839
 840
 841
 842
 843
 844
 845
 846
 847
 848
 849
 850
 851
 852
 853
 854
 855
 856
 857
 858
 859
 860
 861
 862
 863
 864
 865
 866
 867
 868
 869
 870
 871
 872
 873
 874
 875
 876
 877
 878
 879
 880
 881
 882
 883
 884
 885
 886
 887
 888
 889
 890
 891
 892
 893
 894
 895
 896
 897
 898
 899
 900
 901
 902
 903
 904
 905
 906
 907
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
class OpenRouterModel(OpenAIChatModel):
    """Extends OpenAIChatModel to capture extra metadata for Openrouter."""

    def __init__(
        self,
        model_name: str,
        *,
        provider: Literal['openrouter'] | Provider[AsyncOpenAI] = 'openrouter',
        profile: ModelProfileSpec | None = None,
        settings: ModelSettings | None = None,
    ):
        """Initialize an OpenRouter model.

        Args:
            model_name: The name of the model to use.
            provider: The provider to use for authentication and API access. If not provided, a new provider will be created with the default settings.
            profile: The model profile to use. Defaults to a profile picked by the provider based on the model name.
            settings: Model-specific settings that will be used as defaults for this model.
        """
        super().__init__(model_name, provider=provider or OpenRouterProvider(), profile=profile, settings=settings)

    @property
    def _resolved_profile(self) -> OpenRouterModelProfile:
        return cast(OpenRouterModelProfile, self.profile)

    def _build_cache_control(self, ttl: OpenRouterCacheTTL = '5m') -> dict[str, str]:
        """Build a `cache_control` dict for the downstream provider.

        Args:
            ttl: The cache time-to-live. `True` is treated as `'5m'`.
                Only included for providers that support it (Anthropic).
        """
        resolved_ttl: Literal['5m', '1h'] = '5m' if isinstance(ttl, bool) else ttl
        cache_control: dict[str, str] = {'type': 'ephemeral'}
        if self._resolved_profile.get('openrouter_supports_cache_ttl', False):
            cache_control['ttl'] = resolved_ttl
        return cache_control

    def _limit_cache_points(
        self,
        openai_messages: list[chat.ChatCompletionMessageParam],
        *,
        has_tool_cache_point: bool = False,
    ) -> None:
        """Limit the number of cache breakpoints to the downstream provider's maximum.

        Anthropic enforces a maximum of 4 cache breakpoints per request. When the limit
        is exceeded, excess breakpoints are removed from messages (oldest first), preserving
        tool and system/developer cache points which are typically more valuable.

        Follows the same strategy as the Anthropic and Bedrock models' `_limit_cache_points`:
        1. Reserve slots for tool cache points (known from `has_tool_cache_point`)
        2. Count cache points in system/developer messages (always preserved)
        3. Calculate remaining budget for user/assistant message cache points
        4. Traverse remaining messages newest-first, removing excess cache points

        Args:
            openai_messages: The mapped OpenAI messages to limit.
            has_tool_cache_point: Whether a tool definition cache point was added by `_get_tool_choice`.
        """
        max_points = self._resolved_profile.get('openrouter_max_cache_points')
        if max_points is None:
            return

        used = int(has_tool_cache_point)

        for msg in openai_messages:
            if msg.get('role') in ('system', 'developer'):
                content = msg.get('content')
                if isinstance(content, list):
                    used += sum(1 for part in content if 'cache_control' in cast(dict[str, Any], part))

        remaining = max_points - used
        if remaining < 0:
            raise UserError(
                f'Too many cache points for downstream provider. '
                f'Tool and system cache points already use {used}, '
                f'which exceeds the maximum of {max_points}.'
            )

        for msg in reversed(openai_messages):
            if msg.get('role') in ('system', 'developer'):
                continue
            content = msg.get('content')
            if not isinstance(content, list):
                continue
            for part in reversed(content):
                part_dict = cast(dict[str, Any], part)
                if 'cache_control' in part_dict:
                    if remaining > 0:
                        remaining -= 1
                    else:
                        del part_dict['cache_control']

    def _add_cache_control(self, params: list[ChatCompletionContentPartParam], ttl: OpenRouterCacheTTL = '5m') -> None:
        """Add `cache_control` to the last content part.

        Mirrors the Anthropic model's `_add_cache_control_to_last_param` behavior for
        OpenRouter's Anthropic and Gemini providers.

        See https://openrouter.ai/docs/guides/best-practices/prompt-caching for more information.

        Args:
            params: List of content parts to modify.
            ttl: The cache time-to-live (`True` -> `'5m'`, or `'5m'`/`'1h'`).
                Ignored for providers that don't support it.
        """
        if not self._resolved_profile.get('openrouter_supports_cache_control', False):
            return

        if not params:
            raise UserError(
                'CachePoint cannot be the first content in a user message - there must be previous content to attach the CachePoint to. '
                'To cache system instructions or tool definitions, use the `openrouter_cache_instructions` or `openrouter_cache_tool_definitions` settings instead.'
            )

        last_param = cast(dict[str, Any], params[-1])
        last_param['cache_control'] = self._build_cache_control(ttl)

    def _add_cache_control_to_message(
        self, message: chat.ChatCompletionMessageParam, ttl: OpenRouterCacheTTL = '5m'
    ) -> None:
        """Add `cache_control` to the last content block in a mapped chat message."""
        content = message.get('content')
        if isinstance(content, str):
            message['content'] = [  # type: ignore[typeddict-unknown-key]
                {'type': 'text', 'text': content, 'cache_control': self._build_cache_control(ttl)}
            ]
        elif isinstance(content, list) and content:
            last_part = cast(dict[str, Any], content[-1])
            last_part.setdefault('cache_control', self._build_cache_control(ttl))

    def _add_cache_control_to_instructions(
        self,
        openai_messages: list[chat.ChatCompletionMessageParam],
        messages: Sequence[ModelMessage],
        model_request_parameters: ModelRequestParameters,
        ttl: OpenRouterCacheTTL,
    ) -> None:
        instruction_parts = self._get_instruction_parts(messages, model_request_parameters)
        if not instruction_parts:
            for msg in reversed(openai_messages):
                if msg.get('role') in ('system', 'developer'):
                    self._add_cache_control_to_message(msg, ttl)
                    break
            return

        instruction_role = self._resolved_profile.get('openai_system_prompt_role', None) or 'system'
        if instruction_role not in ('system', 'developer'):
            return

        has_dynamic_instructions = any(part.dynamic for part in instruction_parts)
        if has_dynamic_instructions and not self._resolved_profile.get(
            'openrouter_supports_dynamic_instruction_cache', False
        ):
            # OpenRouter normalizes Google system/developer messages into Gemini's single
            # `systemInstruction`, which Gemini caches as an immutable block (explicit
            # cache_control path). Unlike Anthropic prefix caching, we can't cache only the
            # static instruction prefix and leave a dynamic tail uncached in that shape, so
            # skip instruction caching when dynamic instructions are present.
            # https://openrouter.ai/docs/guides/best-practices/prompt-caching
            # https://ai.google.dev/api/caching  ('systemInstruction': Input only. Immutable)
            return

        instruction_prefix_count = next(
            (index for index, msg in enumerate(openai_messages) if msg.get('role') != instruction_role),
            len(openai_messages),
        )
        # Instruction parts are mapped as the tail of the system/developer prefix.
        first_instruction_index = instruction_prefix_count - len(instruction_parts)

        if has_dynamic_instructions:
            static_instruction_count = sum(1 for part in instruction_parts if not part.dynamic)
            if static_instruction_count:
                cache_message_index = first_instruction_index + static_instruction_count - 1
            elif first_instruction_index > 0:
                cache_message_index = first_instruction_index - 1
            else:
                return
        else:
            cache_message_index = instruction_prefix_count - 1

        self._add_cache_control_to_message(openai_messages[cache_message_index], ttl)

    @classmethod
    @override
    def supported_native_tools(cls) -> frozenset[type[AbstractNativeTool]]:
        """Return the set of builtin tool types this model can handle.

        OpenRouter supports web search via its plugins system.
        """
        return frozenset({WebSearchTool})

    @override
    def prepare_request(
        self,
        model_settings: ModelSettings | None,
        model_request_parameters: ModelRequestParameters,
    ) -> tuple[ModelSettings | None, ModelRequestParameters]:
        merged_settings, customized_parameters = super().prepare_request(model_settings, model_request_parameters)
        new_settings = _openrouter_settings_to_openai_settings(
            cast(OpenRouterModelSettings, merged_settings or {}), customized_parameters
        )
        return new_settings, customized_parameters

    @override
    def _translate_thinking(
        self,
        model_settings: OpenAIChatModelSettings,
        model_request_parameters: ModelRequestParameters,
    ) -> ReasoningEffort | Any:
        """OpenRouter handles reasoning via extra_body['reasoning'], not the reasoning_effort parameter.

        Only pass through explicit openai_reasoning_effort if set; unified thinking
        is handled in _openrouter_settings_to_openai_settings via extra_body['reasoning'].
        """
        if effort := model_settings.get('openai_reasoning_effort'):
            return effort
        return omit

    @override
    def _get_tool_choice(
        self,
        model_settings: OpenAIChatModelSettings,
        model_request_parameters: ModelRequestParameters,
    ) -> tuple[list[chat.ChatCompletionToolParam], ChatCompletionToolChoiceOptionParam | None]:
        tools, tool_choice = super()._get_tool_choice(model_settings, model_request_parameters)

        if (
            tools
            and (cache_tool_defs := model_settings.get('openrouter_cache_tool_definitions'))
            and self._resolved_profile.get('openrouter_supports_tool_cache', False)
        ):
            last_tool = cast(dict[str, Any], tools[-1])
            last_tool['cache_control'] = self._build_cache_control(cache_tool_defs)

        return tools, tool_choice

    @override
    async def _map_messages(
        self,
        messages: Sequence[ModelMessage],
        model_request_parameters: ModelRequestParameters,
        *,
        model_settings: ModelSettings | None = None,
    ) -> list[chat.ChatCompletionMessageParam]:
        openai_messages = await super()._map_messages(messages, model_request_parameters, model_settings=model_settings)

        if (
            openai_messages
            and model_settings
            and (cache_messages := model_settings.get('openrouter_cache_messages'))
            and self._resolved_profile.get('openrouter_supports_cache_control', False)
        ):
            self._add_cache_control_to_message(openai_messages[-1], cache_messages)

        if (
            model_settings
            and (cache_instructions := model_settings.get('openrouter_cache_instructions'))
            and self._resolved_profile.get('openrouter_supports_cache_control', False)
        ):
            self._add_cache_control_to_instructions(
                openai_messages, messages, model_request_parameters, cache_instructions
            )

        has_tool_cache_point = bool(
            model_settings
            and model_settings.get('openrouter_cache_tool_definitions')
            and model_request_parameters.tool_defs
            and self._resolved_profile.get('openrouter_supports_tool_cache', False)
        )
        self._limit_cache_points(openai_messages, has_tool_cache_point=has_tool_cache_point)

        return openai_messages

    @override
    def _get_web_search_options(self, model_request_parameters: ModelRequestParameters) -> WebSearchOptions | None:
        """OpenRouter handles web search via plugins in extra_body, not via the OpenAI web_search_options parameter."""
        return None

    @override
    def _validate_completion(self, response: chat.ChatCompletion) -> _OpenRouterChatCompletion:
        response_dict = response.model_dump()

        try:
            validated = _OpenRouterChatCompletion.model_validate(response_dict)
        except ValidationError as exc:
            # OpenRouter intermittently returns responses with null standard fields (#3994).
            # Try known quirky response shapes before giving up.
            try:
                error_response = _OpenRouterErrorResponse.model_validate(response_dict)
            except ValidationError:
                pass
            else:
                raise ModelHTTPError(
                    status_code=error_response.error.code,
                    model_name=error_response.model or self.model_name,
                    body=error_response.error.message,
                )

            try:
                nested = _OpenRouterNestedProviderResponse.model_validate(response_dict)
            except ValidationError:
                raise exc

            validated = nested.provider
            if not validated.created:
                validated.created = response_dict.get('created') or 0

        if error := validated.error:
            raise ModelHTTPError(status_code=error.code, model_name=validated.model, body=error.message)

        return validated

    @override
    def _process_thinking(self, message: chat.ChatCompletionMessage) -> list[ThinkingPart] | None:
        assert isinstance(message, _OpenRouterCompletionMessage)

        if reasoning_details := message.reasoning_details:
            return [_from_reasoning_detail(detail) for detail in reasoning_details]
        else:
            return super()._process_thinking(message)

    @override
    def _process_provider_details(self, response: chat.ChatCompletion) -> dict[str, Any] | None:
        assert isinstance(response, _OpenRouterChatCompletion)

        provider_details = super()._process_provider_details(response) or {}
        provider_details.update(_map_openrouter_provider_details(response))
        return provider_details or None

    @override
    def _map_usage(self, response: chat.ChatCompletion) -> usage.RequestUsage:
        assert isinstance(response, _OpenRouterChatCompletion)
        return _map_openrouter_usage(response, self._provider.name, self._provider.base_url, self.model_name)

    @dataclass
    class _MapModelResponseContext(OpenAIChatModel._MapModelResponseContext):  # type: ignore[reportPrivateUsage]
        reasoning_details: list[dict[str, Any]] = field(default_factory=list[dict[str, Any]])

        def _into_message_param(self) -> chat.ChatCompletionAssistantMessageParam | None:
            message_param = super()._into_message_param()
            if self.reasoning_details:
                if message_param is None:
                    message_param = chat.ChatCompletionAssistantMessageParam(role='assistant', content=None)
                message_param['reasoning_details'] = self.reasoning_details  # type: ignore[reportGeneralTypeIssues]
            return message_param

        @override
        def _map_response_thinking_part(self, item: ThinkingPart) -> None:
            assert isinstance(self._model, OpenRouterModel)
            if item.provider_name == self._model.system:
                if reasoning_detail := _into_reasoning_detail(item):  # pragma: lax no cover
                    self.reasoning_details.append(reasoning_detail.model_dump())
            else:  # pragma: lax no cover
                super()._map_response_thinking_part(item)

    @property
    @override
    def _streamed_response_cls(self):
        return OpenRouterStreamedResponse

    @override
    async def _map_user_prompt_content_item(
        self, item: UserContent, content: list[ChatCompletionContentPartParam]
    ) -> None:
        if isinstance(item, CachePoint):
            self._add_cache_control(content, ttl=item.ttl)
        else:
            await super()._map_user_prompt_content_item(item, content)

    @override
    async def _map_binary_content_item(self, item: BinaryContent) -> ChatCompletionContentPartParam:
        """Map a BinaryContent item to a chat completion content part for OpenRouter."""
        if item.is_video:
            video_url: _VideoURL = {'url': item.data_uri}
            return cast(
                ChatCompletionContentPartParam,
                _ChatCompletionContentPartVideoUrlParam(video_url=video_url, type='video_url'),
            )

        return await super()._map_binary_content_item(item)

    @override
    async def _map_video_url_item(self, item: VideoUrl) -> ChatCompletionContentPartParam:
        """Map a VideoUrl to a chat completion content part for OpenRouter."""
        video_url: _VideoURL = {'url': item.url}
        if item.force_download:
            video_content = await download_item(item, data_format='base64_uri', type_format='extension')
            video_url['url'] = video_content['data']
        # OpenRouter extends OpenAI's API to support video_url, but it's not in the OpenAI client types.
        # At runtime, the OpenAI client accepts dicts that match the expected structure.
        return cast(
            ChatCompletionContentPartParam,
            _ChatCompletionContentPartVideoUrlParam(video_url=video_url, type='video_url'),
        )

    @override
    def _map_finish_reason(  # type: ignore[reportIncompatibleMethodOverride]
        self, key: Literal['stop', 'length', 'tool_calls', 'content_filter', 'error']
    ) -> FinishReason | None:
        return _CHAT_FINISH_REASON_MAP.get(key)

    @override
    def _map_tool_definition(self, f: ToolDefinition, model_settings: ModelSettings) -> chat.ChatCompletionToolParam:
        """Map a tool definition, forwarding downstream-provider tool flags through OpenRouter.

        For example, when routing to an Anthropic model with `anthropic_eager_input_streaming`
        set, the `eager_input_streaming` flag is added to the tool param so OpenRouter forwards
        it to Anthropic.
        """
        tool_def = super()._map_tool_definition(f, model_settings)
        if self.model_name.startswith('anthropic/') and model_settings.get('anthropic_eager_input_streaming'):
            tool_def['eager_input_streaming'] = True  # type: ignore[typeddict-item]
        return tool_def

__init__

__init__(
    model_name: str,
    *,
    provider: (
        Literal["openrouter"] | Provider[AsyncOpenAI]
    ) = "openrouter",
    profile: ModelProfileSpec | None = None,
    settings: ModelSettings | None = None
)

Initialize an OpenRouter model.

Parameters:

Name Type Description Default
model_name str

The name of the model to use.

required
provider Literal['openrouter'] | Provider[AsyncOpenAI]

The provider to use for authentication and API access. If not provided, a new provider will be created with the default settings.

'openrouter'
profile ModelProfileSpec | None

The model profile to use. Defaults to a profile picked by the provider based on the model name.

None
settings ModelSettings | None

Model-specific settings that will be used as defaults for this model.

None
Source code in pydantic_ai_slim/pydantic_ai/models/openrouter.py
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
def __init__(
    self,
    model_name: str,
    *,
    provider: Literal['openrouter'] | Provider[AsyncOpenAI] = 'openrouter',
    profile: ModelProfileSpec | None = None,
    settings: ModelSettings | None = None,
):
    """Initialize an OpenRouter model.

    Args:
        model_name: The name of the model to use.
        provider: The provider to use for authentication and API access. If not provided, a new provider will be created with the default settings.
        profile: The model profile to use. Defaults to a profile picked by the provider based on the model name.
        settings: Model-specific settings that will be used as defaults for this model.
    """
    super().__init__(model_name, provider=provider or OpenRouterProvider(), profile=profile, settings=settings)

supported_native_tools classmethod

supported_native_tools() -> (
    frozenset[type[AbstractNativeTool]]
)

Return the set of builtin tool types this model can handle.

OpenRouter supports web search via its plugins system.

Source code in pydantic_ai_slim/pydantic_ai/models/openrouter.py
868
869
870
871
872
873
874
875
@classmethod
@override
def supported_native_tools(cls) -> frozenset[type[AbstractNativeTool]]:
    """Return the set of builtin tool types this model can handle.

    OpenRouter supports web search via its plugins system.
    """
    return frozenset({WebSearchTool})

OpenRouterStreamedResponse dataclass

Bases: OpenAIStreamedResponse

Implementation of StreamedResponse for OpenRouter models.

Source code in pydantic_ai_slim/pydantic_ai/models/openrouter.py
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
@dataclass
class OpenRouterStreamedResponse(OpenAIStreamedResponse):
    """Implementation of `StreamedResponse` for OpenRouter models."""

    @override
    async def _validate_response(self):
        try:
            async for chunk in self._response:
                yield _OpenRouterChatCompletionChunk.model_validate(chunk.model_dump())
        except APIError as e:
            error = _OpenRouterError.model_validate(e.body)
            raise ModelHTTPError(status_code=error.code, model_name=self._model_name, body=error.message)

    @override
    def _map_thinking_delta(self, choice: chat_completion_chunk.Choice) -> Iterable[ModelResponseStreamEvent]:
        assert isinstance(choice, _OpenRouterChunkChoice)

        if reasoning_details := choice.delta.reasoning_details:
            for i, detail in enumerate(reasoning_details):
                thinking_part = _from_reasoning_detail(detail)
                # Use unique vendor_part_id for each reasoning detail type to prevent
                # different detail types (e.g., reasoning.text, reasoning.encrypted)
                # from being incorrectly merged into a single ThinkingPart.
                # This is required for Gemini 3 Pro which returns multiple reasoning
                # detail types that must be preserved separately for thought_signature handling.
                vendor_id = f'reasoning_detail_{detail.type}_{i}'
                yield from self._parts_manager.handle_thinking_delta(
                    vendor_part_id=vendor_id,
                    id=thinking_part.id,
                    content=thinking_part.content,
                    signature=thinking_part.signature,
                    provider_name=self._provider_name,
                    provider_details=thinking_part.provider_details,
                )
        else:
            return super()._map_thinking_delta(choice)

    @override
    def _map_provider_details(self, chunk: chat.ChatCompletionChunk) -> dict[str, Any] | None:
        assert isinstance(chunk, _OpenRouterChatCompletionChunk)

        provider_details = super()._map_provider_details(chunk) or {}
        provider_details.update(_map_openrouter_provider_details(chunk))
        return provider_details or None

    @override
    def _map_usage(self, response: chat.ChatCompletionChunk) -> usage.RequestUsage:
        assert isinstance(response, _OpenRouterChatCompletionChunk)
        return _map_openrouter_usage(response, self._provider_name, self._provider_url, self.model_name)

    @override
    def _map_finish_reason(  # type: ignore[reportIncompatibleMethodOverride]
        self, key: Literal['stop', 'length', 'tool_calls', 'content_filter', 'error']
    ) -> FinishReason | None:
        return _CHAT_FINISH_REASON_MAP.get(key)