---
title: Multi-provider LLM routing — which provider actually served that trace?
description: OpenRouter, AWS Bedrock, and the gateway pattern made multi-provider routing the default. Without span-level provider attribution, your eval baseline is a coin flip. The OTel GenAI semantic conventions are the answer.
author: Akshay Sarode
published: 2025-11-04
updated: 2026-04-22
cluster: c4-ai-agents
tags: [llm, ai-agents, observability, opentelemetry, multi-provider]
reading: 11 min
hero: When your gateway routes between four upstream providers, the only honest answer to "did the model regress?" comes from span-level provider attribution.
---

# Multi-provider LLM routing — which provider actually served that trace?

**TL;DR.** Most teams who ship LLM agents in 2026 are routing through a gateway — OpenRouter (400+ models, 60+ providers), AWS Bedrock (multi-vendor), or a custom proxy following the [AWS Multi-Provider GenAI Gateway reference architecture](https://aws.amazon.com/blogs/machine-learning/streamline-ai-operations-with-the-multi-provider-generative-ai-gateway-reference-architecture/). The architecture is sensible — failover, cost optimisation, vendor independence. The observability problem: when your eval baseline regresses, you need to answer "did the model change, did the prompt change, or did the upstream provider get swapped under us?" Without span-level provider attribution, you're guessing. The fix is the OpenTelemetry GenAI semantic conventions: every span carries `gen_ai.system`, `gen_ai.request.model`, `gen_ai.response.model`, and (for gateway-routed traffic) the upstream provider that actually served the call. This post walks through the patterns, the OTel attributes, the gateway choices, and what Sutrace tags by default. The [Agenta top LLM gateways roundup](https://agenta.ai/blog/top-llm-gateways) and [TrueFoundry's multi-model routing post](https://www.truefoundry.com/blog/multi-model-routing) are the best background reads.

## Why teams are routing through gateways

Three reasons, all of them economic.

**1. Failover.** When OpenAI has a P0 incident — every quarter or two, on average — your customer-facing agent goes dark. Routing through a gateway lets you fail over to Anthropic or Bedrock. The cost is one weekend of integration work; the upside is not getting paged at 3am the next time GPT-5 returns 503s for an hour.

**2. Cost optimisation.** Different providers price the same capability differently. Claude 4 Opus is cheaper for some workloads, GPT-5 cheaper for others, Bedrock-hosted Llama-4-405B cheaper still. A gateway with routing rules can pick the cheapest upstream that meets latency and quality thresholds.

**3. Capacity.** Provider rate limits hit harder than they should. Routing through multiple providers gives you elastic capacity across vendor accounts.

The most popular gateway choices, in rough order of adoption:

- **OpenRouter** — 400+ models from 60+ providers, BYO API key model, single OpenAI-compatible endpoint. The hobbyist favourite that's quietly become a production default.
- **AWS Bedrock** — multi-vendor (Anthropic, Meta, Cohere, Amazon's own Nova, Mistral) inside a single AWS account. The enterprise pick.
- **Custom proxy** — typically using a framework like LiteLLM or BerriAI, deployed in-cluster, with org-specific routing rules.
- **Vendor-specific gateways** — AzureOpenAI's deployment routing, Google Vertex's model garden routing.

[TrueFoundry's multi-model routing post](https://www.truefoundry.com/blog/multi-model-routing) is the cleanest external read on the patterns. [Agenta's gateway roundup](https://agenta.ai/blog/top-llm-gateways) covers the vendor space.

## The observability problem this creates

Your code says `model="gpt-4o"`. The gateway sees the request, applies its routing rules, and sends it upstream to one of: OpenAI direct, Azure OpenAI, a Together-hosted variant, or a fall-back to Anthropic Claude with a translation layer. The response comes back. Your code logs "called gpt-4o, got response, here's the latency."

Three weeks later, your eval suite shows a regression. Accuracy on a held-out test set dropped 3%. You ask: did the prompt change (no, git says clean), did the model change (no, your code still says `model="gpt-4o"`), or did the upstream change? You don't know. Your traces don't say.

The gateway might have recorded which upstream it routed to — but that record is in the gateway's logs, not in your spans. Joining them post-hoc is hard. If you're using OpenRouter, you literally can't — they don't expose per-request upstream attribution.

This is the gap.

## The OTel GenAI semantic conventions

The [OpenTelemetry GenAI semantic conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/) are the answer. They define a standard set of attributes that every LLM-related span should carry. The relevant ones for routing:

| Attribute | Meaning |
|---|---|
| `gen_ai.system` | The provider that served the request (`openai`, `anthropic`, `aws.bedrock`, `azure.openai`, `openrouter`) |
| `gen_ai.request.model` | The model your code asked for |
| `gen_ai.response.model` | The model the provider actually used (these diverge more than you'd expect) |
| `gen_ai.usage.input_tokens` | Input tokens consumed |
| `gen_ai.usage.output_tokens` | Output tokens generated |
| `gen_ai.operation.name` | `chat`, `embedding`, `completion`, etc. |

Sutrace also emits two attributes that aren't in the upstream spec yet but follow the same semantic-convention shape:

| Attribute | Meaning |
|---|---|
| `gen_ai.route.gateway` | The gateway your code talked to (`openrouter`, `bedrock`, `litellm-proxy`) |
| `gen_ai.route.upstream` | The actual upstream provider behind the gateway, when known |

For a request that goes `app → OpenRouter → Anthropic Claude`, the span carries:

```
gen_ai.system = "openrouter"
gen_ai.request.model = "anthropic/claude-4-opus"
gen_ai.response.model = "claude-4-opus-20260201"
gen_ai.route.gateway = "openrouter"
gen_ai.route.upstream = "anthropic"
```

When the eval regresses, you can query "show me all traces where `gen_ai.route.upstream` changed in the last 30 days." If the answer is "OpenRouter started routing 40% of traffic to a Together-hosted variant of Claude," you have your answer in one query.

## How to instrument

### Path 1: OTel auto-instrumentation

If you use OpenAI, Anthropic, or Bedrock SDKs in Python or TS, the OpenInference auto-instrumentation packages emit the standard `gen_ai.*` attributes for free. Install `openinference-instrumentation-openai` (or the equivalent), point your OTel collector at Sutrace, and the spans appear with the standard attributes.

### Path 2: Manual instrumentation

If you're building your own client wrapper, set the attributes yourself when you create the span. The shape is straightforward — a span around the LLM call, attributes set from the request and response.

### Path 3: Gateway-side instrumentation

Some gateways emit OTel spans directly. LiteLLM has experimental OTel support. OpenRouter does not (as of April 2026). For OpenRouter, you have to instrument client-side and infer the upstream from response headers (which OpenRouter does include — check `openrouter-served-by`).

For Bedrock, the Boto3 client emits enough information that the upstream provider is straightforward to derive from the model ID.

## What Sutrace does by default

Three things, automatically.

**1. Tag every span with `gen_ai.system` and route attributes.** If you use the Sutrace SDK middleware, this happens at SDK initialisation. If you use OTel auto-instrumentation, our collector enriches incoming spans with the route attributes when it can derive them from response metadata.

**2. Build the routing topology view.** A dashboard that shows the gateway-and-upstream graph for your traffic over the last N hours. You can see at a glance "70% of GPT-4o requests went to OpenAI direct, 30% to Azure OpenAI." Drift here is a leading indicator.

**3. Eval regression cross-tab.** When your eval suite runs against a baseline, the regression report cross-tabs by `gen_ai.route.upstream`. If accuracy regressed and the upstream mix changed in the same window, that's flagged.

> [!NOTE]
> Diagram: Provider routing fan-out. Single app → gateway (OpenRouter or Bedrock) → 4 possible upstream providers. Sutrace span tags identify which one served each request, and the eval report cross-tabs accuracy by upstream.

## A worked example

Suppose you ship a customer-support agent using OpenRouter, with `anthropic/claude-4-sonnet` as the model. Eval baseline runs nightly.

**Day 1–14.** Eval at 0.94 accuracy. `gen_ai.route.upstream` is 100% `anthropic`.

**Day 15.** Anthropic has a regional incident. OpenRouter's routing rules fail over 60% of traffic to a Together-hosted Claude variant for 4 hours. Eval that night runs at 0.88 — a 6-point drop.

Without route attribution: you spend a day suspecting the prompt, the retrieval, the model, the eval set. Six engineers, eight hours.

With route attribution: the eval report cross-tab shows the regression cluster matches the time window where `gen_ai.route.upstream = "together"`. The Sutrace SDK emits this without you having to do anything. You confirm in 5 minutes, lock OpenRouter to Anthropic-only via routing rules, and move on.

## Pitfalls to avoid

**1. Don't trust the gateway's billing dashboard for accuracy.** Gateways typically attribute cost to the upstream they routed to, but the *model name* in their dashboard is the one your code asked for, not the one the upstream actually served. The two diverge for compatibility-shimmed routes.

**2. Don't assume `model="gpt-4o"` means GPT-4o.** Even at OpenAI direct, `gpt-4o` is a versioned alias that points to whatever the current production weights are. The actual model version comes back in `gen_ai.response.model`. Track both.

**3. Don't forget Bedrock.** Bedrock model IDs include both vendor and capability — `anthropic.claude-4-sonnet-20260201-v1:0` is unambiguous. But teams often record only the friendly name. Tag both.

**4. Don't assume rate limits are per-provider.** When you route through a gateway, rate limits apply at the gateway level too. Hitting OpenRouter's per-account rate limit is just as production-impacting as hitting OpenAI's.

**5. Don't lose visibility through Bedrock cross-region inference.** Bedrock's "cross-region inference" feature transparently routes between regional model endpoints for capacity. Your span needs to record which region actually served the call — `aws.region` as a span attribute. We've seen teams chase a latency regression for a week before discovering Bedrock had silently moved 30% of their traffic from `us-east-1` to `us-west-2` and the additional propagation delay was the cause.

**6. Don't ignore the routing-rule audit trail.** Most gateways let you change routing rules via dashboard or API. Those changes need to be tracked. Without an audit trail, "the eval started regressing on Tuesday" is a mystery; with one, "the eval regressed at 14:00 Tuesday because the routing rule changed at 13:55" is the answer.

## A short tour of the major gateways' span quality

If you're picking a gateway today and observability matters:

- **AWS Bedrock.** Native CloudWatch + X-Ray support. The Boto3 client emits enough information to derive the upstream cleanly. Cross-region inference is the gotcha (see above). Bedrock model IDs are the most semantically clean of any vendor.
- **OpenRouter.** Response headers (`openrouter-served-by`, `openrouter-model`) are the canonical attribution source. There's no native OTel emission. The OpenInference instrumentation reads the headers and tags the spans correctly.
- **LiteLLM proxy.** Experimental OTel emission. Configurable. The most flexible if you self-host.
- **Azure OpenAI deployment routing.** Each deployment is a logical alias to a specific model version on a specific region. The deployment name is an unambiguous record. Treat the deployment name as part of `gen_ai.request.model`.
- **Custom proxies.** You write the OTel emission. Make sure to set both `gen_ai.route.gateway` and `gen_ai.route.upstream`.

## Tools and references

- [OTel GenAI semantic conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/) — the spec
- [AWS Multi-Provider GenAI Gateway reference architecture](https://aws.amazon.com/blogs/machine-learning/streamline-ai-operations-with-the-multi-provider-generative-ai-gateway-reference-architecture/) — the enterprise blueprint
- [TrueFoundry multi-model routing](https://www.truefoundry.com/blog/multi-model-routing)
- [Agenta top LLM gateways](https://agenta.ai/blog/top-llm-gateways)
- [OpenInference instrumentation packages](https://github.com/Arize-ai/openinference) — for SDK auto-instrumentation

## How this fits with the rest of the stack

Multi-provider routing visibility is one of three things [Sutrace ships that the rest of the LLM-observability category doesn't](/blog/helicone-langsmith-langfuse-phoenix-honest-comparison) by default — alongside [hard budget caps](/blog/hard-budget-caps-for-ai-agents-the-architecture-options) and [prompt-injection signals](/blog/echoleak-camoleak-prompt-injection-shipping-this-year). For the full picture see the [AI agent observability use case](/use-cases/ai-agent-observability), the [LangSmith comparison](/alternatives/langsmith), or the [Helicone comparison](/alternatives/helicone) for the proxy-vs-SDK trade-offs.

If you're already routing through a gateway and your traces don't tell you which upstream served each call — fix that this sprint. The eval-regression mystery you'll save yourself from is worth the integration work alone.
