otel

OpenTelemetry won the protocol war. Now it needs a backend.

OTel adoption is universal — 40% YoY PR growth, 21M monthly Python SDK downloads. The backend war is fragmented. A field guide to who's OTel-native, who's bolted on, and where Sutrace fits.

By Akshay Sarode· November 12, 2025· 14 min readopentelemetryobservabilitydatadoggrafana

OpenTelemetry won the protocol war. Now it needs a backend.

TL;DR. OpenTelemetry is no longer up for debate. Grafana's 2025 OpenTelemetry report puts OTel-related GitHub PR growth at 40% year-over-year and the Python SDK alone at 21M downloads/month — a 445% YoY jump. Every serious vendor accepts OTLP now. The unsolved problem is the backend: where does the data go, what does it cost, and does the storage shape match what the SDKs emit? This post is a field guide to the OTel-backend landscape — Datadog and New Relic accepting OTel as a second-class input, SigNoz/ClickStack/Uptrace fighting for OTel-native developer love, Grafana shipping the LGTM stack, and a long tail of niche players. We'll be honest about who fits which use case, and we'll be honest about where Sutrace sits.

If you've already chosen OTel and you're now choosing a backend, this is the post.

The protocol war is over

Three data points settle it.

First, Grafana's OpenTelemetry report — the most comprehensive OTel adoption survey of 2025. Forty per cent YoY growth in OTel-related GitHub PRs. The Python SDK reaching 21 million monthly downloads, up 445% YoY. The Java SDK in the same neighbourhood. Java, Python, Node, and Go now all sit comfortably in the production-ready bucket per the OTel spec status pages.

Second, Dynatrace's 2025 OTel trends piece. Their analysis confirms what most platform teams now feel intuitively: traces and metrics are stable across the OTel SDKs, and logs are at the inflection point traces hit two years ago. If you're starting a greenfield project in 2026 and not using OTel, you're starting it wrong.

Third, ClickHouse's 2026 observability year-in-review. Mike Shi's framing: OTel won as the protocol, ClickHouse won as the storage layer, and the open question is which presentation layer (UI + query) becomes default. That's the backend war.

What "the backend war" actually means

When OTel adoption was the question, the conversation was about agents and SDKs. Now the conversation is about three things that each cost real money:

1. Storage. Where do traces, metrics, and logs land? In what schema? At what cost per GB? With what retention policy?

2. Query. How fast does p99 trace search return? Can you slice metrics by arbitrary labels? Can you full-text search logs across a quarter without your laptop fan spinning?

3. UX. Does the alerts UI not require a PhD? Can a developer who doesn't know PromQL write a useful query? Does the dashboards-as-code story actually work?

Vendors win different sub-battles in each of these. Below is the honest map.

Category 1: Hyperscale incumbents

Datadog, New Relic, Dynatrace, Splunk Observability.

These backends accept OTLP. They've been forced to — every enterprise procurement deck since 2024 has had "supports OpenTelemetry" as a hard requirement. But their internal storage was built before OTel, so OTLP gets translated into a proprietary tag namespace at ingest. Your service.name becomes their service. Your deployment.environment becomes their env. Your k8s.pod.name becomes their pod_name.

This isn't a deal-breaker — the tags still work. But it means a query you wrote last year against OTel semantic conventions won't survive the translation, and if you ever leave, your saved queries don't come with you.

The pricing problem is the bigger story. The HN thread on the $83K Datadog renewal and the Pragmatic Engineer's $65M Coinbase teardown describe what happens at scale. SigNoz's pricing teardown walks through the SKUs. Uptrace's Datadog alternatives roundup covers the same ground from a competitive lens.

When this category wins: you have an enterprise-scale budget and you value the integrations catalog and APM auto-instrumentation breadth more than predictability. The HN "what instead of Datadog" thread is full of teams arguing both sides honestly.

Category 2: OTel-native startups

SigNoz, Uptrace, ClickStack, HyperDX, Sutrace.

These backends were built around OTel. ClickHouse is the storage layer for most of them — see ClickHouse's roundup of OTel-compatible platforms for why ClickHouse won the storage sub-battle. Resource attributes survive ingest. The schema is the official OTel ClickHouse exporter schema. Migration in and out is symmetric.

Within this category there are real differences:

SigNoz. Strongest open-source option. The HN comment from srcreigh is honest about the operational cost: "Signoz is pretty sloppily built. For ex the self hosted option starts a ZK instance with 1 clickhouse host — no way to disable, 800MB ram." The product team has been responsive on these in their GitHub issues, but the OSS-first posture means self-host is the default path and managed is a smaller team's focus. If you want full self-host with a real OSS license, this is the answer. We'll be honest about the comparison in Sutrace vs SigNoz.

Uptrace. Smaller team, polished product, similar technical posture to SigNoz. Open source available. Strong on traces specifically. Their comparisons page is one of the more honest in the category.

ClickStack. ClickHouse's own managed observability product. Pricing analysis in SigNoz's ClickStack pricing piece. The pitch is "use the database the rest of us use, with first-party support." Compelling if you already have ClickHouse expertise in-house.

HyperDX. Logs and traces focus. Strong on the developer-first UI angle. The HN thread comparing them with SigNoz is worth reading.

Sutrace. Where we fit: managed, EU residency by default, with the cardinality cost attribution layer the others haven't shipped yet, and with hardware (PLC/SCADA) and AI-agent (LLM cost/latency) signals as native primitives. We are NOT open-source-first; if open-source is a hard requirement, SigNoz or Uptrace are the right answer and we'll say so on a sales call.

When this category wins: you've standardised on OTel, you don't want a translation layer, and you're either cost-sensitive or residency-sensitive (or both). PostHog's roundup of Datadog alternatives and ClickHouse's similar list are the cleanest external comparisons.

Category 3: Composable open-source stacks

Grafana Mimir / Tempo / Loki, plus Prometheus.

The LGTM stack (Loki, Grafana, Tempo, Mimir) is the composable answer. Each component is best-in-class for its data type. You stitch them together with Grafana as the UI. Grafana Cloud is the managed version.

Strengths: open-source, vendor-neutral, infinitely tunable, the community is enormous. Weaknesses: operational complexity. Cloudflare's Prometheus-at-scale post describes the kind of dedicated platform team you need to run this stack at scale. Last9 and Sysdig cover the failure modes in detail. Chronosphere's article on Prometheus scaling challenges has the line we keep coming back to: "Each time a user runs a query, they must first remember which instance to query their data from." That cognitive-load tax is real.

Grafana Cloud's managed offering removes the operational burden but introduces DPM-rate pricing surprises that a lot of teams don't model correctly. Our Grafana Cloud comparison covers this honestly.

When this category wins: you have an SRE team that wants control, you have OSS purity as a hard requirement, or you've already operationalised one of the components and want to extend it.

Category 4: Niche and adjacent

Better Stack, Honeycomb, Lightstep (now ServiceNow Cloud Observability), Last9, Coralogix, New Relic Free.

A long tail of vendors with specific wedges. Better Stack leans hard on UX and the "30x cheaper than Datadog" pitch — see our Better Stack comparison. Honeycomb is the gold standard for high-cardinality trace exploration but priced for it. Lightstep had distributed tracing first; the ServiceNow acquisition has changed the trajectory. Last9 has strong scaling content and a focused product.

When this category wins: you have a specific shape — small team for Better Stack, deep distributed-tracing needs for Honeycomb, and so on. The PagerDuty alert-fatigue research and the Catchpoint SRE report via OneUptime are independent of vendor choice but tell you which adjacencies (incident response, on-call) you'll need to choose alongside the backend.

The honest decision tree

If you're picking a backend in 2026, the questions in order are:

1. Is OSS purity a hard requirement?
Yes → SigNoz or Grafana LGTM. Done.

2. Are you on a Datadog-class budget and value integration breadth?
Yes → Stay on Datadog or move to New Relic. The HN cheaper-Datadog thread has been arguing this for two years; the verdict is "if your budget is fine, the product is fine."

3. Is EU residency mandatory by regulation or by company policy?
Yes → narrow the field to vendors that ship EU-default, not EU-as-a-paid-SKU. Sutrace fits here.

4. Is cardinality the cost driver killing your bill?
Yes → cost attribution before ingest is the wedge. We talked about this in cardinality cost attribution before the bill arrives.

5. Do you have hardware signals (PLC/SCADA) or AI agents (LLM cost/latency) you want unified with software signals?
Yes → that's specifically the Sutrace bet.

6. None of the above and you just want a managed OTel-native backend?
SigNoz Cloud, Uptrace Cloud, ClickStack, or Sutrace. Run a two-week parallel evaluation against each. The free tier on /pricing is enough.

Where Sutrace fits

We are an OTel-native managed backend with three differentiators that the rest of category 2 doesn't ship today:

Cardinality cost attribution before ingest. Every metric's page shows live series count, projected monthly delta when a new label is added, and a one-click revert at the collector. No separate cost dashboard. Architecture detail here.
EU residency as default, not a SKU. Frankfurt (europe-west3). Backups in-region. Support staff access logged and region-restricted.
Hardware + AI agent signals as native primitives. PLC/SCADA via OPC-UA receivers, LLM cost/latency via OTel auto-instrumentation for the major framework SDKs. Same dashboard, same query layer.

We are NOT the right answer if: you need full self-host (use SigNoz), you're a 50-engineer team with $1M+/year already on Datadog and feature breadth matters more than predictability (stay on Datadog), or you've operationalised the LGTM stack and have a dedicated SRE team that loves running it (don't migrate, you'll regret the loss of control).

What to do this week

If you're picking an OTel backend, three concrete actions:

Read the free OTel report from Grafana. It's the cleanest data on adoption.
Run a parallel-write evaluation. Configure your OTel Collector to fan out OTLP to two backends for two weeks. Validate dashboards, validate alerts, decide on data not slides.
Build the cost model with finance, not with engineering. Cardinality cost is the variable that breaks budgets. The architecture for budget enforcement is the part most evaluations skip.

Closing

OTel won. The conversation moved upstream from "which agent" to "which backend, and at what cost, and where does the data live?" The honest answer changes by team, and we wrote this post to help you pick — even if the right answer is not us. If your shape matches ours (EU residency, cardinality control, mixed signal types), we'd love to run an evaluation. If it doesn't, the comparisons page names who fits instead.

The protocol war is done. The backend war is the one that matters now.