all alternatives

alternatives

Sutrace as a Datadog alternative — predictable observability, no cardinality bill

A practical, honest comparison of Sutrace and Datadog for teams tired of unpredictable renewals. Pricing, migration, and the cases where Datadog still wins.


Sutrace vs Datadog: predictable observability, no cardinality bill

TL;DR. If your Datadog renewal grew faster than your headcount, the cause is almost always one of three things: custom-metric cardinality, log volume tiers, or synthetic check fan-out. Sutrace is built so none of those line items can quietly compound. We bill on ingest you can predict — not on the number of unique tag combinations a service emits — and we surface cost attribution before data hits storage. EU residency by default (Frankfurt, europe-west3). No per-tag pricing. One dashboard for hardware (PLC/SCADA), software (OTel/Prometheus), web/APIs (synthetics), and AI agents (LLM cost/latency). If you're paying Datadog more than $30k/year and the bill is climbing without a clear reason, you should at least know what the alternative looks like.

This page is the long version. It covers what Datadog still does better, where Sutrace pulls ahead, the migration playbook, and a cost-estimator-as-prose that you can run through with your finance team.

What Datadog gets right (and we won't pretend otherwise)

Datadog is the most polished commercial observability product on the market. The integrations catalog is enormous, the UI is fast, the APM auto-instrumentation works on more languages than anyone else's, and the dashboards-as-code story via Terraform is mature. If you have an unlimited budget and a team that already speaks "DDQL," you are not the audience for this page.

The thread you should read before deciding anything is the HN $83K renewal post — not because Datadog is bad, but because the comments are the most honest account of how an observability bill compounds in production. The Pragmatic Engineer's $65M Coinbase teardown is a more extreme version of the same dynamic. If your team is small, your data plane is European, and predictability matters more than feature breadth, keep reading.

The three line items that break Datadog bills

1. Custom metric cardinality

Datadog defines a "custom metric" as a unique combination of metric name and tag values. A counter named http.requests with tags for route, method, and status_code produces route × method × status unique series. Add customer_id and you have multiplied your bill by the number of customers. SigNoz's pricing teardown walks through exactly how this lands in the invoice.

The architectural problem isn't pricing — it's that the pricing model rewards label sprawl with surprise charges instead of warnings. Engineers add a useful tag, the bill jumps in 30 days, and finance asks why.

2. Log volume

Logs are billed on ingest and retention tier. Indexed logs are dramatically more expensive than archived logs, and the tooling makes it easy to over-index. OneUptime's breakdown of how Datadog pricing actually works covers the multipliers.

3. Synthetics

Browser checks at $12 per 1,000, API checks at $5 per 10,000. Sounds cheap. Then Checkly does the math: 16 routes × 4 regions × every 4 minutes = $8,509/month. We covered this in detail in the synthetics calculator post.

Side-by-side comparison

DimensionDatadogSutrace
Pricing on metricsPer-host + per-custom-metric (cardinality-driven)Flat ingest tier; cardinality tracked, not billed
LogsPer-GB ingest × retention tier multiplierPer-GB ingest, single tier, hot-by-default 30 days
Synthetic checks$12 / 1k browser, $5 / 10k APIIncluded in plan, region multiplier capped
EU data residencyAvailable, requires EU site selectionDefault, europe-west3 (Frankfurt)
OpenTelemetrySupported, often re-tagged at ingestNative; OTel is the primary protocol
Industrial signals (PLC/SCADA)Not supportedNative
AI agent observabilityAdd-on (LLM Observability)Native, included
Per-tag pricingYes (custom metrics)No
DashboardsDDQL + UIPromQL + UI + saved views
OnboardingSales-led above 50 hostsSelf-serve up to 500 hosts
Vendor lock-inDDQL, proprietary agentOTel-native, exit by re-pointing collector

When Datadog still wins

Be honest with yourself. Pick Datadog if:

  • You need APM for COBOL, Erlang, or any niche runtime where Datadog's agent has coverage and OTel does not.
  • Your security team has standardized on Datadog Cloud SIEM and rebuilding those rules is more expensive than the metrics bill.
  • You're a US-only shop and EU residency is irrelevant to your buyers.
  • You have an enterprise contract under $0.05 per custom metric and you've audited that you're inside it.
  • You use Watchdog and your on-call has integrated its anomaly detections into runbooks.

Sutrace will not match Datadog on integration breadth in 2026. We have ~80 first-party integrations and rely on OTel for the long tail. Datadog has 800+. If you need turnkey on day one and money is not the constraint, that's a real reason.

When Sutrace wins

  • Your bill grew >40% YoY without a proportional infrastructure change.
  • You're EU-headquartered and your DPO has flagged US data residency.
  • You instrument with OpenTelemetry and resent that Datadog re-tags your spans.
  • You run industrial hardware alongside software services. Nobody else unifies those.
  • You operate AI agents and want token cost, latency p95, and prompt-injection signals in the same place as your API metrics.
  • You want cost attribution before data hits storage, not in a monthly PDF.

Migration playbook

This is the abbreviated version. The full week-one checklist is in migrating from Datadog to OTel.

Day 1 — Inventory. Export your current Datadog metric list (datadog-cli metric list) and tag every metric with: emitter service, label cardinality estimate, business owner. About 30% will turn out to be unused.

Day 2 — Stand up an OTel collector. Run it as a sidecar or DaemonSet. Configure two exporters: your existing Datadog exporter (keep it) and the Sutrace OTLP endpoint. Dual-write for the migration window.

Day 3-4 — Shadow dashboards. Recreate your top 20 dashboards in Sutrace. The PromQL is straightforward; we provide a DDQL→PromQL cheat sheet. Keep both visible to the team.

Day 5 — Cutover alerts. Move alerting one service at a time. Start with the noisiest — the alerts your on-call already mutes are the cheapest to migrate.

Week 2 — Logs and traces. Logs migrate via Vector or the OTel logs receiver. Traces are already OTLP if you're on OpenTelemetry; otherwise switch from dd-trace to the OTel SDK in your highest-volume service first.

Week 3 — Synthetics and uptime. This is where the savings show up fast. See the Checkly-data calculator post.

Week 4 — Decommission. Drop the Datadog exporter, cancel the agent, downgrade the contract at renewal.

Cost estimator (prose version)

Take a hypothetical team. 80 hosts. 4,000 custom metrics across 12 services. 2 TB/month of logs, 30-day retention. 16 synthetic routes × 4 regions × every 4 minutes. APM on 200 services.

Datadog (list pricing, no enterprise discount):

  • Pro hosts: 80 × $18 = $1,440/mo
  • APM: 200 × $36 = $7,200/mo (you'll likely consolidate, but list is list)
  • Custom metrics: 4,000 above the 100/host included = roughly 4,000 × $0.05 × 12 = sized in the $2,000/mo range depending on cardinality tier
  • Logs: 2,000 GB × $0.10 ingest + indexed retention multiplier ≈ $1,800/mo
  • Synthetics: $8,509/mo (the Checkly number)

That's about $21k/month, or $252k/year, before negotiation. Real-world enterprise discounts get you 20–40% off, so call it $150k–$200k/year.

Sutrace equivalent:

  • 80 hosts on the Scale tier
  • Metrics ingest sized by series, not by tag combination
  • Logs at one tier, 30 days hot
  • Synthetics included up to a generous regional cap
  • AI agent observability included

For the same workload we'd quote in the $4k–$6k/month range. Actual quote requires real numbers; pricing has the public tiers. The savings are not magic — they come from one design decision: cardinality is a thing we monitor, not a thing we charge for.

What "no cardinality bill" actually means

We track cardinality. We show you which metrics are exploding. We will rate-limit and warn before you blow up your ingest budget. What we don't do is multiply your invoice by the number of unique tag combinations.

The mechanism: every metric has a series budget per service. Cross it and you get a warning and an option to cap, sample, or upgrade. You make the decision before the bill arrives — not after. We wrote the long version of why this matters in cardinality cost attribution before the bill.

The architectural reference is Cloudflare's sample_limit: 200 — a single-line config that prevents one bad scrape from poisoning the whole TSDB. That kind of guardrail should be the default, not a hard-won internal practice.

Security and residency

Sutrace runs in europe-west3 (Frankfurt) by default. Data does not leave the EU unless you explicitly opt in for a US replica. We sign DPAs as standard, list sub-processors, and our security page and DPA cover the controls. Datadog offers an EU site, but the default is US, and your procurement process knows the difference.

FAQ

1. Is Sutrace really cheaper, or does the cost just move somewhere? Cheaper for teams whose Datadog bill is dominated by custom metrics, log retention, and synthetics. If you're 90% APM hosts, the gap narrows. We'll tell you which case you're in before you sign.

2. Do I have to rewrite instrumentation? No, if you're on OpenTelemetry. Re-point your collector. If you're on dd-trace, you'll migrate one service at a time — most teams finish in 4–6 weeks.

3. What happens to my existing Datadog dashboards? They keep working. Dual-write during migration. We provide a DDQL→PromQL cheat sheet for the common queries.

4. How do alerts migrate? Alert definitions are exportable. We import the JSON and recreate them in PromQL. About 80% map cleanly; the rest need a manual review (anomaly-detection alerts especially).

5. Is there a managed agent or do I run OTel myself? You can run our managed collector or self-host the upstream OTel collector and point at our OTLP endpoint. Both are supported.

6. What about Watchdog / anomaly detection? We have anomaly detection on metrics and logs. We don't claim parity with Watchdog on every dimension yet — it's a multi-year product. If anomaly detection is your top use case, evaluate carefully.

7. Can I keep using Terraform for dashboards-as-code? Yes. Our Terraform provider covers dashboards, alerts, and synthetics.

8. How do you handle high-cardinality fields like customer_id? We expose them in logs and traces (where cardinality is fine) and surface aggregations in metrics. The 50M time-series budget post explains the technical pattern.

9. Do you support industrial protocols? Yes — Modbus, OPC UA, MQTT. This is what differentiates us. See the OpenTelemetry backend use case for the software side; hardware integrations are documented in the admin docs.

10. What's the catch? Smaller integrations catalog (~80 vs Datadog's 800+). Smaller team. We don't have a Watchdog equivalent. If those matter, Datadog is still the right call. If predictable billing and EU residency matter more, start here.


If you've read this far, the next move is a 30-minute call where we walk through your last Datadog invoice and tell you honestly which parts we'd reduce. No demo theater. Pricing is public and the trial runs against your real OTel data.