compare
Sutrace vs LangSmith — framework-agnostic AI observability vs LangChain-native depth
Honest comparison. LangSmith is the LangChain-native play for teams deep in LangChain. Sutrace is framework-agnostic, on-host redaction, multi-provider. When each wins.
Sutrace vs LangSmith
TL;DR. LangSmith is LangChain Inc.'s observability product, deeply integrated with the LangChain framework and ecosystem. If your AI stack is LangChain-first and you'll stay there, LangSmith is the integration depth-versus-breadth trade-off optimised for your shape — fewer integration steps, richer LangChain-specific telemetry, the most mature debugging UX for LangChain-shaped workflows. Sutrace is framework-agnostic — we instrument via OpenTelemetry's GenAI semantic conventions, with on-host PII redaction, and unify AI agent observability with the rest of your software/hardware/web/API observability. If your AI stack uses multiple frameworks (LangChain plus LlamaIndex plus direct provider SDKs), or you have data-residency or PII-redaction-before-storage requirements, Sutrace fits better. The honest read below.
Side-by-side
| Dimension | LangSmith | Sutrace |
|---|---|---|
| Framework support | LangChain-native; partial others | Framework-agnostic via OTel GenAI conventions |
| Protocol | Proprietary | OpenTelemetry OTLP |
| LangChain debugging UX | Best-in-class | Good, generic |
| Multi-provider (OpenAI/Anthropic/Vertex/local) | Supported, LangChain-shaped | Native first-class |
| PII redaction | Server-side | On-host pre-egress |
| Unified with software observability | Separate product | Same dashboard |
| Industrial signals | Not in scope | Native (PLC/SCADA) |
| EU residency | Available | Default |
| Pricing | Per-trace + retention | Per-GB ingest |
| LangSmith Hub / prompts | Native | Not in scope |
| Evaluation / dataset workflows | Native, deep | Lightweight |
Where LangSmith is genuinely the right choice
LangSmith was built by the team that built LangChain. The integration depth shows in places that matter:
1. LangChain-shaped traces are first-class. Chains, agents, retrievers, tools — all surface in the LangSmith UI as the entities a LangChain developer thinks in. Sutrace surfaces LLM calls as OTel spans with GenAI attributes, which is correct but more generic.
2. Prompt versioning and the LangSmith Hub. Prompts are first-class artifacts in LangSmith, with versioning, sharing, and a public hub. Sutrace doesn't ship prompt management — we don't try.
3. Evaluation and dataset workflows. LangSmith's dataset + evaluation tooling is mature and LangChain-aware. Side-by-side prompt comparisons, regression test runners, evaluator chains — all native. Sutrace logs evaluation runs as events; we don't ship the orchestration.
4. LangGraph integration. LangGraph (LangChain's stateful graph orchestration framework) traces beautifully in LangSmith. Sutrace can ingest the OTel telemetry from LangGraph but won't render the graph topology natively.
If your team's daily work is LangChain + LangGraph + prompt iteration, LangSmith is integrated for that workflow in ways nothing framework-agnostic can match. We will not pretend otherwise.
Where Sutrace is the right choice
Mirror cases:
1. Multi-framework AI stacks. Most production AI stacks in 2026 are not LangChain-only. They mix LangChain (orchestration) with LlamaIndex (RAG), with direct provider SDKs (OpenAI, Anthropic, Vertex), with custom in-house frameworks. Sutrace instruments via OpenTelemetry's GenAI semantic conventions — a vendor-neutral spec backed by all the major providers. Each framework's instrumentation library emits the same span shape; you query across them as one. LangSmith handles non-LangChain frameworks via SDK shims that work, but the UX is LangChain-shaped.
2. PII redaction before egress. Sutrace ships an on-host redaction layer in the OTel Collector that strips configurable PII patterns from prompts and completions before the data leaves the host. This matters for regulated industries (healthcare, finance, EU regulated workloads under GDPR) where prompt content is the sensitive surface. LangSmith offers server-side redaction; the data has already left the host.
3. EU data residency by default. LangSmith offers EU regions on enterprise plans; Sutrace is EU-default. For DACH/Nordics/EU-regulated workloads the difference is meaningful.
4. AI observability unified with the rest of observability. Most production AI workloads run alongside software services, depend on software backends, and are part of one user-facing system. Sutrace puts LLM cost/latency on the same dashboard as service latency, error rates, and infrastructure metrics. LangSmith is a dedicated AI tool; you'll have a separate observability stack for the software side.
5. Cardinality / cost control matters. LLM workloads have unique cardinality dynamics — tokens, models, providers, prompt families. Sutrace's cardinality cost attribution generalises to LLM telemetry; LangSmith bills per-trace which is a different model that doesn't expose the same lever.
6. You need hardware or industrial signals on the same dashboard. Sutrace's PLC/SCADA support is alongside the AI agent observability — see the OTel backend use-case page. LangSmith doesn't cover this scope.
Framework-agnostic via OTel GenAI conventions
The structural argument for OTel GenAI as the AI observability protocol is the same as the OTel argument for software observability: the protocol-versus-backend distinction. We covered this in the OTel protocol-war pillar post.
The key OTel GenAI attributes — gen_ai.system, gen_ai.request.model, gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, etc. — are the same regardless of framework. An OpenAI call from LangChain emits the same span shape as a direct OpenAI SDK call as a LlamaIndex call. That portability matters when frameworks come and go (and they will).
LangSmith supports OTel ingestion as well; their proprietary protocol is the richer one, but OTel works. The trade-off is depth-of-integration vs. portability-of-protocol.
On-host redaction — the regulated-industry detail
For teams under GDPR, HIPAA, PCI, or similar, where data is redacted matters. Two patterns:
Server-side redaction (LangSmith default). The full prompt + completion ships to the vendor. The vendor redacts before storage. The vendor had the unredacted data, even if briefly.
On-host redaction (Sutrace pattern). The OTel Collector runs as a local sidecar or daemonset. It strips PII patterns from the span attributes before the OTLP egress. The vendor never sees the unredacted data.
For some compliance postures the server-side pattern is acceptable (with the right contractual / SOC2 / ISO postures from the vendor). For others it's not. If "the vendor must never see the unredacted data" is a hard requirement, on-host redaction is the architecture.
Pricing — directional
LangSmith bills per-trace plus retention. Sutrace bills per-GB-of-OTLP-payload. The shapes are different enough that the right comparison depends entirely on your trace shape.
- Small number of large traces (long agent runs): LangSmith often cheaper per-trace; depends on retention.
- High volume of small traces (high-QPS LLM endpoints): Sutrace's per-GB model usually cheaper.
- Mixed AI + software workload: Sutrace cheaper because no separate observability product needed.
Run the calculators with real workload numbers. Both vendors have free tiers for evaluation.
Migration paths
LangSmith → Sutrace. Replace the LangSmith client with the opentelemetry-instrumentation-langchain package (or equivalent for other frameworks). Point your OTel Collector at our endpoint. Existing LangSmith data does not migrate — historical evaluation datasets stay in LangSmith.
Sutrace → LangSmith. Add the LangSmith client alongside our OTel exporter. Both can coexist. If you want full migration, remove the OTel exporter for the LangChain-only paths.
The migration is bidirectional and not destructive. You can run both in parallel for an evaluation period.
What we won't do
- We will not match LangSmith's LangChain-debugging UX. We're framework-agnostic; the trade-off is depth vs. breadth.
- We will not ship prompt management. LangSmith's hub is a separate product concern.
- We will not ship evaluation orchestration at LangSmith's depth. We log eval runs; we don't run them.
What LangSmith won't do (probably)
- Framework-agnostic depth. Their incentive is LangChain-first.
- On-host redaction. Architecturally not their pattern.
- Unification with hardware/software/web observability. Different product scope.
- EU-default residency. EU is available; not default.
When the answer is "use both"
Some teams use both. LangSmith for the LangChain-deep developer-debugging workflows (prompt iteration, evaluation, the LangSmith Hub). Sutrace for production observability across all signal types — including the AI workloads — where unified dashboards and alerting matter.
This is a legitimate combination. The OTel GenAI instrumentation is parallel to the LangSmith client; nothing prevents both from running. The cost is operational complexity (two systems) vs. the benefit of best-in-class for each scope.
What to do next
- If your stack is LangChain-deep and stays that way, evaluate LangSmith first. It's the right depth for that shape.
- If your stack is multi-framework or has compliance / residency requirements, evaluate Sutrace. The OTel use-case page covers the architecture.
- If you want unified AI + software observability, Sutrace is the single-product answer.
Closing
LangSmith is excellent for LangChain-shaped workflows. Sutrace is the framework-agnostic, multi-provider, EU-default, on-host-redaction answer. The choice is depth-of-integration vs. breadth-of-scope. We won't pretend depth doesn't matter — for some teams, LangChain-native is the right bet. For others, the framework-agnostic architecture is more durable.
The Datadog comparison, Grafana Cloud comparison, SigNoz comparison, and Better Stack comparison cover the non-AI observability dimensions. The pricing page covers the SKU detail.
If your AI stack is LangChain-first today and you don't see that changing, give LangSmith the first look. If it's not, let us know your shape and we'll model the workload honestly.