all posts

status

Atlassian Statuspage's 21-day outage — and what it means

From 2 February to 23 February 2026, Statuspage's System Metrics feature was broken because Librato — a deprecated upstream — finally went away. Three weeks. On a paid product whose entire job is honesty about infrastructure.

By Akshay Sarode· March 4, 2026· 10 min readstatus-pagepostmortematlassian

The 150-word answer

From 2 February 2026 to 23 February 2026 — twenty-one days — Atlassian Statuspage's "System Metrics" feature stopped working. The metrics graphs that customers display on their public status pages went blank. Customers paying $1,499/month for the Enterprise tier saw broken graphs on their own status pages, visible to their customers, for three weeks.

The root cause was Librato — a metrics platform Atlassian had built System Metrics on top of. Librato's deprecation was announced in advance. The deprecation date passed. The feature broke. There was no graceful degradation, no fallback metrics provider, no contingency.

OneUptime documented the full timeline and made the obvious point: a product whose entire value proposition is honesty about infrastructure couldn't honestly report on its own infrastructure for 21 days.

This is what it means.

The timeline

Reconstructed from Atlassian's incident communications, OneUptime's analysis, and customer reports:

DateEvent
Late 2025Librato (the upstream metrics platform) announces final deprecation date in early 2026
2 February 2026System Metrics feature stops rendering data on customer status pages
2-5 FebruaryCustomer support tickets accumulate; Atlassian acknowledges "issues with metrics rendering"
8 FebruaryAtlassian status update: "investigating an issue with our metrics provider"
14 FebruaryStatus update mentions "ongoing dependency migration"
18 FebruaryPublic acknowledgement that Librato is the cause
23 FebruaryFix deployed, System Metrics restored
25 FebruaryOneUptime publishes the public timeline analysis

Twenty-one days. On a feature listed as a headline differentiator for the $99/$399/$1,499 paid tiers.

What System Metrics is — and why it broke

System Metrics is the feature where you embed a real-time graph (response time, error rate, custom counter) on your public status page. It's a major selling point in Atlassian's marketing — the "your customers see live data, not just incident posts" pitch.

Architecturally, the feature was built on top of Librato. Librato was a SaaS metrics platform — Atlassian was a customer of it. When Librato's parent company (originally SolarWinds, now AppOptics) announced deprecation, Atlassian had to either:

  1. Build their own metrics ingest + storage + query + render layer
  2. Migrate to a different provider
  3. Accept that the feature would break on the deprecation date

What appears to have happened: a partial attempt at #1 or #2 that wasn't ready when #3 arrived. The deprecation date came, Librato went read-only or went away entirely, and the feature broke.

There's no shame in dependency migrations being hard. There is, however, fault in:

  • Not having a fallback graph state (e.g., "metrics temporarily unavailable" instead of broken render)
  • Taking 16 days from outage to root-cause acknowledgement
  • Not having a parallel-write, parallel-read setup during the deprecation window so the cutover could be tested live

Why this is more embarrassing than a normal outage

A normal outage on a normal product: you're sorry, you fix it, you write a postmortem. Customers grumble. Renewals are slightly harder for one quarter.

A 21-day outage on a status page product is categorically different. The product's entire value proposition is: trust us to honestly tell your customers when things are broken. The product was, itself, broken. For three weeks. With slow communication.

Every customer's renewal conversation now starts with: "How do you tell us when your status-page product is broken? Apparently the answer is: 21 days late."

OneUptime put it this way: "Every minute unreported is a minute that doesn't count against your SLA." Twenty-one days of unreported degraded function, on a product that bills at $1,499/month for Enterprise. The math is unkind.

The deprecation discipline lesson

Underneath the storytelling, this is a normal engineering failure. Specifically, it's the failure mode where:

  1. An upstream dependency announces deprecation
  2. The downstream team adds it to a backlog
  3. The deprecation deadline approaches
  4. The team underestimates the migration effort or overestimates how much slack the upstream will give
  5. Deadline arrives, dependency goes away, downstream feature breaks

Every infrastructure team has lived this. The fix is not heroic; it's discipline. Specifically:

  • Track external deprecations as P1 work, with calendar deadlines, not "we'll do it eventually."
  • Migrate to the new provider before the old one is dead, not at the deadline. The migration window should be six months minimum.
  • Run parallel read/write across both providers. Cut over only after the new provider has matched the old one's behavior in production for two weeks.
  • Have a graceful-degradation fallback for the feature. "Metrics temporarily unavailable" is acceptable. A broken graph render is not.

Atlassian almost certainly knows this. Statuspage is a smaller part of Atlassian's portfolio, the team has likely been resource-constrained, and the deprecation slipped down the priority list. That's the most likely root cause beneath the technical root cause.

What this means for vendor risk

If you're a Statuspage customer today, ask Atlassian:

  1. What's the current backlog of upstream deprecations affecting Statuspage features? Get the list. Get the dates. Demand visibility.
  2. What's the SLA on individual features (vs. the page itself)? System Metrics being broken for 21 days probably did not violate the page-availability SLA. It violated the feature-availability assumption that customers reasonably had. If the contract doesn't cover feature-level SLAs, the contract is letting the vendor off too easy.
  3. What's the parallel-write / parallel-read setup for current third-party dependencies? If the answer is "we don't," you're one deprecation away from another 21-day outage.

If you're evaluating Statuspage for new procurement, weight this incident heavily. Twenty-one days is not a short outage. The product team's response cadence (16 days to root-cause acknowledgement) is not best-in-class.

What Sutrace does differently

We have less to lose by saying this because we're newer. But the architectural choices are intentional:

  • No third-party metrics provider for our metrics graphs. We compute and store our own. The upstream dependency for graphs is Firestore (where everything else lives). One failure mode, not two.
  • Auto-driven status pages. When a feature is broken, the relevant component on the public page flips to degraded automatically. Even if the engineering team hasn't acknowledged the incident yet, the page reflects monitored reality. (See the honest status page use-case.)
  • Public dependency list. We publish our list of third-party dependencies and their current health on our own status page. If Postmark is having issues, our customers know that affects their incident-notification email delivery.
  • EU residency, architectural. Subscriber data, incident bodies, component history — all in EU-west Firestore. Not contractual. Architectural.

Statuspage at the Business tier is $399/month for the status page alone, and you pay separately for monitoring, on-call, and uptime. Sutrace at $99/month bundles all four. See Sutrace as a Statuspage alternative for the direct comparison.

Where Atlassian still wins

To be clear: this incident does not make Statuspage a bad product. It still has the largest install base, the deepest integrations, the longest track record. The brand value alone is worth real money for some procurement situations.

What it does mean: the long-standing assumption that "buying Atlassian = buying reliability" needs to be re-examined for the Statuspage product line specifically. The team is smaller than people think. The feature surface is broader than the team can clearly maintain. A different vendor, who treats the status page as their primary product rather than a tertiary acquisition, may be the right choice for new procurement decisions in 2026.

What you should do this week

  1. Read OneUptime's full analysis. It's the canonical writeup.
  2. If you use Statuspage's System Metrics feature: check that it's actually working. Don't assume.
  3. List the third-party dependencies of your current status page. Ask the vendor what their migration plans are when those upstreams deprecate.
  4. Test your status page failover. If you don't have one, build one. The bar is now: when your status page vendor breaks for three weeks, what's your plan?

The status page is a real engineering surface. February 2026 was the proof.


Try Sutrace free at sutrace.io. Read more in our pillar piece on why most status pages lie and the 2025 chronology.