all use cases

use case

A status page that auto-reflects monitored reality

Most public status pages are PR pages that say "all systems operational" while half the regions are on fire. The fix is architectural, not editorial. Here's how Sutrace does it.


The 150-word answer

On 18 November 2025, Cloudflare had a global outage. Their public status page went down too. Cloudflare's own write-up admitted it: "Cloudflare's status page went down. The status page is hosted completely off Cloudflare's infrastructure with no dependencies on Cloudflare. While it turned out to be a coincidence…" — see the official postmortem.

That sentence — that even the company that built the global edge network couldn't keep its status page up while it had an outage — is the entire reason this use-case exists.

A status page should: (1) be auto-driven by what your monitors actually see, (2) be hosted off your primary infrastructure on different cloud, DNS, and CDN, and (3) allow PR-edit mode as opt-in for incidents that need nuance. Sutrace does all three by default. Most competitors do at most one.

Why most status pages lie

We wrote the long version: why most status pages lie — the evidence. The short version, in three points:

1. They're hosted on the same infra they're reporting on. When AWS us-east-1 went down on 20 October 2025, AWS's own Service Health Dashboard struggled for the first ~80 minutes — because the dashboard had a dependency on the region that was down.

2. They're updated by humans during the worst possible moment. Your incident commander is fighting the fire. Their twelfth priority is logging into Statuspage and writing a tasteful 80-word update. The status page lags reality by 20-90 minutes by default.

3. There's an incentive to delay. As OneUptime put it: "Every minute unreported is a minute that doesn't count against your SLA." The PR team and the SRE team have opposing incentives during an incident. PR usually wins.

What "auto-reflects monitored reality" means

In a Sutrace status page, every public component is bound to one or more monitored signals: synthetic checks, uptime probes, SSL cert state, SLO burn rate, error budget remaining. When the monitored signal crosses its configured threshold, the component flips state on the public page.

The result: when your API returns 5xx in three regions for three consecutive minutes, the "API" component on your public page flips to Degraded — automatically — before your on-call has finished pasting the stack trace into Slack.

You can override the auto-driven state by writing a manual incident, which is PR-edit mode.

PR-edit mode is opt-in, not the default

Some incidents need nuance. "5% of users in Frankfurt seeing slow payment confirmation between 14:32 and 14:47 UTC" is not something a binary green-yellow-red component captures well. You write that as an incident, in plain English, with a manual component-state override.

In Sutrace, this is a deliberate per-incident decision. You can:

  • Let auto-drive run uninterrupted (default for every component)
  • Open an incident that supplements auto-drive (manual incident, components stay auto-driven)
  • Open an incident that overrides auto-drive (manual incident, components manually pinned)

The override always has a visible expiry. If you pin a component to "operational" for 4 hours and don't reset it, the auto-drive resumes and reports reality. This prevents the most common status-page lie: someone sets the page to green, the incident continues, the page is still green hours later because nobody remembered to update.

Hosted off primary infrastructure — the architecture

The Sutrace status page (status.sutrace.io and every customer's custom-domain status page) runs on a deliberately disjoint infrastructure stack:

LayerPrimary infraStatus page infra
ComputeFirebase Functions (Google)Cloudflare Pages
DNS resolverGoogle Cloud DNSCloudflare DNS, with a NS3 fallback at deSEC
CDN / EdgeCloudflare on app.sutrace.ioCloudflare on status, with origin failover to Bunny.net
State storeFirestore EUFirestore EU + read-only static snapshot rebuilt every 30s, served from object storage
Email (subscriber notify)PostmarkResend (separate vendor, for blast outs during primary outage)

The static snapshot is the key piece. Every 30 seconds, the current component-state JSON is serialized and uploaded to a public object store. If the entire Sutrace primary infrastructure is unreachable, the status page can serve from the cached snapshot — frozen in time but still correct as of 30 seconds ago. We test this once a month by hard-failing the primary on a maintenance window.

This isn't novel. It's the architecture Cloudflare thought they had on 18 November 2025 — and the post-mortem revealed an unintentional shared dependency. Architecture matters; intent isn't enough. We test ours.

What the public page actually shows

A Sutrace public status page surfaces:

  1. Components — your services, with auto-driven state from bound monitors.
  2. Live regional probe map — each of our 14 probe regions, real-time green/red. Optional. Some teams turn this off because it surfaces too much.
  3. Open incidents — manually-authored, with a timeline.
  4. Scheduled maintenance — manual, future-dated.
  5. Historical uptime — last 90 days per component, derived from the monitored data, not from incident posting cadence. This is the number that matters and the one most status pages get wrong, because they only count "during a posted incident" as downtime.
  6. SLO burn rate widget — optional. Some teams want their customers to see error budget remaining; most don't.

Internal status page mode

In addition to public status pages, Sutrace offers an internal status page — same architecture, auth-gated. Customer success teams use this to get the unfiltered view (every probe, every region, every SLO) while the public page shows the curated version. The internal page also pulls from the same auto-drive engine, so internal and external state can never diverge.

Migration from existing status pages

We import:

  • Atlassian Statuspage CSV exports (components, incidents, subscribers)
  • Better Stack JSON exports
  • Pingdom status page + check definitions
  • Hyperping JSON

Subscriber lists are migrated with a re-confirmation email, because consent receipts don't transfer between processors. This adds friction; it's the right thing to do.

When this isn't for you

If your status page is for regulatory reporting (e.g., regulated financial services with mandatory written incident communication), auto-drive may not be acceptable to your compliance team. They want the human in the loop. We support PR-edit-only mode for those cases — it just throws away half the value.

If your team has no monitoring at all today, a status page is the wrong first investment. Get monitoring in place first. Sutrace bundles them; you can also use Better Stack, Hyperping, or roll your own with checks-as-code via Checkly.

What this connects to in Sutrace


Try Sutrace free at sutrace.io. The free tier includes 1 public status page with auto-drive, 5 components, 100 subscribers, custom domain. Forever-free, no credit card.