Core Concepts

This page explains the key ideas behind CheckUpstream. Understanding these concepts will help you get the most out of the platform.

Services

A service is an external API or platform that your application depends on, such as Stripe, OpenAI, AWS S3, Supabase, and so on. CheckUpstream maintains a curated registry of 221 services, each mapped to known domains and status pages.

Services have a status that reflects their current health:

Operational: functioning normally
Degraded: performance is below normal (higher latency, intermittent errors)
Partial outage: some components or regions are unavailable
Major outage: the service is largely or fully unavailable
Unknown: status could not be determined (status page unreachable or unparsable)

Note

Status colors in the dashboard are sacred: green = operational, yellow = degraded, orange = partial outage, red = major outage. They are never used decoratively.

Projects

A project is your application, the thing that depends on upstream services. Each project has its own API key, dependency list, and alert configuration.

You might have one project per microservice, per environment (staging vs. production), or per team. Projects are the top-level organizational unit in CheckUpstream.

Example: A SaaS app might have three projects:

payments-api: depends on Stripe, Plaid
ai-pipeline: depends on OpenAI, Anthropic, Pinecone
main-app: depends on Supabase, Resend, Clerk

Dependencies

A dependency is the relationship between a project and a service. When you install a CheckUpstream SDK or manually configure your project, dependencies are tracked automatically.

Dependencies are discovered in two ways:

SDK auto-detection: The SDK intercepts outbound HTTP calls and recognizes requests to known service domains. No configuration needed.
Package mapping: CheckUpstream analyzes your dependency manifest (package.json, requirements.txt, go.mod, etc.) to infer which services you use based on their official client libraries.

Status Polling

CheckUpstream monitors upstream services through status polling, periodically fetching and parsing each service's public status page.

How it works

Fetch: CheckUpstream requests each service's status page (typically an Atlassian Statuspage, Instatus, or custom endpoint) at regular intervals.
Parse: The response is parsed to extract the current overall status and any active incidents. CheckUpstream supports 10+ status page formats.
Normalize: The raw status is mapped to CheckUpstream's five-level status model (operational → unknown).
Compare: The new status is compared to the previous known status. If it changed, a status transition event is emitted.
Alert: If the transition crosses a configured threshold, alerts are sent to your enabled channels.

Polling frequency

Free plan: Every 5 minutes
Pro plan: Every 1 minute
Enterprise plan: Every 30 seconds

SDK telemetry provides an additional signal: if your application observes elevated error rates or latency to a service before the status page updates, CheckUpstream can trigger early warnings.

Blast Radius

Blast radius measures how many of your projects are affected when a service degrades. It answers the question: "If Stripe goes down, what breaks?"

CheckUpstream calculates blast radius by traversing your dependency graph:

A service used by 1 project has a narrow blast radius.
A service used by 10 projects has a wide blast radius.

The dashboard displays blast radius as a count and visual indicator. Use it to prioritize which services deserve the most aggressive alerting and redundancy planning.

Tip

Services with a wide blast radius are candidates for circuit breakers, fallback behavior, or multi-provider strategies.

Risk Scores

A risk score is a composite number (0–100) that summarizes the reliability risk a service poses to your projects. Higher scores mean higher risk.

Risk scores are calculated from three signals:

Incident frequency: How often the service has had incidents in the past 30 days.
Incident duration: How long incidents typically last (mean time to recovery).
Blast radius: How many of your projects depend on the service.

How to read risk scores

Score	Level	Meaning
0–20	Low	Stable service, narrow blast radius
21–50	Moderate	Occasional incidents or moderate blast radius
51–80	High	Frequent incidents or wide blast radius
81–100	Critical	Frequent + long incidents with wide blast radius

Risk scores update daily. They appear on the dashboard service cards and are available via the /api/risk-score endpoint.

Warning

Risk scores are relative to your project portfolio. A service with a low global incident rate can still have a high risk score if every one of your projects depends on it.