Risk scoring

How the 0–100 risk number is computed and how to read it.

Every release we analyze gets a risk score: a single integer from 0 to 100 meant to answer "how nervous should I be about merging this update?"

The score band #

BandMeaning
0–39 (Low)Patch-level changes, dependency bumps, doc-only edits. Usually safe to update without a careful read.
40–69 (Medium)Feature additions or refactors. Worth a glance at the summary before merging.
70–100 (High)Documented breaking changes, undocumented signature changes, or security-sensitive edits. Read the summary in full before updating.

Wherever the score appears (the Pulse feed and every release card on a source's detail page) it renders as a colour-coded Risk NN badge so you can skim by heat rather than reading numbers: teal for Low, amber for Medium, coral for High.

In the email digest, a release flagged with documented breaking changes is never presented below the Medium band, so a card can't read the contradictory "Low risk + breaking changes", even if the raw inputs scored it lower. The underlying 0–100 score shown elsewhere is unchanged; this is a presentation floor specific to the digest.

"High signal" releases #

A busy project can publish dozens of releases that are mostly routine. To make the relevant ones stand out, a release card is tagged High signal when it ships documented breaking changes, security updates, or scores in the High band (70+). This is the same filter that drives the Pulse "High-signal releases" section, so a release reads as relevant the same way on both surfaces. The tag keys off content as well as the number: a release with documented breaking changes is flagged High signal even when its raw churn-based score lands in the Low band, so you won't skim past it.

What goes into the score #

The exact weights are tuned over time, but the inputs are stable:

  • Code churn. Lines added and removed across the diff. A 20-line patch is rarely as risky as a 2,000-line refactor.
  • Files modified. Wide changes (many files touched) score higher than narrow ones.
  • Documented breaking changes. Anything the maintainer explicitly flagged in the release notes.
  • Undocumented changes. Signature changes, removed exports, behavior shifts that the diff shows but the changelog doesn't mention. See Undocumented changes.
  • Security signals. Indicators of CVE relevance, dependency vulns, auth/crypto-related code paths.

How to use the number #

Treat the score as a triage signal, not a verdict. Two specific patterns to watch:

  • Low score, undocumented changes flagged. Even a low overall score is worth reading if undocumented changes are present; the score is averaging many signals, and one specific signal might still bite you.
  • High score from churn alone. A large but mechanical refactor (e.g. a monorepo restructure) can push the score high without containing any behavior change. The summary will say so; read it before you assume the worst.

If you find scores consistently miscalibrated for a specific source, tell us. Tuning happens at the model and prompt level, not per-user, but feedback shapes the next iteration.

How does this differ from the source trust score? #

Risk scores ask "how risky is this particular release?": they look at the diff between two consecutive versions of the same project. The trust score asks the orthogonal question: "how comfortable should I be depending on this project at all?" A well-maintained source can publish a high-risk release (intentionally breaking change), and a low-trust source can publish a low-risk release (a one-line patch). The dangerous combination is low trust + high risk: the alerts surface treats those as the strongest signal to stop and read.