Parser types
How we extract releases from a source — the difference between a stable adapter and the AI fallback, and what to expect from each.
Every source needs a parser — the code that turns a release-notes page or GitHub repo into a structured list of releases we can analyze. We use one of two: a purpose-built stable parser, or the AI fallback for anything without dedicated support.
Stability tiers
Purpose-built adapter for a known source. Releases are extracted deterministically. If the upstream page changes in a way that breaks the adapter, fetches return zero entries with a logged warning so we notice quickly. No per-poll LLM cost.
We don't have a deterministic parser for the source yet, so the page text is sent to a language model that extracts a release list. Accuracy depends on how structured the page is. Output may miss or duplicate entries when the page changes — verify against the source URL before acting on a release. We cache results by content hash so unchanged pages don't re-bill.
The current catalog of stable parsers is at Sources → Catalog.
What about other URLs?
Any URL that isn't handled by a stable parser is routed through the AI fallback. We fetch the page, strip it to plain text, and ask a language model to return a structured list of releases.
When the fallback works well
- The page is a single, scrollable changelog.
- Each release has a clear heading (e.g.
## v1.2.3 — 2025-04-01). - The page renders server-side (no JavaScript hydration required to see the changelog).
When it struggles #
- Paginated changelogs. We only see the first page of HTML.
- JS-rendered pages. We fetch raw HTML — no headless browser. If the changelog only appears after a client-side fetch, we won't see it.
- Mixed content. Pages that intermix unrelated content with the changelog may produce noisy extractions.
If a URL keeps producing wrong or missing releases, tell us — we'll prioritize a stable adapter. See Reference → Parser stability.
Why we don't run AI on stable parsers #
Stable parsers extract release data deterministically from page structure, so there's nothing for the model to do. Skipping the LLM call keeps polling cheap and predictable — the only AI cost on a stable source is the one-time diff analysis when a new release is detected.