htmlparser2 is a fast, forgiving HTML and XML parser for JavaScript, designed to parse documents with a callback-based interface. It’s useful for streaming-style parsing where you react to events like opening tags, attributes, text, closing tags, and comments without needing to build a full DOM up front.
Project status
- Actively maintained, with a recent GitHub push (2026-06-02) and multiple version updates in early 2026 (v10.1.0 in Jan 2026, v11.0.0 and v12.0.0 in March 2026).
- Update cadence appears fairly active, including a cluster of major updates in March 2026 (v11.0.0 on 2026-03-19, v12.0.0 on 2026-03-20), following a smaller update in January.
AI summary generated Today
Recent updates
v12.0.0
2 months agov12.0.0 significantly refactors HTML parsing to align with the WHATWG spec, with most behavioral changes restricted to HTML mode. It updates tokenization and parser state handling for raw-text tags, foreign namespaces (SVG/MathML), HTML bogus comments and declarations, and several implicit open/close rules for tags.
Breakingv11.0.0
2 months agov11.0.0 is a major release that makes htmlparser2 ESM-only, raises the minimum supported Node.js version, and bumps several core dependencies to new major versions. It also adds a new Web Streams API integration via `WebWritableStream`, and includes parser correctness fixes for HTML comment endings, XML processing instructions, and parser reset state.
BreakingFeaturesv10.1.0
4 months agov10.1.0 bumps the runtime dependency entities to v7.0.1 and updates packaging so test files are no longer included in the published module. The diff also shows a non-trivial change to Tokenizer entity-decoding control flow, which is not called out in the release notes.
v10.0.0
12/24/2024v10.0.0 focuses on test infrastructure migration to Vitest, adds support for parsing <xmp> as a special tag, and includes a breaking change to the WritableStream import path. The code diff also shows substantial packaging and build changes (moving from lib/ to dist/, adding ESM exports, and updating dependency major versions).
BreakingFeaturesv9.1.0
1/5/2024v9.1.0 adds exports for `QuoteType` and the `Handler` interface, and it updates the tokenizer to treat `<textarea>` like other special tags (for correct text handling). It also fixes `onattribend` reporting for `endIndex`, which adjusts location-related indices during tokenization.
Featuresv9.0.0
5/10/2023v9.0.0 introduces a new `createDocumentStream` API and rewires the tokenizer to use `EntityDecoder` from the `entities` package. It also changes core parser internals by reversing internal stacks, which affects how tag closure behavior and entity/index handling work.
BreakingFeaturesv8.0.2
3/22/2023v8.0.2 is primarily a bug fix release for htmlparser2’s tokenizer behavior, plus CI/test maintenance work. The documented change resets the tokenizer baseState after closing tag name parsing, and the rest of the notes cover dependency bumps, workflow hardening, and refactoring tests to use specs/snapshots.
Securityv8.0.1
4/29/2022v8.0.1 primarily updates the package metadata to expose a missing WritableStream entrypoint. The diff also shows unrelated CI workflow changes (CodeQL action version bump) and development dependency bumps reflected in package-lock.json.
Featuresv8.0.0
4/23/2022v8.0.0 is a major release that modernizes htmlparser2 for both CommonJS and ESM, and includes substantial internal parsing refactors to reduce memory overhead. It also removes the deprecated FeedHandler and tightens input type handling for Parser.write/end to strings only.
BreakingFeaturesv7.2.0
11/11/2021v7.2.0 focuses on tokenizer behavior around entities and event ordering, plus a broader tokenizer refactor aimed at better performance (reported ~5% speed-up). The release notes call out fixes for decoding entities after the '<' character, stringifying non-string chunks, and emitting text before entities once an entity is confirmed.