Back to Explore

jsdom/html-encoding-sniffer

GitHub
1 watchersOpen source

Last release: 2025-12-26

Determines the character encoding of an HTML byte stream by implementing the HTML Standard’s encoding sniffing algorithm. It pre-scans the first 1024 bytes to look for `<meta charset>`-related patterns and returns a canonical encoding name, useful for decoding HTML content correctly from raw bytes.

Project status

  • Maintenance status: The last upstream update was 2025-12-26, and the repo shows multiple recent version bumps (v5.0.0 on 2025-12-23, v6.0.0 on 2025-12-26), suggesting active maintenance as of today (2026-06-11).
  • Update cadence: Recent tight clustering of updates in late December 2025 (three days apart for v5.0.0, then v6.0.0 two to three days later), but historically there is a much longer gap (for example, v4.0.0 in 2023, then v5.0.0 in 2025), so cadence is bursty rather than continuous.

AI summary generated 2026-06-11

AI-generated from public sources. May be inaccurate. Report

Recent updates

  • v6.0.0

    2025-12-26

    Release 6.0.0 updates html-encoding-sniffer to return canonical encoding names (like "UTF-8") instead of lowercased encoding labels, and introduces an `xml` option to change how encoding is determined. The code also adjusts defaulting behavior and expands the supported option set, and it bumps `@exodus/bytes` to a newer version.

    BreakingFeatures
  • v5.0.0

    2025-12-23

    Release v5.0.0 raises the minimum supported Node.js versions and changes the main API output from encoding names to encoding labels (specifically, lowercased). The implementation also swaps out the underlying encoding mapping library used by the sniffing algorithm.

    Breaking
  • v4.0.0

    2023-11-12

    Release 4.0.0 raises the minimum supported Node.js version to 18 and claims a fix for correctly handling `<meta charset="x-user-defined">`. The actual diff shows additional ecosystem and dependency changes beyond the release notes.

    Breaking
  • v3.0.0

    2021-09-18

    Release 3.0.0 raises the minimum supported Node.js version to 12 and formally documents and tests support for passing any Uint8Array to the encoding sniffer. The core implementation changes the exported function to operate on a Uint8Array, updating internal helpers to use byteLength and Uint8Array indexing instead of Buffer-oriented assumptions.

    BreakingFeatures
  • v2.0.1

    2020-02-23

    Release v2.0.1 focuses on fixing edge cases in HTML charset sniffing, specifically around malformed or duplicated meta tag attributes. The main changes are in the meta attribute prescan logic and the routine that extracts the encoding from a meta tag’s content string.

  • v2.0.0

    2020-02-23

    Release 2.0.0 raises the minimum supported Node.js version to v10 and includes targeted fixes to HTML meta charset sniffing for malformed patterns like `><meta` and short comment forms like `<!-->`. The code changes go beyond the release notes by modifying how the exported sniffing function accepts its options and by adjusting internal scanning/indexing logic in the HTML parser.

    Breaking
  • v1.0.2

    2017-10-23

    Release v1.0.2 switches the project licensing from WTFPL to the MIT license. No functional code changes are shown in the provided diff, only license and package metadata updates.

    Breaking
  • v1.0.1

    2016-10-16

    Release v1.0.1 fixes an off-by-one error in the HTML encoding sniffer when parsing `<meta http-equiv>` tags with unquoted attributes. As a result, charset detection should work correctly in cases that previously failed to identify the encoding.

  • v1.0.0

    2016-10-16

    Version 1.0.0 introduces code extracted from jsdom, specifically from the jsdom encoding helper logic. The release notes indicate changes to perform manual, per-spec parsing of Content-Type values.