Back to Explore

jsdom/html-encoding-sniffer

GitHub
1 watchersOpen source

Last release: 5 months ago

Determines the character encoding of an HTML byte stream by implementing the HTML Standard’s encoding sniffing algorithm. It pre-scans the first 1024 bytes to look for `<meta charset>`-related patterns and returns a canonical encoding name, useful for decoding HTML content correctly from raw bytes.

Project status

  • Actively maintained, with recent commits and back-to-back updates in late 2025 (v5.0.0 on 2025-12-23, v6.0.0 on 2025-12-26), indicating ongoing development rather than maintenance mode.
  • Update cadence looks irregular, with a short gap between v5 and v6 (3 days), but a much longer gap previously (v4 in 2023-11-12 to v5 in 2025-12-23).

AI summary generated Today

AI-generated from public sources. May be inaccurate. Report

Recent updates

  • v6.0.0

    5 months ago

    Release 6.0.0 updates html-encoding-sniffer to return canonical encoding names (like "UTF-8") instead of lowercased encoding labels, and introduces an `xml` option to change how encoding is determined. The code also adjusts defaulting behavior and expands the supported option set, and it bumps `@exodus/bytes` to a newer version.

    BreakingFeatures
  • v5.0.0

    5 months ago

    Release v5.0.0 raises the minimum supported Node.js versions and changes the main API output from encoding names to encoding labels (specifically, lowercased). The implementation also swaps out the underlying encoding mapping library used by the sniffing algorithm.

    Breaking
  • v4.0.0

    11/12/2023

    Release 4.0.0 raises the minimum supported Node.js version to 18 and claims a fix for correctly handling `<meta charset="x-user-defined">`. The actual diff shows additional ecosystem and dependency changes beyond the release notes.

    Breaking
  • v3.0.0

    9/18/2021

    Release 3.0.0 raises the minimum supported Node.js version to 12 and formally documents and tests support for passing any Uint8Array to the encoding sniffer. The core implementation changes the exported function to operate on a Uint8Array, updating internal helpers to use byteLength and Uint8Array indexing instead of Buffer-oriented assumptions.

    BreakingFeatures
  • v2.0.1

    2/23/2020

    Release v2.0.1 focuses on fixing edge cases in HTML charset sniffing, specifically around malformed or duplicated meta tag attributes. The main changes are in the meta attribute prescan logic and the routine that extracts the encoding from a meta tag’s content string.

  • v2.0.0

    2/23/2020

    Release 2.0.0 raises the minimum supported Node.js version to v10 and includes targeted fixes to HTML meta charset sniffing for malformed patterns like `><meta` and short comment forms like `<!-->`. The code changes go beyond the release notes by modifying how the exported sniffing function accepts its options and by adjusting internal scanning/indexing logic in the HTML parser.

    Breaking
  • v1.0.2

    10/23/2017

    Release v1.0.2 switches the project licensing from WTFPL to the MIT license. No functional code changes are shown in the provided diff, only license and package metadata updates.

    Breaking
  • v1.0.1

    10/16/2016

    Release v1.0.1 fixes an off-by-one error in the HTML encoding sniffer when parsing `<meta http-equiv>` tags with unquoted attributes. As a result, charset detection should work correctly in cases that previously failed to identify the encoding.

  • v1.0.0

    10/16/2016

    Version 1.0.0 introduces code extracted from jsdom, specifically from the jsdom encoding helper logic. The release notes indicate changes to perform manual, per-spec parsing of Content-Type values.