Determines the character encoding of an HTML byte stream by implementing the HTML Standard’s encoding sniffing algorithm. It pre-scans the first 1024 bytes to look for `<meta charset>`-related patterns and returns a canonical encoding name, useful for decoding HTML content correctly from raw bytes.
Project status
- Actively maintained, with recent commits and back-to-back updates in late 2025 (v5.0.0 on 2025-12-23, v6.0.0 on 2025-12-26), indicating ongoing development rather than maintenance mode.
- Update cadence looks irregular, with a short gap between v5 and v6 (3 days), but a much longer gap previously (v4 in 2023-11-12 to v5 in 2025-12-23).
AI summary generated Today
Recent updates
v6.0.0
5 months agoRelease 6.0.0 updates html-encoding-sniffer to return canonical encoding names (like "UTF-8") instead of lowercased encoding labels, and introduces an `xml` option to change how encoding is determined. The code also adjusts defaulting behavior and expands the supported option set, and it bumps `@exodus/bytes` to a newer version.
BreakingFeaturesv5.0.0
5 months agoRelease v5.0.0 raises the minimum supported Node.js versions and changes the main API output from encoding names to encoding labels (specifically, lowercased). The implementation also swaps out the underlying encoding mapping library used by the sniffing algorithm.
Breakingv4.0.0
11/12/2023Release 4.0.0 raises the minimum supported Node.js version to 18 and claims a fix for correctly handling `<meta charset="x-user-defined">`. The actual diff shows additional ecosystem and dependency changes beyond the release notes.
Breakingv3.0.0
9/18/2021Release 3.0.0 raises the minimum supported Node.js version to 12 and formally documents and tests support for passing any Uint8Array to the encoding sniffer. The core implementation changes the exported function to operate on a Uint8Array, updating internal helpers to use byteLength and Uint8Array indexing instead of Buffer-oriented assumptions.
BreakingFeaturesv2.0.1
2/23/2020Release v2.0.1 focuses on fixing edge cases in HTML charset sniffing, specifically around malformed or duplicated meta tag attributes. The main changes are in the meta attribute prescan logic and the routine that extracts the encoding from a meta tag’s content string.
v2.0.0
2/23/2020Release 2.0.0 raises the minimum supported Node.js version to v10 and includes targeted fixes to HTML meta charset sniffing for malformed patterns like `><meta` and short comment forms like `<!-->`. The code changes go beyond the release notes by modifying how the exported sniffing function accepts its options and by adjusting internal scanning/indexing logic in the HTML parser.
Breakingv1.0.2
10/23/2017Release v1.0.2 switches the project licensing from WTFPL to the MIT license. No functional code changes are shown in the provided diff, only license and package metadata updates.
Breakingv1.0.1
10/16/2016Release v1.0.1 fixes an off-by-one error in the HTML encoding sniffer when parsing `<meta http-equiv>` tags with unquoted attributes. As a result, charset detection should work correctly in cases that previously failed to identify the encoding.
v1.0.0
10/16/2016Version 1.0.0 introduces code extracted from jsdom, specifically from the jsdom encoding helper logic. The release notes indicate changes to perform manual, per-spec parsing of Content-Type values.