Skip to content

chore(deps): bump chardet from 5.2.0 to 7.0.1 in /services/knowledge-base/requirements#52

Open
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/pip/services/knowledge-base/requirements/chardet-7.0.1
Open

chore(deps): bump chardet from 5.2.0 to 7.0.1 in /services/knowledge-base/requirements#52
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/pip/services/knowledge-base/requirements/chardet-7.0.1

Conversation

@dependabot
Copy link
Copy Markdown

@dependabot dependabot Bot commented on behalf of github Mar 9, 2026

Bumps chardet from 5.2.0 to 7.0.1.

Release notes

Sourced from chardet's releases.

7.0.1

Fixes

  • Fixed false UTF-7 detection of SHA-1 git hashes (#324, fixing #323) — requirements files with VCS pins (e.g., +4bafdea3...) were misdetected as UTF-7, breaking tools like tox
  • Fixed _SINGLE_LANG_MAP missing aliases for single-language encoding lookup (e.g., big5big5hkscs)
  • Fixed PyPy TypeError in UTF-7 codec handling

Improvements

  • Retrained bigram models — 24 previously failing test cases now pass
  • Updated language equivalences for mutual intelligibility (Slovak/Czech, East Slavic + Bulgarian, Malay/Indonesian, Scandinavian languages)

New Contributors

  • @​rembish made their first contribution — both reporting the UTF-7 false detection issue and submitting the fix! (#323, #324)

7.0.0

Ground-up, MIT-licensed rewrite of chardet. Same package name, same public API — drop-in replacement for chardet 5.x/6.x. Just way faster and more accurate!

Highlights:

  • MIT license (previous versions were LGPL)
  • 96.8% accuracy on 2,179 test files (+2.3pp vs chardet 6.0.0, +7.7pp vs charset-normalizer)
  • 41x faster than chardet 6.0.0 with mypyc (28x pure Python), 7.5x faster than charset-normalizer
  • Language detection for every result (90.5% accuracy across 49 languages)
  • 99 encodings across six eras (MODERN_WEB, LEGACY_ISO, LEGACY_MAC, LEGACY_REGIONAL, DOS, MAINFRAME)
  • 12-stage detection pipeline — BOM, UTF-16/32 patterns, escape sequences, binary detection, markup charset, ASCII, UTF-8 validation, byte validity, CJK gating, structural probing, statistical scoring, post-processing
  • Bigram frequency models trained on CulturaX multilingual corpus data for all supported language/encoding pairs
  • Optional mypyc compilation — 1.49x additional speedup on CPython
  • Thread-safe detect() and detect_all() with no measurable overhead; scales on free-threaded Python 3.13t+
  • Negligible import memory (96 B)
  • Zero runtime dependencies

Breaking changes vs 6.0.0:

  • detect() and detect_all() now default to encoding_era=EncodingEra.ALL (6.0.0 defaulted to MODERN_WEB)
  • Internal architecture is completely different (probers replaced by pipeline stages). Only the public API is preserved.
  • LanguageFilter is accepted but ignored (deprecation warning emitted)
  • chunk_size is accepted but ignored (deprecation warning emitted)

6.0.0.post1

  • Fixed version number in chardet/version.py still being set to 6.0.0dev0. Otherwise identical to 6.0.0.

6.0.0

Features

  • Unified single-byte charset detection: Instead of only having trained language models for a handful of languages (Bulgarian, Greek, Hebrew, Hungarian, Russian, Thai, Turkish) and relying on special-case Latin1Prober and MacRomanProber heuristics for Western encodings, chardet now treats all single-byte charsets the same way: every encoding gets proper language-specific bigram models trained on CulturaX corpus data. This means chardet can now accurately detect both the encoding and the language for all supported single-byte encodings.
  • 38 new languages: Arabic, Belarusian, Breton, Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Farsi, Finnish, French, German, Icelandic, Indonesian, Irish, Italian, Kazakh, Latvian, Lithuanian, Macedonian, Malay, Maltese, Norwegian, Polish, Portuguese, Romanian, Scottish Gaelic, Serbian, Slovak, Slovene, Spanish, Swedish, Tajik, Ukrainian, Vietnamese, and Welsh. Existing models for Bulgarian, Greek, Hebrew, Hungarian, Russian, Thai, and Turkish were also retrained with the new pipeline.
  • EncodingEra filtering: New encoding_era parameter to detect allows filtering by an EncodingEra flag enum (MODERN_WEB, LEGACY_ISO, LEGACY_MAC, LEGACY_REGIONAL, DOS, MAINFRAME, ALL) allows callers to restrict detection to encodings from a specific era. detect() and detect_all() default to MODERN_WEB. The new MODERN_WEB default should drastically improve accuracy for users who are not working with legacy data. The tiers are:
    • MODERN_WEB: UTF-8/16/32, Windows-125x, CP874, CJK multi-byte (widely used on the web)

... (truncated)

Changelog

Sourced from chardet's changelog.

7.0.1 (2026-03-04)

Fixes:

  • Fixed false UTF-7 detection of SHA-1 git hashes ([#324](https://github.com/chardet/chardet/issues/324) <https://github.com/chardet/chardet/issues/324>_)
  • Fixed _SINGLE_LANG_MAP missing aliases for single-language encoding lookup (e.g., big5big5hkscs)
  • Fixed PyPy TypeError in UTF-7 codec handling

Improvements:

  • Retrained bigram models — 24 previously failing test cases now pass
  • Updated language equivalences for mutual intelligibility (Slovak/Czech, East Slavic + Bulgarian, Malay/Indonesian, Scandinavian languages)

7.0.0 (2026-03-02)

Ground-up, MIT-licensed rewrite of chardet. Same package name, same public API — drop-in replacement for chardet 5.x/6.x.

Highlights:

  • MIT license (previous versions were LGPL)
  • 96.8% accuracy on 2,179 test files (+2.3pp vs chardet 6.0.0, +7.7pp vs charset-normalizer)
  • 41x faster than chardet 6.0.0 with mypyc (28x pure Python), 7.5x faster than charset-normalizer
  • Language detection for every result (90.5% accuracy across 49 languages)
  • 99 encodings across six eras (MODERN_WEB, LEGACY_ISO, LEGACY_MAC, LEGACY_REGIONAL, DOS, MAINFRAME)
  • 12-stage detection pipeline — BOM, UTF-16/32 patterns, escape sequences, binary detection, markup charset, ASCII, UTF-8 validation, byte validity, CJK gating, structural probing, statistical scoring, post-processing
  • Bigram frequency models trained on CulturaX multilingual corpus data for all supported language/encoding pairs
  • Optional mypyc compilation — 1.49x additional speedup on CPython
  • Thread-safe detect() and detect_all() with no measurable overhead; scales on free-threaded Python 3.13t+
  • Negligible import memory (96 B)
  • Zero runtime dependencies

Breaking changes vs 6.0.0:

  • detect() and detect_all() now default to encoding_era=EncodingEra.ALL (6.0.0 defaulted to MODERN_WEB)

... (truncated)

Commits
  • 330e41e docs: update benchmark numbers for expanded test suite (2,510 files)
  • 83eb965 fix: remove unused cached_specs and add version mismatch diagnostic
  • b5ef193 feat: skip venv creation when full cache exists for detector
  • d98e26a fix: use project_root parameter instead of pip_args[0] in _resolve_version_wi...
  • 5a85c25 feat: add helpers for venv-less version/tag resolution and cache checking
  • f4917a3 Remove plans
  • 06ae339 Use package name in cache filenames and enrich display labels
  • 90fff1d Fix precommit hook failures
  • 611fc0b Bump coverage requirements up to 95% since we have 100%
  • cc21964 Add separate lint job back
  • Additional commits viewable in compare view

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [chardet](https://github.com/chardet/chardet) from 5.2.0 to 7.0.1.
- [Release notes](https://github.com/chardet/chardet/releases)
- [Changelog](https://github.com/chardet/chardet/blob/main/docs/changelog.rst)
- [Commits](chardet/chardet@5.2.0...7.0.1)

---
updated-dependencies:
- dependency-name: chardet
  dependency-version: 7.0.1
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot @github
Copy link
Copy Markdown
Author

dependabot Bot commented on behalf of github Mar 9, 2026

Labels

The following labels could not be found: dependencies, python. Please create them before Dependabot can add them to a pull request.

Please fix the above issues or remove invalid values from dependabot.yml.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 9, 2026

Dependency Review

The following issues were found:
  • ✅ 0 vulnerable package(s)
  • ✅ 0 package(s) with incompatible licenses
  • ✅ 0 package(s) with invalid SPDX license definitions
  • ⚠️ 1 package(s) with unknown licenses.
See the Details below.

License Issues

services/knowledge-base/requirements/processing.txt

PackageVersionLicenseIssue Type
chardet7.0.1NullUnknown License

OpenSSF Scorecard

PackageVersionScoreDetails
pip/chardet 7.0.1 UnknownUnknown

Scanned Files

  • services/knowledge-base/requirements/processing.txt

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 9, 2026

This PR has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs within 14 days.

@github-actions github-actions Bot added the stale label May 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants