Suggestion
Move the tree-sitter-language-pack dependency forward to the 1.9.0rc.x rewrite. The current pins are exactly the combination that deadlocks on macOS arm64 for Java, and the upstream fix only exists in the rewrite — so today, downstream users on Apple Silicon still hang on Java even though the upstream bug is resolved.
Current state
pyproject.toml on main pins:
"tree-sitter>=0.25,<0.26",
"tree-sitter-language-pack>=1.0,<1.8.0,!=1.6.3",
That is precisely the broken matrix cell: legacy tree-sitter-language-pack 1.x + tree-sitter 0.25.x on macOS arm64 → get_language("java") / get_parser("java") hang forever (no exception, no timeout, only kill -9; even faulthandler ... exit=True can't interrupt it because the block is inside the native extension). This is the same root cause as #151 (Java 代码不兼容) and #133 (MCP timeout).
Upstream is resolved — but only in the rewrite
See kreuzberg-dev/tree-sitter-language-pack#137. The maintainer confirmed the legacy 1.x line is the pre-rewrite Cython module and is unmaintained; the current line is a Rust + PyO3 rewrite shipping under the same package name as 1.9.0rc.x. The native deadlock is fixed there.
Re-tested on the exact affected machine (macOS 26 / Darwin 25.4.0 / Apple Silicon arm64 / Python 3.14.2) with the latest pre-release 1.9.0rc53 (1.9.0rc34 was pulled from PyPI):
import: 0.017s
get_language('java'): 0.000s -> <Language object>
get_parser('java'): 0.000s -> <Parser object>
No hang, watchdog never fires. (Upstream confirmation: kreuzberg-dev/tree-sitter-language-pack#137#issuecomment-4726882770)
Why the existing line-chunking fallback (#115) does not cover this
The #115 fallback catches the case where a parser can't be downloaded (a raised exception) and degrades to line chunking. This bug is a native deadlock — it raises nothing and can't be signaled — so the try/except never fires and the process just hangs. The fallback therefore does not protect macOS arm64 Java users from this.
Recommendation
Relax the <1.8.0 upper bound to allow 1.9.0rc.x (adopt the pre-release now, or pin to >=1.9.0 once the non-rc 1.9.0 is cut upstream).
⚠️ Not a drop-in version bump — it's a breaking API change. In the rewrite:
Parser.parse() now takes str, not bytes (passing bytes raises TypeError).
- Node accessors changed.
This is exactly what the closed PR #128 (fix: tree-sitter 0.25+ API compatibility in chunking, branch fix-bytes-str-tree-sitter-parse) was addressing — that work likely needs to be revived and landed alongside the dependency bump.
Related
Suggestion
Move the
tree-sitter-language-packdependency forward to the 1.9.0rc.x rewrite. The current pins are exactly the combination that deadlocks on macOS arm64 for Java, and the upstream fix only exists in the rewrite — so today, downstream users on Apple Silicon still hang on Java even though the upstream bug is resolved.Current state
pyproject.tomlonmainpins:That is precisely the broken matrix cell: legacy
tree-sitter-language-pack1.x +tree-sitter0.25.x on macOS arm64 →get_language("java")/get_parser("java")hang forever (no exception, no timeout, onlykill -9; evenfaulthandler ... exit=Truecan't interrupt it because the block is inside the native extension). This is the same root cause as #151 (Java 代码不兼容) and #133 (MCP timeout).Upstream is resolved — but only in the rewrite
See kreuzberg-dev/tree-sitter-language-pack#137. The maintainer confirmed the legacy 1.x line is the pre-rewrite Cython module and is unmaintained; the current line is a Rust + PyO3 rewrite shipping under the same package name as
1.9.0rc.x. The native deadlock is fixed there.Re-tested on the exact affected machine (macOS 26 / Darwin 25.4.0 / Apple Silicon arm64 / Python 3.14.2) with the latest pre-release
1.9.0rc53(1.9.0rc34was pulled from PyPI):No hang, watchdog never fires. (Upstream confirmation: kreuzberg-dev/tree-sitter-language-pack#137#issuecomment-4726882770)
Why the existing line-chunking fallback (#115) does not cover this
The #115 fallback catches the case where a parser can't be downloaded (a raised exception) and degrades to line chunking. This bug is a native deadlock — it raises nothing and can't be signaled — so the
try/exceptnever fires and the process just hangs. The fallback therefore does not protect macOS arm64 Java users from this.Recommendation
Relax the
<1.8.0upper bound to allow1.9.0rc.x(adopt the pre-release now, or pin to>=1.9.0once the non-rc1.9.0is cut upstream).Parser.parse()now takesstr, notbytes(passingbytesraisesTypeError).This is exactly what the closed PR #128 (
fix: tree-sitter 0.25+ API compatibility in chunking, branchfix-bytes-str-tree-sitter-parse) was addressing — that work likely needs to be revived and landed alongside the dependency bump.Related