Skip to content

Upgrade tree-sitter-language-pack to the 1.9.0rc.x rewrite — fixes macOS arm64 Java hang (upstream resolved in #137) #200

@wolgy

Description

@wolgy

Suggestion

Move the tree-sitter-language-pack dependency forward to the 1.9.0rc.x rewrite. The current pins are exactly the combination that deadlocks on macOS arm64 for Java, and the upstream fix only exists in the rewrite — so today, downstream users on Apple Silicon still hang on Java even though the upstream bug is resolved.

Current state

pyproject.toml on main pins:

"tree-sitter>=0.25,<0.26",
"tree-sitter-language-pack>=1.0,<1.8.0,!=1.6.3",

That is precisely the broken matrix cell: legacy tree-sitter-language-pack 1.x + tree-sitter 0.25.x on macOS arm64get_language("java") / get_parser("java") hang forever (no exception, no timeout, only kill -9; even faulthandler ... exit=True can't interrupt it because the block is inside the native extension). This is the same root cause as #151 (Java 代码不兼容) and #133 (MCP timeout).

Upstream is resolved — but only in the rewrite

See kreuzberg-dev/tree-sitter-language-pack#137. The maintainer confirmed the legacy 1.x line is the pre-rewrite Cython module and is unmaintained; the current line is a Rust + PyO3 rewrite shipping under the same package name as 1.9.0rc.x. The native deadlock is fixed there.

Re-tested on the exact affected machine (macOS 26 / Darwin 25.4.0 / Apple Silicon arm64 / Python 3.14.2) with the latest pre-release 1.9.0rc53 (1.9.0rc34 was pulled from PyPI):

import: 0.017s
get_language('java'): 0.000s -> <Language object>
get_parser('java'):   0.000s -> <Parser object>

No hang, watchdog never fires. (Upstream confirmation: kreuzberg-dev/tree-sitter-language-pack#137#issuecomment-4726882770)

Why the existing line-chunking fallback (#115) does not cover this

The #115 fallback catches the case where a parser can't be downloaded (a raised exception) and degrades to line chunking. This bug is a native deadlock — it raises nothing and can't be signaled — so the try/except never fires and the process just hangs. The fallback therefore does not protect macOS arm64 Java users from this.

Recommendation

Relax the <1.8.0 upper bound to allow 1.9.0rc.x (adopt the pre-release now, or pin to >=1.9.0 once the non-rc 1.9.0 is cut upstream).

⚠️ Not a drop-in version bump — it's a breaking API change. In the rewrite:

  • Parser.parse() now takes str, not bytes (passing bytes raises TypeError).
  • Node accessors changed.

This is exactly what the closed PR #128 (fix: tree-sitter 0.25+ API compatibility in chunking, branch fix-bytes-str-tree-sitter-parse) was addressing — that work likely needs to be revived and landed alongside the dependency bump.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions