fix(extract): route extensionless shebang scripts to their AST extractor#1683
Open
Stashub wants to merge 1 commit into
Open
fix(extract): route extensionless shebang scripts to their AST extractor#1683Stashub wants to merge 1 commit into
Stashub wants to merge 1 commit into
Conversation
detect.classify_file already labels extensionless files with a bash/python/ node/... shebang as CODE via _shebang_interpreter, but _get_extractor dispatched purely on path.suffix — so a CLI entry point like `devctl` or `manage` was detected as code and then silently contributed zero nodes to the graph (its doc-referenced symbols stayed dangling stubs). Resolve extensionless files through the same _shebang_interpreter and a new _SHEBANG_DISPATCH map. Only interpreters with a real extractor are mapped (python/bash-family/node/ruby/lua/php/julia); detect's wider set (perl, fish, tcsh, Rscript) stays unmapped and skipped rather than being mis-parsed by a wrong grammar. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
detect.classify_filealready labels extensionless files with a bash/python/node/… shebang as CODE via_shebang_interpreter, but_get_extractordispatches purely on the path suffix. So a CLI entry point likedevctlormanageis detected as code and then silently contributes zero nodes to the graph — its doc-referenced symbols stay dangling stubs.Found while graphing a devops repo whose main artifact is an extensionless bash CLI: the file most referenced by the docs produced no AST nodes at all.
Fix
Resolve extensionless files through the same
_shebang_interpreterand a new_SHEBANG_DISPATCHmap, so extraction honors the same signal as detection.Only interpreters with a real extractor are mapped (python / bash-family / node / ruby / lua / php / julia). detect's wider set (perl, fish, tcsh, Rscript) stays unmapped and skipped rather than being mis-parsed by a wrong grammar.
Tests
test_extensionless_shebang_via_dispatch— bash/python/env -Sshebangs resolve to their extractorstest_extensionless_without_usable_shebang_stays_unsupported— no shebang / no-extractor interpreters (perl) still returnNonetest_extract_extensionless_bash_cli_end_to_end— a shebang-only bash CLI contributes nodes with the same ID scheme as a.shfile, so doc-created stub IDs mergepytest tests/test_extract.py tests/test_detect.py— 236 passed.🤖 Generated with Claude Code