Rewrite SourceMapper to scan directories with sourceMappingURL support#292
Rewrite SourceMapper to scan directories with sourceMappingURL support#292
Conversation
…L support Replace `SourceMapper.create(searchDirs[])` with a two-part API: - `new SourceMapper()` constructs an empty mapper synchronously - `await sm.loadDirectory(dir)` populates it asynchronously This separates construction from loading, allowing callers to fire off the async scan without blocking profiler initialization. In production the scan is fire-and-forget (completes well before the first profile is taken); in tests it is awaited directly. The directory scan now uses a two-phase approach per JS file: Phase 1 (higher priority): reads each .js/.cjs/.mjs file and checks for a `sourceMappingURL` annotation (per TC39 ECMA-426). Inline `data:application/json;base64,` URLs are decoded in-memory; external file URLs are loaded from disk if the file exists. Phase 2 (fallback): processes .map files found in the directory using the original logic (file property → naming convention). Skips any JS file that Phase 1 already resolved. processSourceMap is refactored to parse just the `file` JSON property before creating the SourceMapConsumer, so we can bail out early (skipping consumer creation) if the JS file was already loaded in Phase 1. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Re-adds the static async create(searchDirs[], debug?) factory method, now implemented as a thin wrapper over new SourceMapper() + loadDirectory(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The previous regex-scan-all approach was incorrect. The spec requires:
- Split source on line terminators (\r\n, \n, \r, \u2028, \u2029)
- Iterate lines from the end, skipping empty/whitespace-only lines
- On the first non-empty line found, return null immediately if it does
not carry a valid annotation (early-exit semantics: the URL must be on
the very last non-empty line of the file)
- Return null if the line has no "//" comment marker
- Return null if the comment text contains quote chars (", ', `)
- Apply MatchSourceMapURL pattern ^[@#]\s*sourceMappingURL=(\S*?)\s*$
to the comment text; return the captured URL or null
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces the full readFile() call in Phase 1 with readSourceMappingURL(), which opens the file, seeks to EOF - 4KB, and reads only that tail. Correctness argument: the annotation must be on the last non-empty line (ECMA-426). A single source-code line contains no embedded line terminators, so if the tail contains at least one line terminator the last non-empty line starts somewhere inside the tail (after the last terminator) and extends to EOF — it is fully captured. extractSourceMappingURL receives a complete last line and produces the correct result. The only case where the tail contains no line terminator is when the file ends with one very long unbroken line — a large inline data: map whose base64 payload exceeds 4 KB. In that case we fall back to a full readFile. Based on a survey of 30 real-world inline source maps on GitHub, the median decoded map size is 0.6 KB (~0.8 KB base64), so the fallback is uncommon. All external sourceMappingURL annotations are always captured in the tail. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The previous check (LINE_TERM_RE.test(tail)) was wrong: a file whose last non-empty line is a large inline map followed by trailing empty lines would produce a tail like "<end-of-base64>\n\n" — line terminators present, but the last non-empty content is still the first split segment, potentially extending before the tail window. The correct condition: the last non-empty line is fully captured iff it is NOT the first element of tail.split(LINE_SPLIT_RE) — i.e. a line terminator that precedes it also falls within the tail (lastNonEmptyIdx > 0). Walk back from the end of the split array skipping whitespace-only segments, then check whether the last non-empty segment index is > 0. This also correctly handles the case where the large inline map line itself is followed by a single trailing newline: tail = "<base64-end>\n" gives lines = ["<base64-end>", ""], lastNonEmptyIdx = 0, tailSize < size → fallback. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Covers extractSourceMappingURL: - standard annotation, legacy //@ prefix, trailing whitespace-only lines - leading whitespace before //, all line-terminator variants - early-exit: last non-empty line with no //, non-matching //, quote chars - empty / whitespace-only content - inline data: URL Covers readSourceMappingURL tail-read logic: - small file (fits in tail): external URL, inline URL, no annotation - large file: external URL captured in tail - large inline map, no trailing newline → full-file fallback - large inline map, single trailing newline → full-file fallback - large inline map, multiple trailing empty lines → full-file fallback (regression case for the lastNonEmptyIdx === 0 bug) - large file with no annotation → undefined - empty file → undefined Also exports extractSourceMappingURL, readSourceMappingURL, and ANNOTATION_TAIL_BYTES so the tests can reference the actual constant. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Overall package sizeSelf size: 2.03 MB Dependency sizes| name | version | self size | total size | |------|---------|-----------|------------| | source-map | 0.7.6 | 185.63 kB | 185.63 kB | | pprof-format | 2.2.1 | 163.06 kB | 163.06 kB | | node-gyp-build | 3.9.0 | 8.81 kB | 8.81 kB |🤖 This report was automatically generated by heaviest-objects-in-the-universe |
Adds two integration tests for SourceMapper.loadDirectory(): - When a JS file's sourceMappingURL annotation points to a file that does not exist, Phase 1 silently skips it and Phase 2 loads the mapping from a conventional .map file found in the directory scan. - When the annotation points to a missing file and no .map fallback exists either, hasMappingInfo() returns false for that JS file. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…al diff Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
During phase 1 you're using relative key to so store data on infoMap, while in phase 2 we have absolute key, this means that we will end up with two same entries with different keys.
ts/src/sourcemapper/sourcemapper.ts
Outdated
| // is already loaded (e.g. via a sourceMappingURL annotation). | ||
| let rawFile: string | undefined; | ||
| try { | ||
| rawFile = (JSON.parse(contents) as {file?: string}).file; |
There was a problem hiding this comment.
IMO it's better to save it as JSON and later send it to SourceMapperConsumer as JSON object to avoid internal parsing again https://github.com/mozilla/source-map/blob/f4f1a9079a66083c0262275b26d050bd5f4f2a33/lib/source-map-consumer.js#L178
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cce867a7c4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const consumer = (await new sourceMap.SourceMapConsumer( | ||
| mapContent as {} as sourceMap.RawSourceMap, | ||
| )) as {} as sourceMap.RawSourceMap; | ||
| this.infoMap.set(jsPath, {mapFileDir: mapDir, mapConsumer: consumer}); |
There was a problem hiding this comment.
Normalize JS path before storing sourceMappingURL mapping
Phase 1 stores mappings under jsPath as received from walk, which can be relative when loadDirectory()/create() is called with a relative search dir; later lookups in mappingInfo() are done against normalized runtime frame paths (typically absolute), so inline maps (and external maps loaded only via annotation) can be loaded but never found. This is a regression from the .map path flow, which resolves generated paths to absolute paths before inserting into infoMap.
Useful? React with 👍 / 👎.
- Resolve searchDir to an absolute path at the start of loadDirectory() so that all paths stored in infoMap from Phase 1 (walk-derived) are consistent with the absolute paths produced by Phase 2 (path.resolve), preventing duplicate entries with different keys for the same file. - Parse source map JSON once in processSourceMap and reuse the object when constructing SourceMapConsumer, avoiding a second internal parse. Apply the same pattern in loadMapContent. Addresses review feedback on PR #292. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
Replaces the old
SourceMapper.create(searchDirs[])API (which only found.mapfiles and linked them to JS files via thefileproperty) with a two-phase directory scan that also readssourceMappingURLannotations from JS files per TC39 ECMA-426.new SourceMapper()constructs an empty mapper synchronouslyawait sm.loadDirectory(dir)populates it asynchronously — safe to fire-and-forget in production (first profile is collected after ~65 s), awaited directly in testsSourceMapper.create(searchDirs[], debug?)is kept for backwards compatibility, delegating to the new APIPhase 1 —
sourceMappingURLannotation (higher priority)Each
.js/.cjs/.mjsfile in the directory is read to check for asourceMappingURLannotation on its last non-empty line, per the TC39 ECMA-426 §11.1.2.1 spec (without-parsing variant):\r\n,\n,\r,\u2028,\u2029)nullif it doesn't carry a valid annotation",', or`is rejectedMatchSourceMapURLpattern:^[@#]\s*sourceMappingURL=(\S*?)\s*$applied to the text after//data:application/json;base64,…URLs and external file paths (loaded only if the file exists)Phase 2 —
.mapfile fallbackFor any JS file not resolved in Phase 1, the original
.map-file scan logic runs as before (using thefileproperty or naming convention). TheprocessSourceMapfunction is refactored to parse thefileproperty from JSON before creating aSourceMapConsumer, enabling an early exit if the JS file was already loaded.Tail-read optimisation
Instead of reading entire JS files to extract the annotation, only the last 4 KB is read using a positioned
fd.read(). The last non-empty line is always fully captured in the tail as long as a line terminator preceding it also falls within the tail window (lastNonEmptyIdx > 0in the split result). If not — i.e. the tail is entirely inside a very long inline-map line — the function falls back to a full file read. Based on a survey of 30 real-world inline source maps on GitHub, the median decoded map size is 0.6 KB, so the fallback is uncommon. All externalsourceMappingURLannotations are always captured in the 4 KB tail.Test plan
ts/test/test-sourcemapper.tswith 23 tests covering:extractSourceMappingURL: standard annotation, legacy//@prefix, trailing whitespace lines, leading whitespace before//, all line-terminator variants, early-exit semantics, quote-char rejection, inlinedata:URL, edge cases (empty, whitespace-only)readSourceMappingURL: small files (tail captures everything), large file with short external URL, large inline map with no trailing newline (fallback), with one trailing newline (fallback), with multiple trailing empty lines (fallback — regression case), large file with no annotation, empty filenpm test)🤖 Generated with Claude Code
Jira: PROF-13986