Skip to content

Rewrite SourceMapper to scan directories with sourceMappingURL support#292

Merged
szegedi merged 9 commits intomainfrom
lazy-sourcemapper
Mar 12, 2026
Merged

Rewrite SourceMapper to scan directories with sourceMappingURL support#292
szegedi merged 9 commits intomainfrom
lazy-sourcemapper

Conversation

@szegedi
Copy link

@szegedi szegedi commented Mar 9, 2026

Summary

Replaces the old SourceMapper.create(searchDirs[]) API (which only found .map files and linked them to JS files via the file property) with a two-phase directory scan that also reads sourceMappingURL annotations from JS files per TC39 ECMA-426.

  • new SourceMapper() constructs an empty mapper synchronously
  • await sm.loadDirectory(dir) populates it asynchronously — safe to fire-and-forget in production (first profile is collected after ~65 s), awaited directly in tests
  • SourceMapper.create(searchDirs[], debug?) is kept for backwards compatibility, delegating to the new API

Phase 1 — sourceMappingURL annotation (higher priority)

Each .js/.cjs/.mjs file in the directory is read to check for a sourceMappingURL annotation on its last non-empty line, per the TC39 ECMA-426 §11.1.2.1 spec (without-parsing variant):

  • Lines are split on all ECMA-262 line terminators (\r\n, \n, \r, \u2028, \u2029)
  • Lines are scanned from the end; empty/whitespace-only lines are skipped
  • The first non-empty line found is the only candidate — early return null if it doesn't carry a valid annotation
  • A comment containing ", ', or ` is rejected
  • MatchSourceMapURL pattern: ^[@#]\s*sourceMappingURL=(\S*?)\s*$ applied to the text after //
  • Supports inline data:application/json;base64,… URLs and external file paths (loaded only if the file exists)

Phase 2 — .map file fallback

For any JS file not resolved in Phase 1, the original .map-file scan logic runs as before (using the file property or naming convention). The processSourceMap function is refactored to parse the file property from JSON before creating a SourceMapConsumer, enabling an early exit if the JS file was already loaded.

Tail-read optimisation

Instead of reading entire JS files to extract the annotation, only the last 4 KB is read using a positioned fd.read(). The last non-empty line is always fully captured in the tail as long as a line terminator preceding it also falls within the tail window (lastNonEmptyIdx > 0 in the split result). If not — i.e. the tail is entirely inside a very long inline-map line — the function falls back to a full file read. Based on a survey of 30 real-world inline source maps on GitHub, the median decoded map size is 0.6 KB, so the fallback is uncommon. All external sourceMappingURL annotations are always captured in the 4 KB tail.

Test plan

  • New test file ts/test/test-sourcemapper.ts with 23 tests covering:
    • extractSourceMappingURL: standard annotation, legacy //@ prefix, trailing whitespace lines, leading whitespace before //, all line-terminator variants, early-exit semantics, quote-char rejection, inline data: URL, edge cases (empty, whitespace-only)
    • readSourceMappingURL: small files (tail captures everything), large file with short external URL, large inline map with no trailing newline (fallback), with one trailing newline (fallback), with multiple trailing empty lines (fallback — regression case), large file with no annotation, empty file
  • All 75 tests pass (npm test)

🤖 Generated with Claude Code

Jira: PROF-13986

szegedi and others added 6 commits March 5, 2026 08:12
…L support

Replace `SourceMapper.create(searchDirs[])` with a two-part API:
- `new SourceMapper()` constructs an empty mapper synchronously
- `await sm.loadDirectory(dir)` populates it asynchronously

This separates construction from loading, allowing callers to fire off
the async scan without blocking profiler initialization. In production the
scan is fire-and-forget (completes well before the first profile is taken);
in tests it is awaited directly.

The directory scan now uses a two-phase approach per JS file:

  Phase 1 (higher priority): reads each .js/.cjs/.mjs file and checks for
  a `sourceMappingURL` annotation (per TC39 ECMA-426). Inline
  `data:application/json;base64,` URLs are decoded in-memory; external
  file URLs are loaded from disk if the file exists.

  Phase 2 (fallback): processes .map files found in the directory using
  the original logic (file property → naming convention). Skips any JS
  file that Phase 1 already resolved.

processSourceMap is refactored to parse just the `file` JSON property
before creating the SourceMapConsumer, so we can bail out early (skipping
consumer creation) if the JS file was already loaded in Phase 1.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Re-adds the static async create(searchDirs[], debug?) factory method,
now implemented as a thin wrapper over new SourceMapper() + loadDirectory().

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The previous regex-scan-all approach was incorrect. The spec requires:

- Split source on line terminators (\r\n, \n, \r, \u2028, \u2029)
- Iterate lines from the end, skipping empty/whitespace-only lines
- On the first non-empty line found, return null immediately if it does
  not carry a valid annotation (early-exit semantics: the URL must be on
  the very last non-empty line of the file)
- Return null if the line has no "//" comment marker
- Return null if the comment text contains quote chars (", ', `)
- Apply MatchSourceMapURL pattern ^[@#]\s*sourceMappingURL=(\S*?)\s*$
  to the comment text; return the captured URL or null

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces the full readFile() call in Phase 1 with readSourceMappingURL(),
which opens the file, seeks to EOF - 4KB, and reads only that tail.

Correctness argument: the annotation must be on the last non-empty line
(ECMA-426). A single source-code line contains no embedded line terminators,
so if the tail contains at least one line terminator the last non-empty line
starts somewhere inside the tail (after the last terminator) and extends to
EOF — it is fully captured. extractSourceMappingURL receives a complete last
line and produces the correct result.

The only case where the tail contains no line terminator is when the file
ends with one very long unbroken line — a large inline data: map whose
base64 payload exceeds 4 KB. In that case we fall back to a full readFile.
Based on a survey of 30 real-world inline source maps on GitHub, the median
decoded map size is 0.6 KB (~0.8 KB base64), so the fallback is uncommon.
All external sourceMappingURL annotations are always captured in the tail.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The previous check (LINE_TERM_RE.test(tail)) was wrong: a file whose last
non-empty line is a large inline map followed by trailing empty lines would
produce a tail like "<end-of-base64>\n\n" — line terminators present, but the
last non-empty content is still the first split segment, potentially extending
before the tail window.

The correct condition: the last non-empty line is fully captured iff it is
NOT the first element of tail.split(LINE_SPLIT_RE) — i.e. a line terminator
that precedes it also falls within the tail (lastNonEmptyIdx > 0). Walk back
from the end of the split array skipping whitespace-only segments, then check
whether the last non-empty segment index is > 0.

This also correctly handles the case where the large inline map line itself
is followed by a single trailing newline: tail = "<base64-end>\n" gives
lines = ["<base64-end>", ""], lastNonEmptyIdx = 0, tailSize < size → fallback.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Covers extractSourceMappingURL:
- standard annotation, legacy //@ prefix, trailing whitespace-only lines
- leading whitespace before //, all line-terminator variants
- early-exit: last non-empty line with no //, non-matching //, quote chars
- empty / whitespace-only content
- inline data: URL

Covers readSourceMappingURL tail-read logic:
- small file (fits in tail): external URL, inline URL, no annotation
- large file: external URL captured in tail
- large inline map, no trailing newline → full-file fallback
- large inline map, single trailing newline → full-file fallback
- large inline map, multiple trailing empty lines → full-file fallback
  (regression case for the lastNonEmptyIdx === 0 bug)
- large file with no annotation → undefined
- empty file → undefined

Also exports extractSourceMappingURL, readSourceMappingURL, and
ANNOTATION_TAIL_BYTES so the tests can reference the actual constant.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link

github-actions bot commented Mar 9, 2026

Overall package size

Self size: 2.03 MB
Deduped: 2.39 MB
No deduping: 2.39 MB

Dependency sizes | name | version | self size | total size | |------|---------|-----------|------------| | source-map | 0.7.6 | 185.63 kB | 185.63 kB | | pprof-format | 2.2.1 | 163.06 kB | 163.06 kB | | node-gyp-build | 3.9.0 | 8.81 kB | 8.81 kB |

🤖 This report was automatically generated by heaviest-objects-in-the-universe

Adds two integration tests for SourceMapper.loadDirectory():

- When a JS file's sourceMappingURL annotation points to a file that does
  not exist, Phase 1 silently skips it and Phase 2 loads the mapping from
  a conventional .map file found in the directory scan.

- When the annotation points to a missing file and no .map fallback exists
  either, hasMappingInfo() returns false for that JS file.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@pr-commenter
Copy link

pr-commenter bot commented Mar 9, 2026

Benchmarks

Benchmark execution time: 2026-03-12 14:45:45

Comparing candidate commit 6e4e74a in PR branch lazy-sourcemapper with baseline commit 37c40ef in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 93 metrics, 27 unstable metrics.

@szegedi szegedi added the semver-minor Usually minor non-breaking improvements label Mar 11, 2026
…al diff

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During phase 1 you're using relative key to so store data on infoMap, while in phase 2 we have absolute key, this means that we will end up with two same entries with different keys.

// is already loaded (e.g. via a sourceMappingURL annotation).
let rawFile: string | undefined;
try {
rawFile = (JSON.parse(contents) as {file?: string}).file;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it's better to save it as JSON and later send it to SourceMapperConsumer as JSON object to avoid internal parsing again https://github.com/mozilla/source-map/blob/f4f1a9079a66083c0262275b26d050bd5f4f2a33/lib/source-map-consumer.js#L178

@IlyasShabi
Copy link

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cce867a7c4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

const consumer = (await new sourceMap.SourceMapConsumer(
mapContent as {} as sourceMap.RawSourceMap,
)) as {} as sourceMap.RawSourceMap;
this.infoMap.set(jsPath, {mapFileDir: mapDir, mapConsumer: consumer});

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Normalize JS path before storing sourceMappingURL mapping

Phase 1 stores mappings under jsPath as received from walk, which can be relative when loadDirectory()/create() is called with a relative search dir; later lookups in mappingInfo() are done against normalized runtime frame paths (typically absolute), so inline maps (and external maps loaded only via annotation) can be loaded but never found. This is a regression from the .map path flow, which resolves generated paths to absolute paths before inserting into infoMap.

Useful? React with 👍 / 👎.

- Resolve searchDir to an absolute path at the start of loadDirectory()
  so that all paths stored in infoMap from Phase 1 (walk-derived) are
  consistent with the absolute paths produced by Phase 2 (path.resolve),
  preventing duplicate entries with different keys for the same file.

- Parse source map JSON once in processSourceMap and reuse the object
  when constructing SourceMapConsumer, avoiding a second internal parse.
  Apply the same pattern in loadMapContent.

Addresses review feedback on PR #292.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link

@IlyasShabi IlyasShabi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done 👏

@szegedi szegedi enabled auto-merge (squash) March 12, 2026 14:46
@szegedi szegedi disabled auto-merge March 12, 2026 14:47
@szegedi szegedi enabled auto-merge (squash) March 12, 2026 15:02
@szegedi szegedi merged commit 6ab3e39 into main Mar 12, 2026
174 of 181 checks passed
@szegedi szegedi deleted the lazy-sourcemapper branch March 12, 2026 15:07
@szegedi szegedi mentioned this pull request Mar 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

semver-minor Usually minor non-breaking improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants