Rewrite SourceMapper to scan directories with sourceMappingURL support by szegedi · Pull Request #292 · DataDog/pprof-nodejs

szegedi · 2026-03-09T13:42:17Z

Summary

Replaces the old SourceMapper.create(searchDirs[]) API (which only found .map files and linked them to JS files via the file property) with a two-phase directory scan that also reads sourceMappingURL annotations from JS files per TC39 ECMA-426.

new SourceMapper() constructs an empty mapper synchronously
await sm.loadDirectory(dir) populates it asynchronously — safe to fire-and-forget in production (first profile is collected after ~65 s), awaited directly in tests
SourceMapper.create(searchDirs[], debug?) is kept for backwards compatibility, delegating to the new API

Phase 1 — `sourceMappingURL` annotation (higher priority)

Each .js/.cjs/.mjs file in the directory is read to check for a sourceMappingURL annotation on its last non-empty line, per the TC39 ECMA-426 §11.1.2.1 spec (without-parsing variant):

Lines are split on all ECMA-262 line terminators (\r\n, \n, \r, \u2028, \u2029)
Lines are scanned from the end; empty/whitespace-only lines are skipped
The first non-empty line found is the only candidate — early return null if it doesn't carry a valid annotation
A comment containing ", ', or ` is rejected
MatchSourceMapURL pattern: ^[@#]\s*sourceMappingURL=(\S*?)\s*$ applied to the text after //
Supports inline data:application/json;base64,… URLs and external file paths (loaded only if the file exists)

Phase 2 — `.map` file fallback

For any JS file not resolved in Phase 1, the original .map-file scan logic runs as before (using the file property or naming convention). The processSourceMap function is refactored to parse the file property from JSON before creating a SourceMapConsumer, enabling an early exit if the JS file was already loaded.

Tail-read optimisation

Instead of reading entire JS files to extract the annotation, only the last 4 KB is read using a positioned fd.read(). The last non-empty line is always fully captured in the tail as long as a line terminator preceding it also falls within the tail window (lastNonEmptyIdx > 0 in the split result). If not — i.e. the tail is entirely inside a very long inline-map line — the function falls back to a full file read. Based on a survey of 30 real-world inline source maps on GitHub, the median decoded map size is 0.6 KB, so the fallback is uncommon. All external sourceMappingURL annotations are always captured in the 4 KB tail.

Test plan

New test file ts/test/test-sourcemapper.ts with 23 tests covering:
- extractSourceMappingURL: standard annotation, legacy //@ prefix, trailing whitespace lines, leading whitespace before //, all line-terminator variants, early-exit semantics, quote-char rejection, inline data: URL, edge cases (empty, whitespace-only)
- readSourceMappingURL: small files (tail captures everything), large file with short external URL, large inline map with no trailing newline (fallback), with one trailing newline (fallback), with multiple trailing empty lines (fallback — regression case), large file with no annotation, empty file
All 75 tests pass (npm test)

🤖 Generated with Claude Code

Jira: PROF-13986

…L support Replace `SourceMapper.create(searchDirs[])` with a two-part API: - `new SourceMapper()` constructs an empty mapper synchronously - `await sm.loadDirectory(dir)` populates it asynchronously This separates construction from loading, allowing callers to fire off the async scan without blocking profiler initialization. In production the scan is fire-and-forget (completes well before the first profile is taken); in tests it is awaited directly. The directory scan now uses a two-phase approach per JS file: Phase 1 (higher priority): reads each .js/.cjs/.mjs file and checks for a `sourceMappingURL` annotation (per TC39 ECMA-426). Inline `data:application/json;base64,` URLs are decoded in-memory; external file URLs are loaded from disk if the file exists. Phase 2 (fallback): processes .map files found in the directory using the original logic (file property → naming convention). Skips any JS file that Phase 1 already resolved. processSourceMap is refactored to parse just the `file` JSON property before creating the SourceMapConsumer, so we can bail out early (skipping consumer creation) if the JS file was already loaded in Phase 1. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Re-adds the static async create(searchDirs[], debug?) factory method, now implemented as a thin wrapper over new SourceMapper() + loadDirectory(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The previous regex-scan-all approach was incorrect. The spec requires: - Split source on line terminators (\r\n, \n, \r, \u2028, \u2029) - Iterate lines from the end, skipping empty/whitespace-only lines - On the first non-empty line found, return null immediately if it does not carry a valid annotation (early-exit semantics: the URL must be on the very last non-empty line of the file) - Return null if the line has no "//" comment marker - Return null if the comment text contains quote chars (", ', `) - Apply MatchSourceMapURL pattern ^[@#]\s*sourceMappingURL=(\S*?)\s*$ to the comment text; return the captured URL or null Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replaces the full readFile() call in Phase 1 with readSourceMappingURL(), which opens the file, seeks to EOF - 4KB, and reads only that tail. Correctness argument: the annotation must be on the last non-empty line (ECMA-426). A single source-code line contains no embedded line terminators, so if the tail contains at least one line terminator the last non-empty line starts somewhere inside the tail (after the last terminator) and extends to EOF — it is fully captured. extractSourceMappingURL receives a complete last line and produces the correct result. The only case where the tail contains no line terminator is when the file ends with one very long unbroken line — a large inline data: map whose base64 payload exceeds 4 KB. In that case we fall back to a full readFile. Based on a survey of 30 real-world inline source maps on GitHub, the median decoded map size is 0.6 KB (~0.8 KB base64), so the fallback is uncommon. All external sourceMappingURL annotations are always captured in the tail. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The previous check (LINE_TERM_RE.test(tail)) was wrong: a file whose last non-empty line is a large inline map followed by trailing empty lines would produce a tail like "<end-of-base64>\n\n" — line terminators present, but the last non-empty content is still the first split segment, potentially extending before the tail window. The correct condition: the last non-empty line is fully captured iff it is NOT the first element of tail.split(LINE_SPLIT_RE) — i.e. a line terminator that precedes it also falls within the tail (lastNonEmptyIdx > 0). Walk back from the end of the split array skipping whitespace-only segments, then check whether the last non-empty segment index is > 0. This also correctly handles the case where the large inline map line itself is followed by a single trailing newline: tail = "<base64-end>\n" gives lines = ["<base64-end>", ""], lastNonEmptyIdx = 0, tailSize < size → fallback. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Covers extractSourceMappingURL: - standard annotation, legacy //@ prefix, trailing whitespace-only lines - leading whitespace before //, all line-terminator variants - early-exit: last non-empty line with no //, non-matching //, quote chars - empty / whitespace-only content - inline data: URL Covers readSourceMappingURL tail-read logic: - small file (fits in tail): external URL, inline URL, no annotation - large file: external URL captured in tail - large inline map, no trailing newline → full-file fallback - large inline map, single trailing newline → full-file fallback - large inline map, multiple trailing empty lines → full-file fallback (regression case for the lastNonEmptyIdx === 0 bug) - large file with no annotation → undefined - empty file → undefined Also exports extractSourceMappingURL, readSourceMappingURL, and ANNOTATION_TAIL_BYTES so the tests can reference the actual constant. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-03-09T13:42:53Z

Overall package size

Self size: 2.03 MB
Deduped: 2.39 MB
No deduping: 2.39 MB

Dependency sizes

| name | version | self size | total size | |------|---------|-----------|------------| | source-map | 0.7.6 | 185.63 kB | 185.63 kB | | pprof-format | 2.2.1 | 163.06 kB | 163.06 kB | | node-gyp-build | 3.9.0 | 8.81 kB | 8.81 kB |

_{🤖 This report was automatically generated by heaviest-objects-in-the-universe}

Adds two integration tests for SourceMapper.loadDirectory(): - When a JS file's sourceMappingURL annotation points to a file that does not exist, Phase 1 silently skips it and Phase 2 loads the mapping from a conventional .map file found in the directory scan. - When the annotation points to a missing file and no .map fallback exists either, hasMappingInfo() returns false for that JS file. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

pr-commenter · 2026-03-09T13:49:00Z

Benchmarks

Benchmark execution time: 2026-03-12 14:45:45

Comparing candidate commit 6e4e74a in PR branch lazy-sourcemapper with baseline commit 37c40ef in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 93 metrics, 27 unstable metrics.

…al diff Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

IlyasShabi · 2026-03-12T10:05:05Z

ts/src/sourcemapper/sourcemapper.ts

During phase 1 you're using relative key to so store data on infoMap, while in phase 2 we have absolute key, this means that we will end up with two same entries with different keys.

IlyasShabi · 2026-03-12T10:15:55Z

ts/src/sourcemapper/sourcemapper.ts

+  // is already loaded (e.g. via a sourceMappingURL annotation).
+  let rawFile: string | undefined;
+  try {
+    rawFile = (JSON.parse(contents) as {file?: string}).file;


IMO it's better to save it as JSON and later send it to SourceMapperConsumer as JSON object to avoid internal parsing again https://github.com/mozilla/source-map/blob/f4f1a9079a66083c0262275b26d050bd5f4f2a33/lib/source-map-consumer.js#L178

IlyasShabi · 2026-03-12T10:17:57Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cce867a7c4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-12T10:25:31Z

ts/src/sourcemapper/sourcemapper.ts

+      const consumer = (await new sourceMap.SourceMapConsumer(
+        mapContent as {} as sourceMap.RawSourceMap,
+      )) as {} as sourceMap.RawSourceMap;
+      this.infoMap.set(jsPath, {mapFileDir: mapDir, mapConsumer: consumer});


Normalize JS path before storing sourceMappingURL mapping

Phase 1 stores mappings under jsPath as received from walk, which can be relative when loadDirectory()/create() is called with a relative search dir; later lookups in mappingInfo() are done against normalized runtime frame paths (typically absolute), so inline maps (and external maps loaded only via annotation) can be loaded but never found. This is a regression from the .map path flow, which resolves generated paths to absolute paths before inserting into infoMap.

Useful? React with 👍 / 👎.

- Resolve searchDir to an absolute path at the start of loadDirectory() so that all paths stored in infoMap from Phase 1 (walk-derived) are consistent with the absolute paths produced by Phase 2 (path.resolve), preventing duplicate entries with different keys for the same file. - Parse source map JSON once in processSourceMap and reuse the object when constructing SourceMapConsumer, avoiding a second internal parse. Apply the same pattern in loadMapContent. Addresses review feedback on PR #292. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

IlyasShabi

Well done 👏

szegedi and others added 6 commits March 5, 2026 08:12

Restore SourceMapper.create() as backwards-compatible delegate

d4af13f

Re-adds the static async create(searchDirs[], debug?) factory method, now implemented as a thin wrapper over new SourceMapper() + loadDirectory(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

szegedi requested review from IlyasShabi, nsavoire and r1viollet as code owners March 9, 2026 13:42

szegedi added the semver-minor Usually minor non-breaking improvements label Mar 11, 2026

Revert test-profile-serializer to use SourceMapper.create() for minim…

cce867a

…al diff Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

szegedi mentioned this pull request Mar 11, 2026

Report missing source maps in profile comments #298

Open

2 tasks

IlyasShabi reviewed Mar 12, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Mar 12, 2026

View reviewed changes

IlyasShabi approved these changes Mar 12, 2026

View reviewed changes

szegedi enabled auto-merge (squash) March 12, 2026 14:46

szegedi disabled auto-merge March 12, 2026 14:47

szegedi enabled auto-merge (squash) March 12, 2026 15:02

szegedi merged commit 6ab3e39 into main Mar 12, 2026
174 of 181 checks passed

szegedi deleted the lazy-sourcemapper branch March 12, 2026 15:07

szegedi mentioned this pull request Mar 13, 2026

v5.14.0 #295

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite SourceMapper to scan directories with sourceMappingURL support#292

Rewrite SourceMapper to scan directories with sourceMappingURL support#292
szegedi merged 9 commits intomainfrom
lazy-sourcemapper

szegedi commented Mar 9, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 9, 2026 •

edited

Loading

Uh oh!

pr-commenter bot commented Mar 9, 2026 •

edited

Loading

Uh oh!

IlyasShabi Mar 12, 2026

Uh oh!

IlyasShabi Mar 12, 2026

Uh oh!

IlyasShabi commented Mar 12, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 12, 2026

Uh oh!

IlyasShabi left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

szegedi commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Phase 1 — sourceMappingURL annotation (higher priority)

Phase 2 — .map file fallback

Tail-read optimisation

Test plan

Uh oh!

github-actions bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overall package size

Uh oh!

pr-commenter bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

Uh oh!

IlyasShabi Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

IlyasShabi Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

IlyasShabi commented Mar 12, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

IlyasShabi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

szegedi commented Mar 9, 2026 •

edited

Loading

Phase 1 — `sourceMappingURL` annotation (higher priority)

Phase 2 — `.map` file fallback

github-actions bot commented Mar 9, 2026 •

edited

Loading

pr-commenter bot commented Mar 9, 2026 •

edited

Loading