Skip to content

Add --source-map CLI option for source-map generation#62

Draft
elv3rs wants to merge 2 commits into
aradi:mainfrom
elv3rs:fypp/01-source-map
Draft

Add --source-map CLI option for source-map generation#62
elv3rs wants to merge 2 commits into
aradi:mainfrom
elv3rs:fypp/01-source-map

Conversation

@elv3rs
Copy link
Copy Markdown

@elv3rs elv3rs commented Mar 2, 2026

This PR (coauthored by Claude) adds source-map support to fypp, enabling downstream tools to map byte
ranges in expanded Fortran output back to the corresponding byte ranges in the
original .fypp template files. A new --source-map FILE CLI option instructs
fypp to write a JSON file alongside the normal output, describing how every
region of the generated code relates to the source template.

The implementation introduces a SourceMapRenderer class (subclass of
Renderer), which tracks and records output byte offsets during rendering.

All changes are purely additive. When --source-map is not passed (or
FyppOptions.source_map is None), the existing Renderer is used and
behaviour is identical to the current release.

Diff: 3 files changed: src/fypp.py: +205/−5,
test/test_source_map.py: +468, README.rst: +14.

Motivation

Linters such as fortitude, cannot natively understand fypp directives and therefore need to preprocess the file.
Without a mapping from the expanded version back to the original source template features such as
go to definition and auto-fixes cannot work reliably.

By adding an option to output a json map, the expanded output can be mapped back to the
the original file and position.

Changes

CLI / Options

  • New --source-map FILE option added to the optparse-based argument parser.
  • New FyppOptions.source_map attribute (default: None).

Parser

  • Added _cur_char_span instance variable that records (char_start, char_end) character offsets for each parsed region, set in _parse()
    before every _process_text and directive handler call.
  • _parse_txt() now populates per-file lookup tables inline:
    • _file_contents — raw text of each parsed file (for content-based fixups).
    • _file_char_to_byte — character offset → byte offset tables.
    • _file_line_to_char — line number → character offset tables.

Builder

  • _parser_ref field added to Builder; handle_eval and handle_text now
    append getattr(self._parser_ref, '_cur_char_span', None) to node tuples
    so the renderer can access sub-line byte ranges.

Renderer base class

  • Added _on_txt(node) hook (no-op in base class) called before appending
    text output, allowing subclasses to intercept verbatim text nodes.
  • _get_eval() signature extended with optional char_span parameter;
    _render() now passes *node[1:5] (was *node[1:4]) to forward the char span.

New SourceMapRenderer class

Method Purpose
render() Resets state, calls parent, then runs fixup and merge passes
_on_txt() Records a verbatim mapping for each text node
_span_to_byte_range() Converts (start_line, end_line) span (with optional char_span) to (src_byte_start, src_byte_end)
_record() Records a mapping entry (verbatim, expanded, or generated)
_get_eval() Source-map aware eval; records expanded mapping
_get_called_content() Source-map aware macro call; records expanded mapping
_get_muted_content() Saves/restores map state so muted content doesn't corrupt offsets
_fixup_out_byte_offsets() Post-render pass correcting output byte offsets via content search
_build_insertion_patterns() Builds regex patterns for fold/linenum insertions
_find_with_insertions() Matches verbatim content allowing generated insertions
_merge_adjacent_verbatim() Coalesces consecutive verbatim entries with contiguous ranges
get_source_map() Returns the final { version, source_file, mappings } dict

The _linenumdir method is wrapped in __init__ to automatically record
generated content for line-number directives without modifying the base class.

Fypp orchestrator changes

  • When source_map is set, creates a SourceMapRenderer instead of the
    default Renderer.
  • process_file(): writes the JSON source map after processing.
  • New process_text_with_map(): returns (output, source_map_dict or None).

Source Map JSON Format

{
  "version": 1,
  "source_file": "example.fypp",
  "mappings": [
    {
      "kind": "verbatim",
      "out_byte_start": 0,
      "out_byte_end": 42,
      "src_file": "example.fypp",
      "src_byte_start": 0,
      "src_byte_end": 42
    },
    {
      "kind": "expanded",
      "out_byte_start": 42,
      "out_byte_end": 70,
      "src_file": "example.fypp",
      "src_byte_start": 42,
      "src_byte_end": 85
    },
    {
      "kind": "generated",
      "out_byte_start": 70,
      "out_byte_end": 95
    }
  ]
}

Mapping kinds

Kind Meaning Fields
verbatim Output bytes are an exact 1:1 copy of the source bytes out_byte_start/end, src_file, src_byte_start/end
expanded Output bytes were produced by evaluating/calling a source region (e.g. ${expr}$, macro call) out_byte_start/end, src_file, src_byte_start/end
generated Output bytes have no corresponding source (e.g. #line directives inserted by fypp) out_byte_start/end

Testing

  • 50 unit tests in test/test_source_map.py covering:
    • Basic mapping kinds (inline eval, line eval, generated, if/for/nested-if)
    • Include file mapping (verifies multi-file source tracking)
    • Edge cases (no directives, multiline continuation, verbatim byte accuracy,
      escape sequences, continuation markers, hash-in-verbatim with folding)
    • Unicode byte accuracy
    • Line folding with and without line numbering
    • Muted content with line numbering
    • Macro calls
    • API tests (process_text, process_text_with_map, CLI file output)
    • JSON format validation (version, field presence, no overlaps, continuous
      coverage including folded and fold+linenum cases)
    • Stress edge cases: empty lines between directives, backslash in verbatim,
      eval producing #line-like output, very short fold length (7), CRLF,
      tabs near fold points, empty eval before verbatim, nested include chains,
      multiple mute blocks, #:call/#:endcall block syntax

API Impact

Fypp.process_text_with_map(txt)

New method that returns (output_str, source_map_dict) when
FyppOptions.source_map is set, or (output_str, None) otherwise.

Fypp.process_text(txt)

Unchanged — always returns a plain str.

Fypp.process_file(infile, outfile)

Unchanged return value. When source_map is set, the JSON map is written
to the specified path as a side-effect.

Breaking Changes

None. All changes are additive and gated behind the source_map option:

  • Default value of FyppOptions.source_map is None.
  • When source_map is None, the standard Renderer is used and the output
    is byte-for-byte identical to the current release.

Checklist

  • --source-map FILE CLI option added and documented in --help / README
  • FyppOptions.source_map attribute with None default
  • SourceMapRenderer class with verbatim/expanded/generated mapping kinds
  • Parser extended with _cur_char_span and per-file lookup tables
  • Builder propagates char spans in tree nodes
  • process_file writes JSON source map when option is set
  • process_text_with_map returns (output, map) tuple
  • Muted content does not corrupt source map byte offsets
  • Output byte offsets are correct after line folding
  • Adjacent verbatim entries are merged for compact output
  • No behavioural change when --source-map is not used
  • 50 tests cover mapping kinds, round-trips, API, and edge cases

elv3rs added 2 commits March 2, 2026 12:45
Introduce SourceMapRenderer, a subclass of Renderer that records byte-level
mappings between preprocessor output and original source files. The mapping
tracks verbatim (copied) and expanded (macro/eval) regions, supports line
folding with continuation markers, and handles #line directive insertions.

Key additions:
- SourceMapRenderer class with post-render fixup for folded/line-numbered output
- --source-map CLI option to write JSON mapping files
- Fypp.process_text_with_map() API for programmatic access
- Parser augmented to track per-file char-to-byte and line-to-char tables
50 tests covering:
- Mapping kinds (verbatim, expanded, generated, mixed)
- Field completeness and source map structure
- Include file handling
- Edge cases: empty input, whitespace, line folding, line numbering,
  folding+linenums combined, special characters, escape sequences,
  continuous coverage, hash-in-verbatim, direct calls, muted regions
- CLI integration (--source-map flag)
- Public API (process_text_with_map)
@aradi aradi marked this pull request as draft March 3, 2026 10:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant