Skip to content

refactor(sexp): table-drive ANSI-C escape dispatch#67

Merged
mpecan merged 1 commit into
mainfrom
feat/56-ansi-c-dispatch
Apr 17, 2026
Merged

refactor(sexp): table-drive ANSI-C escape dispatch#67
mpecan merged 1 commit into
mainfrom
feat/56-ansi-c-dispatch

Conversation

@mpecan
Copy link
Copy Markdown
Owner

@mpecan mpecan commented Apr 17, 2026

Summary

Refactors process_ansi_c_content in src/sexp/ansi_c.rs. The old function was ~140 lines with a giant inline match on the escape character and duplicated logic across \x, \u, \U, and octal branches (each independently read digits, checked for literal-fallback, checked for NUL-truncation, and re-derived the 0x01/0x7F CTLESC prefix).

The new shape:

  • process_ansi_c_content — 20-line loop over chars/pos, dispatches each \X to handle_escape.
  • handle_escape — dispatcher returning bool (true on NUL truncation).
  • simple_escapeconst fn lookup table for the 1-char escapes (\n, \t, \r, \a, \b, \f, \v, \e/\E, \\, \").
  • handle_hex\xNN.
  • handle_unicode\uNNNN and \UNNNNNNNN consolidated via a width parameter.
  • handle_octal — octal escapes up to 3 digits.
  • handle_control\cX.
  • push_with_ctlesc — centralises the 0x01/0x7F CTLESC prefix logic previously duplicated in the hex and octal paths.
  • push_escaped_quote — inlines the 4-character output for \', removing the unnecessary recursive process_ansi_c_continue helper (the recursion was semantically equivalent to letting the outer loop continue).

Drops #[allow(clippy::too_many_lines)].

Test plan

  • cargo fmt
  • cargo clippy --all-targets -- -D warnings — no warnings
  • cargo test — 252 passed
  • Oracle suite — 12 oracle_* tests pass (cargo test --test integration oracle_)
  • ANSI-C targeted: tests/parable/24_ansi_c_quoting.tests (325 lines), tests/oracle/ansi_c_escapes.tests (128 lines), tests/oracle/ansi_c_processing.tests (113 lines) — all green
  • Decoding is byte-identical (constitutional: compatibility is correctness)

Stack

Part of the v0.2.0 refactoring cycle (#61). This is PR 6 of 10.

Closes #56

🤖 Generated with Claude Code

process_ansi_c_content shrinks from ~140 lines to a 20-line loop that
delegates to per-escape-kind helpers:

- simple_escape(esc) -> Option<char>: const table of 1-char escapes
  (n/t/r/a/b/f/v/e/E/backslash/doublequote)
- handle_escape: dispatcher returning bool (true on NUL truncation)
- handle_hex: \xNN (up to 2 hex digits, 0x80+ -> U+FFFD, 0x01/0x7F CTLESC)
- handle_unicode: \uNNNN (width=4) and \UNNNNNNNN (width=8) share one helper
- handle_octal: octal escapes followed by up to 2 octal digits, CTLESC
  for 0x01/0x7F
- handle_control: \cX -> chr(X & 0x1F); \c@ silently dropped
- push_with_ctlesc: centralises 0x01/0x7F prefix logic (previously duped
  across hex and octal branches)
- push_escaped_quote: inlines the 4-char output for escaped single quote;
  removes the unnecessary recursive process_ansi_c_continue (the
  recursion was semantically equivalent to letting the outer loop
  continue)

Drops the #[allow(clippy::too_many_lines)] attribute. Decoding is
byte-identical: all parable corpus tests (including the 325-line
24_ansi_c_quoting.tests) and both ansi_c oracle test files pass.

Part of #61 (v0.2.0 cycle).

Closes #56

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mpecan mpecan merged commit 91a5267 into main Apr 17, 2026
5 checks passed
@mpecan mpecan deleted the feat/56-ansi-c-dispatch branch April 17, 2026 13:19
mpecan pushed a commit that referenced this pull request Apr 19, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.2.0](rable-v0.1.15...rable-v0.2.0)
(2026-04-18)


### ⚠ BREAKING CHANGES

* tighten lexer API surface and relocate WordSpan to ast
([#70](#70))

### Bug Fixes

* **format:** align cmdsub reformatter with bash canonical form
([#49](#49))
([c7a4411](c7a4411))
* **lexer:** accept sloppy heredoc terminator in cmdsub mode
([#50](#50))
([40f394f](40f394f))
* **lexer:** backticks opaque when content is invalid
([#71](#71))
([e72166f](e72166f)),
closes [#38](#38)
* **lexer:** disable reserved-word recognition after assignment words
([#44](#44))
([42e1fc0](42e1fc0))
* **lexer:** stop treating ]] and unbalanced [...] as special outside
conditionals ([#45](#45))
([4bf5a5c](4bf5a5c))
* **parser:** fall back from (( … )) arith to nested subshells
([#48](#48))
([1437f00](1437f00))


### Code Refactoring

* **format:** introduce Formatter struct
([#65](#65))
([d965a8f](d965a8f))
* **lexer:** drop Result&lt;Token&gt; wrapper from operator readers
([#62](#62))
([d52a841](d52a841))
* **lexer:** split read_word_token into classify + advance + dispatch
helpers ([#63](#63))
([3ba09f5](3ba09f5))
* **parser:** extract fill_heredoc_contents visitor helpers
([#68](#68))
([40e6165](40e6165))
* **parser:** extract helpers from three oversize parsers
([#69](#69))
([25d0762](25d0762))
* **sexp:** dispatch NodeKind Display to per-category helpers
([#66](#66))
([44b0330](44b0330))
* **sexp:** table-drive ANSI-C escape dispatch
([#67](#67))
([91a5267](91a5267))
* tighten lexer API surface and relocate WordSpan to ast
([#70](#70))
([5171d01](5171d01))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: repository-butler[bot] <166800726+repository-butler[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PR 6: sexp/ansi_c.rs — table-drive escape dispatch

1 participant