Skip to content

feat: Add detection of marginal numbers for PDFs#1006

Open
jcs-zfc wants to merge 1 commit intodatalab-to:masterfrom
jcs-zfc:handleMarginals
Open

feat: Add detection of marginal numbers for PDFs#1006
jcs-zfc wants to merge 1 commit intodatalab-to:masterfrom
jcs-zfc:handleMarginals

Conversation

@jcs-zfc
Copy link

@jcs-zfc jcs-zfc commented Mar 4, 2026

Context
Marginal numbers (Randnummern) are essential in many document types, such as German legal texts. They are critical for citation but disrupt the natural text flow if treated as standard body text.

Solution

  • Introduced the --marginal CLI parameter.
  • This feature is strictly opt-in. Marginalia detection is heuristic. Without the flag, the behavior remains unchanged.
  • Marginal numbers are rendered as <aside>NR</aside> to preserve both citation accuracy and readability.
  • Implemented MarginalProcessor and a dedicated marginal block scheme.
  • Updated README.md (list of CLI parameters).

Example Reference
Commonly found e.g. in decisions of the German Federal Court of Justice (BGH):
https://www.bundesgerichtshof.de/SiteGlobals/Forms/Suche/EntscheidungssucheBGH_Formular.html

(Note: This is the last of three independent PRs to improve marker.)

- Marginal numbers are common e.g. in German legal texts (Randnummern). They are needed for correct citation, but they disturb the text flow when not put aside.
- Add --marginal CLI parameter. (Without that parameter nothing changes. Detecting marginals is pretty much guesswork, so it gets capsuled.)
- Render as <aside>#</aside>.
- Implement MarginalProcessor and marginal block scheme.
- Update Readme.md.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant