Lutaml integration#105
Open
andrew2net wants to merge 102 commits into
Open
Conversation
| when :content then "tableOfContents" | ||
| else note[0].to_s | ||
| end | ||
| next mem if type == "howpublished" && note[1].to_s.match?(/^\\publisher\{.+\},\\url\{.+\}$/) |
Check failure
Code scanning / CodeQL
Polynomial regular expression used on uncontrolled data High
…ify date parsing in hash_to_bib method fixes relaton/relaton#143
Integrate updates to bibliographic models and converters, including new attributes for formatted references and abstracts, and adjustments to versioning in YAML and XML fixtures.
… fixtures issue relaton/relaton#146
…AML, and related specs
* Update lutaml-model gem version to 0.8.0 and update dependencies; refactor XML and YAML fixtures for consistency * Refactor Ext class to include schema_version method; update XML and YAML fixtures to omit schema-version attribute * Fix XML root mapping for Docidentifier class * Refactor bibliographic item classes to share attributes and XML mappings; introduce ItemShared module for cleaner code organization * Refactor Contributor class by removing commented-out code and simplifying entity handling; retain import of ContributionInfo attributes * Fix XML root name extraction in Item class to use Nokogiri for proper class dispatch * Refactor LocalizedMarkedUpString class by simplifying content attribute handling and removing unused methods * Reintroduce BibitemShared and BibdataShared mixins for flavor gems Flavor gems (relaton-iso et al.) subclass Bib::Item to add flavor-specific docidentifier/relation/ext overrides and still need to emit <bibitem> (no ext) and <bibdata> (no id) serializations. The 5d8d02f refactor removed the old BibitemShared/BibdataShared mixins in favor of ItemShared lambdas consumed directly by Bib::Bibitem and Bib::Bibdata, which broke every flavor gem that used the old include-based API. Bring the two mixin constants back as thin included hooks that set the XML root and prune the one attribute that does not belong on that root, so flavor gems can go back to a one-line `include Bib::BibitemShared` / `include Bib::BibdataShared` with no lutaml-model mapping surgery at each call site. * Refactor Bibdata and Bibitem classes to inherit from Item and include shared mixins for cleaner code organization * Add render_default option to schema-version mapping in Item class XML serialization * Add key_value mapping for schema_version and other attributes in Ext class * Refactor organization name handling in RFCXML converters to support array content * Add PlainDate type and update revision_date attribute in Version class * Remove GitHub gem sources for lutaml-model and rfcxml; both are on RubyGems now
) The version element no longer carries nested <revision-date> and <draft> children. It is now a simple text element with an optional `type` attribute. The Version model reads both the new shape and the legacy shape (for both XML and YAML), folding legacy values into `content` ("draft (revision-date)" when both are present), and always emits the new shape on output. This keeps in-production v2 datasets readable while new datasets use the new shape natively. HashParserV1 (v1 -> v2 migration) is updated to produce the new shape directly. Local biblio.rng is synced from metanorma-model-iso. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Escape and <p>-wrap abstract content in BibXML converter The BibXML <t> abstract paragraphs may contain entities like <mailto:x> that the rfcxml-model parser decodes back into bare < / > characters. When the resulting string was stored verbatim in Abstract.content (raw: true) and later round-tripped through YAML to XML, lutaml-model attempted to parse the bare brackets as XML markup and crashed (Moxml::ParseError). FromRfcxml#abstract now CGI.escapeHTML the joined <t> text and wraps each paragraph in <p>...</p>, matching the format already used for RFC index entries. ToRfcxml#create_abstract is updated as the inverse to keep BibXML round-trips lossless: it unwraps <p> and unescapes entities when emitting <t>. Fixes the crash reported as relaton-ietf#128 on "IETF I-D.draft-abarth-cake-01". * Update Ruby version to 3.2 in integration tests workflow
…serves <text> (#113) Under lutaml-model 0.8, child elements parsed from XML and re-serialized via a parent collection skip attributes whose backing ivar holds no explicit value (the user-defined `#text` accessor is not consulted in that path). Push the Isoics fallback through the public `text=` writer when `code` is assigned, and refuse the post-parse `using_default_for` mark, so the description is emitted on subsequent `to_xml`. Adds spec coverage for the XML round-trip behaviour: `ICS.new(code:)`, `ICS.from_xml`, `Ext.from_xml` nesting an ICS with no `<text>`, and the explicit-text-wins case. Closes #112.
Strip inline markup outside the basicdoc PureTextElement whitelist (plus <p>, <eref>, <xref>) from raw marked-up content. Disallowed elements are unwrapped — tags removed, inner text kept. <italic> is renamed to <em> to preserve emphasis when ingesting JATS-style sources. Sanitization runs at the content= setter via a prepended module so parse-time, initializer, and programmatic assignment all share one chokepoint. issue relaton/relaton-doi#21 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…#115) The Sanitizer's ALLOWED set was strict basicdoc PureTextElement plus <p>, <eref>, <xref>. <fn> was not in the list, so the content= setter on LocalizedMarkedUpString (used by Title, Note, Abstract, and other text-bearing fields) unwrapped <fn> at assignment time and kept only its inner <p> body. Downstream consumers — relaton-render and isodoc — were left with an orphan <p> they could not place, producing a visible regression in formattedref rendering of ISO-style titles with footnotes (e.g. isodoc spec/isodoc/footnotes_spec.rb:4). Adds <fn> to ALLOWED. <fn> is strictly speaking not in basicdoc PureTextElement, but it is a legitimate child of <title> in real Metanorma bibliographic input (ISO disclaimer footnotes), is in relaton-render's own inline-tag allow-list, and the existing "plus <p>, <eref>, <xref>" carve-out already concedes that the Sanitizer needs to be a touch broader than strict PureTextElement to handle real bibliographic content.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.