Skip to content

Lutaml integration#105

Open
andrew2net wants to merge 102 commits into
mainfrom
lutaml-integration
Open

Lutaml integration#105
andrew2net wants to merge 102 commits into
mainfrom
lutaml-integration

Conversation

@andrew2net

Copy link
Copy Markdown
Contributor

No description provided.

when :content then "tableOfContents"
else note[0].to_s
end
next mem if type == "howpublished" && note[1].to_s.match?(/^\\publisher\{.+\},\\url\{.+\}$/)

Check failure

Code scanning / CodeQL

Polynomial regular expression used on uncontrolled data High

This
regular expression
that depends on a
library input
may run slow on strings starting with '\publisher{a},\url{' and with many repetitions of 'a},\url{'.
andrew2net and others added 27 commits March 13, 2026 17:06
…ify date parsing in hash_to_bib method

fixes relaton/relaton#143
Integrate updates to bibliographic models and converters, including new attributes for formatted references and abstracts, and adjustments to versioning in YAML and XML fixtures.
* Update lutaml-model gem version to 0.8.0 and update dependencies; refactor XML and YAML fixtures for consistency

* Refactor Ext class to include schema_version method; update XML and YAML fixtures to omit schema-version attribute

* Fix XML root mapping for Docidentifier class

* Refactor bibliographic item classes to share attributes and XML mappings; introduce ItemShared module for cleaner code organization

* Refactor Contributor class by removing commented-out code and simplifying entity handling; retain import of ContributionInfo attributes

* Fix XML root name extraction in Item class to use Nokogiri for proper class dispatch

* Refactor LocalizedMarkedUpString class by simplifying content attribute handling and removing unused methods

* Reintroduce BibitemShared and BibdataShared mixins for flavor gems

Flavor gems (relaton-iso et al.) subclass Bib::Item to add
flavor-specific docidentifier/relation/ext overrides and still need to
emit <bibitem> (no ext) and <bibdata> (no id) serializations. The
5d8d02f refactor removed the old BibitemShared/BibdataShared mixins in
favor of ItemShared lambdas consumed directly by Bib::Bibitem and
Bib::Bibdata, which broke every flavor gem that used the old
include-based API.

Bring the two mixin constants back as thin included hooks that set the
XML root and prune the one attribute that does not belong on that root,
so flavor gems can go back to a one-line `include Bib::BibitemShared` /
`include Bib::BibdataShared` with no lutaml-model mapping surgery at
each call site.

* Refactor Bibdata and Bibitem classes to inherit from Item and include shared mixins for cleaner code organization

* Add render_default option to schema-version mapping in Item class XML serialization

* Add key_value mapping for schema_version and other attributes in Ext class

* Refactor organization name handling in RFCXML converters to support array content

* Add PlainDate type and update revision_date attribute in Version class

* Remove GitHub gem sources for lutaml-model and rfcxml; both are on RubyGems now
)

The version element no longer carries nested <revision-date> and <draft>
children. It is now a simple text element with an optional `type`
attribute. The Version model reads both the new shape and the legacy
shape (for both XML and YAML), folding legacy values into `content`
("draft (revision-date)" when both are present), and always emits the
new shape on output. This keeps in-production v2 datasets readable while
new datasets use the new shape natively.

HashParserV1 (v1 -> v2 migration) is updated to produce the new shape
directly. Local biblio.rng is synced from metanorma-model-iso.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Escape and <p>-wrap abstract content in BibXML converter

The BibXML <t> abstract paragraphs may contain entities like
&lt;mailto:x&gt; that the rfcxml-model parser decodes back into bare
< / > characters. When the resulting string was stored verbatim in
Abstract.content (raw: true) and later round-tripped through YAML to
XML, lutaml-model attempted to parse the bare brackets as XML markup
and crashed (Moxml::ParseError).

FromRfcxml#abstract now CGI.escapeHTML the joined <t> text and wraps
each paragraph in <p>...</p>, matching the format already used for
RFC index entries. ToRfcxml#create_abstract is updated as the inverse
to keep BibXML round-trips lossless: it unwraps <p> and unescapes
entities when emitting <t>.

Fixes the crash reported as relaton-ietf#128 on
"IETF I-D.draft-abarth-cake-01".

* Update Ruby version to 3.2 in integration tests workflow
…serves <text> (#113)

Under lutaml-model 0.8, child elements parsed from XML and re-serialized
via a parent collection skip attributes whose backing ivar holds no
explicit value (the user-defined `#text` accessor is not consulted in
that path). Push the Isoics fallback through the public `text=` writer
when `code` is assigned, and refuse the post-parse `using_default_for`
mark, so the description is emitted on subsequent `to_xml`.

Adds spec coverage for the XML round-trip behaviour: `ICS.new(code:)`,
`ICS.from_xml`, `Ext.from_xml` nesting an ICS with no `<text>`, and the
explicit-text-wins case.

Closes #112.
Strip inline markup outside the basicdoc PureTextElement whitelist
(plus <p>, <eref>, <xref>) from raw marked-up content. Disallowed
elements are unwrapped — tags removed, inner text kept. <italic> is
renamed to <em> to preserve emphasis when ingesting JATS-style sources.

Sanitization runs at the content= setter via a prepended module so
parse-time, initializer, and programmatic assignment all share one
chokepoint.

issue relaton/relaton-doi#21

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…#115)

The Sanitizer's ALLOWED set was strict basicdoc PureTextElement plus
<p>, <eref>, <xref>. <fn> was not in the list, so the content= setter
on LocalizedMarkedUpString (used by Title, Note, Abstract, and other
text-bearing fields) unwrapped <fn> at assignment time and kept only
its inner <p> body. Downstream consumers — relaton-render and isodoc —
were left with an orphan <p> they could not place, producing a visible
regression in formattedref rendering of ISO-style titles with
footnotes (e.g. isodoc spec/isodoc/footnotes_spec.rb:4).

Adds <fn> to ALLOWED. <fn> is strictly speaking not in basicdoc
PureTextElement, but it is a legitimate child of <title> in real
Metanorma bibliographic input (ISO disclaimer footnotes), is in
relaton-render's own inline-tag allow-list, and the existing "plus
<p>, <eref>, <xref>" carve-out already concedes that the Sanitizer
needs to be a touch broader than strict PureTextElement to handle
real bibliographic content.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants