✨ Feature / SafeXml — single hardened xmerl_scan entry point#29
Open
docJerem wants to merge 2 commits into
Open
✨ Feature / SafeXml — single hardened xmerl_scan entry point#29docJerem wants to merge 2 commits into
docJerem wants to merge 2 commits into
Conversation
Introduce a single hardening surface for parsing untrusted SAML/XML. `SafeXml.scan/1` and `scan!/1` are now the only call sites of `:xmerl_scan.string` in `lib/`; they always pass `allow_entities: false`, `namespace_conformant: true`, and `quiet: true`. Callers cannot weaken these options. The binary→charlist conversion is centralised in `SafeXml.scan/1` too, which removes the per-site duplication that #22 (UTF-8 fix) had to patch in four files — future encoding changes are now a one-file diff. Replace the four ad-hoc `:xmerl_scan.string(...)` invocations: * `ExSaml.Core.Binding.decode_response/2` (DEFLATE + non-DEFLATE paths) * `ExSaml.Core.Sp.decrypt_assertion/2` * `ExSaml.Metadata.parse/1` Behaviour preserved: * `Binding` / `Sp` callers still receive a raise on malformed input (via `scan!/1`) — the surrounding `extract_assertion/3` and friends keep their existing `rescue`/`catch` wrappers. * `Metadata.parse/1` still returns `{:error, invalid_xml_violation()}` on any parse failure or non-element root. Behaviour change worth noting: * `Binding.decode_response/2` now suppresses xmerl warnings on malformed payloads (`quiet: true`). Previously they were logged. Test coverage: * New `ExSaml.Core.Xml.SafeXmlTest` exercises well-formed XML, binary vs. charlist parity, UTF-8 content, malformed input, empty input, XXE (external entity), internal entity expansion, and the raising variant. Sets up steps 2–4 of the plan discussed on #15 (AST-based check, config invariants for app-side concerns, CI wiring of `mix security.check_release`).
4 tasks
Drop the raising variant. `SafeXml.scan/1` is now the single entry point
and always returns `{:ok, xml()} | {:error, :invalid_xml}` — more
composable in `with`/`case` chains, safer to plug into pipelines.
Cascade the tuple through the callers so the error stays a value end to
end (no rescue at site, no hidden raise to catch):
* `Binding.decode_response/2` public spec is now
`{:ok, xml()} | {:error, :invalid_xml}` (was `xml()`).
* `Sp.decrypt_assertion/2` now returns
`{:ok, xml()} | {:error, :invalid_xml | :decrypt_failed}`;
`extract_assertion/3` consumes it via `with`.
* `Helper.decode_saml_payload/2` replaces its `rescue` with a `case`
on the tuple; the error term changes from a stringified
`inspect(error)` to the atom reason (`{:invalid_response, reason}`)
— no caller pattern-matches on the previous shape.
Add a configurable error hook for SIEM / OCSF pipelines:
config :ex_saml, :safe_xml,
on_error: &MyApp.SIEM.report_xml_error/1
The arity-1 handler receives a context map
`%{reason: :invalid_xml, kind: :error | :exit | :throw, payload: …}`.
Any exception from the handler is swallowed — the handler can never
mask the original parse failure or alter control flow.
Tests updated:
* `SafeXmlTest` drops the `scan!/1` cases, adds five `:on_error`
cases (called once per failure, not called on success, swallows
handler errors, no-op when unconfigured, no-op when value is not
an arity-1 function).
* `BindingTest` patterns the `{:ok, _}` shape across the 7 call
sites (roundtrip + UTF-8 regression suite).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
New module
ExSaml.Core.Xml.SafeXml— single entry point for parsing untrusted SAML/XML via:xmerl_scan. Always forcesallow_entities: false,namespace_conformant: true,quiet: true; callers cannot weaken these options.Tuple-only API.
SafeXml.scan/1returns{:ok, xml()} | {:error, :invalid_xml}— no raising variant. Errors are values, composable inwith/casepipelines.Tuple cascade through callers.
Binding.decode_response/2(public),Sp.decrypt_assertion/2(private) andHelper.decode_saml_payload/2(private) all propagate the tuple. The previousrescueinHelperis replaced by acase; the error reason is now an atom ({:invalid_response, :invalid_xml}) instead of a stringifiedinspect(error).Configurable error hook for SIEM / OCSF pipelines:
The arity-1 handler receives
%{reason: :invalid_xml, kind: :error | :exit | :throw, payload: …}. Any exception from the handler is silently caught — it can never mask the original parse failure or alter control flow.De-duplication of the four pre-existing
:xmerl_scan.stringcall sites (Metadata.parse/1,Binding.decode_response/2× 2,Sp.decrypt_assertion/2). The binary→charlist conversion (the source of the 🐛 Fix / UTF-8 characters in SAMLResponse rejected by xmerl_scan #22 UTF-8 fix that had to patch four files) is centralised inSafeXml.scan/1— future encoding changes are now a one-file diff.Why this PR
This is PR 1 of #32 (SAML XML parsing surface hardening). The umbrella issue lays out the full four-PR plan; the short version is that the original
mix security.check_releasetask drafted in #15 relied on regex source-scanning, which was silently contournable. The root cause was upstream::xmerl_scanwas called in four different files with slightly different options. With a single hardened call site, a future audit check (PR 2 of #32) becomes incontournable: any reference to:xmerl_scan.*outsidesafe_xml.exis a violation.Behaviour notes
Metadata.parse/1still returns{:error, invalid_xml_violation()}on any parse failure or non-element root.Binding.decode_response/2return shape goes fromxml()to{:ok, xml()} | {:error, :invalid_xml}. Public spec change. Single intra-lib caller (Helper.decode_saml_payload/2) updated; tests updated.Helper.decode_saml_payload/2error shape{:invalid_response, …}now carries an atom reason instead of an interpolatedinspect(error)string. No caller pattern-matches on it (verified viagrep).Binding.decode_response/2now passesquiet: trueto:xmerl_scan(onlyMetadata.parse/1did before). Malformed-payload xmerl warning logs are now silent; error semantics unchanged.Test plan
mix compile --warnings-as-errorscleanmix format --check-formattedcleanmix credo --strict— no issuesmix sobelow --config --skip— scan complete, no findingsmix test— 211 tests, 0 failures, including:SafeXml.scan/1cases (well-formed, binary/charlist parity, UTF-8, malformed, empty, XXE external entity, internal entity):on_errorhook cases (called once per failure, not called on success, swallows handler errors, no-op when unconfigured, no-op when value is not an arity-1 function)Binding.decode_response/2callers inbinding_test.exs(deflate + non-deflate roundtrip + full UTF-8 regression matrix)grep -rn ':xmerl_scan.string' lib/— only invocation lives atsafe_xml.ex:59grep -rn 'scan!' lib/ test/— no remaining callers (no raising API surface)