Summary
The SARIF schema-validation integration suite in tests/sarif_test.rs
never feeds a path that exercises path_to_uri_reference's encoding
branches (spaces, backslashes, non-ASCII, colons), so it cannot catch a
regression that makes the writer emit an artifactLocation.uri the
vendored SARIF schema's uri-reference format constraint rejects —
the exact failure that encoding exists to prevent.
Location
tests/sarif_test.rs:124-207 (the five *_validates_against_schema
tests + the canary), all using clean ASCII paths such as
src/foo.rs, a.rs, src/alpha.rs.
tests/sarif_test.rs:116-122 (the section doc claiming these tests
catch "a wrong-typed value, an extra property the schema rejects").
- Guards production code at
src/output/sarif.rs:58-95
(path_to_uri_reference), whose doc (lines 48-52) states it
percent-encodes "so spaces and other path characters survive
validation" under "the uri-reference format ... GitHub Code
Scanning uses."
Evidence
The vendored schema constrains the URI field with a real assertion the
validator enforces:
tests/fixtures/sarif-2.1.0.json:211-214
"uri": { ... "format": "uri-reference" }
jsonschema::draft7::new(&schema) (tests/common/validators.rs:55)
validates format for Draft-07 by default (the crate's format
validation is draft-dependent; Draft-07 treats format as an
assertion). A raw space is not a legal uri-reference character, so a
document whose uri is src/my file.rs (unencoded) would fail
validate_sarif.
Production deliberately percent-encodes to avoid exactly this
(src/output/sarif.rs:84-92). But every offender fed to the
schema-validation tests has a path that is already a clean
uri-reference requiring no transformation:
sarif_single_offender_validates_against_schema → src/foo.rs
sarif_multi_offender_validates_against_schema → src/alpha.rs, src/beta.rs
sarif_error_severity_validates_against_schema → src/foo.rs
sarif_omitted_optional_fields_validates_against_schema → a.rs
The encoding branch is asserted only at the unit level in
src/output/sarif.rs:634 (space_is_percent_encoded), which checks the
string output of path_to_uri_reference directly but never runs the
result through schema validation. So the one place that proves
schema-conformance (the integration suite) and the one place that
exercises encoding (the unit test) never meet.
Concretely: if path_to_uri_reference regressed to stop
percent-encoding spaces (or emitted a malformed escape), the unit test
space_is_percent_encoded would catch the exact-string break, but the
integration schema suite — the test that exists to certify
GitHub-Code-Scanning-ingestible output — would stay green, because no
fixture carries a character that needs encoding.
Expected Behavior
At least one *_validates_against_schema test should use an offender
path that forces a non-trivial path_to_uri_reference transformation
(e.g. a path with an embedded space, and ideally a Windows-style
backslash path), so the suite proves the encoded URI still satisfies
the schema's uri-reference format — not merely that an
already-clean path does.
Actual Behavior
All schema-validation fixtures use paths that pass through
path_to_uri_reference unchanged, so the suite would still pass even if
the encoding logic were removed entirely.
Impact
A regression in path_to_uri_reference that produces schema-invalid
uri-reference values for real-world paths (spaces are common in
fixture/temp paths; backslashes on Windows runners) would ship
undetected by the integration suite whose stated purpose is to certify
schema conformance. This is the lib-side analogue of the SARIF
URI-correctness concerns already tracked in #798 (colon scheme
ambiguity), but it is a distinct test-quality gap, not a duplicate:
#798 is a production bug in the colon branch; this is the absence of any
schema-validation fixture that exercises any encoding branch.
Summary
The SARIF schema-validation integration suite in
tests/sarif_test.rsnever feeds a path that exercises
path_to_uri_reference's encodingbranches (spaces, backslashes, non-ASCII, colons), so it cannot catch a
regression that makes the writer emit an
artifactLocation.urithevendored SARIF schema's
uri-referenceformatconstraint rejects —the exact failure that encoding exists to prevent.
Location
tests/sarif_test.rs:124-207(the five*_validates_against_schematests + the canary), all using clean ASCII paths such as
src/foo.rs,a.rs,src/alpha.rs.tests/sarif_test.rs:116-122(the section doc claiming these testscatch "a wrong-typed value, an extra property the schema rejects").
src/output/sarif.rs:58-95(
path_to_uri_reference), whose doc (lines 48-52) states itpercent-encodes "so spaces and other path characters survive
validation" under "the
uri-referenceformat ... GitHub CodeScanning uses."
Evidence
The vendored schema constrains the URI field with a real assertion the
validator enforces:
jsonschema::draft7::new(&schema)(tests/common/validators.rs:55)validates
formatfor Draft-07 by default (the crate's formatvalidation is draft-dependent; Draft-07 treats
formatas anassertion). A raw space is not a legal
uri-referencecharacter, so adocument whose
uriissrc/my file.rs(unencoded) would failvalidate_sarif.Production deliberately percent-encodes to avoid exactly this
(
src/output/sarif.rs:84-92). But every offender fed to theschema-validation tests has a path that is already a clean
uri-referencerequiring no transformation:sarif_single_offender_validates_against_schema→src/foo.rssarif_multi_offender_validates_against_schema→src/alpha.rs,src/beta.rssarif_error_severity_validates_against_schema→src/foo.rssarif_omitted_optional_fields_validates_against_schema→a.rsThe encoding branch is asserted only at the unit level in
src/output/sarif.rs:634(space_is_percent_encoded), which checks thestring output of
path_to_uri_referencedirectly but never runs theresult through schema validation. So the one place that proves
schema-conformance (the integration suite) and the one place that
exercises encoding (the unit test) never meet.
Concretely: if
path_to_uri_referenceregressed to stoppercent-encoding spaces (or emitted a malformed escape), the unit test
space_is_percent_encodedwould catch the exact-string break, but theintegration schema suite — the test that exists to certify
GitHub-Code-Scanning-ingestible output — would stay green, because no
fixture carries a character that needs encoding.
Expected Behavior
At least one
*_validates_against_schematest should use an offenderpath that forces a non-trivial
path_to_uri_referencetransformation(e.g. a path with an embedded space, and ideally a Windows-style
backslash path), so the suite proves the encoded URI still satisfies
the schema's
uri-referenceformat — not merely that analready-clean path does.
Actual Behavior
All schema-validation fixtures use paths that pass through
path_to_uri_referenceunchanged, so the suite would still pass even ifthe encoding logic were removed entirely.
Impact
A regression in
path_to_uri_referencethat produces schema-invaliduri-referencevalues for real-world paths (spaces are common infixture/temp paths; backslashes on Windows runners) would ship
undetected by the integration suite whose stated purpose is to certify
schema conformance. This is the lib-side analogue of the SARIF
URI-correctness concerns already tracked in #798 (colon scheme
ambiguity), but it is a distinct test-quality gap, not a duplicate:
#798 is a production bug in the colon branch; this is the absence of any
schema-validation fixture that exercises any encoding branch.