Skip to content

tests(sarif): schema-validation suite never exercises a path needing uri encoding, can't catch a path_to_uri_reference regression #949

@dekobon

Description

@dekobon

Summary

The SARIF schema-validation integration suite in tests/sarif_test.rs
never feeds a path that exercises path_to_uri_reference's encoding
branches (spaces, backslashes, non-ASCII, colons), so it cannot catch a
regression that makes the writer emit an artifactLocation.uri the
vendored SARIF schema's uri-reference format constraint rejects —
the exact failure that encoding exists to prevent.

Location

  • tests/sarif_test.rs:124-207 (the five *_validates_against_schema
    tests + the canary), all using clean ASCII paths such as
    src/foo.rs, a.rs, src/alpha.rs.
  • tests/sarif_test.rs:116-122 (the section doc claiming these tests
    catch "a wrong-typed value, an extra property the schema rejects").
  • Guards production code at src/output/sarif.rs:58-95
    (path_to_uri_reference), whose doc (lines 48-52) states it
    percent-encodes "so spaces and other path characters survive
    validation" under "the uri-reference format ... GitHub Code
    Scanning uses."

Evidence

The vendored schema constrains the URI field with a real assertion the
validator enforces:

tests/fixtures/sarif-2.1.0.json:211-214
  "uri": { ... "format": "uri-reference" }

jsonschema::draft7::new(&schema) (tests/common/validators.rs:55)
validates format for Draft-07 by default (the crate's format
validation is draft-dependent; Draft-07 treats format as an
assertion). A raw space is not a legal uri-reference character, so a
document whose uri is src/my file.rs (unencoded) would fail
validate_sarif.

Production deliberately percent-encodes to avoid exactly this
(src/output/sarif.rs:84-92). But every offender fed to the
schema-validation tests has a path that is already a clean
uri-reference requiring no transformation:

  • sarif_single_offender_validates_against_schemasrc/foo.rs
  • sarif_multi_offender_validates_against_schemasrc/alpha.rs, src/beta.rs
  • sarif_error_severity_validates_against_schemasrc/foo.rs
  • sarif_omitted_optional_fields_validates_against_schemaa.rs

The encoding branch is asserted only at the unit level in
src/output/sarif.rs:634 (space_is_percent_encoded), which checks the
string output of path_to_uri_reference directly but never runs the
result through schema validation. So the one place that proves
schema-conformance (the integration suite) and the one place that
exercises encoding (the unit test) never meet.

Concretely: if path_to_uri_reference regressed to stop
percent-encoding spaces (or emitted a malformed escape), the unit test
space_is_percent_encoded would catch the exact-string break, but the
integration schema suite — the test that exists to certify
GitHub-Code-Scanning-ingestible output — would stay green, because no
fixture carries a character that needs encoding.

Expected Behavior

At least one *_validates_against_schema test should use an offender
path that forces a non-trivial path_to_uri_reference transformation
(e.g. a path with an embedded space, and ideally a Windows-style
backslash path), so the suite proves the encoded URI still satisfies
the schema's uri-reference format — not merely that an
already-clean path does.

Actual Behavior

All schema-validation fixtures use paths that pass through
path_to_uri_reference unchanged, so the suite would still pass even if
the encoding logic were removed entirely.

Impact

A regression in path_to_uri_reference that produces schema-invalid
uri-reference values for real-world paths (spaces are common in
fixture/temp paths; backslashes on Windows runners) would ship
undetected by the integration suite whose stated purpose is to certify
schema conformance. This is the lib-side analogue of the SARIF
URI-correctness concerns already tracked in #798 (colon scheme
ambiguity), but it is a distinct test-quality gap, not a duplicate:
#798 is a production bug in the colon branch; this is the absence of any
schema-validation fixture that exercises any encoding branch.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions