Skip to content

Refactor PATCH operations, add filter parser, and parameterize ID types#39

Open
Paul-E wants to merge 9 commits intoShiftControl-io:mainfrom
Paul-E:main
Open

Refactor PATCH operations, add filter parser, and parameterize ID types#39
Paul-E wants to merge 9 commits intoShiftControl-io:mainfrom
Paul-E:main

Conversation

@Paul-E
Copy link
Copy Markdown
Contributor

@Paul-E Paul-E commented Apr 4, 2026

Refactor PATCH operations

Refactor patch operation datastructures to better leverage the sum type nature of Rust enums such that they more closely models the shape of the data in the SCIM spec. The enum representation expresses the relationship between the op and value , and path and value, which helps make invalid states unrepresentable, leading to rejection of invalid data.

Add parser for filters and PATCH operations

The RFC 7644 provides a grammar for filters, defined by RFC 7644 §3.4.2.2. This PR adds

  • A LALRPOP based parser to parse the filter grammar into data structures.
  • Data structures to build filters and serialize them.

Eg a filter emails[type eq "work"] filter can be both serialized and parsed like so

let vp = ValuePath {
    attr: AttrPath::with_name("emails"),
    filter: Box::new(ValFilter::Attr(AttrExp::Comparison(
        AttrPath::with_name("type"),
        CompareOp::Eq,
        "work".into(),
    ))),
};
let filter = Filter::ValuePath(vp)

// serializing
assert_eq!(filter.to_string(), r#"emails[type eq "work"]"#);
// parsing
let parsed: Filter = r#"emails[type eq "work"]"#).parse();
assert_eq!(Ok(filter), parsed);  

The current implementation exposes the full AST, which can be unwieldy to work with. Future work may improve the DX by offering tools with a different interface

Building the parser

LALRPOP offers two methods of turning src/filter_parser.lalrpop into src/filter_parser.rs, each described in the guide. One method is to use a build script, detecting changes and re-building the Rust parser each run of the compiler. The other is to use the lalrpop cli to generate src/filter_parser.rs without the use of macros or build scripts.

I chose the CLI route so as to not introduce a build script. Build scripts run arbitrary Rust code on each compilation, so some developers and maintainers feel more comfortable using libraries without them. Statically generating src/filter_parser.rs is also more likely to play nice with IDEs and LSP, as it is "just another Rust file".

A drawback of using the CLI is it means a very large src/filter_parser.rs. Because this file is so large, it is in practice not re-viewable. An attacker could hide an exploit in its large footprint. This can be mitigated with CI job that re-generates filter_parser.rs and validates that it matches what is in the MR. As a stopgap, maintainers can perform this action manually.

Make id types parametric

Previously some ID types were parsed as Strings, meaning users of the crate required another parsing pass if their ID UUIDs, emails, etc... With the ID types parameterized, invalid IDs are caught during the initial deserialization pass.

Users of the library may still use String as the ID type to get the old behavior, and indeed this is the default type if none is specified.

Remove Default impls on various types

While using Default is a common lightweight builder pattern for applications, it is probably not the right approach for libraries, as users might not have a grasp of what they need to "fill in" after the struct is created via Default. Objects created via Deafult should be ready to use immediately.

If a builder is desired, something like TypedBuilder is more likely to be the correct approach, as it leverages the compiler to guide users of the library towards constructing correct objects.

@amazon-inspector-singapore
Copy link
Copy Markdown

⏳ I'm reviewing this pull request for security vulnerabilities and code quality issues. I'll provide an update when I'm done

@amazon-inspector-singapore
Copy link
Copy Markdown

✅ I finished the code review, and left comments with the issues I found.

@shiftcontrol-dan
Copy link
Copy Markdown
Contributor

PR Red Team Report: #39 — Refactor PATCH operations, add filter parser, and parameterize ID types

Reviewed by: 4 parallel AI agents (Security, Error Handling, Test Coverage, Code Review)
Severity breakdown: 1 Critical | 4 High | 3 Medium | 3 Low | 2 Info


Critical Findings

[C-1] Silent path swallowing via #[serde(untagged)] on OperationTarget

  • Category: Data Integrity / Input Validation | Reviewer: Security, Error Handling, Code Review (3/4 agents)
  • Location: src/models/others.rs (OperationTarget enum)
  • Issue: OperationTarget uses #[serde(untagged)] with WithPath before WithoutPath. If a PATCH payload contains a path field with an invalid filter expression (e.g., "path": "emails[badfilter"), serde tries WithPath, fails on PatchPath deserialization, then silently falls through to WithoutPath. If value is a JSON object, WithoutPath succeeds, silently dropping the malformed path and treating the operation as pathless.
  • Impact: A server receiving {"op":"replace","path":"some[broken filter","value":{"displayName":"pwned"}} would not reject the request. Instead, it would apply a pathless replace, potentially overwriting top-level attributes. This is a data integrity issue that bypasses targeted-attribute semantics.
  • Recommendation: Replace #[serde(untagged)] with a custom Deserialize impl that checks for the path key first. If path is present, always attempt WithPath and propagate errors on failure. Only try WithoutPath when path is absent. Add a test with a malformed path to confirm the fix.

High Findings

[H-1] Stack overflow via deeply nested filter expressions (DoS)

  • Category: Denial of Service | Reviewer: Security, Error Handling
  • Location: src/filter.rs (Display impl, recursive AST traversal)
  • Issue: The LALRPOP parser is table-driven (iterative), so parsing itself won't overflow. However, deeply nested expressions like ((((...(title pr)...)))) produce a deeply nested Box<Filter> AST. Any recursive traversal (Display::fmt, user-written evaluators) will overflow the Rust call stack.
  • Impact: An attacker sends a SCIM filter with thousands of nested parentheses in a GET ?filter=... request. The server parses it, then crashes on any recursive traversal.
  • Recommendation: Add a recursion depth limit (e.g., 64 levels) checked after parsing, or enforce a maximum input length in FromStr::from_str (e.g., 4096 bytes). Document the limit.

[H-2] Committed generated parser with no CI verification (supply chain risk)

  • Category: Supply Chain | Reviewer: Security, Code Review
  • Location: src/filter_parser.rs (13,729 lines)
  • Issue: The PR author explicitly flagged this: "An attacker could hide an exploit in its large footprint." The SHA3 hash in the header only proves which grammar input was used, not that the output is unmodified. Manual review found no suspicious code, but this isn't scalable. The existing CI does not regenerate and verify the parser.
  • Impact: Future modifications to filter_parser.rs could inject malicious code that is practically unreviewable.
  • Recommendation: Add a CI step that regenerates from the grammar and diffs:
    - name: Verify generated filter parser
      run: |
        cargo install lalrpop --version 0.23.1
        lalrpop src/filter_parser.lalrpop
        git diff --exit-code src/filter_parser.rs

[H-3] ListQuery::filter type change creates hard deserialization failures

  • Category: Breaking Behavior Change | Reviewer: Code Review, Error Handling
  • Location: src/models/others.rs:38
  • Issue: ListQuery.filter changed from Option<String> to Option<Filter>. Previously, any filter string (valid or invalid) could be received and handled at the application layer. Now, an invalid filter expression causes the entire ListQuery deserialization to fail, losing access to start_index, count, and other fields.
  • Impact: Servers that currently do tolerant filter handling (logging and ignoring bad filters) will get hard deserialization failures. They cannot return RFC 7644 compliant 400 invalidFilter responses.
  • Recommendation: Document this behavior change explicitly. Consider a RawFilter(String) fallback or a structured error type so downstream servers can produce proper SCIM error responses.

[H-4] Remove variant silently drops RFC-allowed value field

  • Category: Data Loss | Reviewer: Error Handling
  • Location: src/models/others.rs (PatchOperation::Remove)
  • Issue: Remove is defined as Remove { path: PatchPath } with no value field. RFC 7644 Section 3.5.2.2 states that remove operations MAY include a value for complex multi-valued attributes (to remove specific sub-values). Azure AD/Entra sends remove operations with value fields. These values are silently dropped during deserialization because serde ignores unknown fields by default.
  • Impact: Identity providers sending remove-with-value operations (e.g., removing a specific email from a multi-valued attribute) will have the value silently ignored, potentially removing all values instead of the targeted one.
  • Recommendation: Add an optional value field to the Remove variant: Remove { path: PatchPath, #[serde(skip_serializing_if = "Option::is_none")] value: Option<Value> }.

Medium Findings

[M-1] Resource<T> untagged enum risks type confusion

  • Category: Logic Bug | Reviewer: Security, Error Handling
  • Location: src/models/others.rs (Resource enum)
  • Issue: Resource<T> uses #[serde(untagged)] and tries variants in order: User, Schema, Group, ResourceType. Since User and Group have overlapping fields (schemas, id), a payload with fields matching multiple types could deserialize as the wrong variant. The first match wins.
  • Impact: A ListResponse containing mixed resources could misclassify resources, leading to incorrect application logic.
  • Recommendation: Add a discriminator based on the schemas array content, or document variant ordering requirements.

[M-2] No input size/depth limit on filter parser

  • Category: Resource Exhaustion | Reviewer: Security, Error Handling
  • Location: src/filter.rs (FromStr impls)
  • Issue: Filter and PatchPath FromStr implementations accept strings of arbitrary length. A multi-megabyte filter string allocates proportional memory for tokens and AST nodes.
  • Recommendation: Document a recommended maximum filter length and optionally enforce it in FromStr, or advise consumers to enforce limits at the HTTP layer.

[M-3] User<T> has Default, Group<T> does not (asymmetric API)

  • Category: API Inconsistency | Reviewer: Code Review, Error Handling
  • Location: src/models/user.rs:63 (Default kept), src/models/group.rs (Default removed)
  • Issue: User<T> retains Default (works because Option<T> defaults to None), but Group<T> has Default removed. This creates an inconsistent API surface.
  • Recommendation: Either restore Default for Group<T> or remove it from User<T> for symmetry. Document the rationale.

Low Findings

[L-1] Lenient bool deserializer edge cases

  • Category: Input Validation | Reviewer: Error Handling
  • Location: src/utils/serde.rs
  • Issue: The custom deserialize_optional_lenient_bool handles "true"/"false" strings but behavior for "yes", "1", "", or numeric values is unspecified and untested.
  • Recommendation: Document accepted values. Consider handling numeric 1/0 if Entra sends them.

[L-2] Password field included in serialized User output (pre-existing)

  • Category: Information Disclosure | Reviewer: Security
  • Location: src/models/user.rs:37
  • Issue: Pre-existing: password: Option<String> with skip_serializing_if = "Option::is_none" means populated passwords are serialized. RFC 7643 says passwords SHOULD NOT be returned.
  • Recommendation: Consider #[serde(skip_serializing)] while keeping deserialization.

[L-3] No input length validation on filter/path strings

  • Category: Defense in Depth | Reviewer: Security
  • Location: src/filter.rs (FromStr impls)
  • Issue: Unbounded input length for filter parsing from untrusted HTTP parameters.
  • Recommendation: Enforce or document a max length.

Test Coverage Gaps (Critical/High only)

Gap Location Severity
Lenient bool deserializer has zero unit tests src/utils/serde.rs Critical
MemberType case-insensitive deser untested src/models/group.rs Critical
No tests with non-String ID types (e.g., UUID) src/models/user.rs, group.rs Critical
No tests for number/bool/null comparison values in filters src/filter.rs High
Filter/PatchPath JSON serde round-trip untested src/filter.rs High
PatchOperation serialization untested (only deser) src/models/others.rs High
Case-insensitive op names ("Add", "ADD") untested src/models/others.rs High
Entra user test has no field-level assertions src/models/user.rs Medium
No negative test: Remove without path, invalid op values src/models/others.rs High
No negative test: malformed path fallthrough to WithoutPath src/models/others.rs High

Generated by PRRedTeam -- 4 parallel agents reviewing security, bugs/compliance, error handling, and test coverage.

legion-paul and others added 7 commits April 6, 2026 20:49
This allows users of the library to use different types.
Eg String, uiid::Uuid, email_address::EmailAddress, etc...
…der tests

- Replace #[serde(untagged)] on OperationTarget with a custom Deserialize
  impl that checks for the "path" key explicitly. Malformed paths now
  produce an error instead of silently falling through to WithoutPath.
- Add optional `value` field to PatchOperation::Remove to support
  providers (e.g. JumpCloud) that include a value array on remove ops,
  per RFC 7644 Section 3.5.2.2.
- Add Okta test fixtures: deactivate (pathless replace), add members
  (array value with path), replace members (multi-member array).
- Add JumpCloud test fixtures: per-field replace ops, minimal member add,
  remove-with-value.
- Add negative tests: malformed path rejection, invalid op rejection.
Add clippy with -D warnings to catch lint issues early. Add a step that
regenerates filter_parser.rs from the LALRPOP grammar and diffs against
the committed file, ensuring the generated parser has not been tampered
with or fallen out of sync.
@shiftcontrol-dan
Copy link
Copy Markdown
Contributor

Red Team Follow-Up: Findings Status

Resolved

Finding Fix
C-1 (Critical): Silent path swallowing via #[serde(untagged)] on OperationTarget Replaced with custom Deserialize impl that checks for path key explicitly. Malformed paths now return errors instead of silently falling through to WithoutPath.
H-2 (High): No CI verification of generated parser Added Verify generated filter parser step to CI that regenerates filter_parser.rs from the grammar and diffs. CI run passed.
H-4 (High): Remove drops RFC-allowed value field Added value: Option<Value> to PatchOperation::Remove. JumpCloud and other providers that send value on remove ops are now supported.
Test coverage: No provider-specific test fixtures Added 6 test fixtures (3 Okta, 3 JumpCloud) + 2 negative tests (malformed path rejection, invalid op rejection).

Remaining (for contributor or future work)

Finding Status Notes
H-1 (High): Stack overflow via deeply nested filters Open Recursive Display/eval on deep ASTs can overflow the stack. Consider a depth limit after parsing (e.g., 64 levels) or a max input length in FromStr.
H-3 (High): ListQuery::filter hard deser failures Open / Design decision Changing filter from Option<String> to Option<Filter> means invalid filters fail entire deserialization. Downstream servers can't return RFC 7644 400 invalidFilter. Consider documenting this or offering a raw fallback.
M-1 (Medium): Resource<T> untagged enum type confusion Open Overlapping fields between User/Group could cause mis-deserialization. Consider a schemas-based discriminator.
M-2 (Medium): No input size limit on filter parser Open Unbounded input to FromStr. Document recommended max or enforce at library level.
M-3 (Medium): User<T> has Default, Group<T> does not Open Asymmetric API. Intentional or oversight?
L-1/L-2/L-3 Open Lenient bool edge cases, password serialization (pre-existing), input length validation.
Test gaps: lenient bool, MemberType, generic IDs, filter serde Open Zero unit tests for deserialize_optional_lenient_bool, MemberType case-insensitive deser, and User<Uuid>/Group<Uuid>.

@shiftcontrol-dan
Copy link
Copy Markdown
Contributor

Hey @Paul-E, I've run my PR review agent against the PR. Overall I'm ok with it. I addressed the highest ones, if you could commit sign future commits that would be awesome.
Let me know if you'd like to resolve any of the other findings or dispute them, or if you think we should merge as is.

@Paul-E
Copy link
Copy Markdown
Contributor Author

Paul-E commented Apr 7, 2026

Hey @Paul-E, I've run my PR review agent against the PR. Overall I'm ok with it. I addressed the highest ones, if you could commit sign future commits that would be awesome. Let me know if you'd like to resolve any of the other findings or dispute them, or if you think we should merge as is.

Thanks for reviewing, I'll take a look at the feedback this week

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants