Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughParsed now stores optional CFWS-stripped "clean" strings for local-part and domain; parser populates these during obs-mode parsing. normalize.rs and validate.rs use Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Parser as Parser (src/parser.rs)
participant Parsed as Parsed (struct)
participant Normalizer as normalize.rs
participant Validator as validate.rs
Client->>Parser: parse(input)
Parser-->>Parsed: Parsed{ local_part_span, domain_span, local_part_clean?, domain_clean? }
Client->>Normalizer: normalize(parsed)
Normalizer->>Parsed: local_part_str(), domain_str()
Normalizer-->>Client: normalized address
Client->>Validator: validate(parsed or normalized)
Validator->>Parsed: local_part_str(), domain_str()
Validator-->>Client: validation result
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Pull request overview
This PR fixes a Lax-mode parsing bug where CFWS/comment text between atoms/labels could be included in the semantic local-part/domain content, which then impacted normalization and validation.
Changes:
- Extend
Parsedto optionally carry CFWS-strippedlocal_part_clean/domain_clean, pluslocal_part_str()/domain_str()helpers. - Update obs-local-part and obs-domain parsing to collect per-atom/label spans and reconstruct clean content when CFWS is present.
- Switch normalization and validation to operate on CFWS-stripped content and add regression tests for the affected cases.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/parser.rs |
Adds optional clean fields + helpers; reconstructs CFWS-stripped local-part/domain in obs-mode; adds regression tests. |
src/normalize.rs |
Uses local_part_str() / domain_str() so normalization operates on CFWS-stripped semantics. |
src/validate.rs |
Uses local_part_str() / domain_str() for length checks and domain handling. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/validate.rs`:
- Around line 26-29: The variables raw_local/raw_domain and the comment about
"RAW addr-spec" are misleading because parsed.local_part_str() and
parsed.domain_str() return CFWS-stripped semantic content; update the comment to
say length limits apply to the CFWS-stripped addr-spec content and rename
raw_local/raw_domain to local/domain (and update all references of
raw_local/raw_domain at the places noted) so the names reflect that they contain
cleaned semantic local and domain strings (refer to local_part_str(),
domain_str(), and the variable uses on the lines indicated).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: bf216156-deb6-4d1f-808a-b877d8e3b42d
📒 Files selected for processing (3)
src/normalize.rssrc/parser.rssrc/validate.rs
In Lax mode, skip_cfws() between atoms advanced parser.pos, making the resulting Span include CFWS/comment characters in semantic values. For example, `user (comment) . name@example.com` produced local_part span `user (comment) . name` instead of `user.name`. - Collect atom/label spans during obs-mode parsing and reconstruct clean content by joining with dots - Add `local_part_clean` / `domain_clean` optional fields to Parsed (None when no CFWS present — preserves zero-copy path) - Add `local_part_str()` / `domain_str()` helpers that return the clean version when available, raw span otherwise - Update normalize.rs and validate.rs to use helpers - Add 8 regression tests covering comments, whitespace, multiple segments, leading CFWS, and zero-copy verification Closes #13
Trailing CFWS before @/>/EOF was incorrectly marking had_cfws=true even though backtracking excluded it from the span. This caused unnecessary allocation and loss of the zero-copy path. - Only set had_cfws after '.' is successfully consumed - Rename raw_local/raw_domain to local/domain_str in validate.rs to reflect CFWS-stripped semantic content - Add 2 tests verifying trailing CFWS preserves zero-copy path Closes #13
2d78302 to
ee2b888
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/parser.rs`:
- Around line 353-361: skip_cfws() may append tentative comment spans into
parser.comments but the code only restores parser.pos on backtrack
(parser.restore(save)), leaving duplicate entries; update the backtrack logic in
parse_dot_atom_domain() so you save the current parser.comments.len() (or
equivalent) before calling skip_cfws(), and when you restore after a failed '.'
check (and in the analogous block around lines 521-527) truncate parser.comments
back to that saved length in addition to restoring parser.pos via
parser.restore(save); use the existing parser.save()/parser.restore(save)
pattern and the parser.comments collection to locate and implement the
truncation.
- Around line 339-349: The code currently calls skip_cfws(parser, 0) before
parsing the first local-part word, which incorrectly allows leading CFWS in bare
addr-spec; remove that initial skip_cfws call and only allow CFWS between words.
Concretely, in the local-part parsing logic around eat_atext_run and
try_quoted_string (where atoms is pushed and ErrorKind::EmptyLocalPart is
returned on failure), do not call skip_cfws(parser, 0) before the first
atom_start; instead only invoke skip_cfws when atoms is non-empty (i.e., before
consuming a separator or the next word) so leading CFWS/comments are rejected
for bare addr-spec while still permitting CFWS between word segments.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: f226571e-74de-4869-9acb-6e4eb41b8b4f
📒 Files selected for processing (3)
src/normalize.rssrc/parser.rssrc/validate.rs
…cktrack state - Remove skip_cfws before first obs-local-part atom: bare addr-spec does not permit CFWS before the local-part, only between segments - Replace Vec<&str> with stack-allocated [(usize, usize)] arrays in both parse_dot_atom_local and parse_dot_atom_domain — zero heap allocation when no CFWS is present (the common Lax-mode path) - Save and truncate parser.comments on backtrack when trailing CFWS before @/>/EOF has no following dot — prevents duplicate spans from tentative skip_cfws calls Closes #13
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/parser.rs`:
- Around line 338-396: The parser currently uses a fixed-size inline buffer
(MAX_ATOMS, bounds, count) and silently ignores atoms once count reaches
MAX_ATOMS, which causes reconstructed cleaned values to drop trailing labels
when had_cfws is true; change the logic in the local-part/domain parsing
function (the code around MAX_ATOMS, bounds, count, had_cfws, eat_atext_run,
try_quoted_string and the reconstruction that builds clean) to either (a) on
first overflow allocate a Vec<(usize,usize)> on the heap, copy existing bounds
into it and continue pushing new atom spans into that Vec, or (b) return an
explicit parse error instead of continuing; ensure downstream calls to
local_part_str()/domain_str() use the full heap-backed spans when used and
remove any reconstruction that uses only bounds[..count] when overflow occurred.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: fa309430-dec6-493b-81d8-ff7d8527dbf5
📒 Files selected for processing (1)
src/parser.rs
… builder Removes the bounded [(usize, usize)] arrays that silently dropped atoms/labels beyond MAX_ATOMS/MAX_LABELS. Now builds the clean String incrementally: zero allocation when no CFWS is present (Option<String> stays None), and when CFWS is first detected the contiguous prefix is copied from the raw span then subsequent atoms are appended directly. No capacity limits, no silent truncation. Closes #13
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/parser.rs`:
- Around line 373-385: The current logic seeds the clean buffer with
parser.input[outer_start..last_clean_end + 1], which includes the '.' separator
and then later unconditionally appends another '.', producing double dots when
CFWS appears after the dot; update the seeding so that when initializing clean
(in the block that checks clean.is_none() && parser.pos > last_clean_end + 1)
you copy only up to last_clean_end (exclude the trailing dot) so that the later
s.push('.') in the atom-parsing branch yields a single separator; locate this
behavior around the variables/identifiers clean, outer_start, last_clean_end,
parser.pos, atom_start and the calls
eat_atext_run/try_quoted_string/ErrorKind::EmptyLocalPart and adjust the slice
bounds accordingly (the same fix should be applied to the analogous block at
lines 533-543).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 3bb641f3-18a8-4b06-8681-413d3b4d2c5c
📒 Files selected for processing (1)
src/parser.rs
When CFWS appeared only after a dot (e.g. "user. name"), the clean
buffer was seeded with the prefix including the dot, then push('.')
appended another — producing "user..name". Seed now copies only up
to last_clean_end (before the dot), so push('.') yields a single
separator. Applies to both obs-local-part and obs-domain.
Closes #13
|
@coderabbitai re-review |
|
🧠 Learnings used✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/normalize.rs (1)
32-45: 🧹 Nitpick | 🔵 TrivialAdd one end-to-end normalization test for obs-form CFWS.
raw_localandraw_domainnow come from the cleaned accessors, but the tests in this file still only cover already-clean addr-specs. Aparse_and_normalize()regression with CFWS between local atoms and domain labels would lock in that quote detection, NFC, and provider rules run on the stripped value.Based on learnings: All normalization paths must have tests for edge cases (Unicode NFC, IDNA, confusables).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/normalize.rs` around lines 32 - 45, Tests currently only cover already-clean addr-specs and miss obs-form CFWS cases; add an end-to-end test that exercises parse_and_normalize() with CFWS (comments/folding) between local atoms and between domain labels to ensure raw_local/raw_domain obtained via parsed.local_part_str()/parsed.domain_str() are the cleaned values used for quote detection (is_quoted), unescape_quoted_string, NFC normalization (nfc_local/nfc_domain), and provider rules. The test should pass an addr-spec containing CFWS around atoms/labels and assert the normalized output equals the expected cleaned, NFC-normalized, IDNA/filtered result so regressions where quote detection or NFC runs on unstripped input are caught.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/parser.rs`:
- Around line 1187-1206: The tests
obs_trailing_cfws_before_at_preserves_zero_copy and
obs_trailing_cfws_after_domain_preserves_zero_copy only assert zero-copy but not
that comments weren't duplicated; update each test to also assert the
Parsed.comments collection contains the expected single entry (e.g.,
assert_eq!(p.comments.len(), 1) or assert the exact span/value) to ensure
skip_cfws() didn't record the same comment twice — locate these assertions near
the existing p.local_part_clean/p.domain_clean checks and reference
Parsed.comments and the tests named
obs_trailing_cfws_before_at_preserves_zero_copy and
obs_trailing_cfws_after_domain_preserves_zero_copy.
- Around line 341-343: The parser still accepts leading WSP because the code
trims the input earlier (see the input.trim() call) so the bare addr-spec branch
(the guard using eat_atext_run(parser) and try_quoted_string(parser)) never sees
leading whitespace; to fix, stop trimming the left side (remove or change the
earlier input.trim() to only trim trailing WSP) or explicitly reject leading
WSP/CFWS before entering the bare addr-spec logic: ensure the pre-parse
normalization does not remove leading whitespace and/or add a check before the
eat_atext_run(parser) / try_quoted_string(parser) branch that fails on any
leading WSP or leading CFWS/comment, and apply the same change at the other
affected region (the block around lines 1157-1169) so bare addr-specs never
accept leading CFWS/comments.
---
Outside diff comments:
In `@src/normalize.rs`:
- Around line 32-45: Tests currently only cover already-clean addr-specs and
miss obs-form CFWS cases; add an end-to-end test that exercises
parse_and_normalize() with CFWS (comments/folding) between local atoms and
between domain labels to ensure raw_local/raw_domain obtained via
parsed.local_part_str()/parsed.domain_str() are the cleaned values used for
quote detection (is_quoted), unescape_quoted_string, NFC normalization
(nfc_local/nfc_domain), and provider rules. The test should pass an addr-spec
containing CFWS around atoms/labels and assert the normalized output equals the
expected cleaned, NFC-normalized, IDNA/filtered result so regressions where
quote detection or NFC runs on unstripped input are caught.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 07e5181f-1c05-4554-95af-9a3e6624c6f4
📒 Files selected for processing (3)
src/normalize.rssrc/parser.rssrc/validate.rs
- Use span-based String::with_capacity instead of fixed 64 for clean buffer seeding in both obs-local-part and obs-domain - Rename raw_local/raw_domain to local/domain_str in normalize.rs to reflect CFWS-stripped semantic content - Clarify leading-comment test wording: trim() handles WSP, parser rejects comments - Assert comments.len() in backtrack tests to catch duplicates - Add obs-form end-to-end normalization test (CFWS + lowercase) Closes #13
There was a problem hiding this comment.
♻️ Duplicate comments (1)
src/parser.rs (1)
1186-1207: 🧹 Nitpick | 🔵 TrivialConsider asserting
comments.len()to strengthen backtrack tests.These tests verify zero-copy behavior via
*_clean.is_none(), but they don't assert that the comment deduplication (viacomments.truncate()on backtrack) is working. Addingassert_eq!(p.comments.len(), 1)to one local-part and one domain test would catch regressions where tentativeskip_cfws()re-records the same comment.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/parser.rs` around lines 1186 - 1207, The tests obs_trailing_cfws_before_at_preserves_zero_copy and obs_trailing_cfws_after_domain_preserves_zero_copy check zero-copy via local_part_clean/domain_clean but don’t verify comment deduplication; update each test to also assert that p.comments.len() == 1 to ensure skip_cfws() backtracking and comments.truncate() correctly avoid re-recording the same comment during tentative CFWS parsing (i.e., add an assert_eq!(p.comments.len(), 1) in each test after obtaining p).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/parser.rs`:
- Around line 1186-1207: The tests
obs_trailing_cfws_before_at_preserves_zero_copy and
obs_trailing_cfws_after_domain_preserves_zero_copy check zero-copy via
local_part_clean/domain_clean but don’t verify comment deduplication; update
each test to also assert that p.comments.len() == 1 to ensure skip_cfws()
backtracking and comments.truncate() correctly avoid re-recording the same
comment during tentative CFWS parsing (i.e., add an assert_eq!(p.comments.len(),
1) in each test after obtaining p).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 62f14757-1bfb-4cd4-bd69-a4e6343d4eba
📒 Files selected for processing (1)
src/parser.rs
|
@coderabbitai Re: obs-form normalization test — added |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chainLet me take a look at the new test to verify it covers all the required edge cases. 🏁 Script executed: Length of output: 0 Let me try fetching the file directly from the PR branch. 🏁 Script executed: Length of output: 59 🏁 Script executed: Length of output: 46 🏁 Script executed: Length of output: 13430
The learning stored for this repo explicitly requires:
Suggested extension to make the test fully match its description: #[test]
fn obs_cfws_stripped_before_normalization() {
let config = Config::builder()
.strictness(crate::Strictness::Lax)
.lowercase_all()
.build();
// ASCII path: CFWS stripped + case folded.
let n = parse_and_normalize("User (comment) . Name@Example (c) . COM", &config);
assert_eq!(n.local_part, "user.name", "CFWS stripped + lowercased");
assert_eq!(n.domain, "example.com", "domain CFWS stripped + lowercased");
// IDNA path: obs-domain with CFWS around an IDN label.
let n = parse_and_normalize("user@M\u{00FC}nchen (c) . DE", &config);
assert_eq!(n.domain, "xn--mnchen-3ya.de", "IDN domain CFWS stripped + punycode encoded");
assert_eq!(n.domain_unicode.as_deref(), Some("münchen.de"));
// NFC path: obs-local with a decomposed Unicode character (e + combining acute → é).
// 'e' + U+0301 (combining acute) should NFC-normalize to U+00E9 'é'.
let n = parse_and_normalize("use\u{0301}r (c) . name@example.com", &config);
assert_eq!(n.local_part, "usér.name", "obs-local NFC-normalized after CFWS strip");
}The ASCII sub-case can stay as-is — just add the IDN and NFC sub-cases to make the test actually validate what its comment claims. 🧠 Learnings used |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/normalize.rs`:
- Around line 360-371: The test obs_cfws_stripped_before_normalization currently
claims to verify NFC, case folding, and IDNA encoding but uses ASCII-only input
so NFC and IDNA paths aren't exercised; update the test either by changing the
comment to accurately reflect that it only checks CFWS stripping and
lowercasing, or extend it to include additional sub-cases that exercise NFC and
IDNA (e.g., call parse_and_normalize with a local part containing a decomposed é
like "cafe\u0301" to force .nfc() behavior and a domain with non-ASCII chars
like "münchen.de" to exercise domain_to_ascii_strict/punycode conversion),
keeping references to the existing test function
obs_cfws_stripped_before_normalization and helper parse_and_normalize so
reviewers can find and modify the right test.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 74c39307-f973-4751-9209-d3351a39e88a
📒 Files selected for processing (2)
src/normalize.rssrc/parser.rs
- Size String::with_capacity to known clean prefix length instead of remaining input length — avoids over-allocation for long inputs - Fix test wording: obs_cfws_stripped_before_normalization only tests case folding (ASCII input), not NFC/IDNA - Add obs_cfws_stripped_with_idna test exercising CFWS stripping through IDNA punycode encoding (münchen → xn--mnchen-3ya) Closes #13
Summary
skip_cfws()between atoms advancedparser.pos, making spans include CFWS/comment characters in semantic local-part and domain valuesOption<String>staysNone)local_part_clean/domain_cleanoptional fields toParsedwithlocal_part_str()/domain_str()accessorsnormalize.rsandvalidate.rsto use CFWS-stripped contentTechnical Details
Before:
user (comment) . name@example.com→ local_part span =user (comment) . nameAfter:
user (comment) . name@example.com→local_part_str()returnsuser.nameKey design decisions:
String::with_capacity(last_clean_end - outer_start)— sizes to the known prefix, not the remaining inputhad_cfwsflag only set after a dot is successfully consumed; trailing CFWS before@/>/EOF is backtracked without allocationparser.commentstruncated alongsideparser.poson backtrackTest Plan
Closes #13
Summary by CodeRabbit