xarf-rs

A Rust implementation of XARF v4 — the eXtended Abuse Reporting Format. Parses, validates, and generates abuse reports with a small, dependency-light API. Compatible with XARF v3 input via automatic conversion.

[dependencies]
xarf-rs = "0.1"

The crate publishes as xarf-rs on crates.io but imports as xarf:

use xarf::{parse, ReportBuilder, Contact, create_evidence};

What XARF is

XARF is a JSON standard for describing abuse incidents (spam, DDoS, phishing, copyright violations, compromised infrastructure, and so on) in a machine-readable, schema-validated form. The full v4 specification lives at github.com/xarf/xarf-spec. This crate ships the canonical JSON Schemas embedded at compile time — no network or filesystem access is required at runtime.

Quick start

use serde_json::json;
use xarf::{create_evidence, parse, Contact, ReportBuilder};

// Parse an incoming report.
let result = parse(r#"{"xarf_version": "4.2.0", "...": "..."}"#).unwrap();
if result.errors.is_empty() {
    let report = result.report.unwrap();
    println!("{}/{}", report.category.as_str(), report.type_);
}

// Build a new report programmatically.
let evidence = create_evidence("text/plain", b"original spam payload");
let result = ReportBuilder::new("messaging", "spam", "192.0.2.1")
    .reporter(Contact::new("Acme", "abuse@acme.example", "acme.example"))
    .sender(Contact::new("Acme", "abuse@acme.example", "acme.example"))
    .source_port(25)
    .extra("protocol", json!("smtp"))
    .extra("smtp_from", json!("spam@bad.example"))
    .add_evidence(evidence)
    .build()
    .unwrap();

API surface

Function / type	Purpose
`parse(json)`	Parse a JSON string. Auto-converts v3 input.
`parse_value(value, ...)`	Same, but starts from a `serde_json::Value`.
`validate(value, ...)`	Validate without typed deserialization.
`ReportBuilder`	Build a report programmatically; auto-fills `report_id`/timestamp.
`create_report(...)`	Functional shorthand for `ReportBuilder`.
`create_evidence(...)`	Compute hash + base64 + size from raw bytes.
`is_v3_report(value)`	Detect XARF v3 format.
`convert_v3_to_v4(...)`	Convert v3 → v4 JSON.
`Report`, `Contact`,
`Evidence`, `Category`	Typed views over a parsed report.

Validation modes

Standard (default): required fields enforced; recommended fields ignored; unknown fields surface as warnings.
Strict (ParseOptions { strict: true, .. }): recommended fields promoted to required; unknown fields become errors.
Show missing optional (show_missing_optional: true): populate info with every absent recommended/optional field — useful for review tools.

XARF v3 backward compatibility

parse() automatically detects v3 input (Version: "3"/"3.0"/"3.0.0" plus ReporterInfo and Report), converts it to v4, and surfaces a deprecation warning. Behavioural parity is held with the reference Python implementation's v3_compat module.

Architecture

Core fields on Report are strongly typed (Contact, Evidence, Category); category-specific fields flow through a sorted BTreeMap<String, serde_json::Value> for lossless round-tripping and forward compatibility.
The 33 type schemas plus the master schema and core schema are bundled into the binary with include_str!.
Strict-mode validation deep-copies the master schema once at startup and promotes every x-recommended: true property to its parent's required array — matching the algorithm in the v4 implementer's guide.
Schema reference resolution ($ref between type schemas and the core schema) is satisfied entirely from the embedded documents through a custom jsonschema::Retrieve retriever.

Performance

The hot paths (parse, validate, build) all run at single-digit µs on commodity hardware, well inside the XARF v4 spec's <1ms typical-report target:

Operation	Time	Throughput
`parse` — small spam (~500 B)	~9 µs	~68 MiB/s
`parse` — DDoS report (~700 B)	~8 µs	~68 MiB/s
`parse` — large bulk (~40 KB)	~23 µs	~1.7 GiB/s
`parse` (strict mode)	~10 µs	~60 MiB/s
`validate` only	~7 µs	—
`convert_v3_to_v4`	~5 µs	~100 MiB/s
`is_v3_report` detection	~30 ns	~16 GiB/s
`create_evidence` (SHA-256, 64 KB)	~39 µs	~1.6 GiB/s
`create_evidence` (MD5, 64 KB)	~85 µs	~730 MiB/s

The compiled jsonschema::Validator for the master schema is built once on first use and cached for the lifetime of the process — concurrent parses from many tasks share a single immutable validator.

Regression detection

The crate ships with three layers of defense against perf regressions:

Committed perf-budget tests (tests/perf_budgets.rs) — wall-clock bounds with 5–180× headroom, run as part of cargo test on any contributor's machine. They catch catastrophic regressions (e.g. an accidentally-uncached schema validator) without any CI setup. Run:
```
cargo test --release --test perf_budgets -- --nocapture
```
CI-side benchmark comparison (.github/workflows/bench.yml) — runs cargo bench on every PR, compares against the main baseline stored on a gh-pages branch, and fails the build if any benchmark regresses by more than 10%. Posts a comment on the PR with the diff. Powered by benchmark-action/github-action-benchmark.
Local criterion baselines (benches/xarf.rs) — for fine-grained work on a single machine:
```
cargo bench --bench xarf -- --save-baseline main
# ... make changes ...
cargo bench --bench xarf -- --baseline main
```
Criterion writes reports to target/criterion/ (gitignored, so these stay local) and flags any operation that regresses by more than 5%.

The committed tests are the universal fallback; the CI workflow is the authoritative regression gate; the local baselines are for development.

Sync and async

The public API is synchronous and performs zero I/O at runtime — all 34 JSON schemas are embedded into the binary with include_str!, so there is no filesystem or network access to await on. That means:

Call parse, validate, ReportBuilder::build, etc. directly from any context — synchronous CLI tools, tokio tasks, async-std, threads, anywhere.
Every public type is Send + Sync (verified at compile time with static_assertions), so reports flow across tokio::spawn boundaries and channels without wrapping.
For very large batches you can push the work off the runtime with tokio::task::spawn_blocking(move || xarf::parse(...)) — but for normal message sizes the parse is microseconds and you don't need to.

See tests/async_compat.rs for working examples of multi-threaded tokio runtimes, spawn, spawn_blocking, and concurrent registry use.

Testing

The crate ships with 115 tests across eleven test binaries:

golden_parser_tests — exercises every sample from the shared xarf-parser-tests corpus (44 valid + 5 invalid) plus the 32 canonical samples from xarf-spec.
snapshots — insta snapshot tests covering v3→v4 conversion output, validation error shape, strict-mode promotion, show_missing_optional output, generator output, and round-trip stability. Run cargo install cargo-insta && cargo insta review to inspect pending changes.
async_compat — Send + Sync compile-time assertions plus tokio integration tests (single + multi-thread runtimes, spawn boundary, concurrent registry use, spawn_blocking batches).
parser_tests — JSON parsing, strict mode, info mode, unknown-field handling, evidence and tag round-trips.
validator_tests — schema enforcement for every category, format validation (UUID, email, hostname, datetime), x-recommended promotion in strict mode.
generator_tests — ReportBuilder auto-metadata, hash algorithms, evidence creation, strict-mode generation errors.
v3_compat_tests — detection, conversion, every documented v3 ReportType, error paths.
model_tests — Category (un)known, contact/evidence (de)serialization, internal stripping.
round_trip — parse → mutate → serialize → re-parse for every spec sample, plus extension-field preservation.
smoke, api_surface — sanity checks.

Run them all:

cargo test

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.github		.github
benches		benches
schemas/v4		schemas/v4
src		src
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
deny.toml		deny.toml
release-plz.toml		release-plz.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

xarf-rs

What XARF is

Quick start

API surface

Validation modes

XARF v3 backward compatibility

Architecture

Performance

Regression detection

Sync and async

Testing

License

About

Uh oh!

Releases 5

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

xarf-rs

What XARF is

Quick start

API surface

Validation modes

XARF v3 backward compatibility

Architecture

Performance

Regression detection

Sync and async

Testing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages