A benchmark dataset for AI smart-contract auditing. It curates and extracts vulnerable contract source code from real-world historical exploits caused by smart-contract vulnerabilities, intended for evaluating and training auditing capabilities (e.g., used together with tools like ai-auditing-engine).
- Sources: Publicly available on-chain / security incident reports and repository snapshots, corresponding to contracts exploited or found defective in each incident.
- Goals:
- Provide real vulnerable samples that are reproducible and comparable.
- Support evaluating AI auditing performance at two granularities: full context and reduced attack surface.
Data lives under dataset/ and is organized by incident. Each incident directory name follows: {IncidentDateYYYYMMDD}_{ProjectOrProtocolSlug} (date first, to make sorting and searching easier).
dataset/
├── benchmark_complete/ # Full code of the exploited contracts (incl. deps/libs), as close as possible to an auditable/compilable snapshot
└── benchmark_simplified/ # Only vulnerability-related functions + minimal required deps; obviously irrelevant logic removed
- Contains the full source tree of the exploited contracts (including interfaces, libraries, third-party dependencies, etc.), useful for:
- Cross-contract and cross-module interaction analysis
- Audit workflows that require a full call graph and state-flow context
- Based on the code for the same incident, it keeps only the vulnerable functions (and the minimal dependencies required for compilation and semantic understanding) and removes functions unrelated to the vulnerability, for:
- Reducing input scope when integrating with engines like ai-auditing-engine, making it easier to pinpoint vulnerabilities precisely
- Lowering token and compute costs, speeding up iterative evaluation
Note:
benchmark_simplifiedmay still include some library files or interfaces, because the vulnerable functions can be coupled via types, constants, or math libraries. The guiding principle is “minimum necessary,” not “single-file (.sol) only.”
The CSV in the repository root lists all incidents currently included in dataset/ and should be treated as the authoritative source of metadata:
- Chinese:
ai-auditing-benchmark_cn.csv - English:
ai-auditing-benchmark_en.csv
Both CSVs have identical rows; only the field language differs. Column meanings:
- Attack date: Incident date (
YYYY.MM.DD). - Project: The exploited project or protocol (the display name may differ slightly from the directory slug, e.g., with
@or parenthetical notes). - Vulnerability: A short description of the vulnerability type.
- Vulnerability details: The exploit technique and defect description.
- Attack transaction: Representative on-chain transaction hash.
- Vulnerable contract address: Related contract address(es) (may span multiple lines within a cell).
- Loss (10k USD): Reported or estimated loss amount.
Mapping to directory names: Incident folder names under both dataset/benchmark_complete and benchmark_simplified use {IncidentDateYYYYMMDD}_{ProjectOrProtocolSlug}. The date is derived from Attack date as an 8-digit number (e.g., 2025.05.28 → 20250528). {ProjectOrProtocolSlug} corresponds to the Project column and is typically a filesystem-safe slug in lowercase/camel case (e.g., @Corkprotocol in the table maps to 20250528_Corkprotocol). If the Project column includes extra notes (e.g., addresses in parentheses), the directory name usually still uses a short protocol identifier; the actual folder names in the repo are authoritative.
Source-tree paths vary by incident. Browse within the corresponding directory by subproject / contract name.
- Find the target row in the CSV (by Attack date / Project).
- Convert Attack date to
YYYYMMDD, and combine it with the project slug:{YYYYMMDD}_{ProjectSlug}. - Choose a granularity:
dataset/benchmark_complete/{dir}/...: Full context (closer to real audit inputs).dataset/benchmark_simplified/{dir}/...: Minimal necessary slice (fewer tokens, faster regression).
Example: 2025.05.28 + @Corkprotocol → dataset/benchmark_simplified/20250528_Corkprotocol/
- Regression and comparison: For the same incident, run the same audit prompts/pipeline on both
benchmark_completeandbenchmark_simplified, and compare detection rate, false positives, and cost. - Day-to-day iteration: During development, use
benchmark_simplifiedfor quick validation; before release, spot-check withbenchmark_completefor more production-like context.
- Code snippets in this repository come from publicly available project sources or incident-related public materials; copyright belongs to the original authors. They are provided solely for security research and benchmark evaluation.
- Vulnerable code can be destructive. Do not use it for illegal purposes. If you use this dataset in papers or products, please cite the dataset name and the version/commit information.
Issues and PRs are welcome for adding new incidents, fixing paths, or improving the “vulnerable function” slicing rules. For new entries, please maintain the mapping between benchmark_complete and benchmark_simplified and update the CSV metadata (ai-auditing-benchmark_cn.csv, ai-auditing-benchmark_en.csv). In your PR, briefly describe the incident source and the vulnerability type.