Prototype Python utility for generating fictional financial transaction data for audit analytics practice, fraud/anomaly detection exercises, and data science education.
The generator creates two CSV files:
fake_transactions.csv— generated transactions with normal and irregular records.irregularities.csv— a ledger of injected irregularities and descriptions.
Maturity note: this is an educational audit-analytics utility, not a production synthetic-data platform. It is intentionally small, standard-library only, and suited to examples, workshops, and detector testing.
- Configurable transaction volume and date range.
- Recurring and random transaction generation.
- Benford-style first-digit amount distribution for normal random transactions.
- Configurable irregularities/anomalies.
- Separate irregularity ledger for validation and training labels.
- Optional
--seedargument for reproducible generated datasets. - Simple example analysis script for detecting selected irregularity patterns.
- Python 3.10 or higher.
- Runtime: Python standard library only.
- Tests:
pytest.
git clone https://github.com/busera/fake_transaction_data_generator.git
cd fake_transaction_data_generator
python -m pip install pytestGenerate the sample dataset:
python fake_generator.py -c config.json -o fake_transactions.csv -a irregularities.csv --seed 42Arguments:
-c/--config: path to the configuration file. Default:config.json.-o/--output: output CSV file for transactions. Default:fake_transactions.csv.-a/--anomalies: output CSV file for irregularities. Default:irregularities.csv.--seed: optional integer seed for reproducible output.
Run a basic detector example after generating the CSV:
python examples/analyze_irregularities.py fake_transactions.csvRun the tests:
python -m pytest -vColumns:
Transaction IDDateTypeAmountAccountDescriptionVendor
Columns:
Transaction IDIrregularity TypeDescription
The irregularities file acts as a label ledger for validating detection logic or training exercises.
config.json controls the generated period, volume, vendors, recurring transactions, and irregularity counts.
Example structure:
{
"num_transactions": 1000,
"start_date": "2023-01-01",
"end_date": "2023-12-31",
"irregularities": {
"high_amount": {"count": 10},
"frequency_change": {"count": 10},
"double_spend": {"count": 10},
"missing_id": {"count": 2},
"incorrect_date": {"count": 10},
"mismatched_description": {"count": 10},
"wrong_account": {"count": 10},
"personal_expense": {"count": 10},
"benford_violation": {"count": 10},
"subtle_skimming": {"count": 30},
"seasonal_anomaly": {"count": 10},
"round_number_bias": {"count": 10},
"cumulative_irregularity": {
"enabled": true,
"count": 36,
"threshold": 0.005
}
},
"vendors": ["ABC Office Supplies", "XYZ Tech Solutions", "123 Cleaning Services"],
"personal_vendors": ["Luxury Resort & Spa", "Designer Clothing Co.", "Gourmet Restaurant"],
"personal_expense_descriptions": ["Team Building Retreat", "Client Entertainment", "Office Decor"],
"recurring_transactions": [
{
"vendor": "City Power & Utilities",
"amount": 500,
"day": 15,
"description": "Monthly Utility Bill"
}
]
}high_amount: unusually large transaction amount.frequency_change: altered date for a recurring transaction.double_spend: duplicate transaction with a new ID and timestamp offset.missing_id: transaction with no transaction ID.incorrect_date: future-dated transaction.mismatched_description: description conflicts with transaction type.wrong_account: invalid account-number pattern.personal_expense: personal vendor/description disguised as business expense.benford_violation: amount distribution biased away from Benford's Law.subtle_skimming: small reductions across nearby transactions.seasonal_anomaly: seasonal mismatch, such as summer equipment in winter.round_number_bias: suspiciously round transaction amounts.cumulative_irregularity: small cumulative increases across multiple transactions.
This project is useful for:
- Audit analytics workshops and training.
- Testing anomaly-detection scripts.
- Demonstrating how labeled irregularities support validation.
- Practicing data analysis without exposing confidential financial records.
- It does not model a full accounting subledger or ERP export.
- Amounts are floats because this is synthetic training data, not financial posting logic.
- The included detector example is intentionally simple and rule-based.
- Generated patterns are educational examples, not proof of real-world fraud typologies.
MIT License. See LICENSE.