Skip to content

busera/fake_transaction_data_generator

Repository files navigation

Fake Transaction Data Generator

Overview

Prototype Python utility for generating fictional financial transaction data for audit analytics practice, fraud/anomaly detection exercises, and data science education.

The generator creates two CSV files:

  • fake_transactions.csv — generated transactions with normal and irregular records.
  • irregularities.csv — a ledger of injected irregularities and descriptions.

Maturity note: this is an educational audit-analytics utility, not a production synthetic-data platform. It is intentionally small, standard-library only, and suited to examples, workshops, and detector testing.

Features

  • Configurable transaction volume and date range.
  • Recurring and random transaction generation.
  • Benford-style first-digit amount distribution for normal random transactions.
  • Configurable irregularities/anomalies.
  • Separate irregularity ledger for validation and training labels.
  • Optional --seed argument for reproducible generated datasets.
  • Simple example analysis script for detecting selected irregularity patterns.

Requirements

  • Python 3.10 or higher.
  • Runtime: Python standard library only.
  • Tests: pytest.

Installation

git clone https://github.com/busera/fake_transaction_data_generator.git
cd fake_transaction_data_generator
python -m pip install pytest

Usage

Generate the sample dataset:

python fake_generator.py -c config.json -o fake_transactions.csv -a irregularities.csv --seed 42

Arguments:

  • -c / --config: path to the configuration file. Default: config.json.
  • -o / --output: output CSV file for transactions. Default: fake_transactions.csv.
  • -a / --anomalies: output CSV file for irregularities. Default: irregularities.csv.
  • --seed: optional integer seed for reproducible output.

Run a basic detector example after generating the CSV:

python examples/analyze_irregularities.py fake_transactions.csv

Run the tests:

python -m pytest -v

Output Files

Transactions CSV

Columns:

  • Transaction ID
  • Date
  • Type
  • Amount
  • Account
  • Description
  • Vendor

Irregularities CSV

Columns:

  • Transaction ID
  • Irregularity Type
  • Description

The irregularities file acts as a label ledger for validating detection logic or training exercises.

Configuration

config.json controls the generated period, volume, vendors, recurring transactions, and irregularity counts.

Example structure:

{
  "num_transactions": 1000,
  "start_date": "2023-01-01",
  "end_date": "2023-12-31",
  "irregularities": {
    "high_amount": {"count": 10},
    "frequency_change": {"count": 10},
    "double_spend": {"count": 10},
    "missing_id": {"count": 2},
    "incorrect_date": {"count": 10},
    "mismatched_description": {"count": 10},
    "wrong_account": {"count": 10},
    "personal_expense": {"count": 10},
    "benford_violation": {"count": 10},
    "subtle_skimming": {"count": 30},
    "seasonal_anomaly": {"count": 10},
    "round_number_bias": {"count": 10},
    "cumulative_irregularity": {
      "enabled": true,
      "count": 36,
      "threshold": 0.005
    }
  },
  "vendors": ["ABC Office Supplies", "XYZ Tech Solutions", "123 Cleaning Services"],
  "personal_vendors": ["Luxury Resort & Spa", "Designer Clothing Co.", "Gourmet Restaurant"],
  "personal_expense_descriptions": ["Team Building Retreat", "Client Entertainment", "Office Decor"],
  "recurring_transactions": [
    {
      "vendor": "City Power & Utilities",
      "amount": 500,
      "day": 15,
      "description": "Monthly Utility Bill"
    }
  ]
}

Irregularity Types

  • high_amount: unusually large transaction amount.
  • frequency_change: altered date for a recurring transaction.
  • double_spend: duplicate transaction with a new ID and timestamp offset.
  • missing_id: transaction with no transaction ID.
  • incorrect_date: future-dated transaction.
  • mismatched_description: description conflicts with transaction type.
  • wrong_account: invalid account-number pattern.
  • personal_expense: personal vendor/description disguised as business expense.
  • benford_violation: amount distribution biased away from Benford's Law.
  • subtle_skimming: small reductions across nearby transactions.
  • seasonal_anomaly: seasonal mismatch, such as summer equipment in winter.
  • round_number_bias: suspiciously round transaction amounts.
  • cumulative_irregularity: small cumulative increases across multiple transactions.

Purpose and Use Cases

This project is useful for:

  • Audit analytics workshops and training.
  • Testing anomaly-detection scripts.
  • Demonstrating how labeled irregularities support validation.
  • Practicing data analysis without exposing confidential financial records.

Limitations

  • It does not model a full accounting subledger or ERP export.
  • Amounts are floats because this is synthetic training data, not financial posting logic.
  • The included detector example is intentionally simple and rule-based.
  • Generated patterns are educational examples, not proof of real-world fraud typologies.

License

MIT License. See LICENSE.

About

This Python script generates a dataset of fake financial transactions, designed for audit training and testing purposes. It creates a CSV file containing a mix of normal transactions and various types of irregularities, ranging from simple anomalies to sophisticated patterns that require advanced analytical techniques to detect.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages