Digital payment systems face sophisticated fraud attacks where attackers drain accounts and disappear within seconds. Traditional rule-based systems (like threshold checks) fail because fraud is behavioural, not just transactional. This project analyses a large-scale financial transaction dataset (PaySim) using advanced SQL analytics and window functions to discover real fraud patterns and design a better detection strategy. Instead of simple counting queries, the project answers a business question:
How do fraudsters behave — and how can a bank stop them before money leaves the system?
The PaySim dataset simulates real mobile money transactions.
Each row represents one transaction with fields such as:
step→ time sequence of transactiontype→ transaction category (TRANSFER, CASH_OUT, PAYMENT, etc.)amount→ transaction valuenameOrig→ sender accountnameDest→ receiver accountoldbalanceOrg / newbalanceOrig→ sender balance before & afteroldbalanceDest / newbalanceDest→ receiver balance before & afterisFraud→ actual fraud labelisFlaggedFraud→ bank’s original detection rule
Dataset size: Millions of transactions (highly imbalanced fraud data)
- Measure the effectiveness of the bank’s current fraud detection rule
- Identify behavioural patterns of fraudsters
- Detect mule accounts and attack chains
- Analyse temporal fraud activity
- Build a rule-based fraud detection model using SQL only
- Provide actionable business recommendations
- MySQL
- Window Functions (LAG, LEAD, ROW_NUMBER, RUNNING SUM)
- Behavioural pattern detection
- Sequential transaction analysis
- Rule-based modelling
No machine learning was used — the focus is analytical thinking using SQL.
- Compared
isFlaggedFraudvsisFraud - Measured false positives and false negatives
Result:
The bank rule detects only a very small portion of actual frauds
Conclusion: The existing system relies on thresholds instead of behaviour
Fraud occurs almost exclusively in:
- TRANSFER
- CASH_OUT
Safe transaction types:
- PAYMENT
- DEBIT
- CASH_IN
Conclusion: Fraud is behavioural and targeted — not random
Using LEAD():
TRANSFER → CASH_OUT pattern discovered
Meaning:
Fraudsters immediately withdraw transferred funds to avoid reversal
Major discovery:
- Sender balance becomes zero after fraud
- Amount ≈ full account balance
- Receiver accounts initially empty
Conclusion: Attack pattern = account draining into mule accounts
Using partitioned window analysis:
- Many victims → one destination account
- Accounts receive repeated fraudulent transfers
Conclusion: Fraud network structure exists (hub-and-spoke model)
- Fraud occurs in bursts
- Multiple rapid transactions
- Accounts disappear after a few transactions
Conclusion: Fraud attacks are planned sessions, not random activity
Fraudster workflow:
- Compromise account
- Transfer full balance
- Send to the mule account
- Immediately cash out
- Abandon account
A high-risk transaction typically satisfies:
- Transaction type = TRANSFER or CASH_OUT
- Amount equals sender balance
- Sender balance becomes zero
- Followed by a cash withdrawal
This rule detects fraud far better than the bank’s original rule.
- Monitor behaviour, not just amount thresholds
- Flag accounts performing full-balance transfers
- Detect transfer → withdrawal sequences in real time
- Freeze accounts receiving funds from multiple senders
- Implement session-based monitoring instead of single-transaction checks
- Fraud detection is a behavioural problem, not a classification problem
- SQL window functions can detect fraud patterns effectively
- Simple rule-based logic can outperform naive detection rules
- Import dataset using
LOAD DATA INFILE - Run analysis queries from
/sql_queries - Review the behavioural detection section
This project demonstrates analytical thinking using SQL instead of machine learning, simulating a real-world banking fraud investigation workflow. The focus is on reasoning, pattern detection, and business interpretation — skills expected from a data analyst in financial risk analytics.
Abhishek Singh Data Analysis Portfolio Project