A privacy-preserving Android application that detects SMS-based financial fraud (smishing) in real time using a three-tier hybrid detection pipeline powered by on-device AI. All processing runs locally with zero cloud dependency.
- Key Metrics
- System Architecture
- Detection Pipeline
- Tier 1: NLP Engine
- Tier 2: Domain Intelligence Module
- Tier 3: Heuristic Rule Engine
- Risk Scoring Formula
- Auto-Block & Background Protection
- Data Models
- Project Structure
- Tech Stack
- Getting Started
- Testing
| Metric | Value |
|---|---|
| Processing Location | 100% on-device (Edge AI) |
| Average Latency | <100ms per message |
| Detection Accuracy | >96% (combined pipeline) |
| False Positive Rate | <0.8% |
| Model Size | 487 KB (8-bit quantized TFLite) |
| Battery Impact | <0.003% per message |
| Privacy | Zero data transmission |
| Offline Capability | Fully functional |
| Supported Platform | Android 8.0+ (API 26+) |
| APK Size | ~9 MB |
Incoming SMS (BroadcastReceiver / Telephony Plugin)
│
▼
┌─────────────────────────────────┐
│ Message Preprocessing │ ← Text normalization, URL extraction,
│ (MessagePreprocessor) │ tokenization, feature calculation
└─────────────┬───────────────────┘
│
┌─────────┼─────────┬──────────────┐
▼ ▼ ▼ ▼
┌────────┐ ┌────────┐ ┌────────┐ ┌──────────┐
│ TIER 1 │ │ TIER 2 │ │ TIER 3 │ │STRUCTURAL│
│ NLP │ │ DIM │ │ RULES │ │ FEATURES │
│ (40%) │ │ (30%) │ │ (20%) │ │ (10%) │
└───┬────┘ └───┬────┘ └───┬────┘ └────┬─────┘
│ │ │ │
└──────────┴──────────┴────────────┘
│
▼
┌─────────────────────────────────┐
│ Risk Scoring Engine │ ← Weighted combination: 0.4·NLP + 0.3·DIM
│ (RiskEngine) │ + 0.2·Rules + 0.1·Structural
└─────────────┬───────────────────┘
│
▼
┌─────────────────────────────────┐
│ Decision Engine │ ← SAFE (≤0.3) / SUSPICIOUS (0.3-0.6)
│ (Classification + Action) │ / FRAUD (>0.6)
└─────────────┬───────────────────┘
│
┌─────────┼─────────┬──────────────┐
▼ ▼ ▼ ▼
Notification SQLite UI Update Auto-Block
Alert Log (Stream) (if FRAUD)
│
▼
blocked_messages
(SQLite)
- SMS Reception —
SmsListenerintercepts via thetelephonyplugin. Both foreground and background handlers are registered withlistenInBackground: true. Background handler runs even when the app is closed or killed. - Preprocessing (<5ms) —
MessagePreprocessornormalizes text, extracts URLs via regex, tokenizes, and computes structural features (uppercase ratio, currency symbols, URL presence). - Parallel Analysis (30-50ms each) — Three tiers plus structural scoring run concurrently via
Future.wait(). - Risk Scoring (<5ms) —
RiskEnginecombines all four signals using the weighted formula. - Decision + Action (<5ms) — Classifies risk level, then:
- FRAUD → Auto-blocks message (saves to
blocked_messagestable), shows red high-priority notification - SUSPICIOUS → Shows orange warning notification
- SAFE → No action
- All results are logged to
fraud_logsand pushed to the dashboard stream.
- FRAUD → Auto-blocks message (saves to
| Scenario | Active Components | Accuracy |
|---|---|---|
| No Internet | NLP + Offline DIM (TLD/entropy/brand checks) + Rules + Structural | ~90% |
| Internet Available | NLP + Full DIM (WHOIS + DNS + redirects) + Rules + Structural | ~96% |
| No URL in Message | NLP + Rules + Structural (DIM skipped) | ~88% |
SMS Received
→ SmsListener._handleIncomingSms()
→ RiskEngine._ensureInitialized() // Lazy-load TFLite model
→ DatabaseService.isTrustedSender() // Skip if whitelisted
→ MessagePreprocessor.cleanText() // Lowercase, normalize whitespace
→ MessagePreprocessor.extractUrls() // Regex URL extraction
→ Future.wait([ // PARALLEL EXECUTION
_runNlpAnalysis(cleanedText), // Tier 1: TFLite inference
_runRuleAnalysis(cleanedText), // Tier 3: Keyword matching
_runDomainAnalysis(message, urls), // Tier 2: Full domain analysis
])
→ MessagePreprocessor.calculateStructuralScore() // Structural features
→ Weighted combination → finalScore
→ RiskThresholds.getRiskLevel(finalScore)
→ DatabaseService.logResult()
→ StreamController.add(result) // Push to UI
→ if FRAUD:
DatabaseService.blockMessage(result) // Auto-block
NotificationService.showFraudAlert() // Red notification
→ if SUSPICIOUS:
NotificationService.showSuspiciousAlert() // Orange notification
| Phase | Target | Component |
|---|---|---|
| SMS Interception | 5ms | SmsListener |
| Preprocessing | 5-10ms | MessagePreprocessor |
| NLP Inference | 40-50ms | NlpClassifier (TFLite) |
| Domain Intelligence | 30-50ms | DomainIntelligence |
| Rule Engine | 5-10ms | RuleEngine |
| Risk Scoring | <5ms | RiskEngine |
| Total | 80-120ms | End-to-end |
Primary Model: Quantized MobileBERT variant
- Size: 487 KB (8-bit quantized)
- Input: 128 tokens max (padded/truncated)
- Output: Single fraud probability float (0.0–1.0)
- Runtime: TensorFlow Lite via
tflite_flutterpackage - Threads: 4 (parallel inference)
Fallback Model: Keyword-weighted scorer
- 27 fraud indicator keywords with individual weights (0.45–0.85)
- Activated when TFLite model fails to load or inference exceeds timeout
- Average score with multi-match boost:
avgScore + (matchCount - 1) * 0.05
Raw Text
→ _preprocessText() // Lowercase, strip non-alphanumeric
→ _tokenize() // Map words to vocabulary indices
→ Pad to 128 tokens // Zero-pad or truncate
→ Interpreter.run() // TFLite forward pass
→ Clamp output to [0.0, 1.0] // Final fraud probability
Loaded from assets/nlp/vocabulary.json — a Map<String, int> mapping words to token indices. Unknown words are silently skipped during tokenization.
| Keyword | Weight | Keyword | Weight |
|---|---|---|---|
| disconnected | 0.85 | kyc | 0.75 |
| suspended | 0.78 | lottery | 0.75 |
| deactivated | 0.75 | blocked | 0.72 |
| immediately | 0.72 | urgent | 0.70 |
| expired | 0.70 | verify | 0.65 |
| aadhaar | 0.65 | otp | 0.65 |
| Parameter | Value |
|---|---|
| Training Data | ~16,000 messages (UCI SMS Spam + Mendeley + Financial Fraud) |
| Distribution | ~40% fraud, ~60% legitimate |
| Features | TF-IDF (5000 features, unigram + bigram) |
| Classifier | Logistic Regression (liblinear, balanced class weight) |
| Test Accuracy | 91.2% |
| Precision | 87.8% |
| Recall | 88.5% |
| F1-Score | 88.1% |
The DIM analyzes every URL found in a message through a five-check pipeline plus additional signals. All URL analyses run in parallel via Future.wait().
URL extracted from message
→ UrlResolver.resolveRedirectChain() // Follow redirects (max 8 hops)
→ [In parallel after final domain known:]
DnsAnalyzer.analyze() // DNS-over-HTTPS via Cloudflare
WhoisAnalyzer.lookup() // RDAP/WHOIS domain age
→ DomainScorer.calculate() // Combine all signals → 0-100 score
| # | Signal | Max Points | Severity | Description |
|---|---|---|---|---|
| 1 | Redirect Chain | 25 | Medium-High | 8 pts per redirect hop. Phishing sites use chains to evade detection. |
| 2 | TLD Reputation | 25 | High | Flags .xyz, .tk, .ml, .cf, .ga, .gq, .pw, .buzz, .click, .link, .stream, .download, .loan, .win, .review, .party |
| 3 | Domain Age | 35 | Critical | Very new (<7 days) = 35pts, new (<30 days) = 22pts. Phishing domains are typically hours old. |
| 4 | Shannon Entropy | 18 | Medium | Entropy > 3.8 indicates algorithmically generated domain names (DGA). Formula: H = -Σ p(x) log₂ p(x) |
| 5 | Brand Impersonation | 30 | Critical | Checks if domain contains trusted brand names (SBI, HDFC, ICICI, Airtel, Jio, Paytm, etc.) on a non-official domain. |
| 6 | Homograph Attack | 45 | Critical | Detects Cyrillic/Greek Unicode lookalike characters mixed with Latin script. Punycode (xn--) = +20pts, mixed scripts = +25pts. |
| 7 | Typosquatting | 30+20 | Critical | Levenshtein distance against 22 legitimate Indian financial/utility domains. Similarity >75% = +30pts. Additional hyphen analysis = +7pts per excess hyphen. |
| 8 | DNS Profile | 47 | Medium-High | Missing MX/SPF/DMARC records = +15pts. Free/bulletproof hosting = +20pts. Suspicious nameservers = +12pts. |
| 9 | Direct IP / Shortener | 40/15 | Critical/Medium | IP address URL = +40pts. URL shortener (bit.ly, tinyurl, t.co, etc.) = +15pts. |
Final domain score: Clamped to 0–100, normalized to 0.0–1.0 for the risk formula.
// Character-level scan for confusable Unicode
for (final rune in domain.runes) {
if (_homoglyphs.containsKey(rune)) hasHomoglyph = true; // Cyrillic а,е,о,р,с...
if (rune in Latin range) hasLatin = true;
}
// Mixed scripts → strongest signal
if (hasHomoglyph && hasLatin) → +25 points
if (hasHomoglyph only) → +15 points
if (domain.contains('xn--')) → +20 points (Punycode)Covers 19 confusable characters across Cyrillic (а, е, о, р, с, у, х, і, ѕ, ј, һ, в) and Greek (ο, ν, τ, α, ε, κ, ι).
Compares the URL domain against 22 legitimate Indian financial domains:
sbi.co.in, onlinesbi.sbi, hdfcbank.com, icicibank.com,
axisbank.com, kotak.com, yesbank.in, bankofbaroda.in,
pnbindia.in, adanielectricity.com, tatpower.com,
bescom.co.in, mahadiscom.in, airtel.in, jio.com,
paytm.com, phonepe.com, googlepay.com, amazonpay.in,
irctc.co.in, incometax.gov.in, epfindia.gov.in
Uses an optimized two-row DP implementation of edit distance. Similarity threshold: >75% match on a non-identical domain triggers +30pts.
Results are cached in SQLite (domain_cache table) for 24 hours to avoid redundant lookups. Cache is keyed by domain name and stores the phishing boolean and score.
| Category | Keywords (with point weights) |
|---|---|
| Urgency (10-25 pts) | urgent, immediately, today, within hours, last chance, expire, final notice, act now, right now |
| Payment (8-20 pts) | pay, payment, bill, amount due, outstanding, transfer, upi, rupees |
| Threat (15-25 pts) | disconnect, disconnected, suspend, block, deactivate, terminated, cut off, service stopped |
| Verification (12-20 pts) | verify, kyc, update details, confirm your, validate, authenticate |
| Combination | Bonus Points |
|---|---|
| Urgency + Payment + Threat | +40 (Critical) |
| Urgency + Payment | +20 |
Score clamped to 0–100, normalized to 0.0–1.0 for the risk formula. Also returns a list of triggered rules with category labels for display in the alert detail screen.
FinalScore = 0.40 × P_nlp + 0.30 × S_domain/100 + 0.20 × R_rules/100 + 0.10 × F_structural
Where:
- P_nlp (0.0–1.0) — NLP model fraud probability
- S_domain (0–100) — Domain intelligence score, normalized to 0–1
- R_rules (0–100) — Rule engine score, normalized to 0–1
- F_structural (0.0–1.0) — Structural features score
| Feature | Points | Condition |
|---|---|---|
| URL present | +0.30 | Any URL in message |
| High uppercase | +0.20 | >30% uppercase characters |
| Currency symbols | +0.15 | Contains ₹, Rs, INR, rupee |
| Short + URL | +0.20 | Message < 100 chars with URL |
| Multiple exclamation marks | +0.15 | Two or more consecutive ! |
Output clamped to 0.0–1.0.
- 40% NLP — Captures semantic intent; most important for text-only analysis
- 30% Domain — Infrastructure analysis; critical for URL-based fraud
- 20% Rules — Pattern matching; high precision for known fraud templates
- 10% Structural — Auxiliary signal; quick computation, detects formatting anomalies
| Risk Score | Classification | User Action | Notification | Auto-Block |
|---|---|---|---|---|
| 0.00 – 0.30 | SAFE | None | None | No |
| 0.31 – 0.60 | SUSPICIOUS | Warning banner | Orange alert | No |
| 0.61 – 1.00 | FRAUD | High alert + blocked | Red alert (high priority) | Yes |
Message: "Your HDFC account blocked. Verify now: http://hdfc-verify-new.tk"
Tier 1 (NLP): P_nlp = 0.89
Tier 2 (DIM): S_domain = 95 (new .tk domain + brand impersonation)
Tier 3 (Rules): R_rules = 95 ("blocked" + "verify" pattern)
Structural: F_structural = 0.70 (has URL, moderate urgency)
FinalScore = 0.40(0.89) + 0.30(0.95) + 0.20(0.95) + 0.10(0.70)
= 0.356 + 0.285 + 0.190 + 0.070
= 0.901
Classification: FRAUD (>0.60)
The app uses the telephony plugin's listenInBackground: true mode combined with a @pragma('vm:entry-point') background handler. This ensures SMS analysis runs even when:
- The app is minimized / in the background
- The app has been swiped away from recents
- The device screen is off
Background SMS Received
→ _backgroundMessageHandler() // @pragma('vm:entry-point')
→ NotificationService.initialize() // Re-init (new isolate)
→ RiskEngine().analyzeMessage() // Full 3-tier pipeline
→ if FRAUD:
DatabaseService.blockMessage() // Save to blocked_messages table
NotificationService.showFraudAlert("Blocked message from: ...")
→ if SUSPICIOUS:
NotificationService.showSuspiciousAlert()
When any message scores >0.60 (FRAUD classification), the system automatically:
- Blocks the message — saves complete message data (sender, text, scores, URLs, rules) to the
blocked_messagesSQLite table - Sends a notification — high-priority red notification: "Blocked message from: [sender]"
- Logs the analysis — full detection result saved to
fraud_logsfor history
This happens identically in both foreground (via onSmsAnalyzed callback) and background (via _backgroundMessageHandler).
The app provides a dedicated Blocked Messages screen accessible from the dashboard:
- Dashboard banner — red gradient card showing blocked count, taps through to blocked list
- Blocked list — each blocked message shows:
- Sender + timestamp + risk score badge
- Full message text with URLs displayed struck-through (disabled links)
- Details button — bottom sheet with full score breakdown (NLP, Domain, Rules)
- Unblock button — removes message from blocked list
- Clear All — bulk delete all blocked messages
CREATE TABLE blocked_messages (
id INTEGER PRIMARY KEY AUTOINCREMENT,
sender TEXT,
message TEXT, -- Full original message text
risk_score REAL, -- 0.0 – 1.0
urls TEXT, -- Comma-separated detected URLs
rules TEXT, -- Pipe-separated triggered rules
domain_score INTEGER, -- 0 – 100
nlp_score REAL, -- 0.0 – 1.0
blocked_at TEXT -- ISO 8601 timestamp
);| Permission | Purpose |
|---|---|
RECEIVE_SMS |
Intercept SMS in real time (foreground + background) |
READ_SMS |
Read message content for analysis |
POST_NOTIFICATIONS |
Display fraud/suspicious alerts when app is backgrounded |
READ_PHONE_STATE |
Identify sender in background processing |
The telephony plugin automatically registers its own BroadcastReceiver for SMS_RECEIVED intents, so no custom receiver is needed in AndroidManifest.xml.
class SmsAnalysisResult {
final String originalMessage; // Raw SMS text
final String sender; // Phone number or alphanumeric ID
final String riskLevel; // SAFE / SUSPICIOUS / FRAUD
final double riskScore; // 0.0 – 1.0
final List<String> detectedUrls;
final List<String> triggeredRules;
final int domainScore; // 0 – 100
final double nlpScore; // 0.0 – 1.0
final DateTime timestamp;
final Map<String, dynamic> explanation; // Full score breakdown
}class DomainScore {
final String domain;
final String originalUrl;
final String finalUrl; // After redirect resolution
final int score; // 0 – 100
final List<ScoringIndicator> indicators; // Detailed findings
final List<String> ipAddresses;
final String registrar;
final String domainAge;
final DateTime? createdDate;
final List<String> nameservers;
}-- Detection logs
CREATE TABLE fraud_logs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
sender TEXT,
risk_level TEXT, -- SAFE / SUSPICIOUS / FRAUD
risk_score REAL, -- 0.0 – 1.0
urls TEXT, -- Comma-separated
rules TEXT, -- Pipe-separated
domain_score INTEGER, -- 0 – 100
nlp_score REAL, -- 0.0 – 1.0
timestamp TEXT -- ISO 8601
);
-- Whitelisted senders (bypass analysis)
CREATE TABLE trusted_senders (
id INTEGER PRIMARY KEY AUTOINCREMENT,
sender TEXT UNIQUE,
added_at TEXT
);
-- Domain analysis cache (24-hour TTL)
CREATE TABLE domain_cache (
id INTEGER PRIMARY KEY AUTOINCREMENT,
domain TEXT UNIQUE,
is_phishing INTEGER,
score INTEGER,
cached_at TEXT
);
-- Auto-blocked fraud messages
CREATE TABLE blocked_messages (
id INTEGER PRIMARY KEY AUTOINCREMENT,
sender TEXT,
message TEXT,
risk_score REAL,
urls TEXT,
rules TEXT,
domain_score INTEGER,
nlp_score REAL,
blocked_at TEXT
);lib/
├── main.dart # App entry, service initialization
├── core/
│ ├── constants/
│ │ ├── app_colors.dart # Color palette (Material 3)
│ │ ├── app_strings.dart # All UI strings
│ │ └── risk_thresholds.dart # Weights (40/30/20/10), thresholds
│ ├── theme/
│ │ └── app_theme.dart # Material 3 theme configuration
│ └── utils/
│ ├── extensions.dart # DateTime.timeAgo, riskColor, etc.
│ └── logger.dart # Tagged logging (NLP/DIM/Rule/SMS)
├── models/
│ ├── sms_analysis_result.dart # Core result model + toMap()
│ ├── domain_score.dart # Domain score model
│ └── nlp_result.dart # NLP result model
├── services/
│ ├── sms_listener.dart # SMS interception (foreground + background)
│ ├── risk_engine.dart # Orchestrator: parallel analysis + scoring
│ ├── rule_engine.dart # Keyword categories + combo bonuses
│ ├── preprocessor.dart # Text cleaning, URL extraction, structural score
│ ├── database_service.dart # SQLite CRUD (logs, trusted, cache, blocked)
│ ├── notification_service.dart # Local notifications (fraud + suspicious)
│ └── domain_intelligence/
│ ├── domain_intelligence.dart # DIM orchestrator (parallel URL analysis)
│ ├── url_resolver.dart # HTTP redirect chain follower (max 8 hops)
│ ├── dns_analyzer.dart # Cloudflare DNS-over-HTTPS queries
│ ├── domain_scorer.dart # 9-signal scorer + homograph + Levenshtein
│ └── whois_analyzer.dart # RDAP/WHOIS domain age lookups
├── nlp/
│ └── nlp_classifier.dart # TFLite inference + keyword fallback
└── ui/
├── dashboard_screen.dart # Stats, pie chart, blocked banner, recent alerts
├── alert_screen.dart # Score breakdown, URLs, rules, actions
├── blocked_screen.dart # Auto-blocked fraud messages with unblock
├── history_screen.dart # Full log with risk-level filters
├── settings_screen.dart # Trusted senders, scan inbox, data mgmt
└── widgets/
├── risk_badge.dart # Color-coded risk level badge
└── stat_card.dart # Statistics card widget
assets/
├── models/
│ └── fraud_model.tflite # Quantized TFLite model (487 KB)
├── nlp/
│ ├── vocabulary.json # Word → token index mapping
│ ├── model_weights.json # Model weight parameters
│ └── nlp_config.json # NLP configuration
└── data/
├── keywords.json # Fraud keyword database
└── tld_list.json # Suspicious TLD list
test/
├── widget_test.dart # App load test
├── rule_engine_test.dart # Keyword detection + combo tests
├── preprocessor_test.dart # URL extraction, cleaning, structural score
└── risk_thresholds_test.dart # Weight sum, normalization, classification
| Component | Technology | Version |
|---|---|---|
| Framework | Flutter | >=3.0.0 |
| Language | Dart | >=3.0.0 |
| ML Runtime | TensorFlow Lite | tflite_flutter ^0.12.1 |
| SMS Access | Telephony plugin | telephony ^0.2.0 |
| Database | SQLite | sqflite ^2.3.0 |
| DNS Queries | Cloudflare DNS-over-HTTPS | http ^1.2.0 |
| WHOIS | RDAP.org API | http ^1.2.0 |
| Notifications | Flutter Local Notifications | ^17.0.0 |
| Permissions | Permission Handler | ^11.3.0 |
| Charts | fl_chart | ^0.68.0 |
| Fonts | Google Fonts | ^6.2.1 |
| Date Formatting | intl | ^0.19.0 |
- Flutter SDK >= 3.0.0
- Android NDK 27.0.12077973
- Android device or emulator (API 26+)
- Java 17
# Clone the repository
git clone https://github.com/yourusername/smisdroid.git
cd smisdroid
# Install dependencies
flutter pub get
# Run on connected device
flutter run| Permission | Purpose |
|---|---|
RECEIVE_SMS |
Intercept incoming SMS in real time |
READ_SMS |
Access SMS inbox for batch scanning |
SEND_SMS |
Reserved for future features |
INTERNET |
Domain intelligence (WHOIS, DNS) |
ACCESS_NETWORK_STATE |
Detect online/offline mode |
POST_NOTIFICATIONS |
Display fraud/suspicious alerts |
READ_PHONE_STATE |
Sender identification |
flutter build apk --releaseflutter test| Test File | Coverage Area | Tests |
|---|---|---|
rule_engine_test.dart |
Keyword detection, combo bonuses, edge cases | 8 |
preprocessor_test.dart |
URL extraction, text cleaning, structural scoring | 11 |
risk_thresholds_test.dart |
Weight validation, normalization, classification | 10 |
widget_test.dart |
App initialization | 2 |
| Total | 31 |
| Message | Expected | Score | Action |
|---|---|---|---|
| "Your electricity bill Rs.2500 due immediately! Pay at http://bill-pay.xyz or disconnection TODAY!" | FRAUD | >0.80 | Auto-blocked + notification |
| "Dear Customer, Rs.5000 debited from account. Bal: Rs.25000. -ICICI" | SAFE | <0.30 | None |
| "Your account is blocked. Verify at http://verify.xyz" | FRAUD | >0.70 | Auto-blocked + notification |
| "Update KYC details urgently" | SUSPICIOUS | 0.3-0.6 | Warning notification |
| "Meeting at 3pm tomorrow. See you there!" | SAFE | <0.10 | None |
- Jisasoftech. (2025). How India's fintech fraud patterns are evolving in 2025.
- TensorFlow. TensorFlow Lite for mobile machine learning. https://www.tensorflow.org/lite
- Android Developers. SMS best practices and implementation. https://developer.android.com/guide/topics/sms
- NIST. Cybersecurity framework for phishing prevention. https://www.nist.gov
- UCI Machine Learning Repository. SMS Spam Collection Dataset.
- Cloudflare. DNS over HTTPS. https://developers.cloudflare.com/1.1.1.1/dns-over-https
Version: 1.0.0 License: MIT