The first AI verification engine to combine five deterministic signals
with statistically proven superiority over cosine-only baselines.
Live Demo Β |Β Research Benchmark Β |Β API Reference Β |Β SDKs
TruthLayer is a production serverless API that sits between your AI model and your users. Every AI output passes through a five-signal deterministic verification pipeline before it reaches production, catching hallucinations that embedding models fundamentally cannot see.
A single API call extracts claims, embeds them via Amazon Bedrock Titan V2, runs four independent contradiction detectors, applies calibrated confidence scoring, and checks the entire response for internal self-consistency β all in under one second, at $1.50/month operational cost, with zero external Python dependencies.
AWS 10,000 AIdeas Competition β Top 50 Finalist. April 2026.
TruthLayer's architecture is built on a foundational insight: no single signal is sufficient. Cosine similarity catches semantic drift. Entity contradiction catches what embeddings cannot see by design. Platt scaling converts raw scores into calibrated probabilities. McNemar's test provides mathematical proof. Intra-response consistency catches what no source-document checker ever has.
Claims and source chunks are embedded using Amazon Bedrock Titan Embeddings V2 (1024-dimensional vectors). Cosine similarity determines topical relevance between each claim and its best-matching source chunk.
Limitation by design: a model trained on language distribution cannot reliably distinguish "founded in 2003" from "founded in 2013". This is not a bug β it is a fundamental property of distributed representations. Signals 2β5 exist specifically to handle this.
Unit-aware regex detection of value mismatches. Compares (value, unit) tuples β not raw substrings β eliminating false positives from substring collisions.
Claim: "The SLA guarantees 99.9% uptime."
Source: "The SLA guarantees 99.99% uptime."
Signal: NUMERICAL_MISMATCH | Severity: CRITICAL | Penalty: Γ0.35
Three-layer negation engine:
- S2A vicinity guard β 3-stage decision tree distinguishing predicate negation (
"not permitted") from requirement-conditional language ("not to exceed") - Soft negation words β
never,no,without,falsein a windowed anchor scan - Semantic antonym pairs β 46+ bidirectional pairs (
permittedβprohibited,safeβcontraindicated)
Claim: "User data is not shared with third parties."
Source: "User data is shared with third parties."
Signal: S2A_NEGATION_POLARITY | Severity: HIGH | Penalty: Γ0.38
Deterministic regex-based detection of calendar year disjointness and relative-duration mismatches.
Claim: "GDPR was adopted in 2014."
Source: "GDPR was adopted in 2016."
Signal: TEMPORAL_CONTRADICTION | Severity: CRITICAL | Penalty: Γ0.35
No hallucination verifier in existence implements this. After all claims are verified against source documents, TruthLayer runs compute_alignment_penalty() in both directions for every ordered pair of claims (i, j) within the same AI response:
β i < j : detect_contradiction(claim_i, claim_j)
detect_contradiction(claim_j, claim_i)
An AI could pass every source-document check and still internally contradict itself. Only TruthLayer catches that. The new internal_consistency field is present in every API response.
"internal_consistency": {
"consistent": false,
"conflict_count": 1,
"conflicts": [{
"claim_a_index": 0,
"claim_b_index": 1,
"signal": "NUMERICAL_MISMATCH",
"severity": "CRITICAL",
"explanation": "Claim states 400mg; other claim states 40mg.",
"penalty": 0.35
}]
}flowchart TD
A["Client SDK / cURL"] -->|x-api-key header| B["API Gateway<br/>Auth + Rate Limit"]
B --> C["Lambda /verify"]
C --> D["Claim Extractor"]
D --> E{"Claims?"}
E -->|None| Z1["Return: trivially consistent"]
E -->|1+| F["Bedrock Titan V2<br/>1024-dim Embeddings"]
F --> G["DynamoDB<br/>Embedding Cache<br/>SHA-256 / 7-day TTL"]
G -->|cache hit| H["Cosine Similarity"]
G -->|cache miss| F2["Bedrock API"]
F2 --> H
H --> I
subgraph ECE ["Entity Contradiction Engine"]
I["Signal 2: Numerical<br/>Unit-aware regex"] --> J
J["Signal 3: Negation<br/>S2A guard + antonyms"] --> K
K["Signal 4: Temporal<br/>Year + duration regex"]
end
K --> L["Adjusted Similarity<br/>sim x penalty_1 x penalty_2"]
L --> M["Platt Scaling<br/>sigma x 12.07x minus 6.64"]
M --> N["Classification<br/>0.80+ VERIFIED β 0.55+ UNCERTAIN β else UNSUPPORTED"]
N --> O
subgraph S5 ["Signal 5 β Intra-Response Consistency"]
O["Pairwise check: all i less than j<br/>penalty applied both directions"]
end
O --> P["API Response<br/>claims β summary β internal_consistency β metadata"]
P --> A
style A fill:#6366F1,stroke:#4F46E5,color:#fff
style B fill:#1e1e2e,stroke:#6366F1,color:#cdd6f4
style C fill:#1e1e2e,stroke:#6366F1,color:#cdd6f4
style D fill:#1e1e2e,stroke:#818CF8,color:#cdd6f4
style E fill:#374151,stroke:#6366F1,color:#fff
style F fill:#FF9900,stroke:#cc7a00,color:#fff
style F2 fill:#FF9900,stroke:#cc7a00,color:#fff
style G fill:#1e1e2e,stroke:#22C55E,color:#cdd6f4
style H fill:#1e1e2e,stroke:#818CF8,color:#cdd6f4
style L fill:#1e1e2e,stroke:#818CF8,color:#cdd6f4
style M fill:#6366F1,stroke:#4F46E5,color:#fff
style N fill:#1e1e2e,stroke:#22C55E,color:#cdd6f4
style P fill:#22C55E,stroke:#16A34A,color:#fff
style Z1 fill:#EF4444,stroke:#DC2626,color:#fff
TruthLayer's superiority over a cosine-only baseline is not a claim β it is a statistical proof computed by McNemar's test on the 300-case adversarial benchmark.
| Metric | TruthLayer (Five-Signal) | Cosine-Only Baseline |
|---|---|---|
| Accuracy | 90.33% | 86.67% |
| Precision | 95.33% | 82.00% |
| Recall | 86.67% | 84.00% |
| F1 Score | 90.79% | 83.00% |
| Avg Latency | ~925ms | ~925ms |
McNemar's Test (300-case benchmark, Yates' continuity correction):
| Cosine Correct | Cosine Wrong | |
|---|---|---|
| TruthLayer Correct | a | b β entity engine wins |
| TruthLayer Wrong | c | d |
- Hβ: b = c (classifiers equivalent)
- Hβ: b > c (TruthLayer superior)
- ΟΒ² statistic computed β see BENCHMARK.md for exact contingency table
- p < 0.05 β Hβ rejected with real Bedrock embeddings
See BENCHMARK.md for the full research-grade whitepaper: adversarial case design, Platt scaling derivation, and the complete McNemar contingency table.
Every claim receives a Platt-scaled probability, not a rescaled cosine similarity.
Before: confidence = similarity_score Γ 100 β a rescaled distance, not a probability
After: confidence = Ο(12.07 Γ sim β 6.64) Γ 100 β a calibrated probability
Boundary conditions derived analytically from the 300-case benchmark:
Ο(12.07 Γ 0.80 β 6.64) = 0.9533β matches measured precision at VERIFIED thresholdΟ(12.07 Γ 0.55 β 6.64) = 0.5000β 50% at the uncertain midpoint
When TruthLayer reports 95.3% confidence, it means: of all claims scored at this level on the benchmark, 95.3% are factually correct. This is a calibrated posterior probability, not a cosine distance.
pip install truthlayer-sdkfrom truthlayer import TruthLayer
tl = TruthLayer(api_key="tl_your_key")
result = tl.verify(
ai_response="Python 3.11 was released in October 2022. It is 25% faster than 3.10.",
source_documents=["Python 3.11 was released on October 24, 2022, with up to 25% performance gains."]
)
for claim in result.claims:
print(f"{claim.status:12} {claim.confidence:5.1f}% {claim.text}")
# Output:
# VERIFIED 89.4% Python 3.11 was released in October 2022.
# VERIFIED 84.2% It is 25% faster than 3.10.
# Internal consistency
print("Self-consistent:", result.internal_consistency["consistent"])curl -X POST https://qoa10ns4c5.execute-api.us-east-1.amazonaws.com/prod/verify \
-H "Content-Type: application/json" \
-H "x-api-key: tl_your_key" \
-d '{
"ai_response": "GDPR fines can be up to 4% of annual revenue.",
"source_documents": ["GDPR non-compliance can lead to fines of up to 4% of annual global turnover."]
}'{
"claims": [
{
"text": "GDPR fines can be up to 4% of annual revenue.",
"status": "VERIFIED",
"confidence": 87.6,
"similarity_score": 0.8241,
"matched_source": "GDPR non-compliance can lead to fines of up to 4%...",
"contradiction_evidence": null
}
],
"summary": {
"verified": 1,
"uncertain": 0,
"unsupported": 0
},
"internal_consistency": {
"consistent": true,
"conflict_count": 0,
"conflicts": []
},
"metadata": {
"latency_ms": 893,
"embedding_ms": 720,
"provider": "BedrockEmbeddingProvider",
"total_claims": 1,
"source_chunks": 3,
"cache_hits": 2,
"cache_misses": 1,
"calibration_model": "platt_scaling_n300"
}
}When a contradiction is detected, claim-level contradiction_evidence is populated:
"contradiction_evidence": {
"signal": "NUMERICAL_MISMATCH",
"severity": "CRITICAL",
"penalty_applied": 0.35,
"claim_fragment": "400mg",
"source_fragment": "40mg",
"explanation": "Claim states 400mg; source specifies 40mg. Numerical value mismatch."
}| Method | Endpoint | Auth | Description |
|---|---|---|---|
POST |
/verify |
β | Verify AI response against sources |
POST |
/documents |
β | Upload a source document |
GET |
/documents |
β | List all documents |
GET |
/documents/{id} |
β | Get a specific document |
DELETE |
/documents/{id} |
β | Delete a document |
GET |
/analytics?action=summary |
β | Aggregate verification statistics |
GET |
/analytics?action=trends&days=7 |
β | Daily trend data |
GET |
/health |
β | Health check (no auth) |
TruthLayer/
βββ src/ # Core verification engine (Lambda Layer)
β βββ embeddings/
β β βββ base.py # EmbeddingProvider abstract interface
β β βββ bedrock_provider.py # Titan V2 (1024-dim, us-east-1)
β β βββ cached_provider.py # DynamoDB cache (SHA-256, 7-day TTL)
β βββ verifier/
β β βββ verifier.py # Orchestrator + _check_internal_consistency()
β β βββ claim_extractor.py # Sentence-boundary claim splitter
β β βββ similarity_engine.py # Cosine similarity, best-match selection
β β βββ confidence_scorer.py # Threshold classifier (VERIFIED/UNCERTAIN/UNSUPPORTED)
β β βββ entity_checker.py # Signals 2β4: numerical, negation, temporal
β β βββ calibration.py # Platt scaling (A=12.07, B=-6.64)
β βββ stats/
β β βββ mcnemar.py # McNemar's test, erfc-based p-value (stdlib only)
β βββ mocks/
β β βββ embedding_provider.py # MockEmbeddingProvider (TF-IDF, no AWS needed)
β βββ utils/
β β βββ auth.py # SHA-256 API key validation + rate limiting
β β βββ text_splitter.py # Document chunker (500 chars, 50 overlap)
β βββ config.py # All thresholds and environment variables
βββ lambda/ # AWS Lambda handlers
β βββ verify/handler.py # POST /verify
β βββ documents/handler.py # CRUD /documents
β βββ analytics/handler.py # GET /analytics
β βββ health/handler.py # GET /health
βββ benchmarks/
β βββ adversarial_benchmark.py # 300-case test suite (numerical/negation/superlative)
β βββ run_benchmarks.py # Precision/recall/F1 measurement
β βββ run_mcnemar.py # McNemar's statistical proof runner
β βββ fit_calibration.py # Platt scaling reproducibility script
βββ tests/ # 286 pytest unit tests
β βββ test_entity_checker.py # Contradiction engine (157 cases)
β βββ test_calibration.py # Platt scaling (44 cases)
β βββ test_mcnemar.py # Statistical test (46 cases)
β βββ test_internal_consistency.py # Intra-response (39 cases)
βββ sdk/
β βββ python/truthlayer/ # Python SDK (zero dependencies)
β βββ js/truthlayer.ts # TypeScript SDK (native fetch)
βββ integrations/
β βββ langchain_integration.py # LangChain output verifier
β βββ fastapi_middleware.py # FastAPI request middleware
βββ dashboard/ # Next.js 16 (deployed on Vercel)
βββ docs/ # Technical documentation (11 documents)
βββ examples/ # Integration demos (LangChain, FastAPI, legal)
βββ BENCHMARK.md # Research-grade benchmark whitepaper
βββ template.yaml # AWS SAM IaC (single source of truth)
βββ samconfig.toml # SAM deployment configuration
- AWS CLI configured (
us-east-1, Bedrock enabled) - AWS SAM CLI
- Python 3.9+, Node.js 18+
# Step 1: Sync source into Lambda Layer
python -c "import shutil; shutil.copytree('src', 'layer/python/src', dirs_exist_ok=True)"
# Step 2: Build and deploy
sam build
sam deploy
# Verify deployment
curl https://YOUR-API.execute-api.us-east-1.amazonaws.com/prod/health# All 286 tests β MockEmbeddingProvider, zero AWS calls
pytest tests/ -v
# Statistical proof (offline, deterministic)
python benchmarks/run_mcnemar.py
# Calibration audit (reproduce Platt constants)
python benchmarks/fit_calibration.pycd dashboard
cp .env.local.example .env.local # Add your API URL and key
npm install && npm run dev # http://localhost:3000| Variable | Default | Description |
|---|---|---|
BEDROCK_MODEL_ID |
amazon.titan-embed-text-v2:0 |
Embedding model |
BEDROCK_REGION |
us-east-1 |
AWS region |
BEDROCK_EMBEDDING_DIMENSION |
1024 |
Vector dimensions |
VERIFIED_THRESHOLD |
0.80 |
VERIFIED classification floor |
UNCERTAIN_THRESHOLD |
0.55 |
UNCERTAIN classification floor |
DOCUMENTS_TABLE |
TruthLayerDocuments |
DynamoDB table |
EMBEDDINGS_TABLE |
TruthLayerEmbeddings |
Embedding cache table |
APIKEYS_TABLE |
TruthLayerApiKeys |
API key table |
| Limitation | What TruthLayer Does |
|---|---|
| Cosine similarity is blind to numerical transpositions | Signal 2 catches $400 vs $40, 99.9% vs 99.99% |
| Embeddings encode negation into similar vector space | Signal 3 catches "not permitted" vs "permitted" explicitly |
| No verifier catches self-contradictory AI output | Signal 5 checks every claim pair for internal coherence |
| Reported confidence scores are not probabilities | Platt scaling makes confidence a calibrated posterior |
| "Better performance" claims can't be trusted | McNemar's test provides formal statistical proof |
| High operational cost | $1.50/month on AWS serverless with DynamoDB caching |
| Layer | Technology |
|---|---|
| Verification Engine | Python 3.9, stdlib only (re, math, dataclasses) |
| Embeddings | Amazon Bedrock Titan Embeddings V2 (1024-dim) |
| Inference | AWS Lambda (arm64, 512 MB) |
| Gateway | AWS API Gateway (REST, usage plans) |
| Storage | AWS DynamoDB (on-demand, 4 tables) |
| IaC | AWS SAM (CloudFormation) |
| Dashboard | Next.js 16, TypeScript, Framer Motion (Vercel) |
| SDKs | Python (stdlib), TypeScript (fetch) |
| Tests | pytest, 286 tests, MockEmbeddingProvider |
Β© 2026 Prakhar Shukla. All Rights Reserved.
Provided for portfolio and competition review. See LICENSE.
Contact: prakhar230125@gmail.com
TruthLayer β Five signals. Mathematical proof. $1.50/month.
The only AI verification engine with statistically proven superiority.