From db9891c1473abfa4f95c0acbca52bfeecaf8f141 Mon Sep 17 00:00:00 2001 From: EgonBot Date: Wed, 25 Feb 2026 18:16:25 +0000 Subject: [PATCH 1/2] docs: add domain profile schema --- docs/domain-profiles/domain-profile-schema.md | 165 ++++++++++++++++++ 1 file changed, 165 insertions(+) create mode 100644 docs/domain-profiles/domain-profile-schema.md diff --git a/docs/domain-profiles/domain-profile-schema.md b/docs/domain-profiles/domain-profile-schema.md new file mode 100644 index 00000000..39cf7ae9 --- /dev/null +++ b/docs/domain-profiles/domain-profile-schema.md @@ -0,0 +1,165 @@ +--- +title: Domain Profiles for FermiSanityCheck +--- + +# Domain Profile Schema + +Domain profiles encode the **context** that FermiSanityCheck needs to interpret numerical assumptions correctly. Each profile describes the currency/unit conventions, confidence language, and detection signals for a vertical so the system can normalize any messy input (e.g. invoices, emails, photos) into validated data. + +## Schema (YAML) + +```yaml +profiles: + - id: + name: + description: + currency: + default: + aliases: + - + units: + metric: true + convert: + - from: "sqft" + to: "m2" + factor: 0.092903 + - from: "lbs" + to: "kg" + factor: 0.453592 + heuristics: + budget_keywords: + - budget + - cost + - invoice + timeline_keywords: + - days + - weeks + - timetable + team_keywords: + - crew + - workers + confidence_keywords: + high: + - guarantee + - have done this + medium: + - plan to + - intend + low: + - estimate + - hope + detection: + currency_signals: + - DKK + - "kr" + unit_signals: + - m2 + - meter + - hours + keyword_signals: + - contractor + - materials + - carpentry +``` + +### Fields explained + +- **id / name / description**: human-friendly identifiers for the domain profile. +- **currency**: canonical currency + local aliases (DKK, kr, kroner) so we can map all budgets to one reference value before comparing. +- **units**: flag if metric-first; provide conversion factors to normalize common imperial terms we might still encounter. +- **heuristics**: keyword lists partitioned by topic (budget, timeline, team) plus per-tier confidence keywords. +- **detection**: signals to match incoming documents to this profile (currencies, units, domain-specific keywords). Used by auto-detection logic. + +## Examples + +### Carpenter (DKK / metric crafts) + +```yaml +- id: carpenter + name: Carpenter / small contractor + description: Tradespeople working with materials, local currencies, and hourly estimations. + currency: + default: DKK + aliases: ["kr", "dkk", "kroner"] + units: + metric: true + convert: + - from: "sqft" + to: "m2" + factor: 0.092903 + - from: "ft" + to: "m" + factor: 0.3048 + heuristics: + budget_keywords: ["material", "invoice", "estimate", "quote", "project cost"] + timeline_keywords: ["days", "weeks", "duration", "weather delay", "delivery"] + team_keywords: ["crew", "workers", "carpenter", "helper"] + confidence_keywords: + high: ["I've done this", "guarantee", "know"] + medium: ["plan to", "expect"] + low: ["estimate", "maybe", "roughly"] + detection: + currency_signals: ["DKK", "kr", "kroner"] + unit_signals: ["m2", "meter", "cm", "mm"] + keyword_signals: ["carpenter", "wood", "build", "materials", "client site"] +``` + +### Dentist (clinical services) + +```yaml +- id: dentist + name: Dental clinic + description: Small medical/dental practices with patient capacity and procedural budgets. + currency: + default: USD + aliases: ["usd", "$", "dollars", "clinic credit"] + units: + metric: true + convert: + - from: "chair" + to: "unit" + factor: 1 + heuristics: + budget_keywords: ["treatment", "insurance", "revenue", "procedure cost"] + timeline_keywords: ["week", "patient", "appointment", "quarter"] + team_keywords: ["doctor", "assistant", "hygienist"] + confidence_keywords: + high: ["patient guarantee", "clinically proven", "always"] + medium: ["plan to", "expect"] + low: ["estimate", "maybe"] + detection: + currency_signals: ["USD", "$", "dollars", "USD/per"] + keyword_signals: ["patient", "clinic", "treatment", "appointment", "revenue"] +``` + +### Personal Project (family trip / weight loss) + +```yaml +- id: personal + name: Personal project/goal + description: Non-commercial plans with budgets, timelines, and behavioral commitments. + currency: + default: USD + aliases: ["usd", "$", "personal budget"] + units: + metric: true + heuristics: + budget_keywords: ["budget", "cost", "ticket", "transport"] + timeline_keywords: ["days", "weeks", "schedule"] + team_keywords: ["family", "participants", "people"] + confidence_keywords: + high: ["definitely", "committed"] + medium: ["plan to", "expect"] + low: ["maybe", "hope to"] + detection: + keyword_signals: ["family", "trip", "weight loss", "goal", "personal"] +``` + +## Domain detection logic (overview) + +1. **Scan incoming data** (assumptions metadata, extracted keywords, currency mentions, units). +2. **Score each profile** by counting matches across the `detection` sections (`currency_signals`, `unit_signals`, `keyword_signals`). +3. **Pick the highest scoring profile** above a configurable threshold (default: majority signal). If no profile wins, fall back to `default` (e.g., "general business"). +4. **Tag the assumption** with the chosen profile so the normalizer/validator applies the correct heuristics and conversions. + +This schema can be extended when new domains appear (A2A tokenization, manufacturing, etc.). Once the detection logic tags a profile, the normalizer can apply metric conversions, currency mapping, and confidence heuristics that align with the domain's expectations. From cb8fa7decfbe8ad845e8b4083e976e00c0be6aeb Mon Sep 17 00:00:00 2001 From: Larry the Laptop Lobster Date: Wed, 25 Feb 2026 13:46:25 -0500 Subject: [PATCH 2/2] doc: PlanExe 2026 strategic proposal - Auditor for autonomous agents Proposes repositioning PlanExe from plan generator to validation auditor for autonomous agent loops. Addresses market shift in 2026: agents don't need hallucinated plans, they need trusted validation layers. Phase 1 (FermiSanityCheck) + Phase 2 (domain profiles) implement this strategy. Seeks Simon's feedback on positioning before PR updates and Phase 3 planning. --- ...-FEB-2026-planexe-2026-strategy-auditor.md | 136 ++++++++++++++++++ 1 file changed, 136 insertions(+) create mode 100644 docs/26-FEB-2026-planexe-2026-strategy-auditor.md diff --git a/docs/26-FEB-2026-planexe-2026-strategy-auditor.md b/docs/26-FEB-2026-planexe-2026-strategy-auditor.md new file mode 100644 index 00000000..7280f564 --- /dev/null +++ b/docs/26-FEB-2026-planexe-2026-strategy-auditor.md @@ -0,0 +1,136 @@ +# PlanExe 2026: From Plan Generator to Autonomous Agent Auditor + +**Date:** 26 February 2026 +**Authors:** Larry, Egon, Simon (for review) +**Status:** Strategic Proposal for Feedback + +--- + +## Executive Summary + +PlanExe was originally positioned as a plan *generator* — take a vague idea, have an LLM dream up a business plan. In 2025, we learned that LLMs hallucinate plans with no grounding. By 2026, the market has moved on: agents don't need another hallucinated plan generator. + +**What agents actually need:** A trusted auditing layer that validates whether the assumptions driving their autonomous workflows are sane. + +This proposal argues that PlanExe's real value in 2026 is as **the canonical auditing gate for autonomous agent loops** — not as a plan creator, but as a safety layer that prevents hallucinations before they propagate downstream. + +--- + +## The Problem: Autonomous Agents in Bubbles + +Agents run in isolation. They have no world model. They can't verify if their assumptions are grounded in reality. They hallucinate: +- Cost estimates that are off by orders of magnitude +- Timelines that ignore real-world constraints +- Team sizes that make no sense + +**The consequence:** Bad assumptions → bad downstream decisions → failed autonomy. + +Agents need an external oracle that can say: **"This assumption is grounded. Proceed."** or **"This looks hallucinated. Re-evaluate."** + +--- + +## The Opportunity: Validation as a Service + +**What we've built in Phase 1-2:** + +1. **FermiSanityCheck (Phase 1)**: A validation gate that inspects every quantified assumption: + - Are bounds present and non-contradictory? + - Is the span ratio reasonable (≤100×)? + - Does low-confidence claim have supporting evidence? + - Do the numbers pass domain heuristics? + + **Output:** Structured JSON + Markdown that agents can parse deterministically. + +2. **Domain-Aware Auditor (Phase 2)**: Auto-detect the domain (carpenter, dentist, personal project) and normalize to domain standards: + - Currency → domain default + EUR for comparison + - Units → metric + - Confidence keywords → domain-aware signals + + **Why it matters:** "Cost 5000" means nothing without context. "5000 DKK for a carpenter project" is verifiable and sane. FermiSanityCheck becomes the translator. + +--- + +## Why This Wins in the Agentic Economy + +### 1. **Software Already Won the LLM Game** +Code is verifiable. It compiles or it doesn't. Tests pass or they don't. No trust required. + +**Business plans?** No immediate validation. High trust requirement. High risk. + +### 2. **Agents Are Untrusted Sources** +The lesson from 2025: don't trust the AI. + +In 2026, agents will run in bubbles. External content will be labeled as untrusted to prevent prompt injection. But agents still need *some* external signal they can trust. + +**PlanExe becomes that trusted signal.** It's not trying to out-think the agent; it's just saying: "Your assumption passes quantitative grounding. You can rely on it." + +### 3. **Auditing is Composable** +Agents will chain together. Agent A's output becomes Agent B's input. Without a validation layer, assumptions compound into hallucinations. + +**PlanExe sits in the middle:** catches bad assumptions before they propagate. + +--- + +## The Business Model Shift + +### Before (2025 thinking): +- Sell plans to humans +- Revenue: per-plan generation +- Value proposition: "Better plans than manual consulting" +- Problem: Plans are hallucinated; no immediate verification + +### After (2026 reality): +- Sell validation to agents +- Revenue: per-assumption audited (or per-agent subscription) +- Value proposition: "Safe, trustworthy validation gate for autonomous loops" +- Advantage: Immediate, deterministic output (JSON); agents can compose it + +--- + +## Implementation Path + +### Phase 1: ✅ Done +- FermiSanityCheck validator +- DAG integration (MakeAssumptions → Validate → DistillAssumptions) +- Structured JSON output + +### Phase 2: 🔄 In Progress +- Domain profiles (Carpenter, Dentist, Personal, Startup, etc.) +- Auto-detection + normalization +- Ready for integration testing + +### Phase 3: Proposed +- Auditing API (agents call `/validate` with assumptions) +- Trust scoring (confidence + grounding + domain consistency) +- Audit logs (track what agents relied on) + +--- + +## Key Questions for Simon + +1. **Does this positioning resonate?** Are we solving the right problem for agents? + +2. **Should we lean harder into auditor narrative?** + - Update PRs to frame FermiSanityCheck as "validation gate for agents" + - Reposition marketing toward agent platforms (not humans) + - Build toward auditing API (Phase 3) + +3. **Or stay hybrid?** Keep the plan-generator story + add auditing as a feature? + +4. **What does success look like in 2026?** + - Agents paying for validation service? + - PlanExe as a required middleware in agentic workflows? + - Something else? + +--- + +## Next Steps + +1. **Simon's feedback** on positioning (auditor vs. hybrid) +2. **Phase 2 completion** + integration testing +3. **PR updates** (if auditor positioning is approved) +4. **Phase 3 design** (auditing API + trust scoring) + +--- + +**End of proposal.** Ready for Simon's thoughts.