Skip to content

khinevich/SpherecastAgent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Spherecast Agent — AI Supply Chain Co-Pilot

TUM.ai × Spherecast Hackathon 2026 challenge

🎬 Watch demo video


Presentation

Slide 1 Slide 2 Slide 3 Slide 4 Slide 5 Slide 6 Slide 7


Demo

Demo 1 Demo 2
Demo 3 Demo 4
Demo 5

The Problem

CPG companies overpay for ingredients because sourcing is fragmented. The same ingredient gets purchased by multiple brands from different suppliers with no shared visibility. Nobody sees the combined demand — so nobody captures the volume discount, and nobody checks whether a cheaper alternative is actually safe to use in a given product.

The challenge has two parts:

  1. Which ingredients can actually replace each other? Not just same name — functionally equivalent in the context of a specific finished product.
  2. Is the substitute compliant? Does it preserve allergen claims, dietary certifications, regulatory status, and physical specifications for that product?

Our Solution

We built Agnes — an autonomous AI agent that reasons across fragmented supply chain data to answer sourcing questions. The system ingests BOM and supplier data for 61 CPG companies, enriches every raw material with external knowledge, finds substitutable ingredients via semantic similarity, and scores each candidate with a quantifiable compliance rubric.

An analyst can ask Agnes: "What can replace soy lecithin in product X?" — and get a ranked list of substitutes with a breakdown of why each one scores the way it does, backed by evidence from supplier datasheets and regulatory databases.


Functional Requirements & How We Addressed Them

# Requirement Status
R1 Ingest BOM and supplier data ✅ Done
R2 Enrich materials with external knowledge ✅ Done
R3 Identify interchangeable components ✅ Done
R4 Infer the compliance bar a substitute must meet ✅ Done
R5 Score substitutes with explainable reasoning ✅ Done
R6 Preserve evidence trails ✅ Done
R7 Surface fragmentation across the portfolio 🔲 Data ready
R8 Generate consolidated sourcing proposals 🔜 Future work
R9 Conversational reasoning interface ✅ Done

R1 — Ingest: All 61-company BOM and supplier data loaded into a structured database on startup.

R2 — Enrich: Every ingredient enriched with functional role, source origin, allergens, dietary flags, certifications, and regulatory status — pulled from supplier websites, scientific databases, and LLM knowledge. Each fact carries its source and confidence level.

R3 — Identify substitutes: Ingredients converted into semantic vectors. Similarity search finds functionally equivalent candidates even when names differ.

R4 — Infer compliance bar: Enriched spec of the original ingredient defines what any substitute must preserve — dietary claims, allergen profile, regulatory status, physical form.

R5 — Score with reasoning: GPT-4o scores each candidate across 5 dimensions (functional equivalence, spec compatibility, regulatory fit, dietary compliance, certification match), each 0–20, total 0–100. Every score includes a breakdown and written justification.

R6 — Evidence trails: Every property carries provenance: where it came from, the URL, and confidence level.

R7 — Fragmentation: Data layer ready. System can answer "which companies buy the same ingredient from different suppliers?" in a single query. Not yet surfaced as a dedicated UI feature.

R8 — Proposals: Future work. Database schema, API contract, and data models fully defined. The agent that generates and writes proposals is the remaining piece.

R9 — Conversational interface: Agnes is an autonomous chat agent. It decides which tools to call based on the analyst's question — no scripted flows. It surfaces its reasoning trace so the analyst sees which data sources were consulted.


Evaluation

We built an evaluation framework (benchmark runner + dataset) to measure how well the compliance engine finds the right substitutes.

Results: ~50% Precision@3 — for a given ingredient and product, the top 3 returned substitutes contain a correct answer roughly half the time.

Why not higher:

  • AI non-determinism. GPT-4o produces slightly different rankings even at temperature=0. Borderline cases flip between runs.
  • Dataset bias. Ground-truth dataset was generated using Claude Sonnet 4.7 and Opus with high reasoning effort — not manually verified by domain experts. The "correct" answers reflect what a powerful LLM thinks is substitutable, which may not match real-world procurement expertise. We did not have time to validate against human judgment.
  • Thin specs. Several enrichment sources are implemented but disabled due to API reliability constraints. Many ingredient specs are filled by LLM inference rather than authoritative databases, weakening the similarity search signal.

50% precision on an open-ended substitution task with no organizer-provided ground truth is a reasonable starting point. With a verified benchmark and richer enrichment, this number would improve.


Cost

API spend over the hackathon:

  • Anthropic (Claude Haiku — enrichment): ~$1.14 across 334 calls
  • OpenAI (GPT-4o compliance scoring + embeddings + Agnes chat): ~$X

Primary model: GPT-4o for compliance scoring and Agnes chat reasoning.


Repositories

About

Spherecast Supply Chain AI Agent

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors