Spherecast Agent — AI Supply Chain Co-Pilot

TUM.ai × Spherecast Hackathon 2026 challenge

Presentation

Demo

The Problem

CPG companies overpay for ingredients because sourcing is fragmented. The same ingredient gets purchased by multiple brands from different suppliers with no shared visibility. Nobody sees the combined demand — so nobody captures the volume discount, and nobody checks whether a cheaper alternative is actually safe to use in a given product.

The challenge has two parts:

Which ingredients can actually replace each other? Not just same name — functionally equivalent in the context of a specific finished product.
Is the substitute compliant? Does it preserve allergen claims, dietary certifications, regulatory status, and physical specifications for that product?

Our Solution

We built Agnes — an autonomous AI agent that reasons across fragmented supply chain data to answer sourcing questions. The system ingests BOM and supplier data for 61 CPG companies, enriches every raw material with external knowledge, finds substitutable ingredients via semantic similarity, and scores each candidate with a quantifiable compliance rubric.

An analyst can ask Agnes: "What can replace soy lecithin in product X?" — and get a ranked list of substitutes with a breakdown of why each one scores the way it does, backed by evidence from supplier datasheets and regulatory databases.

Functional Requirements & How We Addressed Them

#	Requirement	Status
R1	Ingest BOM and supplier data	✅ Done
R2	Enrich materials with external knowledge	✅ Done
R3	Identify interchangeable components	✅ Done
R4	Infer the compliance bar a substitute must meet	✅ Done
R5	Score substitutes with explainable reasoning	✅ Done
R6	Preserve evidence trails	✅ Done
R7	Surface fragmentation across the portfolio	🔲 Data ready
R8	Generate consolidated sourcing proposals	🔜 Future work
R9	Conversational reasoning interface	✅ Done

R1 — Ingest: All 61-company BOM and supplier data loaded into a structured database on startup.

R2 — Enrich: Every ingredient enriched with functional role, source origin, allergens, dietary flags, certifications, and regulatory status — pulled from supplier websites, scientific databases, and LLM knowledge. Each fact carries its source and confidence level.

R3 — Identify substitutes: Ingredients converted into semantic vectors. Similarity search finds functionally equivalent candidates even when names differ.

R4 — Infer compliance bar: Enriched spec of the original ingredient defines what any substitute must preserve — dietary claims, allergen profile, regulatory status, physical form.

R5 — Score with reasoning: GPT-4o scores each candidate across 5 dimensions (functional equivalence, spec compatibility, regulatory fit, dietary compliance, certification match), each 0–20, total 0–100. Every score includes a breakdown and written justification.

R6 — Evidence trails: Every property carries provenance: where it came from, the URL, and confidence level.

R7 — Fragmentation: Data layer ready. System can answer "which companies buy the same ingredient from different suppliers?" in a single query. Not yet surfaced as a dedicated UI feature.

R8 — Proposals: Future work. Database schema, API contract, and data models fully defined. The agent that generates and writes proposals is the remaining piece.

R9 — Conversational interface: Agnes is an autonomous chat agent. It decides which tools to call based on the analyst's question — no scripted flows. It surfaces its reasoning trace so the analyst sees which data sources were consulted.

Evaluation

We built an evaluation framework (benchmark runner + dataset) to measure how well the compliance engine finds the right substitutes.

Results: ~50% Precision@3 — for a given ingredient and product, the top 3 returned substitutes contain a correct answer roughly half the time.

Why not higher:

AI non-determinism. GPT-4o produces slightly different rankings even at temperature=0. Borderline cases flip between runs.
Dataset bias. Ground-truth dataset was generated using Claude Sonnet 4.7 and Opus with high reasoning effort — not manually verified by domain experts. The "correct" answers reflect what a powerful LLM thinks is substitutable, which may not match real-world procurement expertise. We did not have time to validate against human judgment.
Thin specs. Several enrichment sources are implemented but disabled due to API reliability constraints. Many ingredient specs are filled by LLM inference rather than authoritative databases, weakening the similarity search signal.

50% precision on an open-ended substitution task with no organizer-provided ground truth is a reasonable starting point. With a verified benchmark and richer enrichment, this number would improve.

Cost

API spend over the hackathon:

Anthropic (Claude Haiku — enrichment): ~$1.14 across 334 calls
OpenAI (GPT-4o compliance scoring + embeddings + Agnes chat): ~$X

Primary model: GPT-4o for compliance scoring and Agnes chat reasoning.

Repositories

Backend: https://github.com/Luuunch-Gangathon/backend
Frontend: https://github.com/Luuunch-Gangathon/frontend

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
screenshots		screenshots
.gitattributes		.gitattributes
README.md		README.md
SpherecastAgent_Screencast.mp4		SpherecastAgent_Screencast.mp4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spherecast Agent — AI Supply Chain Co-Pilot

Presentation

Demo

The Problem

Our Solution

Functional Requirements & How We Addressed Them

Evaluation

Cost

Repositories

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Spherecast Agent — AI Supply Chain Co-Pilot

Presentation

Demo

The Problem

Our Solution

Functional Requirements & How We Addressed Them

Evaluation

Cost

Repositories

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages