Hermes

Context memory layer for AI coding agents. Extracts architectural decisions from conversations, stores them permanently, and injects them into every prompt so the agent never forgets what was decided, no matter how long the conversation gets.

Evaluated on LoCoEval (48-56 turn repository-oriented coding conversations, 40K token context limit).

Why?

Every AI app builder (Bolt, Lovable, v0) treats conversation history as a FIFO buffer. Context fills up, old messages get silently dropped, and the agent starts contradicting itself. User picked Postgres at message 3, defined a schema at message 8, chose JWT auth at message 12. By message 40, all of that is gone.

Hermes fixes this by doing two things before information can be lost:

Extraction: After every response, pull out architectural decisions (schema choices, auth methods, routing patterns, dependencies) and store them permanently.
Compaction: At task boundaries, summarize old messages instead of dropping them. Topic awareness survives even when raw messages don't.

The agent always knows what was decided (extraction) and what was discussed (compaction). Conversation length becomes irrelevant.

How It Works

TypeScript Bun HTTP server. Three endpoints: /init, /query, /state.

Every round:

Check if conversation topic shifted or context is near limit. If yes, compact.
Assemble prompt: extracted decisions at the top, conversation summary near the query, recent messages in between. (Lost-in-the-Middle placement.)
Call Gemini. Get answer.
Extract decisions from the answer asynchronously. Dedup against existing store (CREATE / MERGE / SKIP).
Return answer.

Memory model: Two tiers per decision. L0 (one-sentence summary) and L1 (structured detail). All decisions from 4 categories (structure, behavior, relationships, decisions) always loaded. Under 4K tokens for 30+ decisions. No classifier, no embeddings, no routing logic.

Compaction triggers: Token pressure (35K threshold) or topic shift gated on pressure (>25K). Old messages get summarized, summaries compound across compactions.

What the LLM Sees

This is the LitM (Lost-in-the-Middle) placement that makes the whole thing work. The summary sits as a synthetic assistant message right before the query, in the high-attention recency zone. This single positioning change took TA from 0.541 to 0.899.

Benchmark

Evaluated on LoCoEval multi-hop coding conversations (48-56 turns, 40K token context limit) across two repositories (Kinto, Falcon). Metrics: Information Extraction (IE) measures fact recall accuracy, Topic Awareness (TA) measures whether the agent remembers what was discussed.

Function Completion (FC) was excluded because it tests the underlying model's code generation capability, which is orthogonal to context management. Both agents scored identically on FC with the same functions failing, confirming it measures model skill, not context strategy.

	TruncateAgent	HermesAgent	Delta
Topic Awareness F1	0.381	0.736	+93%
Information Extraction F1	0.763	0.652	-14.5%

Per-repo breakdown:

Repo	Metric	TruncateAgent	HermesAgent	Delta
Kinto	TA	0.362	1.000	+176%
Kinto	IE	0.728	0.647	-11%
Falcon	TA	0.400	0.471	+18%
Falcon	IE	0.797	0.657	-18%

Both agents use identical code retrieval (SimilarFunctionParser), same Gemini Flash backbone, same mock user, same judge. Only variable is context management strategy.

Why TA wins: Extracted decisions + compaction summaries preserve topic awareness that truncation silently drops. Kinto's perfect 1.000 means Hermes remembered every topic from a 48-turn conversation.

Why IE drops: Hermes has higher recall (finds more ground truth facts) but lower precision (generates more false positives). The decisions block gives the model architectural context that encourages over-elaboration. This is a generation-side issue, not a context management failure, and a clear target for iteration.

Where Hermes Fits

Tool	What it solves	Hermes relationship
OMEGA, Mem0, Zep	Cross-session memory (between conversations)	Complementary. Hermes is within-session.
Letta (MemGPT)	Full agent runtime with memory	Hermes is middleware, not a runtime. No rewrite needed.
Deep Agents, FlashCompact	Context compression	Hermes adds decision extraction on top of compression.
Factory.ai	Structured coding agent compression	Validates our approach. Same insight, different implementation.

Stack


Server	TypeScript, Bun
LLM	Gemini 3 Flash (backbone, extraction, compaction), Gemini 3.1 Flash Lite (dedup)
Memory	In-memory Map, 4 categories
Benchmark	LoCoEval (Python, unmodified except Gemini adapter patches)
Bridge	25-line Python wrapper forwarding HTTP to the TS server

Quickstart

# Install dependencies
bun install

# Set your Gemini API key
export GEMINI_API_KEY="your-key-here"

# Start Hermes server
bun run src/index.ts

# Inspect memory store
curl localhost:3000/state | jq

Status

Research prototype. The core pipeline works and benchmarks well on TA, but IE precision needs iteration. See LEARNINGS.md for the full development story.

Contributions welcome — especially around extraction prompt tuning, decision compaction, and alternative dedup strategies.

Research

Built on: Lost-in-the-Middle (Stanford/Berkeley) | OpenViking L0/L1/L2 (ByteDance, 2026) | Deep Agents Compaction (LangChain, 2026) | ACE Pattern (Zhang et al., 2026) | Factory.ai Structured Compression (2025)

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
assets		assets
eval		eval
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
LEARNINGS.md		LEARNINGS.md
LICENSE		LICENSE
README.md		README.md
bun.lock		bun.lock
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hermes

Why?

How It Works

What the LLM Sees

Benchmark

Where Hermes Fits

Stack

Quickstart

Status

Research

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Hermes

Why?

How It Works

What the LLM Sees

Benchmark

Where Hermes Fits

Stack

Quickstart

Status

Research

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages