Human-like AI Memory ◦ 10Mn+ Token Context ◦ 0.2$/Mn tokens ◦ Conscious Recall
NOTE: That this model is currently in closed alpha. To get access reach out to us
The human brain is a master at compression. It doesn't try to remember every passing detail; instead, it aggressively prunes noise to maintain a sharp, focused, and easily accessible recall of what truly matters. In contrast, traditional AI memory systems try to remember everything. They retrieve whatever is similar—but similar doesn't mean important. The result? Your AI drowns in stale, irrelevant context that degrades every response.
Inspired by how the human brain works, Neocortex takes a similar approach to AI memory: it intelligently forgets noise. Just like how you don't remember every sentence you've ever read or everything happens every day in your life, Neocortex lets low-value memories naturally decay while reinforcing the knowledge that matters — the things you interact with, recall, and build upon.
The result? an AI memory system that can chop through over 10 million tokens accurately at speeds of upto 4000 tokens/second, stays lean and focused, and gets smarter with every interaction.
Neocortex ranks extremely high scores on RAGAS, Babilong, Vending Bench, LoCoMo and HotPotQA
Memories that aren't accessed naturally decay over time. Frequently recalled knowledge becomes more durable. No manual cleanup needed — the system stays lean on its own.
Not all memories are equal. Views, reactions, replies, and content creation all signal what matters. Knowledge people engage with rises to the top; ignored information fades away.
There's no compromise on speed and quality when processing data with Neocortex. Everything is processed at low costs and low latency, while maintain high benchmarks.
Conscious recall is a Neocortex feature that proactively surfaces the most relevant memories for a given moment, instead of waiting for an explicit query.
It continuously tracks what a user has done recently which includes conversations, actions, and signals; and combines that with time-based decay to decide which memories should stay “top of mind.”
When your agent needs context, conscious recall pulls forward the memories that are both recent and repeatedly interacted with, giving the LLM a focused slice of long-term history rather than a noisy dump of everything you’ve ever stored.
Neocortex ships with SDKs for Python, TypeScript/JavaScript, Go, Rust, Dart, C++, C#, and Java, plus plugins for LangGraph, OpenClaw, ElevenLabs, CrewAI, Raycast, Agno Pipecat, Mastra, Autogen and more.
See packages/README.md for details about all the SDKs/Plugins available to use along with documentation and examples.
Below is a simple quickstart example on getting started with Python.
# pip install tinyhumansai
import tinyhumansai as api
client = api.TinyHumanMemoryClient("YOUR_APIKEY_HERE")
# Store a single memory
client.ingest_memory({
"key": "user-preference-theme",
"content": "User prefers dark mode",
"namespace": "preferences",
"metadata": {"source": "onboarding"},
})
# Ask a LLM something from the memory
response = client.recall_with_llm(
prompt="What is the user's preference for theme?",
api_key="OPENAI_API_KEY"
)
print(response.text) # The user prefers dark modeExplore Neocortex in action through a set of real-time demo experiences that show how the memory layer behaves under live usage.
- Real-time chat assistant – A conversational UI that continuously writes and recalls memories so the assistant remembers users across sessions.
- Live activity memory feed – A stream of events (page views, actions, and signals) flowing into Neocortex, letting you inspect how memories are created, updated, and decayed over time.
- Agentic decision demo – A simple agent that uses Neocortex to make stateful decisions over many steps, highlighting how long-horizon context is preserved.
Provide context to your LLM by using a dedicated context role instead of stuffing facts into the system message. Context ingested this way doesn’t consume expensive LLM tokens, and more context doesn’t hurt accuracy.
Standard RAG quality metrics evaluated using RAGAS. Neocortex leads in Answer Relevancy (0.97) and Context Precision (0.75), outperforming FastGraphRAG, Gemini VDB, Mem0, and SuperMemory.
Accuracy across ordering, state-at-time, recency, interval, and sequence questions. Neocortex achieves 100% on recency questions — correctly surfacing the most recent events thanks to its time-decay memory model.
An agent manages a simulated vending machine business over 30 days. Neocortex achieves the highest cumulative P&L (~$295 by day 30) — better memory leads to better decisions over time.
Like contributing towards AGI 🧠? Give this repo a star and spread the love ❤️
Show some love and end up in the hall of fame





