Feature Summary
Semantic caching layer for AI roadmap generation using Qdrant - checks for semantically similar past requests before querying Gemini. It returns cached results immediately rather than using up API quota.
Problem Statement
While going through the codebase, I noticed that the function generateAiRoadmap in roadmap.ai.service.ts makes a new call to GeminiProvider("gemini-2.5-flash-lite") for every request without any caching. Prompts that are semantically identical, like "MERN Stack Beginner Roadmap" and "Beginner roadmap for MERN," are sent to Gemini twice, returning almost the same results.
I also found that AtsService.scoreResume in ats.service.ts has a 24-hour exact-match cache using prisma.atsScore.findFirst, which indicates that the issue of duplicate requests is already acknowledged. However, it only captures inputs that are exactly the same. Semantically similar prompts still contact the API each time.
As user volume increases, this leads to:
- Unnecessary Gemini API costs and rate limit pressure
- Students waiting several seconds for roadmaps that were effectively generated already
Proposed Solution
Introduce a semantic caching layer inside generateAiRoadmap that runs before the Gemini call:
- Normalize and embed the input parameters using the Gemini Embeddings API, which is already in the codebase. No new credentials are needed.
- Query Qdrant for a cosine similarity match above a 0.95 threshold.
- If there's a cache hit, return the stored JSON instantly without making a Gemini call.
- If there's a cache miss, call Gemini as usual and store the result and embedding in Qdrant for future requests.
DevOps safety, with no risk to existing contributors:
The new feature will be controlled by a SEMANTIC_CACHE_ENABLED=true environment variable. If it is missing or set to false, the code will revert to the current behavior without any changes. Contributors without Qdrant set up will see no difference at all.
New .env.example entries (for documentation only):
SEMANTIC_CACHE_ENABLED=false
QDRANT_URL=
QDRANT_API_KEY=
QDRANT_COLLECTION_NAME=ai_prompt_cache
The Qdrant Cloud free tier (1GB, no credit card required) is enough for tens of thousands of cached roadmaps at the project's current scale.
Files to change:
server/src/lib/semantic-cache.ts : new file for Qdrant client, embedding, similarity search, and storage
server/src/module/roadmap/roadmap.ai.service.ts : wrap generateAiRoadmap with a cache check
.env.example: add 4 new optional variables
Alternatives Considered
pgvector (Postgres extension), use the existing Neon database with vector support. This option eliminates the need for a new service but requires the pgvector extension to be enabled on the hosted database, which requires action from a maintainer.
Hash-based exact caching, normalize and hash prompts, store results in a new Prisma table. This solution is simpler but does not account for semantically similar inputs, which is the same limitation as the current ATS cache.
I believe Qdrant is the best choice here because it does not affect any existing infrastructure, and the free tier comfortably supports this project's scale.
Additional Context
I actually built this exact architecture in MergeMind (one of my projects) and used Qdrant to store & retrieve past PR reviews by semantic similarity, keeping LLM feedback consistent across similar code changes. Same pattern here, for caching prompts.
I'm also familiar with the roadmap module from PR #623, I directly worked on roadmap.ai.service.ts and its neighboring infrastructure. Happy to start right away if this is approved.
Feature Summary
Semantic caching layer for AI roadmap generation using Qdrant - checks for semantically similar past requests before querying Gemini. It returns cached results immediately rather than using up API quota.
Problem Statement
While going through the codebase, I noticed that the function
generateAiRoadmapinroadmap.ai.service.tsmakes a new call toGeminiProvider("gemini-2.5-flash-lite")for every request without any caching. Prompts that are semantically identical, like "MERN Stack Beginner Roadmap" and "Beginner roadmap for MERN," are sent to Gemini twice, returning almost the same results.I also found that
AtsService.scoreResumeinats.service.tshas a 24-hour exact-match cache usingprisma.atsScore.findFirst, which indicates that the issue of duplicate requests is already acknowledged. However, it only captures inputs that are exactly the same. Semantically similar prompts still contact the API each time.As user volume increases, this leads to:
Proposed Solution
Introduce a semantic caching layer inside
generateAiRoadmapthat runs before the Gemini call:DevOps safety, with no risk to existing contributors:
The new feature will be controlled by a
SEMANTIC_CACHE_ENABLED=trueenvironment variable. If it is missing or set to false, the code will revert to the current behavior without any changes. Contributors without Qdrant set up will see no difference at all.New
.env.exampleentries (for documentation only):SEMANTIC_CACHE_ENABLED=falseQDRANT_URL=QDRANT_API_KEY=QDRANT_COLLECTION_NAME=ai_prompt_cacheThe Qdrant Cloud free tier (1GB, no credit card required) is enough for tens of thousands of cached roadmaps at the project's current scale.
Files to change:
server/src/lib/semantic-cache.ts: new file for Qdrant client, embedding, similarity search, and storageserver/src/module/roadmap/roadmap.ai.service.ts: wrapgenerateAiRoadmapwith a cache check.env.example: add 4 new optional variablesAlternatives Considered
pgvector (Postgres extension), use the existing Neon database with vector support. This option eliminates the need for a new service but requires the pgvector extension to be enabled on the hosted database, which requires action from a maintainer.
Hash-based exact caching, normalize and hash prompts, store results in a new Prisma table. This solution is simpler but does not account for semantically similar inputs, which is the same limitation as the current ATS cache.
I believe Qdrant is the best choice here because it does not affect any existing infrastructure, and the free tier comfortably supports this project's scale.
Additional Context
I actually built this exact architecture in MergeMind (one of my projects) and used Qdrant to store & retrieve past PR reviews by semantic similarity, keeping LLM feedback consistent across similar code changes. Same pattern here, for caching prompts.
I'm also familiar with the roadmap module from PR #623, I directly worked on roadmap.ai.service.ts and its neighboring infrastructure. Happy to start right away if this is approved.