[FEATURE]: Implement Semantic Caching for AI-Generated Roadmaps using Qdrant

### Feature Summary

Semantic caching layer for AI roadmap generation using Qdrant - checks for semantically similar past requests before querying Gemini. It returns cached results immediately rather than using up API quota.

### Problem Statement

While going through the codebase, I noticed that the function `generateAiRoadmap` in `roadmap.ai.service.ts` makes a new call to `GeminiProvider("gemini-2.5-flash-lite")` for every request without any caching. Prompts that are semantically identical, like "MERN Stack Beginner Roadmap" and "Beginner roadmap for MERN," are sent to Gemini twice, returning almost the same results.

I also found that `AtsService.scoreResume` in `ats.service.ts` has a 24-hour exact-match cache using `prisma.atsScore.findFirst`, which indicates that the issue of duplicate requests is already acknowledged. However, it only captures inputs that are exactly the same. Semantically similar prompts still contact the API each time.

As user volume increases, this leads to:

- Unnecessary Gemini API costs and rate limit pressure
- Students waiting several seconds for roadmaps that were effectively generated already

### Proposed Solution

Introduce a semantic caching layer inside `generateAiRoadmap` that runs before the Gemini call:

- Normalize and embed the input parameters using the Gemini Embeddings API, which is already in the codebase. No new credentials are needed.
- Query Qdrant for a cosine similarity match above a 0.95 threshold.
- If there's a cache hit, return the stored JSON instantly without making a Gemini call.
- If there's a cache miss, call Gemini as usual and store the result and embedding in Qdrant for future requests.

DevOps safety, with no risk to existing contributors:  
The new feature will be controlled by a `SEMANTIC_CACHE_ENABLED=true` environment variable. If it is missing or set to false, the code will revert to the current behavior without any changes. Contributors without Qdrant set up will see no difference at all.  
New `.env.example` entries (for documentation only):  
- `SEMANTIC_CACHE_ENABLED=false`  
- `QDRANT_URL=`  
- `QDRANT_API_KEY=`  
- `QDRANT_COLLECTION_NAME=ai_prompt_cache`  
The Qdrant Cloud free tier (1GB, no credit card required) is enough for tens of thousands of cached roadmaps at the project's current scale.  

Files to change:  
- `server/src/lib/semantic-cache.ts` : new file for Qdrant client, embedding, similarity search, and storage  
- `server/src/module/roadmap/roadmap.ai.service.ts` : wrap `generateAiRoadmap` with a cache check  
- `.env.example`:  add 4 new optional variables  

### Alternatives Considered

pgvector (Postgres extension), use the existing Neon database with vector support. This option eliminates the need for a new service but requires the pgvector extension to be enabled on the hosted database, which requires action from a maintainer.  
Hash-based exact caching, normalize and hash prompts, store results in a new Prisma table. This solution is simpler but does not account for semantically similar inputs, which is the same limitation as the current ATS cache.  

I believe Qdrant is the best choice here because it does not affect any existing infrastructure, and the free tier comfortably supports this project's scale.

### Additional Context

I actually built this exact architecture in MergeMind (one of my projects) and used Qdrant to store & retrieve past PR reviews by semantic similarity, keeping LLM feedback consistent across similar code changes. Same pattern here, for caching prompts.
I'm also familiar with the roadmap module from PR #623, I directly worked on roadmap.ai.service.ts and its neighboring infrastructure. Happy to start right away if this is approved.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE]: Implement Semantic Caching for AI-Generated Roadmaps using Qdrant #856

Feature Summary

Problem Statement

Proposed Solution

Alternatives Considered

Additional Context

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[FEATURE]: Implement Semantic Caching for AI-Generated Roadmaps using Qdrant #856

Description

Feature Summary

Problem Statement

Proposed Solution

Alternatives Considered

Additional Context

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions