Skip to content

[FEATURE]: Implement Semantic Caching for AI-Generated Roadmaps using Qdrant #856

@ChandanMeher4

Description

@ChandanMeher4

Feature Summary

Semantic caching layer for AI roadmap generation using Qdrant - checks for semantically similar past requests before querying Gemini. It returns cached results immediately rather than using up API quota.

Problem Statement

While going through the codebase, I noticed that the function generateAiRoadmap in roadmap.ai.service.ts makes a new call to GeminiProvider("gemini-2.5-flash-lite") for every request without any caching. Prompts that are semantically identical, like "MERN Stack Beginner Roadmap" and "Beginner roadmap for MERN," are sent to Gemini twice, returning almost the same results.

I also found that AtsService.scoreResume in ats.service.ts has a 24-hour exact-match cache using prisma.atsScore.findFirst, which indicates that the issue of duplicate requests is already acknowledged. However, it only captures inputs that are exactly the same. Semantically similar prompts still contact the API each time.

As user volume increases, this leads to:

  • Unnecessary Gemini API costs and rate limit pressure
  • Students waiting several seconds for roadmaps that were effectively generated already

Proposed Solution

Introduce a semantic caching layer inside generateAiRoadmap that runs before the Gemini call:

  • Normalize and embed the input parameters using the Gemini Embeddings API, which is already in the codebase. No new credentials are needed.
  • Query Qdrant for a cosine similarity match above a 0.95 threshold.
  • If there's a cache hit, return the stored JSON instantly without making a Gemini call.
  • If there's a cache miss, call Gemini as usual and store the result and embedding in Qdrant for future requests.

DevOps safety, with no risk to existing contributors:
The new feature will be controlled by a SEMANTIC_CACHE_ENABLED=true environment variable. If it is missing or set to false, the code will revert to the current behavior without any changes. Contributors without Qdrant set up will see no difference at all.
New .env.example entries (for documentation only):

  • SEMANTIC_CACHE_ENABLED=false
  • QDRANT_URL=
  • QDRANT_API_KEY=
  • QDRANT_COLLECTION_NAME=ai_prompt_cache
    The Qdrant Cloud free tier (1GB, no credit card required) is enough for tens of thousands of cached roadmaps at the project's current scale.

Files to change:

  • server/src/lib/semantic-cache.ts : new file for Qdrant client, embedding, similarity search, and storage
  • server/src/module/roadmap/roadmap.ai.service.ts : wrap generateAiRoadmap with a cache check
  • .env.example: add 4 new optional variables

Alternatives Considered

pgvector (Postgres extension), use the existing Neon database with vector support. This option eliminates the need for a new service but requires the pgvector extension to be enabled on the hosted database, which requires action from a maintainer.
Hash-based exact caching, normalize and hash prompts, store results in a new Prisma table. This solution is simpler but does not account for semantically similar inputs, which is the same limitation as the current ATS cache.

I believe Qdrant is the best choice here because it does not affect any existing infrastructure, and the free tier comfortably supports this project's scale.

Additional Context

I actually built this exact architecture in MergeMind (one of my projects) and used Qdrant to store & retrieve past PR reviews by semantic similarity, keeping LLM feedback consistent across similar code changes. Same pattern here, for caching prompts.
I'm also familiar with the roadmap module from PR #623, I directly worked on roadmap.ai.service.ts and its neighboring infrastructure. Happy to start right away if this is approved.

Metadata

Metadata

Assignees

Labels

type:featureNew feature implementation

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions