⚡ Bolt: Optimize context generation and tool loop efficiency#53
⚡ Bolt: Optimize context generation and tool loop efficiency#53SuvenSeo wants to merge 1 commit into
Conversation
- Hoisted getFullPrompt outside the tool loop in messageHandler.js to prevent redundant context rebuilding. - Parallelized knowledge fetching with other database queries in buildContext. - Implemented 1-5 minute in-memory caching for working memory and semantic knowledge results. - Removed redundant episodic_memory fetch from buildContext as history is managed via the messages array. - Deleted unused helper functions: selectConversationLines, compressVerboseContent, scoreEpisodeForContext, and isContextNoiseEpisode. These changes reduce database queries by ~9 and expensive LLM calls by up to 4 for a typical 5-iteration tool loop interaction. Co-authored-by: SuvenSeo <263689617+SuvenSeo@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Pull request overview
This PR targets latency and cost reductions in the prompt/context pipeline by avoiding redundant context regeneration during multi-turn tool interactions and by caching/parallelizing expensive data fetches.
Changes:
- Added TTL-based in-memory caching for knowledge search results and for several “slow-changing” DB reads, and parallelized knowledge retrieval with other context queries.
- Removed episodic-memory-based conversation line selection from
buildContext(), relying on themessagesarray passed to the LLM instead. - Hoisted
getFullPrompt()out of the Telegram tool loop to avoid repeated prompt builds per iteration.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
frontend/src/lib/services/context.js |
Adds knowledge caching + parallel knowledge fetch; introduces caching for working/core/pattern/ideas queries; removes episodic memory selection logic. |
frontend/src/lib/handlers/messageHandler.js |
Moves getFullPrompt() call outside the tool loop to reduce repeated context generation work. |
.jules/bolt.md |
Documents the performance learning/action that motivated the changes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const cacheKey = `knowledge_${userMessage.trim().toLowerCase().replace(/\s+/g, ' ')}_${keywords.sort().join(',')}`; | ||
| const cached = getCache(cacheKey); | ||
| if (cached) return cached; |
| } | ||
|
|
||
| // Run all DB queries in parallel — cached where data rarely changes | ||
| const cachedWorking = getCache('working_memory'); |
| const systemPrompt = await getFullPrompt(processedText); | ||
|
|
||
| while (iteration < maxToolIterations) { | ||
| iteration++; | ||
|
|
💡 What:
Implemented a series of performance optimizations in the core context generation and message handling pipeline. This includes hoisting expensive calls out of loops, parallelizing database/LLM operations, and implementing tactical in-memory caching.
🎯 Why:
The system was performing redundant and sequential operations during multi-turn tool interactions. Specifically,
getFullPrompt(which triggers multiple DB queries and an LLM-based reranking) was being called in every iteration of the tool loop despite the system prompt and context being largely static within a single user turn.📊 Impact:
🔬 Measurement:
Verified via code review and full test suite execution (
npm test). The removal of redundant episodic memory fetching was confirmed safe as the conversation history is correctly preserved in themessagesarray passed to the Groq SDK.PR created automatically by Jules for task 14772828066671693234 started by @SuvenSeo