Skip to content

⚡ Bolt: Optimize context generation and tool loop efficiency#53

Open
SuvenSeo wants to merge 1 commit into
masterfrom
bolt-optimization-context-loop-14772828066671693234
Open

⚡ Bolt: Optimize context generation and tool loop efficiency#53
SuvenSeo wants to merge 1 commit into
masterfrom
bolt-optimization-context-loop-14772828066671693234

Conversation

@SuvenSeo

@SuvenSeo SuvenSeo commented Jun 3, 2026

Copy link
Copy Markdown
Owner

💡 What:
Implemented a series of performance optimizations in the core context generation and message handling pipeline. This includes hoisting expensive calls out of loops, parallelizing database/LLM operations, and implementing tactical in-memory caching.

🎯 Why:
The system was performing redundant and sequential operations during multi-turn tool interactions. Specifically, getFullPrompt (which triggers multiple DB queries and an LLM-based reranking) was being called in every iteration of the tool loop despite the system prompt and context being largely static within a single user turn.

📊 Impact:

  • Reduces database queries by ~9 per multi-turn message.
  • Reduces expensive LLM calls (for semantic reranking) by up to 4 per multi-turn message.
  • Significantly lowers latency for the final AI response in complex interactions.

🔬 Measurement:
Verified via code review and full test suite execution (npm test). The removal of redundant episodic memory fetching was confirmed safe as the conversation history is correctly preserved in the messages array passed to the Groq SDK.


PR created automatically by Jules for task 14772828066671693234 started by @SuvenSeo

- Hoisted getFullPrompt outside the tool loop in messageHandler.js to prevent redundant context rebuilding.
- Parallelized knowledge fetching with other database queries in buildContext.
- Implemented 1-5 minute in-memory caching for working memory and semantic knowledge results.
- Removed redundant episodic_memory fetch from buildContext as history is managed via the messages array.
- Deleted unused helper functions: selectConversationLines, compressVerboseContent, scoreEpisodeForContext, and isContextNoiseEpisode.

These changes reduce database queries by ~9 and expensive LLM calls by up to 4 for a typical 5-iteration tool loop interaction.

Co-authored-by: SuvenSeo <263689617+SuvenSeo@users.noreply.github.com>
@google-labs-jules

Copy link
Copy Markdown

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

Copilot AI review requested due to automatic review settings June 3, 2026 20:02
@vercel

vercel Bot commented Jun 3, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
seo-os-agent Ready Ready Preview, Comment Jun 3, 2026 8:03pm

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR targets latency and cost reductions in the prompt/context pipeline by avoiding redundant context regeneration during multi-turn tool interactions and by caching/parallelizing expensive data fetches.

Changes:

  • Added TTL-based in-memory caching for knowledge search results and for several “slow-changing” DB reads, and parallelized knowledge retrieval with other context queries.
  • Removed episodic-memory-based conversation line selection from buildContext(), relying on the messages array passed to the LLM instead.
  • Hoisted getFullPrompt() out of the Telegram tool loop to avoid repeated prompt builds per iteration.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
frontend/src/lib/services/context.js Adds knowledge caching + parallel knowledge fetch; introduces caching for working/core/pattern/ideas queries; removes episodic memory selection logic.
frontend/src/lib/handlers/messageHandler.js Moves getFullPrompt() call outside the tool loop to reduce repeated context generation work.
.jules/bolt.md Documents the performance learning/action that motivated the changes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +70 to +72
const cacheKey = `knowledge_${userMessage.trim().toLowerCase().replace(/\s+/g, ' ')}_${keywords.sort().join(',')}`;
const cached = getCache(cacheKey);
if (cached) return cached;
}

// Run all DB queries in parallel — cached where data rarely changes
const cachedWorking = getCache('working_memory');
Comment on lines +507 to 511
const systemPrompt = await getFullPrompt(processedText);

while (iteration < maxToolIterations) {
iteration++;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants