Skip to content

Comments

context truncation/compaction improvements#1831

Draft
krissetto wants to merge 1 commit intodocker:mainfrom
krissetto:better-cache-stability-and-context-preservation
Draft

context truncation/compaction improvements#1831
krissetto wants to merge 1 commit intodocker:mainfrom
krissetto:better-cache-stability-and-context-preservation

Conversation

@krissetto
Copy link
Contributor

  • Removes truncateOldToolContent and MaxToolCallTokens from session to avoid busting cache unnecessarily and potentially confusing models
  • Preserves assistant text as a separate message item before function_call items in convertMessagesToResponseInput (responses API)
  • Lowers default context limit before compaction to 80% of model's context length. Anything after 50% usually sees progressively bigger drops in output quality so 80% seems a good point to compact at

These are opinionated changes, things seem to perform generally better if we let the caching do its job and don't edit/remove things from the history.
Lets do some tests and see how we feel about them

- Removes truncateOldToolContent and MaxToolCallTokens from session to avoid busting cache unnecessarily and potentially confusing models
- Preserves assistant text as a separate message item before function_call
items in convertMessagesToResponseInput (responses API)
- Lowers default context limit before compaction to 80% of model's context length, anything after 50% usually sees pregressively bigger drops in output quality

Signed-off-by: Christopher Petito <chrisjpetito@gmail.com>
@rumpl
Copy link
Member

rumpl commented Feb 24, 2026

Preserves assistant text as a separate message item before function_call items in convertMessagesToResponseInput (responses API)

Why?

@rumpl
Copy link
Member

rumpl commented Feb 24, 2026

The same way we can set the max number of messages to keep in the context, we should maybe make this configurable?

@krissetto
Copy link
Contributor Author

krissetto commented Feb 24, 2026

Preserves assistant text as a separate message item before function_call items in convertMessagesToResponseInput (responses API)

Why?

why not? some models can say something together with their tools calls, we were dropping it entirely which just seems wrong to me

The same way we can set the max number of messages to keep in the context, we should maybe make this configurable?

agree. i think the default should only be compaction, not thread truncation, to also to align with what are likely to be most users' expectations (aka "don't mess with my messages unless you really have to").
but being able to configure it is good imho

I've been giving this a ride and honestly it feels quite a bit better to me, especially with openai models. Curious to know if you noticed any issues etc, even cost related ones (not busting cache helps with costs too in many scenarios)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants