context truncation/compaction improvements#1831
context truncation/compaction improvements#1831krissetto wants to merge 1 commit intodocker:mainfrom
Conversation
- Removes truncateOldToolContent and MaxToolCallTokens from session to avoid busting cache unnecessarily and potentially confusing models - Preserves assistant text as a separate message item before function_call items in convertMessagesToResponseInput (responses API) - Lowers default context limit before compaction to 80% of model's context length, anything after 50% usually sees pregressively bigger drops in output quality Signed-off-by: Christopher Petito <chrisjpetito@gmail.com>
Why? |
|
The same way we can set the max number of messages to keep in the context, we should maybe make this configurable? |
why not? some models can say something together with their tools calls, we were dropping it entirely which just seems wrong to me
agree. i think the default should only be compaction, not thread truncation, to also to align with what are likely to be most users' expectations (aka "don't mess with my messages unless you really have to"). I've been giving this a ride and honestly it feels quite a bit better to me, especially with openai models. Curious to know if you noticed any issues etc, even cost related ones (not busting cache helps with costs too in many scenarios) |
These are opinionated changes, things seem to perform generally better if we let the caching do its job and don't edit/remove things from the history.
Lets do some tests and see how we feel about them