Skip to content

Commit 86b0642

Browse files
committed
Merge branch 'main' into eval-claude-codex
2 parents 2474081 + d27fe0c commit 86b0642

File tree

12 files changed

+427
-293
lines changed

12 files changed

+427
-293
lines changed

.agents/base2/base2-editor.ts

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
import { createBase2 } from './base2'
2+
3+
const definition = {
4+
...createBase2('default', { useEditor: true }),
5+
id: 'base2-editor',
6+
displayName: 'Buffy the Editor Orchestrator',
7+
}
8+
export default definition

.agents/base2/base2.ts

Lines changed: 27 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,14 @@ export function createBase2(
1111
options?: {
1212
hasNoValidation?: boolean
1313
planOnly?: boolean
14+
useEditor?: boolean
1415
},
1516
): Omit<SecretAgentDefinition, 'id'> {
16-
const { hasNoValidation = mode === 'fast', planOnly = false } = options ?? {}
17+
const {
18+
hasNoValidation = mode === 'fast',
19+
planOnly = false,
20+
useEditor = false,
21+
} = options ?? {}
1722
const isDefault = mode === 'default'
1823
const isFast = mode === 'fast'
1924
const isMax = mode === 'max'
@@ -65,6 +70,7 @@ export function createBase2(
6570
'researcher-docs',
6671
isLite ? 'commander-lite' : 'commander',
6772
isLite && 'editor-gpt-5',
73+
useEditor && 'editor',
6874
isMax && 'editor-best-of-n-max',
6975
isMax && 'thinker-best-of-n-opus',
7076
!isLite && 'code-reviewer-opus',
@@ -119,6 +125,8 @@ Use the spawn_agents tool to spawn specialized agents to help you complete the u
119125
'- Spawn context-gathering agents (file pickers, code-searcher, directory-lister, glob-matcher, and web/docs researchers) before making edits.',
120126
isLite &&
121127
'- Spawn the editor-gpt-5 agent to implement the changes after you have gathered all the context you need.',
128+
useEditor &&
129+
'- Spawn the editor agent to implement the changes after you have gathered all the context you need.',
122130
isMax &&
123131
'- Spawn the thinker-best-of-n-opus after gathering context to solve complex problems.',
124132
isMax &&
@@ -171,11 +179,13 @@ ${buildArray(
171179
[ You read a few other relevant files using the read_files tool ]
172180
173181
${
174-
isDefault || isFast
175-
? '[ You implement the changes using the str_replace or write_file tools ]'
176-
: isLite
177-
? '[ You implement the changes using the editor-gpt-5 agent ]'
178-
: '[ You implement the changes using the editor-best-of-n-max agent ]'
182+
useEditor
183+
? `[ You implement the changes using the editor agent ]`
184+
: isDefault || isFast
185+
? '[ You implement the changes using the str_replace or write_file tools ]'
186+
: isLite
187+
? '[ You implement the changes using the editor-gpt-5 agent ]'
188+
: '[ You implement the changes using the editor-best-of-n-max agent ]'
179189
}
180190
181191
${
@@ -225,6 +235,7 @@ ${PLACEHOLDER.GIT_CHANGES_PROMPT}
225235
isMax,
226236
isLite,
227237
hasNoValidation,
238+
useEditor,
228239
}),
229240
stepPrompt: planOnly
230241
? buildPlanOnlyStepPrompt({})
@@ -233,6 +244,7 @@ ${PLACEHOLDER.GIT_CHANGES_PROMPT}
233244
isMax,
234245
hasNoValidation,
235246
isSonnet,
247+
useEditor,
236248
}),
237249

238250
handleSteps: function* ({ params }) {
@@ -265,13 +277,15 @@ function buildImplementationInstructionsPrompt({
265277
isMax,
266278
isLite,
267279
hasNoValidation,
280+
useEditor,
268281
}: {
269282
isSonnet: boolean
270283
isFast: boolean
271284
isDefault: boolean
272285
isMax: boolean
273286
isLite: boolean
274287
hasNoValidation: boolean
288+
useEditor: boolean
275289
}) {
276290
return `Act as a helpful assistant and freely respond to the user's request however would be most helpful to the user. Use your judgement to orchestrate the completion of the user's request using your specialized sub-agents and tools as needed. Take your time and be comprehensive. Don't surprise the user. For example, don't modify files if the user has not asked you to do so at least implicitly.
277291
@@ -287,8 +301,10 @@ ${buildArray(
287301
`- For any task requiring 3+ steps, use the write_todos tool to write out your step-by-step implementation plan. Include ALL of the applicable tasks in the list.${isFast ? '' : ' You should include a step to review the changes after you have implemented the changes.'}:${hasNoValidation ? '' : ' You should include at least one step to validate/test your changes: be specific about whether to typecheck, run tests, run lints, etc.'} You may be able to do reviewing and validation in parallel in the same step. Skip write_todos for simple tasks like quick edits or answering questions.`,
288302
isLite &&
289303
'- IMPORTANT: You must spawn the editor-gpt-5 agent to implement the changes after you have gathered all the context you need. This agent will do the best job of implementing the changes so you must spawn it for all changes.',
304+
useEditor &&
305+
'- IMPORTANT: You must spawn the editor agent to implement the changes after you have gathered all the context you need. This agent will do the best job of implementing the changes so you must spawn it for all non-trivial changes. Do not pass any prompt or params to the editor agent when spawning it. It will make its own best choices of what to do.',
290306
isMax &&
291-
`- IMPORTANT: You must spawn the editor-best-of-n-max agent to implement non-trivial code changes, since it will generate the best code changes from multiple implementation proposals. This is the best way to make high quality code changes -- strongly prefer using this agent over the str_replace or write_file tools, unless the change is very straightforward and obvious.`,
307+
`- IMPORTANT: You must spawn the editor-best-of-n-max agent to implement non-trivial code changes, since it will generate the best code changes from multiple implementation proposals. This is the best way to make high quality code changes -- strongly prefer using this agent over the str_replace or write_file tools, unless the change is very straightforward and obvious. Do not pass any prompt or params to the editor agent when spawning it. It will make its own best choices of what to do.`,
292308
(isDefault || isFast) &&
293309
'- Implement the changes using the str_replace or write_file tools.',
294310
isFast &&
@@ -308,15 +324,19 @@ function buildImplementationStepPrompt({
308324
isMax,
309325
hasNoValidation,
310326
isSonnet,
327+
useEditor,
311328
}: {
312329
isFast: boolean
313330
isMax: boolean
314331
hasNoValidation: boolean
315332
isSonnet: boolean
333+
useEditor: boolean
316334
}) {
317335
return buildArray(
318336
isMax &&
319337
`Keep working until the user's request is completely satisfied${!hasNoValidation ? ' and validated' : ''}, or until you require more information from the user.`,
338+
useEditor &&
339+
`You must spawn the 'editor' agent to implement code changes, since it will do the best job of implementing the changes.`,
320340
isMax &&
321341
`You must spawn the 'editor-best-of-n-max' agent to implement code changes, since it will generate the best code changes.`,
322342
isMax && 'Spawn the thinker-best-of-n-opus to solve complex problems.',

.agents/editor/best-of-n/best-of-n-selector.ts

Lines changed: 3 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -26,11 +26,6 @@ export const createBestOfNSelector = (options: {
2626
effort: 'high',
2727
},
2828
}),
29-
...(isOpus && {
30-
reasoningOptions: {
31-
max_tokens: 4000,
32-
},
33-
}),
3429
displayName: isGpt5
3530
? 'Best-of-N GPT-5 Implementation Selector'
3631
: isGemini
@@ -114,12 +109,10 @@ Try to select an implementation that fulfills all the requirements in the user's
114109
## Response Format
115110
116111
${
117-
isSonnet
118-
? `Use <think> tags to briefly consider the implementations as needed to pick the best implementation.
119-
120-
If the best one is obvious or the implementations are very similar, you may not need to think very much (a few words suffice) or you may not need to use think tags at all, just pick the best one and output it. You have a dual goal of picking the best implementation and being fast (using as few words as possible).
112+
isSonnet || isOpus
113+
? `Use <think> tags to write out your thoughts about the implementations as needed to pick the best implementation. IMPORTANT: You should think really really hard to make sure you pick the absolute best implementation! As soon as you know for sure which implementation is the best, you should output your choice.
121114
122-
Then, do not write any other explanations AT ALL. You should directly output a single tool call to set_output with the selected implementationId.`
115+
Then, do not write any other explanations AT ALL. You should directly output a single tool call to set_output with the selected implementationId and short reason.`
123116
: `Output a single tool call to set_output with the selected implementationId. Do not write anything else.`
124117
}`,
125118
}

.agents/editor/best-of-n/editor-best-of-n.ts

Lines changed: 14 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ export function createBestOfNEditor(
5252
properties: {
5353
n: {
5454
type: 'number',
55-
description: `Number of parallel implementor agents to spawn. Defaults to ${isDefault ? 4 : 5}. Use fewer for simple tasks and max of 10 for complex tasks.`,
55+
description: `Number of parallel implementor agents to spawn. Defaults to ${isMax ? 4 : 3}. Use fewer for simple tasks and max of 10 for complex tasks.`,
5656
},
5757
},
5858
},
@@ -73,7 +73,7 @@ function* handleStepsDefault({
7373
}: AgentStepContext): ReturnType<
7474
NonNullable<SecretAgentDefinition['handleSteps']>
7575
> {
76-
const DEFAULT_N = 4
76+
const DEFAULT_N = 3
7777
const selectorAgent = 'best-of-n-selector'
7878
const n = Math.min(
7979
10,
@@ -235,7 +235,7 @@ function* handleStepsMax({
235235
}: AgentStepContext): ReturnType<
236236
NonNullable<SecretAgentDefinition['handleSteps']>
237237
> {
238-
const MAX_N = 5
238+
const MAX_N = 4
239239
const selectorAgent = 'best-of-n-selector-opus'
240240
const n = Math.min(
241241
10,
@@ -245,12 +245,14 @@ function* handleStepsMax({
245245
// Model selection pattern for max mode, using opus and gpt-5
246246
const MAX_MODEL_PATTERN = [
247247
'editor-implementor-opus',
248-
'editor-implementor-gemini',
248+
'editor-implementor-opus',
249+
// 'editor-implementor-gemini',
249250
'editor-implementor-gpt-5',
250251
'editor-implementor-opus',
251252
'editor-implementor-opus',
252253
'editor-implementor-gpt-5',
253-
'editor-implementor-gemini',
254+
// 'editor-implementor-gemini',
255+
'editor-implementor-opus',
254256
'editor-implementor-opus',
255257
'editor-implementor-opus',
256258
'editor-implementor-opus',
@@ -296,6 +298,10 @@ function* handleStepsMax({
296298
implementorResults,
297299
) as any[]
298300

301+
logger.info(
302+
{ implementorResults, spawnedImplementations },
303+
'spawnedImplementations',
304+
)
299305
// Extract all the plans from the structured outputs
300306
const letters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
301307
// Parse implementations from spawn results
@@ -304,14 +310,9 @@ function* handleStepsMax({
304310
content:
305311
'errorMessage' in result
306312
? `Error: ${result.errorMessage}`
307-
: extractLastMessageText(result),
313+
: extractLastMessageText(result) ?? '',
308314
}))
309315

310-
logger.info(
311-
{ spawnedImplementations, implementations },
312-
'spawnedImplementations',
313-
)
314-
315316
// Spawn selector with implementations as params
316317
const { toolResult: selectorResult, agentState: selectorAgentState } = yield {
317318
toolName: 'spawn_agents',
@@ -432,14 +433,14 @@ function* handleStepsOpus({
432433
}: AgentStepContext): ReturnType<
433434
NonNullable<SecretAgentDefinition['handleSteps']>
434435
> {
435-
const DEFAULT_N = 5
436+
const DEFAULT_N = 3
436437
const selectorAgent = 'best-of-n-selector-opus'
437438
const n = Math.min(
438439
10,
439440
Math.max(1, (params?.n as number | undefined) ?? DEFAULT_N),
440441
)
441442

442-
// Spawn implementor agents: 1 gemini + rest sonnet (if n >= 2)
443+
// Spawn implementor agents
443444
const implementorAgents = []
444445
for (let i = 0; i < n; i++) {
445446
implementorAgents.push({
@@ -459,8 +460,6 @@ function* handleStepsOpus({
459460
const spawnedImplementations =
460461
extractSpawnResults<{ text: string }[]>(implementorResults)
461462

462-
logger.info({ spawnedImplementations }, 'spawnedImplementations')
463-
464463
// Extract all the plans from the structured outputs
465464
const letters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
466465
// Parse implementations from spawn results

.agents/editor/best-of-n/editor-implementor.ts

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -72,12 +72,14 @@ ${
7272
isGpt5 || isGemini
7373
? ``
7474
: `
75-
You can also use <think> tags interspersed between tool calls to think about the best way to implement the changes. Keep these thoughts very brief. You may not need to use think tags at all.
75+
IMPORTANT: Before you start writing your implementation, you should use <think> tags to think about the best way to implement the changes. You should think really really hard to make sure you implement the changes in the best way possible. Take as much time as you to think through all the cases to produce the best changes.
76+
77+
You can also use <think> tags interspersed between tool calls to think about the best way to implement the changes.
7678
7779
<example>
7880
7981
<think>
80-
[ Thoughts about the best way to implement the feature ]
82+
[ Long think about the best way to implement the changes ]
8183
</think>
8284
8385
<codebuff_tool_call>
@@ -99,7 +101,7 @@ You can also use <think> tags interspersed between tool calls to think about the
99101
</example>`
100102
}
101103
102-
After the edit tool calls, you can optionally mention any follow-up steps to take, like deleting a file, or a sepcific way to validate the changes. There's no need to use the set_output tool as your entire response will be included in the output.
104+
After the edit tool calls, you can optionally mention any follow-up steps to take, like deleting a file, or a specific way to validate the changes. There's no need to use the set_output tool as your entire response will be included in the output.
103105
104106
Your implementation should:
105107
- Be complete and comprehensive

0 commit comments

Comments
 (0)