skill-as-tool: tell weak models not to retry failed calls by cjus · Pull Request #24 · cjus/solrac

cjus · 2026-05-16T00:24:00Z

Summary

Live v0.7.0 dogfooding under openai/gpt-oss-20b on LMStudio surfaced a tool-loop pathology: when a skill-as-tool call hits iteration_cap (or any deterministic failure), the parent model retries the same skill 3–4× before the loop detector intervenes. Skill execution is deterministic given (skill, args) — retries can't succeed; they waste rounds, accumulate noise in the parent's context, and produce confused final answers.

The fix expands the skill-tool error envelope with explicit non-retry signaling that weak local models can act on.

Before:

{"success":false,"error":"iteration_cap"}

After:

{
  "success": false,
  "error": "iteration_cap",
  "retryable": false,
  "hint": "Do not call 'skills__tldr' again this turn — same input produces the same result. Continue without this skill and answer the user with whatever information you already have."
}

The raw error string is preserved verbatim so operator log-grepping continues to work. retryable and hint are additive fields a parent model reads to abandon the skill on first failure.

Symptom this fixes

Production audit chain (auditId 220) under LMStudio + gpt-oss-20b, query "list my gmail accounts":

Model fires gmail_list_accounts ✓
Then unprompted gmail_search_messages
Then 4 attempts at skills__tldr — each spawns a nested loop hitting maxIterations:1 cap → returns iteration_cap error
Loop detector eventually fires at threshold:3
Final user-facing response: "I'm not sure what you'd like me to log…" — the parent's confused interpretation of repeated tldr failures, not an answer to the actual query

With this fix the parent sees retryable:false on the first tldr failure and produces a final answer instead of cycling.

Implementation

src/skill-tools.ts::buildSkillErrorPayload(skillName, errorMessage) — centralized payload builder, exported for test + reuse
Both error sites in buildOneSkillTool route through it (skill execution failure, missing skillToolCtx defensive path)
No behavior change for successful skill calls — only the error envelope shape changes

Why not retry-classify (transient vs permanent)?

Considered marking some errors retryable (e.g. transient network) but rejected: skill execution is purely deterministic for the same (skill, args). Network errors inside the skill's tool loop are already retried internally by runToolLoop. Anything that escapes to the skill-tool boundary has exhausted its internal retries and won't recover on a second call from the parent.

Test plan

npm run typecheck — clean
bun test — 759/759 pass (+4 vs v0.7.0)
New tests in src/skill-tools.test.ts: MCP content shape, retryable:false invariant, per-skill hint identifier (so multi-skill turns disambiguate), arbitrary error string passthrough
Live retest: LOCAL_BACKEND=lmstudio LOCAL_MODEL=openai/gpt-oss-20b + reproduce the gmail/tldr cycle, confirm tldr is abandoned on first failure

No anti-goal reversals

No SDK pin bump. No new runtime deps. Additive envelope fields only.

Under openai/gpt-oss-20b on lmstudio, a skill-as-tool call that hits iteration_cap (or any deterministic failure) gets retried 3-4× by the parent model before the loop detector intervenes. The bare envelope {success:false, error:"iteration_cap"} reads as transient to a weak model — it doesn't know skill execution is deterministic given (skill, args). Expanded the error envelope with explicit retryable:false + plain-prose hint that names the specific skill the parent should stop calling. Centralized in `buildSkillErrorPayload` so both error sites (execution failure, missing context defensive path) emit the same shape. +4 unit tests pin the MCP content shape, retryable invariant, per-skill hint identifier, and arbitrary-error-string passthrough. Error string preserved verbatim — operator log-grep still works. Retryable/hint are additive; no breaking changes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

skill-as-tool: tell weak models not to retry failed calls#24

skill-as-tool: tell weak models not to retry failed calls#24
cjus wants to merge 1 commit into
mainfrom
carlos/solrac-skill-error-hint

cjus commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cjus commented May 16, 2026

Summary

Symptom this fixes

Implementation

Why not retry-classify (transient vs permanent)?

Test plan

No anti-goal reversals

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant