Skip to content

fix(phone/voice): stop spammy Allow prompts AND signature-loop kills#59

Merged
1bcMax merged 2 commits into
mainfrom
fix/phone-voice-permissions
May 18, 2026
Merged

fix(phone/voice): stop spammy Allow prompts AND signature-loop kills#59
1bcMax merged 2 commits into
mainfrom
fix/phone-voice-permissions

Conversation

@KillerQueen-Z
Copy link
Copy Markdown
Collaborator

@KillerQueen-Z KillerQueen-Z commented May 18, 2026

Summary

PR #58 added 8 typed Phone/Voice tools but missed two integration points. Both surface as agent-killing bugs the moment a real user (especially on cheap models routed by clawrouter) tries to make a call. This PR fixes both in one go so reviewer sees the full picture.

Bug 1 — "Allow?" prompt spam

PR #58 never wired the 8 tools into `PermissionManager`. They all fell through to `decide()`'s default branch and triggered an interactive "Allow?" prompt on every single call — including the side-effect-free polling tool VoiceStatus. Users saw 5+ "Allow VoiceStatus?" prompts per minute during a single call.

Bug 2 — Signature-loop guard kills polling turns

Even with permissions fixed, VoiceStatus shipped as a naked one-shot GET. The agent has to drive the poll cadence itself, calling VoiceStatus(call_id=X) repeatedly until status flips to terminal. The signature counter — `turnSignatureCounts.get('VoiceStatus:{call_id:"X"}')` — climbs by one per poll. At 5, `loop.ts` kills the turn with `Loop stopped: ... repeated the same input 5×`. Confirmed in a user session 2026-05-18: turn 1 fired 1 VoiceCall + 5 VoiceStatus before being killed; the call kept running upstream but the agent never saw the transcript.

Fix 1 — Classify Phone & Voice tools in permissions.ts

Splits the 8 tools by side-effect (matching the rule the rest of the permission system uses — READ_ONLY = "doesn't change the world outside the gateway", price is orthogonal):

Tool Category Price Rationale
`ListPhoneNumbers` READ_ONLY $0.001 Cached inventory read
`PhoneLookup` READ_ONLY $0.01 Carrier + line-type query
`PhoneFraudCheck` READ_ONLY $0.05 SIM-swap signals query
`VoiceStatus` READ_ONLY free GET poll on existing call
`VoiceCall` ASK $0.54 Dials a real human, irreversible
`BuyPhoneNumber` ASK $5 Holds a number for 30 days
`RenewPhoneNumber` ASK $5 Extends a held number
`ReleasePhoneNumber` ASK free Permanently returns number to pool

`WebSearch` and `ImageGen` also charge USDC but live in READ_ONLY because they don't dial anyone or permanently mutate gateway state. Same logic here.

Fix 2 — VoiceStatus internal poll-until-terminal

Refactor VoiceStatus to mirror the pattern videogen.ts and imagegen.ts already use:

  • Tool blocks internally, polling every 5 s until terminal status (completed / failed / no-answer / busy / voicemail / cancelled).
  • 35-min ceiling — Bland.ai caps a single call at 30 min, +5 min headroom for upstream settlement.
  • Each poll iteration writes the latest snapshot to the local CallLog (panel Calls tab stays live even while agent is blocked).
  • `ctx.abortSignal` honored — Ctrl-C cancels the poll cleanly.
  • Tool description updated to explicitly say "CALL THIS ONCE" — even cheap pattern-matching models (deepseek, haiku) won't try a manual loop.

Agent emits exactly one VoiceStatus tool_use per call and gets back the final transcript when the call ends. Signature counter stays at 1, guard never trips.

Test plan

  • `npm run build` → clean
  • `npm test` → 405/405 pass on both fixes
  • Manual e2e (boss can do this after merge):
    • `franklin --model deepseek/deepseek-v4-pro`
    • Ask agent to make a 30-second test call to a number you own
    • Observe: one "Allow VoiceCall?" prompt, zero subsequent prompts during the call, agent returns full transcript in a single turn without "Loop stopped"

Why both fixes belong in one PR

They're the same regression — "PR #58 shipped typed Phone/Voice tools but didn't wire them into the surrounding agent infrastructure." Reviewing them together makes the failure mode obvious. Splitting them would leave the agent half-broken on whichever side merged first.

🤖 Generated with Claude Code

KillerQueen-Z and others added 2 commits May 18, 2026 13:06
…" prompts

PR #58 introduced 8 typed Phone/Voice capabilities (ListPhoneNumbers,
PhoneLookup, PhoneFraudCheck, VoiceStatus, BuyPhoneNumber,
RenewPhoneNumber, ReleasePhoneNumber, VoiceCall) but never wired them
into the permissions system. They all fell through to the default
fall-back branch in PermissionManager.check() and triggered an
interactive "Allow?" prompt on every single call.

Real-world impact today: agent placed one outbound call (correct,
should ask), then polled VoiceStatus every few seconds while the call
was running. Each VoiceStatus poll prompted the user — five+ prompts
per minute during a 1-min call. Confirmed from a user session today
(2026-05-18) showing 11 separate VoiceStatus tool_use entries against
one call_id.

Classification follows the same side-effect rule the rest of the
permission system uses (READ_ONLY = "doesn't change the world outside
the gateway", regardless of price):

  READ_ONLY (auto-allow):
    - ListPhoneNumbers   — cached wallet inventory read ($0.001)
    - PhoneLookup        — carrier + line-type query ($0.01)
    - PhoneFraudCheck    — SIM-swap signals ($0.05)
    - VoiceStatus        — free GET poll on existing call

  ASK (explicit user consent every call, matches Write/Edit/Bash):
    - VoiceCall          — dials a real human, $0.54, can't be undone
    - BuyPhoneNumber     — holds a Twilio number for 30 days, $5
    - RenewPhoneNumber   — extends a held number, $5
    - ReleasePhoneNumber — permanently returns number to pool, irreversible

Pricing is intentionally orthogonal — WebSearch and ImageGen also
charge USDC but live in READ_ONLY because they don't dial anyone /
permanently mutate gateway state.

Does NOT touch the separate signature-loop-guard bug (VoiceStatus is a
poll-style tool but the guard treats repeated identical inputs as a
stuck loop — that's a follow-up PR that needs a polling-tool whitelist
in src/agent/loop.ts).

Test plan:
  npm test  → 405/405 pass (no permission test references this set
              directly; classification is a pure additive change)
  Manual:   start a session, call VoiceCall, observe one prompt; agent
            then polls VoiceStatus 5x with no further prompts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Companion fix to the permissions classification in the same PR. Without
this, even after we stop spamming "Allow?" prompts, the agent still
trips Franklin's signature-loop guard the moment it has to wait for a
real call to finish.

Background: VoiceStatus ships as a naked one-shot GET — agent calls it,
gets `in-progress`, has to decide to call it again, gets `in-progress`,
again, again... Same call_id every time, so the input signature is
literally identical. `loop.ts:turnSignatureCounts` triggers at 5 and
kills the turn with "Loop stopped: ... repeated the same input 5×".

Confirmed in a user session (2026-05-18, screenshot shared): turn 1
fired 1 VoiceCall + 5 VoiceStatus polls, all hit while the call was
still ringing/in-progress, then died at the 5th. Call kept running in
the cloud; agent never saw the transcript on the original turn. User
had to ask "give me the transcript" in a second turn to recover.

The fix: mirror the pattern videogen.ts and imagegen.ts already use.
VoiceStatus blocks internally, polling every 5 s until a terminal
status (completed / failed / no-answer / busy / voicemail / cancelled)
or the 35-min ceiling. Agent emits exactly one VoiceStatus tool_use
per call and gets back the final transcript when the call ends.

Side benefits:
- Each poll iteration writes the latest snapshot to the local call log
  (so the panel Calls tab updates live even though the agent is blocked
  in the tool).
- ctx.abortSignal is honored — Ctrl-C cancels the poll cleanly.
- 35 min ceiling = 30 min Bland max_duration + 5 min headroom for the
  upstream to settle / mark final status.

Updated tool description tells the model explicitly: "CALL THIS ONCE.
The tool blocks internally" — so even cheap models that pattern-match
on description (deepseek, haiku) won't try to poll in a loop.

Test plan:
  npm test  → 405/405 pass (same baseline as the permissions change;
              no test references VoiceStatus directly)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@1bcMax 1bcMax merged commit 7ce2037 into main May 18, 2026
@KillerQueen-Z KillerQueen-Z changed the title fix(permissions): classify Phone & Voice tools to stop spammy "Allow?" prompts fix(phone/voice): stop spammy Allow prompts AND signature-loop kills May 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant