Skip to content

fix(tts): route MiMo Pro as chat#1682

Merged
zerob13 merged 1 commit into
devfrom
fix/mimo-v-model
May 27, 2026
Merged

fix(tts): route MiMo Pro as chat#1682
zerob13 merged 1 commit into
devfrom
fix/mimo-v-model

Conversation

@zerob13
Copy link
Copy Markdown
Collaborator

@zerob13 zerob13 commented May 27, 2026

Summary

  • tighten MiMo chat-audio TTS detection so MiMo Pro chat models do not enter TTS runtime
  • keep actual MiMo TTS variants on chat-audio TTS handling
  • guard chat-audio response content extraction against non-array content shapes
  • add regression coverage and archive the SDD notes

Tests

  • pnpm exec vitest --config vitest.config.ts test/main/shared/ttsSettings.test.ts test/main/presenter/llmProviderPresenter/aiSdkRuntime.test.ts
  • pnpm run format
  • pnpm run i18n
  • pnpm run lint
  • pnpm run typecheck:node
  • commit hook: pnpm run typecheck

Summary by CodeRabbit

  • Bug Fixes

    • Improved chat-audio TTS response handling to correctly extract audio data from chat completions.
    • Enhanced TTS model detection logic to accurately identify supported models.
  • Tests

    • Added regression tests for chat-audio TTS routing and response validation.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 27, 2026

📝 Walkthrough

Walkthrough

This PR refines chat-audio TTS routing for MiMo models by centralizing model detection logic, expanding prefix support for Xiaomi variants, and implementing defensive audio extraction that safely handles different response content shapes while preserving error behavior for missing audio.

Changes

Chat Audio TTS Routing Fix

Layer / File(s) Summary
Architecture & Routing Plan
docs/archives/chat-audio-tts-routing/plan.md, docs/archives/chat-audio-tts-routing/spec.md, docs/archives/chat-audio-tts-routing/tasks.md
Documents the routing refinement: tightened MiMo TTS detection requiring both known prefixes and tts segment marker, safer message.content handling, audio extraction prioritization, error preservation, and regression test requirements.
TTS Model Detection & Prefix Expansion
src/shared/ttsSettings.ts, test/main/shared/ttsSettings.test.ts
CHAT_AUDIO_TTS_MODEL_PREFIXES now includes xiaomi-mimo-v and mimo-v; isChatAudioTtsModel uses unified prefix matching and marker-pattern regex instead of hardcoded checks. Regression tests verify TTS classification for multiple variants and non-TTS models.
Audio Extraction & Pattern B Runtime
src/main/presenter/llmProviderPresenter/aiSdk/runtime.ts, test/main/presenter/llmProviderPresenter/aiSdkRuntime.test.ts
New extractChatAudioContentData helper safely traverses content arrays to locate audio parts; Pattern B parsing prioritizes direct message.audio.data over content extraction and preserves missing-audio errors. Tests cover non-TTS model fallback to standard streaming, TTS audio extraction from content parts, and error handling for missing audio.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • ThinkInAIXYZ/deepchat#1632: Directly related TTS model-level routing work that also modifies src/shared/ttsSettings.ts chat-audio detection constants and Pattern B handling.
  • ThinkInAIXYZ/deepchat#1633: Related TTS execution logic updates to src/main/presenter/llmProviderPresenter/aiSdk/runtime.ts for Pattern A/B response handling.

Poem

🐰 Hops through the audio routing with care,
TTS models now detected fair,
Content arrays? No TypeErrors here—
Extract with grace, and hold the sphere!
Xiaomi whispers, mimo-v rings clear.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(tts): route MiMo Pro as chat' is concise and directly addresses a core part of the changeset, though it represents only one aspect of the multi-faceted changes (which also include tightening TTS detection, guarding response content extraction, and adding regression tests).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/mimo-v-model

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/archives/chat-audio-tts-routing/plan.md (1)

1-22: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Move SDD docs from docs/archives/ to the required SDD location.

For this bug-fix PR, these new SDD artifacts should live under docs/issues/<goal>/ rather than docs/archives/.... Please relocate this doc set (plan/spec/tasks) to the required folder convention.

As per coding guidelines docs/**/*.md: Create specification-driven development documentation in kebab-case folders: docs/features/<goal>/ for new features, docs/issues/<goal>/ for bug fixes, docs/architecture/<goal>/ for refactors

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/archives/chat-audio-tts-routing/plan.md` around lines 1 - 22, Move the
SDD markdown from docs/archives/chat-audio-tts-routing/plan.md into the proper
bug-fix docs folder using kebab-case (e.g.,
docs/issues/chat-audio-tts-routing/plan.md), updating any internal links if
present; keep the file name but relocate the folder to docs/issues/<goal>/ per
guidelines and ensure the spec references the implementation symbols
isChatAudioTtsModel and executeTtsPatternB so readers can find the related code.
🧹 Nitpick comments (3)
test/main/shared/ttsSettings.test.ts (1)

10-13: ⚡ Quick win

Add a direct isTtsModelId negative for non-TTS xiaomi-mimo-v IDs.

You already cover chat-audio classification for xiaomi variants; adding isTtsModelId('xiaomi-mimo-v2.5-pro') === false will lock the newly added prefix path end-to-end.

Suggested test addition
     expect(isChatAudioTtsModel('mimo-v2.5-pro')).toBe(false)
     expect(isChatAudioTtsModel('xiaomimimo/mimo-v2.5-pro')).toBe(false)
+    expect(isTtsModelId('xiaomi-mimo-v2.5-pro')).toBe(false)
     expect(isTtsModelId('mimo-v2.5-pro')).toBe(false)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/main/shared/ttsSettings.test.ts` around lines 10 - 13, Add a negative
assertion for the non-TTS `xiaomi-mimo-v` prefix to ensure the `isTtsModelId`
path is covered: update the test block to call
isTtsModelId('xiaomi-mimo-v2.5-pro') and expect false so both
`isChatAudioTtsModel` and `isTtsModelId` reject `xiaomi-mimo` variants
(referencing the existing expectations around isChatAudioTtsModel and
isTtsModelId in this test).
test/main/presenter/llmProviderPresenter/aiSdkRuntime.test.ts (1)

490-564: ⚡ Quick win

Add a regression for multiple audio parts with a later valid payload.

Please add a case where the first type: 'audio' part has empty/missing audio.data and a later one is valid, then assert successful extraction. This locks fallback behavior for mixed content arrays.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/main/presenter/llmProviderPresenter/aiSdkRuntime.test.ts` around lines
490 - 564, Add a new test variant for runAiSdkCoreStream that returns a content
array where the first content entry of type 'audio' has empty or missing
audio.data and a later content entry of type 'audio' contains a valid base64
payload (e.g., 'ZmFrZS1hdWRpby1wYXJ0'); stub fetch the same way
(vi.stubGlobal('fetch', fetchMock)) and call runAiSdkCoreStream with the same
context and params, then assert the emitted events include the image_data using
the later valid audio payload (mimeType 'audio/wav') and a final stop event,
verifying the function falls back to later audio parts when earlier ones lack
data.
src/main/presenter/llmProviderPresenter/aiSdk/runtime.ts (1)

411-419: ⚡ Quick win

Search for the first valid audio payload, not just the first audio part.

Current logic stops at the first type === 'audio' entry even if its audio.data is empty. Iterating until the first valid non-empty string avoids false “missing audio” errors on mixed content arrays.

Suggested robustness patch
-  const audioPart = content.find(
-    (item) => item && typeof item === 'object' && 'type' in item && item.type === 'audio'
-  )
-  const audioData =
-    audioPart && typeof audioPart === 'object' && 'audio' in audioPart
-      ? (audioPart.audio as { data?: unknown } | undefined)?.data
-      : undefined
-
-  return typeof audioData === 'string' && audioData ? audioData : undefined
+  for (const item of content) {
+    if (!item || typeof item !== 'object') continue
+    if (!('type' in item) || item.type !== 'audio') continue
+
+    const audioData =
+      'audio' in item ? (item.audio as { data?: unknown } | undefined)?.data : undefined
+    if (typeof audioData === 'string' && audioData) {
+      return audioData
+    }
+  }
+
+  return undefined
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/main/presenter/llmProviderPresenter/aiSdk/runtime.ts` around lines 411 -
419, The current use of content.find to get audioPart stops at the first item
with type==='audio' even if its audio.data is empty; modify the logic that
derives audioPart/audioData (the content.find call and the audioData extraction)
to search for the first item where item.type === 'audio' and item.audio?.data is
a non-empty string (or loop through content until you find such an item), then
return that data string or undefined; update the code paths referencing
audioPart and audioData to use this validated audio payload check so empty audio
entries are skipped.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@docs/archives/chat-audio-tts-routing/plan.md`:
- Around line 1-22: Move the SDD markdown from
docs/archives/chat-audio-tts-routing/plan.md into the proper bug-fix docs folder
using kebab-case (e.g., docs/issues/chat-audio-tts-routing/plan.md), updating
any internal links if present; keep the file name but relocate the folder to
docs/issues/<goal>/ per guidelines and ensure the spec references the
implementation symbols isChatAudioTtsModel and executeTtsPatternB so readers can
find the related code.

---

Nitpick comments:
In `@src/main/presenter/llmProviderPresenter/aiSdk/runtime.ts`:
- Around line 411-419: The current use of content.find to get audioPart stops at
the first item with type==='audio' even if its audio.data is empty; modify the
logic that derives audioPart/audioData (the content.find call and the audioData
extraction) to search for the first item where item.type === 'audio' and
item.audio?.data is a non-empty string (or loop through content until you find
such an item), then return that data string or undefined; update the code paths
referencing audioPart and audioData to use this validated audio payload check so
empty audio entries are skipped.

In `@test/main/presenter/llmProviderPresenter/aiSdkRuntime.test.ts`:
- Around line 490-564: Add a new test variant for runAiSdkCoreStream that
returns a content array where the first content entry of type 'audio' has empty
or missing audio.data and a later content entry of type 'audio' contains a valid
base64 payload (e.g., 'ZmFrZS1hdWRpby1wYXJ0'); stub fetch the same way
(vi.stubGlobal('fetch', fetchMock)) and call runAiSdkCoreStream with the same
context and params, then assert the emitted events include the image_data using
the later valid audio payload (mimeType 'audio/wav') and a final stop event,
verifying the function falls back to later audio parts when earlier ones lack
data.

In `@test/main/shared/ttsSettings.test.ts`:
- Around line 10-13: Add a negative assertion for the non-TTS `xiaomi-mimo-v`
prefix to ensure the `isTtsModelId` path is covered: update the test block to
call isTtsModelId('xiaomi-mimo-v2.5-pro') and expect false so both
`isChatAudioTtsModel` and `isTtsModelId` reject `xiaomi-mimo` variants
(referencing the existing expectations around isChatAudioTtsModel and
isTtsModelId in this test).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dc587677-fef5-49d2-83ee-253aa2bf22c5

📥 Commits

Reviewing files that changed from the base of the PR and between 9d81d6f and 4747657.

📒 Files selected for processing (7)
  • docs/archives/chat-audio-tts-routing/plan.md
  • docs/archives/chat-audio-tts-routing/spec.md
  • docs/archives/chat-audio-tts-routing/tasks.md
  • src/main/presenter/llmProviderPresenter/aiSdk/runtime.ts
  • src/shared/ttsSettings.ts
  • test/main/presenter/llmProviderPresenter/aiSdkRuntime.test.ts
  • test/main/shared/ttsSettings.test.ts

@zerob13 zerob13 merged commit 3908493 into dev May 27, 2026
3 checks passed
@zhangmo8 zhangmo8 deleted the fix/mimo-v-model branch May 27, 2026 07:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant