Skip to content

feat(dashscope): Alibaba Bailian DashScope module — CosyVoice TTS + voice cloning + Wanx#17

Merged
RoboZephyr merged 2 commits into
mainfrom
feat/dashscope-module
May 17, 2026
Merged

feat(dashscope): Alibaba Bailian DashScope module — CosyVoice TTS + voice cloning + Wanx#17
RoboZephyr merged 2 commits into
mainfrom
feat/dashscope-module

Conversation

@RoboZephyr

Copy link
Copy Markdown
Owner

Summary

Adds `library/dashscope/` covering the non-LLM half of Alibaba's DashScope platform — CosyVoice TTS, voice cloning, Wanx image gen. Qwen LLM continues to live in the separate `qwen` module (same physical API key, different module per SPEC §0).

Live verification — tier `partial`

step result
Install `dashscope==1.25.18` in tmp venv
Submit CosyVoice v3-flash TTS via official Python SDK ✅ WebSocket opened, task accepted
Bearer auth header validated ✅ task_id issued
Audio bytes returned ❌ `task-failed` event: `error_code: "Arrearage"` (account balance ≤ 0)

Same tier as kling: auth + contract verified, runtime blocked by billing. `last_verified` records this faithfully.

Module highlights

10 Critical Constraints (gotchas-first per SPEC §2.1):

  • WebSocket-only protocol (not REST — debunks confusing CosyVoice v1 batch REST examples in older docs)
  • Region split (`dashscope.aliyuncs.com` vs `dashscope-intl.aliyuncs.com`) with misleading InvalidApiKey error
  • Voice ID is model-version-locked (`*_v3` ids only on v3 models)
  • Character billing counts full-width punctuation; cost meter is `usage.input_tokens` (misleading field name for TTS)
  • No official Node SDK — raw WebSocket for non-Python runtimes
  • Arrearage gate (discovered live during smoke)
  • 4 more

Sections: Setup, Python SDK quickstart, voice catalogue (representative 7 voices, generic framing, link to full 150+), voice cloning recipe, Wanx image gen example, raw WebSocket sketch for Node/Edge, cost table, 6-row error reference, cross-module pointer to `qwen`, source-of-truth URLs.

Bookkeeping

  • 20 → 21 modules
  • 5 prod · 14 verified · 1 partial → 5 prod · 14 verified · 2 partial (dashscope joins kling at partial tier)
  • Site module grid: `dashscope` added under `media-generation` next to seedance/seedream/kling/fal-ai

Privacy

Earlier private-fork of this module body had `Moment Stream` + `ADR-0021` references (maintainer's downstream project + internal ADR). Both stripped. Pre-commit hook scan: clean on staged diff.

🤖 Generated with Claude Code

RoboZephyr and others added 2 commits May 15, 2026 17:58
CF Pages deployment API(错误码 8000111)不接受带换行的 commit message。
wrangler 默认会把 git log -1 的 subject+body 整块发过去,于是带 body 的提交
就会 deploy 失败。错误信息称"not valid UTF-8"是误导——实际拒的是换行符。

加了坑 3 完整说明 + 两种修法(手动 --commit-message "$(git log -1 --pretty=%s)"
或在项目里写 scripts/deploy.sh 封装)。错误调试速查表也补了 8000111 + 8000007。

发现来源:classics-learning 项目 deploy 时踩到,单行中文 commit 没事,
本次 commit 带了 body 才暴露。
… + voice cloning + Wanx

Adds `library/dashscope/` covering the non-LLM half of Alibaba's
DashScope platform (Qwen LLM is intentionally NOT in scope here — that
lives in the separate `qwen` module which targets the OpenAI-compatible
chat-completions front; same physical API key, different module per
SPEC §0 flat-no-inheritance rule).

Live verification (2026-05-17) — tier `partial`
- Installed dashscope==1.25.18 in a tmp venv
- DASHSCOPE_API_KEY (sk-...) sourced from ~/.trove/dashscope/credentials.json
- Submitted CosyVoice v3-flash TTS task via official Python SDK
  (`dashscope.audio.tts_v2.SpeechSynthesizer`) with voice=longxing_v3,
  text="trove smoke 你好"
- WebSocket connection opened to wss://dashscope.aliyuncs.com/api-ws/v1/inference
- Bearer header accepted; run-task event acknowledged; task_id issued
- Runtime then blocked: `task-failed` event with
  `error_code: "Arrearage"`, message `Access denied, please make sure
  your account is in good standing.` Account balance ≤ 0
- Auth + endpoint + request schema verified end to end; only the
  billing gate stops actual audio bytes. Same partial-tier shape as
  the existing kling module entry

Module shape (10 Critical Constraints up front, gotchas-first)
1. Account funding gate (Arrearage discovered during smoke)
2. CosyVoice is WebSocket-only, not REST sync — debunks confusing
   REST-style examples in older docs which were CosyVoice v1 batch
3. Region split (China dashscope.aliyuncs.com vs intl
   dashscope-intl.aliyuncs.com) with misleading InvalidApiKey error
4. Auth header is `Authorization: Bearer`, NOT legacy
   `X-DashScope-API-Key:` (pre-2024 form rejected on v3)
5. Voice ID is model-version-locked (longxing_v3 only on v3-*;
   longxiaochun only on v2)
6. Character billing counts full-width punctuation; cost meter is
   in `usage.input_tokens` (misleading field name for a TTS product)
7. Voice cloning is free; only synthesis charges
8. Watermark setting is account-wide (console), not per-call
9. Wanx image gen uses task-poll pattern (different from CosyVoice's
   WebSocket pattern), same auth
10. No official Node SDK — raw WebSocket for non-Python runtimes

Body sections
- Setup (`pip install 'dashscope>=1.25'`)
- Quickstart: CosyVoice TTS via Python SDK with usage.input_tokens
  inspection
- Voice catalogue (representative 7 voices, generic gender/style
  framing, link to full 150+ list — earlier private-fork's
  "短剧专用 (short drama)" framing genericized)
- Voice cloning recipe (VoiceEnrollmentService → custom voice_id →
  reuse in any SpeechSynthesizer call)
- Wanx image gen example (ImageSynthesis.call, presigned 24h URL)
- Raw WebSocket sketch for Node/Edge/Deno (no official SDK exists)
- Cost estimation table (TTS/clone/Wanx prices, char-counting rule)
- 6-row error reference incl. the Arrearage we discovered live
- Cross-module pointer to `qwen` for LLM access via the same key
- Source of truth (9 upstream URLs + lastmod)

Library bookkeeping
- 20 → 21 modules
- 5 prod · 14 verified · 1 partial → 5 prod · 14 verified · 2 partial
  (dashscope joins kling at the partial tier)
- Site module grid: dashscope added under media-generation alongside
  seedance / seedream / kling / fal-ai

Privacy
- Earlier private-fork of this module had `Moment Stream` and
  `ADR-0021` references in the description (maintainer's downstream
  project + internal architecture decision record). Both stripped
  before OSS commit. Description reframed around the actual user-
  facing scope: CosyVoice TTS + voice cloning + Wanx image gen
- Pre-commit hook PRIVATE_RE scan: clean on staged diff

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@RoboZephyr RoboZephyr merged commit 8672ff2 into main May 17, 2026
1 check passed
@RoboZephyr RoboZephyr deleted the feat/dashscope-module branch May 17, 2026 12:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant