diff --git a/README.md b/README.md index 6539984..212ec25 100644 --- a/README.md +++ b/README.md @@ -81,7 +81,7 @@ The `${credential.X}` substitution resolves to the field's value (string fields) ## Status -**v0.2.4** — 20 modules in `library/` gated by `last_verified` (5 production · 14 verified · 1 partial). New since v0.2.3: `seedance` (Doubao Seedance 2.0 video), `ngrok` (dev tunneling), `seedream` (Doubao Seedream image gen — incl. multi-image fusion / group output / streaming / web-search). Plus CI on every PR (`.github/workflows/ci.yml`), `SECURITY.md` vuln reporting policy, SPEC.md §0–§4 English translation. Format spec is stable; AI-assisted module authoring (v0.3) in progress. +**v0.2.4** — 21 modules in `library/` gated by `last_verified` (5 production · 14 verified · 2 partial). New since v0.2.3: `seedance` (Doubao Seedance 2.0 video), `ngrok` (dev tunneling), `seedream` (Doubao Seedream image gen — incl. multi-image fusion / group output / streaming / web-search), `dashscope` (Alibaba CosyVoice TTS + voice cloning + Wanx image gen). Plus CI on every PR (`.github/workflows/ci.yml`), `SECURITY.md` vuln reporting policy, SPEC.md §0–§4 English translation. Format spec is stable; AI-assisted module authoring (v0.3) in progress. See: - [SPEC.md](./SPEC.md) — full format specification (Chinese, English translation forthcoming) diff --git a/ROADMAP.md b/ROADMAP.md index 6deb050..4177c58 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -56,7 +56,7 @@ Stack: Bun + Hono + HTMX + Tailwind CDN, server-side rendered. - [x] **npm module** (registry + publish workflow — dogfood from shipping `@robozephyr/trove` itself); covers token types, scoped-package private-by-default, bare-name squat, double-shebang trap, Bypass-2FA Granular Token, `NPM_CONFIG_USERCONFIG=` for non-interactive publish. `last_verified: production` - [x] `trove install ...` CLI sidecar — copy library modules into `~/.trove/`; `--list` shows available + installed status; `--force` to overwrite; idempotent - [ ] `trove install ` — install from arbitrary git repo (community modules); needed for the marketplace story but not for v1.0 launch -- [ ] Re-verify the rest of the modules to production-grade `last_verified` — happens organically as maintainer (or contributors) use modules in real projects. Currently **5 production · 14 verified · 1 partial** out of 20 +- [ ] Re-verify the rest of the modules to production-grade `last_verified` — happens organically as maintainer (or contributors) use modules in real projects. Currently **5 production · 14 verified · 2 partial** out of 21 ## v0.2.x → OSS launch prep (active) diff --git a/library/cloudflare/module.md b/library/cloudflare/module.md index 979e3e6..5b25993 100644 --- a/library/cloudflare/module.md +++ b/library/cloudflare/module.md @@ -94,6 +94,36 @@ curl -X POST "https://api.cloudflare.com/client/v4/accounts/$CF_ACCOUNT_ID/pages -d '{"name":"my-subdomain.example.com"}' ``` +**坑 3:`wrangler pages deploy` 默认把多行 commit message 喂给 CF,CF 拒**。CF Pages deployment API 不接受带换行的 commit message,wrangler 默认会跑 `git log -1` 把 subject + body 整块发过去。**只要本次 commit 写了 body**(git 标准格式 = subject + 空行 + body),deploy 末段就会报: + +``` +✘ A request to the Cloudflare API failed. + Invalid commit message, it must be a valid UTF-8 string. [code: 8000111] +``` + +**错误信息有误导性**——内容本身是合法 UTF-8(中文/emoji 都行),CF 真正拒的是**换行符**。 + +**修法**(任选): + +```bash +# 方法 A:每次手动只取 subject 行 +npx wrangler pages deploy . --project-name my-site \ + --commit-message "$(git log -1 --pretty=%s)" --commit-dirty=false + +# 方法 B:在项目里写个 scripts/deploy.sh,把这事儿封装一次(推荐) +cat > scripts/deploy.sh <<'EOF' +#!/bin/sh +set -e +cd "$(dirname "$0")/.." +SUBJECT=$(git log -1 --pretty=%s) +npx wrangler pages deploy . --project-name my-site \ + --commit-message "$SUBJECT" --commit-dirty=false +EOF +chmod +x scripts/deploy.sh +``` + +**注意 wrangler 4.x 还没修这个**——这是 wrangler 应该做的 sanitize(截掉换行 / 只取 subject),它没做。文件已经会上传到 blob storage,只是 deployment 记录没建出来,所以**重试不会重复上传**,只会再次失败在最后一步。 + **常见坑**: - 第一次 deploy 会自动建 project,但默认绑定 `main` 分支——如果你本地不在 main,加 `--branch ` 显式指定 - Pages 单文件 ≤ 25 MB,单 deploy 总文件数 ≤ 20000;超了静默失败,没有提前校验 @@ -411,5 +441,7 @@ npx wrangler kv namespace create MY_KV | 7000 | No route for requested host | 路径里 zone_id 错了 | | 81044 | Record name conflict | DNS 同名同类型记录已存在 | | 7003 | Could not route to /xxx | API 路径打错(少 /accounts 段最常见)| +| 8000007 | Project not found | Pages 项目还没建,先 POST `/accounts/{aid}/pages/projects` | +| 8000111 | "Invalid commit message, it must be a valid UTF-8 string" | **不是 UTF-8 问题**,是 commit message 带了换行(subject + body)。Pages deploy 时只取 subject 喂给 `--commit-message`,详见 Pages 部署「坑 3」 | **Debug 第一步永远是**:把 response body 完整 print,CF 错误信息很详细。 diff --git a/library/dashscope/credentials.example.json b/library/dashscope/credentials.example.json new file mode 100644 index 0000000..8257788 --- /dev/null +++ b/library/dashscope/credentials.example.json @@ -0,0 +1,3 @@ +{ + "DASHSCOPE_API_KEY": "" +} diff --git a/library/dashscope/module.md b/library/dashscope/module.md new file mode 100644 index 0000000..d0d463c --- /dev/null +++ b/library/dashscope/module.md @@ -0,0 +1,235 @@ +--- +name: dashscope +version: 0.1.0 +category: media-generation +description: Alibaba Cloud Bailian / DashScope — CosyVoice TTS (150+ Chinese voice presets + free voice cloning), Wanx image generation. WebSocket-based async API, character-billed. Native DashScope endpoint (Qwen LLM is handled separately by the `qwen` module which uses the OpenAI-compatible front) +homepage: https://bailian.console.aliyun.com/ +tags: [tts, voice-cloning, image-gen, alibaba, cosyvoice, wanx, websocket-api] +applies_to: + - "CosyVoice TTS: synthesize speech from text in Mandarin / English / Cantonese / Japanese / Korean — 150+ named voice presets, character-billed (~¥0.06 per 1k chars on v3-flash, ~¥0.2 on v3-plus)" + - "CosyVoice voice cloning: clone a voice from a short audio sample, then TTS-charge-only for playback (cloning itself is free)" + - "Wanx image generation: Alibaba's text-to-image model (Stable Diffusion-variant), async task pattern" + - "Use when you need Chinese-native voice quality (CosyVoice outperforms western TTS on Mandarin) or when you're already on Alibaba Cloud and want one platform key" +trove_spec: "0.1" +lastmod: "2026-05-17" +last_verified: "2026-05-17 · WebSocket connection + task submit + auth + Bearer header contract verified end to end via official Python SDK (dashscope==1.25.18, model=cosyvoice-v3-flash, voice=longxing_v3). task_id issued; runtime then blocked by `Arrearage / Access denied, please make sure your account is in good standing` — billing gate hit before audio bytes returned. Same tier as kling: auth + contract OK, runtime needs account funding" + +credentials: + DASHSCOPE_API_KEY: + type: password + required: true + help: "Get from https://bailian.console.aliyun.com/?apiKey=1 (Bailian 控制台 → 我的 API-KEY → 创建). Format: `sk-...`. Same physical key works for the qwen module (which calls the OpenAI-compatible chat endpoint); SPEC §0's flat-no-inheritance rule means you paste it twice — once as `QWEN_API_KEY`, once as `DASHSCOPE_API_KEY`." +--- + +# Alibaba DashScope (Bailian) Usage Guide + +## ⚠️ Critical Constraints (read before writing code) + +1. **Account funding gate** — the first call against a freshly-funded account works; if your account goes to zero balance / arrears, all calls fail with `error_code: "Arrearage"` and message `Access denied, please make sure your account is in good standing`. The auth + contract still validates (you DO get a `task_id`), then the WebSocket closes with the arrearage event. Top up at https://billing-cost.console.aliyun.com/ before integrating. +2. **CosyVoice is WebSocket-only, not a sync REST POST** — text submitted via `wss://dashscope.aliyuncs.com/api-ws/v1/inference`. The SDK abstracts this; **if you're writing raw fetch / curl, you cannot just POST a request and read audio bytes — you must speak the WebSocket protocol** (run-task event → audio-frame events → task-finished event). Older docs that show REST `/services/audio/tts` endpoints refer to **CosyVoice v1 batch**, which is deprecated for v2/v3. +3. **Region split: China vs International keys are not interchangeable** — China keys hit `dashscope.aliyuncs.com`; international keys hit `dashscope-intl.aliyuncs.com`. A China-issued key on the intl endpoint (or vice versa) returns `InvalidApiKey` with a misleading "key not found" message — the key is fine, the endpoint mismatches your account region. +4. **Auth header**: `Authorization: Bearer ${DASHSCOPE_API_KEY}`. Some old samples show `X-DashScope-API-Key:` — that's pre-2024 and no longer accepted on the v3 endpoints. +5. **Voice ID is model-version-locked** — `longxing_v3` only works on `cosyvoice-v3-*`; `longxiaochun` only works on `cosyvoice-v2`. Mixing returns a runtime "voice not supported by this model" error AFTER the task is submitted (not at validation). Always pair the suffix with the model version. +6. **Character billing counts EACH Chinese character as 1, but ALSO charges full-width punctuation** — `今天天气真好。` is 7 billable chars (6 hanzi + 1 fullwidth period). English: roughly 1 char per letter. Mixed CJK+English strings get billed the sum. **Cost meter is in `usage.input_tokens` returned at task-finished** (despite being TTS, the field name is `input_tokens` — Alibaba's naming). +7. **Voice cloning is free up to the per-voice quota; only synthesis charges** — record / submit the source audio once (free), receive a custom `voice_id`, then every TTS call using that voice is charged at the standard per-char rate. The catalogue of your cloned voices lives at https://bailian.console.aliyun.com/?tab=app#/voice-list +8. **`watermark` setting is account-wide, not per-call** — DashScope's audio watermark (silent metadata flag for AIGC compliance) is set in console, not the API. If you need watermark-free output for downstream re-mixing, toggle it off in the console BEFORE running the call. +9. **Wanx image gen is a separate model family with task-poll pattern** — `wanx-v1` etc. submit → task_id → GET `/api/v1/tasks/{task_id}` poll. Same DashScope auth, different endpoint shape. Not WebSocket-based. +10. **No `dashscope` Node SDK exists officially** — Python, Java, Android, iOS are official. For Node/Edge/Deno runtimes, raw WebSocket against `wss://dashscope.aliyuncs.com/api-ws/v1/inference` is the only path. See "Raw WebSocket" section below. + +--- + +## Setup + +```bash +# Trove pattern — pull the key on demand +DASHSCOPE_API_KEY=$(jq -r .DASHSCOPE_API_KEY ~/.trove/dashscope/credentials.json) +``` + +Install the SDK (Python is the smoothest path): + +```bash +pip install 'dashscope>=1.25' +``` + +--- + +## Quickstart: CosyVoice TTS (Python SDK) + +```python +import os +from dashscope.audio.tts_v2 import SpeechSynthesizer + +# SDK reads DASHSCOPE_API_KEY from environment by default +synthesizer = SpeechSynthesizer( + model="cosyvoice-v3-flash", # or "cosyvoice-v3-plus" for highest quality + voice="longxing_v3", # see voice catalogue below + format="mp3", # mp3 (default) / wav / pcm + sample_rate=22050, # 8000 / 16000 / 22050 / 24000 / 44100 / 48000 +) + +audio_bytes = synthesizer.call("欢迎使用 trove,本地优先的 AI agent 凭证管理器。") + +with open("out.mp3", "wb") as f: + f.write(audio_bytes) + +# Usage info is on the synthesizer instance after the call +print(f"billed chars: {synthesizer.last_response.usage.input_tokens}") +``` + +`SpeechSynthesizer.call(text)` is synchronous from the caller's perspective — internally it opens a WebSocket, streams the text, collects audio frames, and returns the joined bytes when the `task-finished` event arrives. Wall-clock for ~50 chars: 1–2 seconds. + +**On failure** (e.g. Arrearage), the SDK raises `WebSocketConnectionClosedException` and emits a stderr line like `websocket closed due to TaskFailed: {"header": {"error_code": "Arrearage", "error_message": "..."}}`. Wrap in try/except and parse the error. + +--- + +## CosyVoice voice catalogue + +DashScope ships 150+ named voice presets. Pick by **gender × age × style** rather than memorizing IDs. Full list: https://help.aliyun.com/zh/model-studio/cosyvoice-voice-list + +Representative IDs (all `cosyvoice-v3-*` compatible): + +| voice_id | name | gender | notes | +|---|---|---|---| +| `longxing_v3` | 龙星 | F | warm, neighborly young-woman tone | +| `longanhuan` | 龙安欢 | F | upbeat, energetic young-woman tone | +| `longanzhi_v3` | 龙安智 | M | calm, mature mid-30s male | +| `longanyun_v3` | 龙安昀 | M | warm, gentle mid-30s male | +| `longfeifei_v3` | 龙菲菲 | F | sweet, slightly higher pitch | +| `longanwen_v3` | 龙安温 | F | elegant, mature female | +| `longanya_v3` | 龙安雅 | F | refined, formal female | + +For non-Mandarin: append the language tag. Cantonese (`-yue`), English (`-en`), Japanese (`-ja`), Korean (`-ko`) variants exist for popular voices. + +--- + +## Voice cloning + +CosyVoice supports custom voice creation from a short audio sample. Upload once → receive a `voice_id` → use that ID in subsequent TTS calls. + +```python +from dashscope.audio.tts_v2 import VoiceEnrollmentService + +vs = VoiceEnrollmentService() + +# Step 1: create a voice from a source audio file (15-60 seconds of clean speech recommended) +voice = vs.create_voice( + target_model="cosyvoice-v3-flash", + prefix="my-clone", # prefix for the auto-generated voice_id + url="https://your-public-bucket.example.com/source.wav", # OR pass a local file +) +print(f"created voice_id: {voice.voice_id}") + +# Step 2: use it in any subsequent SpeechSynthesizer call +synth = SpeechSynthesizer(model="cosyvoice-v3-flash", voice=voice.voice_id) +audio = synth.call("Now speaking in my cloned voice.") +``` + +**Cloning is free**; you pay per-char only on the resulting TTS calls. The cloned voice persists across sessions and is tied to your DashScope account. + +**Source audio requirements**: 15–60 s of clean speech, single speaker, no background music, recommended 16/22.05/24/44.1 kHz WAV. Public URL (model server pulls) or local file (SDK uploads). + +--- + +## Wanx image generation (task-poll pattern) + +```python +from dashscope import ImageSynthesis + +response = ImageSynthesis.call( + model="wanx-v1", + prompt="a snowy rooftop with a single red origami crane, golden hour, photorealistic", + n=1, + size="1024*1024", +) + +# Synchronous from the caller's perspective; SDK polls the task internally +for result in response.output.results: + print(result.url) # presigned URL, 24h expiry +``` + +Wanx is **task-polled**, not WebSocket. Same DashScope auth + key. + +--- + +## Raw WebSocket (no Python SDK available, e.g. Node / Deno / Edge) + +For Node / Edge runtimes without an official SDK, hand-roll the WebSocket protocol against `wss://dashscope.aliyuncs.com/api-ws/v1/inference`. + +The protocol is event-based — client sends a `run-task` event with the synthesizer config, server streams `task-started`, multiple `result-generated` events (each carrying an audio frame), and a final `task-finished` event with usage. On error a `task-failed` event with `error_code` / `error_message` (e.g. `Arrearage`) replaces `task-finished`. + +```typescript +// Sketch — see https://help.aliyun.com/zh/model-studio/cosyvoice-websocket-api for the full event schema +const ws = new WebSocket("wss://dashscope.aliyuncs.com/api-ws/v1/inference", { + headers: { Authorization: `Bearer ${process.env.DASHSCOPE_API_KEY}` }, +}); + +ws.onopen = () => { + ws.send(JSON.stringify({ + header: { action: "run-task", task_id: crypto.randomUUID(), streaming: "duplex" }, + payload: { + task_group: "audio", task: "tts", function: "SpeechSynthesizer", + model: "cosyvoice-v3-flash", + parameters: { voice: "longxing_v3", format: "mp3", sample_rate: 22050 }, + input: {}, + }, + })); +}; + +// Then send text via `continue-task` events; collect binary frames from `result-generated` +// until `task-finished` arrives. Errors arrive as `task-failed` with `header.error_code`. +``` + +Full event schema + a clean reference impl: https://help.aliyun.com/zh/model-studio/cosyvoice-websocket-api + +--- + +## Cost estimation + +| product | unit | price (approx, RMB) | +|---|---|---| +| CosyVoice v3-flash TTS | per 1k chars | ¥0.06 | +| CosyVoice v3-plus TTS | per 1k chars | ¥0.2 | +| CosyVoice voice cloning | per clone (one-time) | free | +| Wanx-v1 image gen | per 1024×1024 image | ~¥0.16 | + +CJK chars count 1 per char (including full-width punctuation). English counts ~1 per letter. Prices accurate as of 2026-05; check https://help.aliyun.com/zh/model-studio/billing-for-model-services for current rates. + +--- + +## Error reference + +| symptom | cause | fix | +|---|---|---| +| `task-failed` with `error_code: Arrearage` | account balance ≤ 0 / has unpaid bill | top up at https://billing-cost.console.aliyun.com/ | +| `InvalidApiKey` or "key not found" on first call | China key on intl endpoint (or vice versa), OR key was regenerated and old one is invalidated | confirm key region matches `dashscope.aliyuncs.com` vs `dashscope-intl.aliyuncs.com` | +| `voice not supported by this model` after task submit | voice ID and model version don't match (e.g. `longxiaochun` on `cosyvoice-v3-*`) | use the `_v3` suffix variants for v3 models; check voice catalogue compatibility | +| `WebSocketConnectionClosedException: Connection is already closed` (Python SDK) | underlying task-failed event happened — actual reason is in the stderr `TaskFailed` JSON | parse the JSON to find the real `error_code` | +| Audio plays but with watermark "for AIGC compliance" | account-level watermark enabled in console | toggle off at Bailian 控制台 → 应用 → 输出设置 | +| TTS returns 0 bytes silently | input text is empty / only punctuation / only whitespace | validate text is non-empty before submitting | + +--- + +## When to pick dashscope vs other Trove modules + +- **dashscope (this module)** → Chinese-native voice quality is the killer reason. CosyVoice presets + cloning are best-in-class for Mandarin/Cantonese; English voices are good but not better than ElevenLabs. +- **qwen** → for Qwen LLM calls. Same physical DashScope key, different endpoint (OpenAI-compatible front instead of native DashScope), different module. +- **seedance / seedream** → Volcengine-side (different vendor, different platform). Use seedance for video, seedream for image when you want ByteDance's models specifically. +- **minimax / kling / fal-ai** → for video / image gen with non-Alibaba style biases. + +Rule of thumb: if you need Mandarin TTS or want to clone a Chinese voice, dashscope is the answer. If you need LLM, use qwen. + +--- + +## Source of truth (refresh when these change) + +- Bailian (DashScope) docs index — https://help.aliyun.com/zh/model-studio/ +- CosyVoice overview — https://help.aliyun.com/zh/model-studio/cosyvoice-large-model-for-speech-synthesis/ +- CosyVoice WebSocket API reference — https://help.aliyun.com/zh/model-studio/cosyvoice-websocket-api +- CosyVoice voice catalogue (150+ presets) — https://help.aliyun.com/zh/model-studio/cosyvoice-voice-list +- Wanx image gen — https://help.aliyun.com/zh/model-studio/text-to-image +- Pricing — https://help.aliyun.com/zh/model-studio/billing-for-model-services +- API Key console — https://bailian.console.aliyun.com/?apiKey=1 +- Billing top-up — https://billing-cost.console.aliyun.com/ +- Cross-references: `library/qwen/module.md` for Qwen LLM via the OpenAI-compat front + +Last upstream-docs sync: see `lastmod` in frontmatter. Last live-API verification: see `last_verified`. diff --git a/site/index.html b/site/index.html index e6734c9..652917d 100644 --- a/site/index.html +++ b/site/index.html @@ -134,7 +134,7 @@
     _
     | |_ ___ _____   _____
     |  _|  _|  _  |_|   -|
-    |_| |_| |_____|_|___/   v0.2.4 — 20 modules, live-verified
+    |_| |_| |_____|_|___/   v0.2.4 — 21 modules, live-verified
 

Trove

@@ -147,7 +147,7 @@

Trove

- 20 modules in library + 21 modules in library · 5 production · 14 verified · 1 partial · @@ -179,7 +179,7 @@

Quick start

02

Install a module + open the Web UI to fill credentials

-

Pick from 20 bundled modules (or trove install --list to browse). The UI binds to 127.0.0.1:7821 only — never public.

+

Pick from 21 bundled modules (or trove install --list to browse). The UI binds to 127.0.0.1:7821 only — never public.

trove install stripe
 trove ui     # → http://127.0.0.1:7821
@@ -264,7 +264,7 @@

Web UI

  • Modules — your installed modules grouped by category with credential-status indicators
  • -
  • Library — 20 bundled module templates, one-click Install copies module.md into ~/.trove/
  • +
  • Library — 21 bundled module templates, one-click Install copies module.md into ~/.trove/
  • Credentials form — masked password fields with reveal toggle, file-type fields with present/replace/delete widget, inline save via HTMX
  • Module detail — frontmatter + rendered skill markdown side-by-side, with last_verified tier dot
@@ -286,7 +286,7 @@

MCP support (optional)

What's in the library

-

Every module carries a last_verified field — what was actually tested, by whom, when. Dot color reflects current state. We'd rather ship 20 honest modules than 50 LLM-hallucinated ones.

+

Every module carries a last_verified field — what was actually tested, by whom, when. Dot color reflects current state. We'd rather ship 21 honest modules than 50 LLM-hallucinated ones.

production · daily-use @@ -311,6 +311,7 @@

What's in the library

seedream kling fal-ai + dashscope
payments
stripe @@ -337,7 +338,7 @@

Documentation

  • SPEC — the format definition (frontmatter schema, reference syntax, runtime conventions). Includes a living convention adherence log in §10 of real dogfood lessons from production.
  • -
  • library/ — the 20 bundled modules listed above
  • +
  • library/ — the 21 bundled modules listed above
  • ROADMAP — phases and explicit non-goals (no trove init, no inject step, no SaaS — ever)
  • CONTRIBUTING — module quality bar
  • design-v0.2.md — why the Web UI dropped AI-chat features (chat IS the entry interface; UI is the visualization)
  • @@ -345,7 +346,7 @@

    Documentation

    Status

    -

    v0.2.4 — the format spec is stable, all 20 modules are gated by last_verified, and the maintainer dogfoods trove daily across personal projects. AI-assisted module authoring (v0.3) and a marketplace for community modules (v1.0) are next.

    +

    v0.2.4 — the format spec is stable, all 21 modules are gated by last_verified, and the maintainer dogfoods trove daily across personal projects. AI-assisted module authoring (v0.3) and a marketplace for community modules (v1.0) are next.

    The repo is github.com/RoboZephyr/trove — issues, PRs, and module additions welcome (see CONTRIBUTING.md for the quality bar).