diff --git a/README.md b/README.md index e85bfe8..9f8a404 100644 --- a/README.md +++ b/README.md @@ -81,7 +81,7 @@ The `${credential.X}` substitution resolves to the field's value (string fields) ## Status -**v0.2.4** — 22 modules in `library/` gated by `last_verified` (5 production · 15 verified · 2 partial). New since v0.2.3: `seedance` (Doubao Seedance 2.0 video), `ngrok` (dev tunneling), `seedream` (Doubao Seedream image gen — incl. multi-image fusion / group output / streaming / web-search), `dashscope` (Alibaba CosyVoice TTS + voice cloning + Wanx image gen), `volcengine-tos` (S3-compatible object storage — the bridge for hosting Seedance / Seedream reference images at public URLs). Plus CI on every PR (`.github/workflows/ci.yml`), `SECURITY.md` vuln reporting policy, SPEC.md §0–§4 English translation. Format spec is stable; AI-assisted module authoring (v0.3) in progress. +**v0.2.4** — 24 modules in `library/` gated by `last_verified` (5 production · 16 verified · 3 partial). New since v0.2.3: `seedance` (Doubao Seedance 2.0 video), `ngrok` (dev tunneling), `seedream` (Doubao Seedream image gen — incl. multi-image fusion / group output / streaming / web-search), `dashscope` (Alibaba CosyVoice TTS + voice cloning + Wanx image gen), `volcengine-tos` (S3-compatible object storage — the bridge for hosting Seedance / Seedream reference images at public URLs), `volcengine-speech` (ByteDance Seed-ASR 2.0 — batch transcription + utterance / word timestamps for QA-vs-script diff), `aitoearn` (one-call multi-platform social publishing MCP across 14 channels). Plus CI on every PR (`.github/workflows/ci.yml`), `SECURITY.md` vuln reporting policy, SPEC.md §0–§4 English translation. Format spec is stable; AI-assisted module authoring (v0.3) in progress. See: - [SPEC.md](./SPEC.md) — full format specification (Chinese, English translation forthcoming) diff --git a/ROADMAP.md b/ROADMAP.md index 8423221..c707acb 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -56,7 +56,7 @@ Stack: Bun + Hono + HTMX + Tailwind CDN, server-side rendered. - [x] **npm module** (registry + publish workflow — dogfood from shipping `@robozephyr/trove` itself); covers token types, scoped-package private-by-default, bare-name squat, double-shebang trap, Bypass-2FA Granular Token, `NPM_CONFIG_USERCONFIG=` for non-interactive publish. `last_verified: production` - [x] `trove install ...` CLI sidecar — copy library modules into `~/.trove/`; `--list` shows available + installed status; `--force` to overwrite; idempotent - [ ] `trove install ` — install from arbitrary git repo (community modules); needed for the marketplace story but not for v1.0 launch -- [ ] Re-verify the rest of the modules to production-grade `last_verified` — happens organically as maintainer (or contributors) use modules in real projects. Currently **5 production · 15 verified · 2 partial** out of 22 +- [ ] Re-verify the rest of the modules to production-grade `last_verified` — happens organically as maintainer (or contributors) use modules in real projects. Currently **5 production · 16 verified · 3 partial** out of 24 ## v0.2.x → OSS launch prep (active) diff --git a/library/aitoearn/credentials.example.json b/library/aitoearn/credentials.example.json new file mode 100644 index 0000000..e16f6de --- /dev/null +++ b/library/aitoearn/credentials.example.json @@ -0,0 +1,5 @@ +{ + "AITOEARN_API_KEY": "", + "AITOEARN_ENV": "ai", + "AITOEARN_MCP_URL": "https://aitoearn.ai/api/unified/mcp" +} diff --git a/library/aitoearn/module.md b/library/aitoearn/module.md new file mode 100644 index 0000000..3699f21 --- /dev/null +++ b/library/aitoearn/module.md @@ -0,0 +1,211 @@ +--- +name: aitoearn +version: 0.1.0 +category: social-publishing +description: AiToEarn — one-call multi-platform publishing for 14 channels (TikTok, YouTube, X, Instagram, Threads, Pinterest, Facebook, LinkedIn, Bilibili, Douyin, Kwai, Xiaohongshu, WeChat Channels, WeChat Gzh) via unified MCP server +homepage: https://aitoearn.ai +tags: [publishing, social, mcp, multi-platform, oauth-relay] +applies_to: + - publishing one piece of content to many social platforms in one call + - getting platform-specific publishing rules (char limits, media specs) before composing + - polling publish task status by flowId + - reading cross-platform account/post analytics via data-cube REST + - avoiding the cost of applying for 14 platform developer accounts (Relay borrows official OAuth credentials) +trove_spec: "0.1" +lastmod: "2026-05-25" +last_verified: "2026-05-16 · module scaffolded, MCP endpoint reachable, not yet smoke-tested with real publish (pending first OAuth on a sacrificial account). Tier: partial — auth + endpoint reachability OK, runtime not exercised" + +credentials: + AITOEARN_API_KEY: + type: password + required: true + help: "https://aitoearn.ai (international) or https://aitoearn.cn (China) → Settings → API Key → Create. Key is environment-scoped: .ai key cannot use .cn MCP and vice versa (401)." + AITOEARN_ENV: + type: select + options: [ai, cn] + required: true + default: ai + help: "ai = international (TikTok / YouTube / Meta / X / Pinterest / LinkedIn dominant). cn = China (Douyin / Xiaohongshu / Kwai / Bilibili / WeChat Channels dominant). Must match the host where the API key was created." + AITOEARN_MCP_URL: + type: url + required: false + default: "https://aitoearn.ai/api/unified/mcp" + help: "Override only if self-hosted via docker-compose (then use http://localhost:8080/api/unified/mcp). For hosted use, leave default and switch via AITOEARN_ENV." + +mcp: + type: http + url: "${credential.AITOEARN_MCP_URL}" +--- + +# AiToEarn Usage Guide + +## ⚠️ Critical Constraints (read first) + +1. **Environment / API key must match** — `.ai` key against `.cn` MCP endpoint → 401. Pick one host when creating the key and stick with it. Cross-region accounts need two separate trove project-level installs. +2. **Account ownership vs token custody** — when authorizing a social account, you go through the real OAuth flow with your own credentials, but the `access_token` is custodied on aitoearn's server (only `relayAccountRef` is stored locally). **Account is yours, but every publish call routes through aitoearn's infrastructure.** They can in principle throttle, charge, or shut down the Relay at any time — design for replaceability (see Risk Isolation below). +3. **Publish itself is currently free; AI Create costs credits.** `publishing.service.ts` has zero credit deduction. `aitoearn-ai/image|video|chat` services deduct credits ($1 = 100 credits). If you only use the publish surface, you can run on a free account indefinitely (as of 2026-05) — **no contractual guarantee this stays free**. +4. **Hashtag format must NOT use inline `#tag1#tag2`** — pass as `topics: string[]` array. The `publishRestrictions` tool returns per-platform exact rules; **always call it before composing**. +5. **`flowId` is the only handle to a publish task** — return value of `publishPostTo*` is `{flowId}`; you must call `getPublishingTaskStatus({flowId})` to poll for `published | failed | processing | waiting`. Don't assume sync success. +6. **Media must be uploaded first** — if you have a local file, the docker-compose stack provides RustFS at `localhost:9000` (S3-compatible). For hosted use, upload via the web UI's asset manager or use `createMedia` MCP tool to register a public URL. +7. **Cover image required for video on most platforms** — if missing, use `createThumbnailTask` + `getThumbnailTaskStatus` MCP tools to auto-generate before calling publish. +8. **The "Gold Rush Square" / Monetize marketplace is irrelevant to this module's purpose.** That's the creator side (you accept brand tasks). The publish MCP serves the creator-as-self-publisher use case. Don't conflate. + +--- + +## MCP tool inventory (unified-mcp) + +### Account +- `getAccountGroupList` — list your account groups +- `getAccountListByGroupId({groupId})` — list authorized accounts in a group, returns `[{accountId, accountType, nickname, ...}]` + +### Publish (one per platform) +- `publishPostToTiktok` / `publishPostToYoutube` / `publishPostToTwitter` / `publishPostToInstagram` / `publishPostToFacebook` / `publishPostToThreads` / `publishPostToPinterest` / `publishPostToBilibili` / `publishPostToKwai` / `publishPostToWxGzh` +- All take `{accountId, title, desc, videoUrl?, coverUrl?, imgUrlList?, topics?, publishTime?, option?}` (option is platform-specific, e.g. Bilibili `tid` category, YouTube `categoryId` + `privacyStatus`) +- Returns `{flowId, ...platform-specific extras (e.g. Douyin returns shortLink + permalink QR)}` + +### Status & rules +- `getPublishingTaskStatus({flowId})` — poll task state (`waiting | processing | published | failed`) +- `publishRestrictions({platforms: [...]})` — returns char / size / duration limits per platform. **Call this first**, especially for batch fan-out. + +### Content / media management +- `getMediaGroupInfoByName({title})` — find/create media group +- `createMedia({groupId, type, mediaUrl, thumbUrl?, title?, desc?})` — register a public-URL asset +- `getDraftGroupInfoByName({title})` — find/create draft group +- `createDraft({groupId, ...})` — store a draft for later publish + +--- + +## Quick start (Claude Code / Cursor) + +This module ships an `mcp:` block in its frontmatter — `trove install aitoearn` then ask your AI agent to merge the block into its MCP config (`~/.claude.json`, `~/.cursor/mcp.json`, etc.) per the trove SPEC §3 MCP-configuration pattern. After that, in any project: + +``` +List my authorized aitoearn accounts. +``` + +→ resolves via `getAccountGroupList` + `getAccountListByGroupId`. + +``` +Publish "Testing MCP integration" to my Twitter via aitoearn. +``` + +→ resolves via `publishPostToTwitter({accountId, desc: "..."})` → returns `flowId` → polls `getPublishingTaskStatus`. + +--- + +## Composing for many platforms (the right way) + +Cross-platform publish is NOT "same payload, N calls" — each platform has hard constraints. The disciplined flow: + +1. Call `publishRestrictions({platforms: [...selected]})` once +2. Have the AI / your code generate **per-platform variants** of title / desc / topics / media (Twitter 280 char vs YouTube 5000 char vs Pinterest needs board) +3. Fan-out: call `publishPostTo*` per platform with adapted payload +4. Collect `flowId[]`, poll each with `getPublishingTaskStatus` +5. Aggregate `{platform, status, publishedUrl, errorMsg}` into a structured launch-bundle result + +--- + +## Authorizing accounts (one-time per platform) + +The MCP tools assume accounts are already authorized. To authorize: + +1. Log into your aitoearn account (web UI at https://aitoearn.ai or your self-hosted `http://localhost:8080`) +2. Account Management → pick platform → click "Authorize" +3. Browser redirects to platform's official OAuth page → log in with **your** social account → grant permission +4. Redirected back; account now appears in `getAccountListByGroupId` + +Note: TikTok / YouTube / Twitter / Pinterest / Meta auth flows are fast (minutes). **Bilibili / Douyin / Xiaohongshu / WeChat Channels** require platform-side review of the relay app's developer credentials — sometimes these auth links break temporarily when aitoearn's official dev account hits a review cycle. If a platform auth fails repeatedly, try the other env (e.g. switch from .cn to .ai) or wait a day. + +--- + +## Data-cube (REST, not MCP) + +For published-post analytics, the `/dataCube/*` REST endpoints (not exposed as MCP tools, hit them directly with `x-api-key`): + +- `GET /accountDataCube/:accountId` — account-level stats +- `GET /getAccountDataBulk/:accountId` — bulk recent stats +- `GET /getArcDataCube/:accountId/:dataId` — single post stats +- `GET /getArcDataBulk/:accountId/:dataId` — bulk post stats over time + +Supports 9 platforms (Bilibili, Facebook, Instagram, Kwai, Pinterest, Threads, WeChat Gzh, Xiaohongshu, YouTube). Use these for cross-platform analytics ingest in your project rather than scraping individual platform dashboards. + +--- + +## Risk isolation (recommended pattern) + +Because publish-pricing and Relay availability are not contractually guaranteed, wrap the MCP behind a thin client layer in your project so business code never imports aitoearn directly: + +``` +your-project/ + libs/aitoearn-mcp-client/ # the only place that knows aitoearn exists + config: { mode: 'relay' | 'self-oauth' | 'hybrid' } + social/ # business code calls libs/, not aitoearn +``` + +When threats materialize: +- aitoearn adds publish fees → switch `mode: relay` → `mode: self-oauth` for platforms you've separately obtained dev credentials for +- aitoearn relay outage → fallback to direct platform SDK (need your own client_id / secret) +- aitoearn shuts down → swap the whole client to a different provider, business code unchanged + +Approximate self-OAuth difficulty: easy (Twitter, Pinterest, LinkedIn) → medium (TikTok, YouTube, Meta) → hard (Bilibili, Kwai) → very hard (Douyin, Xiaohongshu, WeChat Channels — often need company entity). + +--- + +## Self-hosted deployment (Docker) + +```bash +mkdir -p ~/infra && cd ~/infra +git clone https://github.com/yikart/AiToEarn.git aitoearn-stack +cd aitoearn-stack +# write docker-compose.override.yml with RELAY_* env vars and your own MongoDB / Redis / JWT secrets +# point OPENAI_BASE_URL / API_KEY at your own LLM gateway to avoid burning aitoearn credits +docker compose up -d +open http://localhost:8080 +``` + +Stack: Nginx :8080 → Web (Next.js) :3000 / Server (Nest) :3002 / AI (Nest) :3010 → MongoDB / Redis / RustFS (S3-compatible). Need ≥4 GB RAM, ≥20 GB disk. Detailed env table in `DOCKER_DEPLOYMENT_CN.md` in the upstream repo. + +After self-hosting, set `AITOEARN_MCP_URL=http://localhost:8080/api/unified/mcp` in trove credentials. + +--- + +## Pricing reality check (as of 2026-05) + +| Surface | Cost on hosted aitoearn.ai | Cost when self-hosted | +|---|---|---| +| Publish (MCP `publishPostTo*`) | $0 currently — no contract | $0 always (assuming relay borrows official OAuth) | +| Data-cube (REST analytics) | $0 currently | $0 always | +| AI Create (image / video / LLM) | 100 credits = $1, marked up over raw provider rates | Pay your own model provider directly (cheaper) | +| OAuth Relay (borrow their dev creds) | Free with any API key | Still free with Relay env vars | + +**Bottom line**: if you only consume publish + data-cube, hosted free tier is fine for MVP; self-host when scale demands. If you ever need their AI Create, self-host immediately to bypass the markup. + +--- + +## Error reference + +| symptom | meaning | fix | +|---|---|---| +| `401 Unauthorized` on MCP call | API key / env mismatch | confirm key was created on the same host as `AITOEARN_MCP_URL` (ai vs cn) | +| `Account not found` in publish response | `accountId` doesn't belong to current user | re-list via `getAccountListByGroupId`; accounts may have been deauthorized | +| `Platform XXX restrictions not found` | typo in platform name | platform values are the exact `AccountType` enum: `bilibili, facebook, instagram, threads, pinterest, youtube, tiktok, twitter, kwai, xhs, douyin, wxGzh` | +| Publish stuck in `waiting` for >10 min | platform-side queue or auth invalidated | check account status in web UI; the relay's `credential-invalidation.service.ts` will eventually mark it expired | +| `RelayServerUnavailable` in self-host logs | RELAY_API_KEY wrong or expired | re-create the API key, restart `aitoearn-server` | + +--- + +## Why this module exists in trove + +aitoearn replaces what would otherwise be a multi-month integration project per platform (OAuth + media upload + publish API + webhook + token refresh + content review handling, × 14 platforms, many of which require corporate developer accounts you may not be able to get). The trade-off is operational dependency on a single vendor — mitigated by the Risk Isolation pattern above and the option to self-host the entire stack under MIT license. + +--- + +## Source of truth (refresh when these change) + +- AiToEarn product site (international) — https://aitoearn.ai +- AiToEarn (China) — https://aitoearn.cn +- Self-host repo — https://github.com/yikart/AiToEarn +- API key console — https://aitoearn.ai/settings/api-key +- MCP endpoint catalogue — https://aitoearn.ai/api/unified/mcp (live MCP server, list tools via the standard MCP `tools/list` introspection) + +Last upstream-docs sync: see `lastmod`. Last live-API verification: see `last_verified`. diff --git a/library/cloudflare/module.md b/library/cloudflare/module.md index 5b25993..97960f9 100644 --- a/library/cloudflare/module.md +++ b/library/cloudflare/module.md @@ -129,6 +129,89 @@ chmod +x scripts/deploy.sh - Pages 单文件 ≤ 25 MB,单 deploy 总文件数 ≤ 20000;超了静默失败,没有提前校验 - Pages 自动启用 HTTPS,但自定义域名要去 dash 加(API 路径:`/accounts/{aid}/pages/projects/{name}/domains`) +### 换 source repo(已有 project,reuse name + domains) + +**⚠️ Critical trap**: `PATCH /pages/projects/{name}` 把 `source.config.repo_name` / `repo_id` 写进 body,**CF 返回 `success: true` 但不更新 source 字段**。GET 回来还是老 repo。`build_config` 这种字段 PATCH 正常生效——只有 `source` 这块是 silently read-only(连 explicit `repo_id` + `owner_id` 都没用)。Dashboard 同样**没有** "change source / disconnect" 按钮(Settings → General 页只有 Rename / Notifications / Preview access / Delete)。 + +唯一可行路径:**delete + recreate**(POST CREATE 时 `source` 字段是可写的)。流程必须严格按下面顺序,不然会卡: + +```bash +PROJECT="my-site" +BASE="https://api.cloudflare.com/client/v4" +AUTH="Authorization: Bearer $CF_API_TOKEN" + +# 0) 先用 throwaway 名 probe,确认 CREATE 真的接受 source 字段 +# (1 分钟成本买"CREATE 也偷偷忽略 source 怎么办"的兜底) +curl -X POST -H "$AUTH" -H "Content-Type: application/json" \ + "$BASE/accounts/$CF_ACCOUNT_ID/pages/projects" \ + -d '{ + "name": "'"$PROJECT"'-probe", + "production_branch": "main", + "source": { + "type": "github", + "config": {"owner": "new-org", "repo_name": "new-repo", "production_branch": "main"} + }, + "build_config": {"destination_dir": "dist"} + }' | jq '.result.source.config.repo_name' # 应该返回 "new-repo" +curl -X DELETE -H "$AUTH" "$BASE/accounts/$CF_ACCOUNT_ID/pages/projects/$PROJECT-probe" + +# 1) 解绑所有 custom domains —— 不解绑直接 DELETE project 报 8000028 +# "To delete your project, you must first delete all custom domains" +curl -X DELETE -H "$AUTH" \ + "$BASE/accounts/$CF_ACCOUNT_ID/pages/projects/$PROJECT/domains/example.com" +curl -X DELETE -H "$AUTH" \ + "$BASE/accounts/$CF_ACCOUNT_ID/pages/projects/$PROJECT/domains/www.example.com" + +# 2) DELETE project +curl -X DELETE -H "$AUTH" \ + "$BASE/accounts/$CF_ACCOUNT_ID/pages/projects/$PROJECT" + +# 3) CREATE with new source —— 同名复用,subdomain 不变,所以 DNS CNAME 不用动 +curl -X POST -H "$AUTH" -H "Content-Type: application/json" \ + "$BASE/accounts/$CF_ACCOUNT_ID/pages/projects" \ + -d '{ + "name": "'"$PROJECT"'", + "production_branch": "main", + "source": { + "type": "github", + "config": { + "owner": "new-org", + "repo_name": "new-repo", + "production_branch": "main", + "deployments_enabled": true, + "production_deployments_enabled": true, + "pr_comments_enabled": true, + "preview_deployment_setting": "all", + "preview_branch_includes": ["*"], + "preview_branch_excludes": [], + "path_includes": ["*"], + "path_excludes": [] + } + }, + "build_config": {"build_command": "", "destination_dir": "dist", "root_dir": ""}, + "deployment_configs": { + "production": {"compatibility_date": "2026-04-26", "fail_open": true}, + "preview": {"compatibility_date": "2026-04-26", "fail_open": true} + } + }' + +# 4) 重新绑 custom domains +curl -X POST -H "$AUTH" -H "Content-Type: application/json" \ + "$BASE/accounts/$CF_ACCOUNT_ID/pages/projects/$PROJECT/domains" \ + -d '{"name":"example.com"}' +curl -X POST -H "$AUTH" -H "Content-Type: application/json" \ + "$BASE/accounts/$CF_ACCOUNT_ID/pages/projects/$PROJECT/domains" \ + -d '{"name":"www.example.com"}' + +# 5) ⚠️ CREATE 不会自动 trigger 首次 deploy —— 必须显式触发 +curl -X POST -H "$AUTH" \ + "$BASE/accounts/$CF_ACCOUNT_ID/pages/projects/$PROJECT/deployments?branch=main" +``` + +**Downtime 窗口**:从 step 1 第一个 domain 解绑开始,到 step 4 全部 domain 重绑完 + SSL active 结束。同名复用 + API 连续执行,实际不可访问窗口 ≈ SSL provision 时间(典型 3–10 min)。Cloudflare Access auth gate 之类的域名层设置在 domain 重绑时自动保留,不丢。 + +**为什么不能并行新建项目然后切 domain**:Pages 项目的 `pages.dev` subdomain 跟项目名绑定。同名复用 = 同 subdomain = DNS CNAME 不用改。换名会触发 DNS CNAME 改字符串(详见上面"坑 2"),反而更慢。 + ### 直接 API(CI 场景,不用 wrangler) ```typescript diff --git a/library/volcengine-speech/credentials.example.json b/library/volcengine-speech/credentials.example.json new file mode 100644 index 0000000..f567bd0 --- /dev/null +++ b/library/volcengine-speech/credentials.example.json @@ -0,0 +1,6 @@ +{ + "VOLC_SPEECH_APP_KEY": "", + "VOLC_SPEECH_ACCESS_KEY": "", + "VOLC_SPEECH_SECRET_KEY": "", + "VOLC_SPEECH_RESOURCE_ID": "volc.seedasr.auc" +} diff --git a/library/volcengine-speech/module.md b/library/volcengine-speech/module.md new file mode 100644 index 0000000..5eddc68 --- /dev/null +++ b/library/volcengine-speech/module.md @@ -0,0 +1,244 @@ +--- +name: volcengine-speech +version: 0.1.0 +category: speech +description: Volcengine / ByteDance Seed-ASR 2.0 Standard — batch audio transcription with utterance and word-level timestamps. The diagnostic / QA companion to Seedance generated videos (verify the produced audio actually matches your target script) and to TTS pipelines (validate intelligibility before shipping) +homepage: https://www.volcengine.com/docs/6561/1631584 +tags: [asr, speech-to-text, volcengine, seed-asr, subtitles, timestamps, qa] +applies_to: + - "batch transcribing produced video / podcast / dialogue audio at utterance and word-level timestamp granularity" + - "QA loop: ASR the produced audio, diff against your target script — catches Seedance / TTS pipelines that pronounced different words than the prompt" + - "subtitle timing: get utterance timestamps for SRT generation; render with your target subtitle text (not the ASR transcription) if dialogue intent matters more than literal audio match" + - "speaker turn segmentation via `enable_speaker_info`" +trove_spec: "0.1" +lastmod: "2026-05-25" +last_verified: "2026-05-22 · live submit + query cycle succeeded against `volc.seedasr.auc` resource on Seed-ASR 2.0 Standard endpoint, returned 4-utterance transcription with timestamps. Same account hit `45000030 requested resource not granted` for the legacy `volc.bigasr.auc` resource ID — dogfood-confirmed the resource-ID migration" + +credentials: + VOLC_SPEECH_APP_KEY: + type: password + required: true + help: "App Key from https://console.volcengine.com/speech/app. Sent as `X-Api-App-Key` header. Independent of the AK/SK used by TOS — Speech console has its own credential namespace." + VOLC_SPEECH_ACCESS_KEY: + type: password + required: true + help: "Access Token from the same Speech console. Sent as `X-Api-Access-Key` header." + VOLC_SPEECH_SECRET_KEY: + type: password + required: false + help: "Optional Secret Key. Stored here for completeness; the header-auth Standard endpoint path does NOT use it. Required only for the signature-auth WebSocket realtime path (not covered by this module)." + VOLC_SPEECH_RESOURCE_ID: + type: text + required: false + default: "volc.seedasr.auc" + help: "Seed-ASR 2.0 Standard resource ID. Older docs / examples reference `volc.bigasr.auc` — that resource returns `45000030 not granted` for accounts onboarded in 2026-05+ (dogfood-verified). Stick with the default." +--- + +# Volcengine Speech / Seed-ASR 2.0 Standard + +## ⚠️ Critical Constraints (read before writing code) + +1. **Use `volc.seedasr.auc` for Seed-ASR 2.0 Standard. NOT `volc.bigasr.auc`** — the legacy resource ID returns `45000030 requested resource not granted` on accounts onboarded in 2026-05 or later, while `volc.seedasr.auc` succeeds with the same App Key on the same endpoint. Dogfood-verified 2026-05-22. Old AI-generated code and stale tutorials still suggest the `bigasr` form; ignore them. +2. **Header auth is separate from ARK / TOS credentials** — Speech console has its own App Key + Access Token namespace. **Do not reuse `~/.trove/seedance` (ARK_API_KEY) or `~/.trove/volcengine-tos` (AK/SK)** — they will not authenticate against the speech endpoint. Create dedicated App Key + Access Token at https://console.volcengine.com/speech/app. +3. **ASR is diagnostic, not authoritative** — when ASR output differs from your target subtitle text, that means **the audio diverged from your script**, NOT that ASR was wrong. Common causes: (a) Seedance pronounced different words than the prompt, (b) the audio file is partial/old / re-generated content, (c) TTS substituted a homophone. **Never silently overwrite target subtitles with ASR text** — surface the mismatch for review. +4. **Audio URL must be publicly fetchable** — the model server pulls the audio from the URL you submit. Same constraint family as Seedance reference images. The canonical Trove pattern: upload mp3 / wav to TOS public-read (`library/volcengine-tos/module.md`), then submit the resulting `https://.tos-.volces.com/` URL. +5. **Two-step async pattern**: `POST /submit` returns a request ID instantly → `POST /query` polls until the task is done. Both calls use the SAME `X-Api-Request-Id` header to correlate. Typical wall-clock: 5–30 seconds depending on audio length. +6. **Audio limits** — supported formats: mp3 / wav / m4a / ogg / flac / pcm / opus. Max duration: 8 hours per file (Standard tier). Sample rate auto-detected. Mono recommended; stereo is auto-mixed-down before ASR. +7. **`show_utterances: true` is what unlocks timestamps** — without it, you only get a flat transcription string with no timing info. For subtitle production or QA-vs-target-text diffing, set this to `true`. Costs no extra credits. +8. **Speaker info is approximate** — `enable_speaker_info: true` returns `speaker_id` per utterance based on voice fingerprinting. Reliable for distinct voices (m/f / age gap); flaky for similar voices speaking quickly back-and-forth. Don't rely on it for legally-binding speaker attribution. +9. **Pricing is per audio minute, not per request** — Standard tier is priced at the per-minute audio rate (current rates at https://www.volcengine.com/pricing). A 5-second audio costs ~5/60 minute units. Multiple submissions of the same audio bill multiple times — cache results in your DB. +10. **`enable_itn: true` normalizes numbers and dates** — without ITN ("Inverse Text Normalization"), the transcription stays verbatim with spelled-out digits ("两千零二十六年"); with ITN you get `2026年`. For subtitle rendering you usually want ITN on; for raw QA-against-prompt you might want it off (to match the prompt's actual phrasing). + +--- + +## Setup + +```bash +# Trove pattern — pull credentials on demand +VOLC_SPEECH_APP_KEY=$(jq -r .VOLC_SPEECH_APP_KEY ~/.trove/volcengine-speech/credentials.json) +VOLC_SPEECH_ACCESS_KEY=$(jq -r .VOLC_SPEECH_ACCESS_KEY ~/.trove/volcengine-speech/credentials.json) +``` + +No SDK required — direct REST is the cleanest path (only 2 endpoints, no streaming). + +--- + +## API surface + +| operation | method | path | +|---|---|---| +| Submit transcription task | `POST` | `https://openspeech.bytedance.com/api/v3/auc/bigmodel/submit` | +| Query task result | `POST` | `https://openspeech.bytedance.com/api/v3/auc/bigmodel/query` | + +Both endpoints take the same auth headers: + +```http +X-Api-App-Key: +X-Api-Access-Key: +X-Api-Resource-Id: volc.seedasr.auc +X-Api-Request-Id: # SAME id for the submit/query pair +Content-Type: application/json +``` + +--- + +## Quickstart: submit + poll (Python, no SDK) + +```python +import os, time, uuid, requests + +APP_KEY = os.environ["VOLC_SPEECH_APP_KEY"] +ACCESS_KEY = os.environ["VOLC_SPEECH_ACCESS_KEY"] +RESOURCE_ID = "volc.seedasr.auc" + +def headers(request_id: str) -> dict: + return { + "X-Api-App-Key": APP_KEY, + "X-Api-Access-Key": ACCESS_KEY, + "X-Api-Resource-Id": RESOURCE_ID, + "X-Api-Request-Id": request_id, + "Content-Type": "application/json", + } + +def transcribe(audio_url: str, audio_format: str = "mp3") -> dict: + request_id = str(uuid.uuid4()) + + # 1. Submit + submit_body = { + "user": {"uid": "trove-user"}, + "audio": {"format": audio_format, "url": audio_url}, + "request": { + "model_name": "bigmodel", + "model_version": "400", + "enable_itn": True, + "enable_punc": True, + "show_utterances": True, + "enable_speaker_info": True, + }, + } + r = requests.post( + "https://openspeech.bytedance.com/api/v3/auc/bigmodel/submit", + headers=headers(request_id), + json=submit_body, + timeout=10, + ) + r.raise_for_status() + + # 2. Poll + while True: + q = requests.post( + "https://openspeech.bytedance.com/api/v3/auc/bigmodel/query", + headers=headers(request_id), # SAME request_id + json={}, + timeout=10, + ) + q.raise_for_status() + body = q.json() + status_code = body.get("status_code") + if status_code == 20000000: # complete + return body + if status_code in (20000001, 20000002): # still running / queued + time.sleep(2) + continue + raise RuntimeError(f"ASR failed: {status_code} {body.get('message')}") + +# Use it +result = transcribe("https://yourbucket.tos-cn-beijing.volces.com/audio.mp3") +print(result["result"]["text"]) # full transcription +for utt in result["result"]["utterances"]: + print(f"[{utt['start_time']}-{utt['end_time']}ms] (speaker {utt.get('speaker_id', '?')}) {utt['text']}") +``` + +--- + +## Cross-module recipe: ASR-verify a Seedance / CosyVoice produced clip + +The canonical Trove pipeline — generate a video with Seedance (with synced audio per Critical Constraint #9 of seedance module) OR a voice clip with CosyVoice (dashscope module), upload to TOS public-read, then ASR-verify the audio matches the target script: + +```python +# 1. Generate the audio (Seedance with audio-on, OR CosyVoice TTS direct) +# ... (see library/seedance/module.md or library/dashscope/module.md) +# → final mp3/wav bytes in `audio_bytes` + +# 2. Upload to TOS public-read (see library/volcengine-tos/module.md) +import tos +tos_client = tos.TosClientV2( + ak=os.environ["VOLC_ACCESS_KEY_ID"], + sk=os.environ["VOLC_SECRET_ACCESS_KEY"], + endpoint="tos-cn-beijing.volces.com", + region="cn-beijing", +) +key = f"asr-input/{uuid.uuid4()}.mp3" +tos_client.put_object( + bucket="yourbucket", + key=key, + content=audio_bytes, + acl=tos.ACLType.ACL_Public_Read, + content_type="audio/mpeg", +) +audio_url = f"https://yourbucket.tos-cn-beijing.volces.com/{key}" + +# 3. ASR + diff against target text +asr_result = transcribe(audio_url, "mp3") +asr_text = asr_result["result"]["text"] + +if asr_text.replace(" ", "") != target_subtitle_text.replace(" ", ""): + # Don't silently overwrite — surface for review (Critical Constraint #3) + print("WARN: ASR text differs from target subtitle.") + print(f" ASR: {asr_text}") + print(f" TARGET: {target_subtitle_text}") + # Decide downstream: trust ASR (literal audio) OR trust target (script intent) +``` + +For subtitle rendering, use **ASR utterance timestamps + target subtitle text** unless your editorial process explicitly accepts whatever the audio actually said. + +--- + +## Recommended status labels for QA workflows + +When wiring this into a content pipeline, capture the ASR outcome as one of these states (the labels themselves are content-team convention, but the underlying signals are robust): + +| label | meaning | +|---|---| +| `asr_ok_text_match` | ASR succeeded; transcription matches target subtitle within tolerance | +| `asr_ok_text_mismatch` | ASR succeeded; spoken words diverge from target subtitle text — needs editorial review | +| `video_incomplete_for_subtitle_check` | source audio/video is partial / old / re-rendered — full-script match rate is not meaningful | +| `asr_unreliable` | audio exists and should match, but ASR output is obviously garbled or missing likely speech (e.g. clear human speech transcribed as 3 random characters) | +| `asr_no_speech` | ASR succeeded but returned empty / very short — input audio is silent / music-only / non-speech | + +--- + +## Error reference + +| status_code / symptom | meaning | fix | +|---|---|---| +| `45000030` "requested resource not granted" | using `volc.bigasr.auc` on a 2026-05+ account | switch `X-Api-Resource-Id` to `volc.seedasr.auc` | +| `45000001` "invalid access key" | wrong App Key / Access Token, OR mixed up the two | reverify at https://console.volcengine.com/speech/app — App Key goes in `X-Api-App-Key`, Access Token in `X-Api-Access-Key` | +| `45000003` "audio url unreachable" | URL needs auth, OR returns non-2xx, OR has CORS / GET method block | curl it from outside your network; the TOS public-read URL pattern is the canonical fix | +| `45000010` "audio format unsupported" | uncommon container (e.g. webm-audio, .amr) | re-encode to mp3 / wav via ffmpeg before upload | +| `45000020` "audio too long" | > 8 hours total | split into segments and submit each | +| Long delay (>2 min) with status_code `20000002` | task queued behind others | normal under load; keep polling. If >5min, consider re-submitting with a fresh request_id | +| ASR returns 0 utterances on speech-containing audio | wrong sample rate / corrupted audio / language mismatch | confirm with `ffprobe`; Seed-ASR works best on Mandarin and English; for other languages check supported list at https://www.volcengine.com/docs/6561 | + +--- + +## When to pick volcengine-speech vs alternatives + +- **volcengine-speech (this module)** → Mandarin / Chinese-accent English. Strong on timestamps + speaker info. Best when you're already on Volcengine (same-region TOS pull = fast + free). +- **OpenAI Whisper API** — strong all-language coverage, well-known. Slower, no native speaker-info, USD-billed. +- **Deepgram / AssemblyAI** — English-first, real-time streaming surface, premium pricing. +- **Local Whisper (`whisper.cpp` / faster-whisper)** — free, offline, no API. Use when audio is sensitive or batch is huge. + +Rule of thumb: Chinese audio + already on Volcengine → this module. English-only batch in a non-Volc stack → Whisper API or local. + +--- + +## Source of truth (refresh when these change) + +- Seed-ASR 2.0 Standard docs — https://www.volcengine.com/docs/6561/1631584 +- Speech console (App Key + Access Token management) — https://console.volcengine.com/speech/app +- Resource ID catalogue — https://www.volcengine.com/docs/6561 +- Pricing — https://www.volcengine.com/pricing +- Cross-modules: `library/volcengine-tos/module.md` for hosting audio, `library/seedance/module.md` for the video producer this ASR-verifies, `library/dashscope/module.md` for CosyVoice TTS + +Last upstream-docs sync: see `lastmod`. Last live-API verification: see `last_verified`. diff --git a/site/index.html b/site/index.html index 5f6a493..32e8860 100644 --- a/site/index.html +++ b/site/index.html @@ -134,7 +134,7 @@
     _
     | |_ ___ _____   _____
     |  _|  _|  _  |_|   -|
-    |_| |_| |_____|_|___/   v0.2.4 — 22 modules, live-verified
+    |_| |_| |_____|_|___/   v0.2.4 — 24 modules, live-verified
 

Trove

@@ -147,7 +147,7 @@

Trove

- 22 modules in library + 24 modules in library · 5 production · 14 verified · 1 partial · @@ -179,7 +179,7 @@

Quick start

02

Install a module + open the Web UI to fill credentials

-

Pick from 22 bundled modules (or trove install --list to browse). The UI binds to 127.0.0.1:7821 only — never public.

+

Pick from 24 bundled modules (or trove install --list to browse). The UI binds to 127.0.0.1:7821 only — never public.

trove install stripe
 trove ui     # → http://127.0.0.1:7821
@@ -264,7 +264,7 @@

Web UI

  • Modules — your installed modules grouped by category with credential-status indicators
  • -
  • Library — 22 bundled module templates, one-click Install copies module.md into ~/.trove/
  • +
  • Library — 24 bundled module templates, one-click Install copies module.md into ~/.trove/
  • Credentials form — masked password fields with reveal toggle, file-type fields with present/replace/delete widget, inline save via HTMX
  • Module detail — frontmatter + rendered skill markdown side-by-side, with last_verified tier dot
@@ -286,7 +286,7 @@

MCP support (optional)

What's in the library

-

Every module carries a last_verified field — what was actually tested, by whom, when. Dot color reflects current state. We'd rather ship 22 honest modules than 50 LLM-hallucinated ones.

+

Every module carries a last_verified field — what was actually tested, by whom, when. Dot color reflects current state. We'd rather ship 24 honest modules than 50 LLM-hallucinated ones.

production · daily-use @@ -331,6 +331,12 @@

What's in the library

github-account npm ngrok + +
speech · transcription
+ volcengine-speech + +
social-publishing
+ aitoearn

Each module ships with a gotchas-first skill body — auth header quirks, billing pitfalls, error-code tables — so the AI doesn't have to rediscover the same trap that bit the last engineer.

@@ -339,7 +345,7 @@

Documentation

  • SPEC — the format definition (frontmatter schema, reference syntax, runtime conventions). Includes a living convention adherence log in §10 of real dogfood lessons from production.
  • -
  • library/ — the 22 bundled modules listed above
  • +
  • library/ — the 24 bundled modules listed above
  • ROADMAP — phases and explicit non-goals (no trove init, no inject step, no SaaS — ever)
  • CONTRIBUTING — module quality bar
  • design-v0.2.md — why the Web UI dropped AI-chat features (chat IS the entry interface; UI is the visualization)
  • @@ -347,7 +353,7 @@

    Documentation

    Status

    -

    v0.2.4 — the format spec is stable, all 22 modules are gated by last_verified, and the maintainer dogfoods trove daily across personal projects. AI-assisted module authoring (v0.3) and a marketplace for community modules (v1.0) are next.

    +

    v0.2.4 — the format spec is stable, all 24 modules are gated by last_verified, and the maintainer dogfoods trove daily across personal projects. AI-assisted module authoring (v0.3) and a marketplace for community modules (v1.0) are next.

    The repo is github.com/RoboZephyr/trove — issues, PRs, and module additions welcome (see CONTRIBUTING.md for the quality bar).