English | 中文
The CapCut/JianYing toolkit that survives the next ByteDance update — and auto-captions with real caption objects.
A pure CLI any LLM can drive: no MCP server, no HTTP daemon, no state. Inspect drafts, build from scratch, add media, modify subtitles, auto-caption with whisper, translate to N languages, cut long-form to shorts. Zero runtime deps, both CapCut and JianYing namespaces in one binary, JSON output by default.
New in v0.4 — caption (whisper → real caption objects, not the import-srt text-mimics), migrate (mask ↔ common_masks across CapCut/JianYing version jumps), lint (schema-aware checks: overlaps, line length, missing files), version (detect support status), translate (Anthropic-API multi-language draft clone), add-sfx, chroma, serve (stateless JSONL queue runner for n8n/Coze/Make), and export --batch (EXPERIMENTAL macOS UI-automated render queue).
How capcut-cli fits into a typical viral-shorts pipeline. Steps 2 and 3 are LLM-driven (any model that returns JSON); steps 1, 4, and 5 are deterministic CLI calls. Step 6 stays human — every short-video platform forbids automated upload, so the publish click is yours.
flowchart LR
A[Long video<br/>or CapCut project] --> B[capcut cut<br/>→ 60s candidate]
B --> C[Claude / DeepSeek<br/>/ GLM / Kimi<br/>→ hook + script JSON]
C --> D[capcut-cli<br/>add-text · add-audio<br/>apply-template]
D --> E[CapCut / JianYing<br/>review + render MP4]
E --> F[Publish<br/>YouTube Shorts · Reels · TikTok]
How capcut-cli differs from the other CapCut / JianYing tooling:
| Capability | pyJianYingDraft (Python, JianYing) |
pyCapCut (Python, CapCut) |
CapCutAPI (Python, HTTP server) |
cutcli (Go, closed) |
capcut-cli (Node, this repo) |
|---|---|---|---|---|---|
Inspect drafts (info / tracks / materials / segments / texts) |
partial | partial | ❌ | ❌ | ✅ |
| Create drafts from scratch | ✅ | ✅ | ✅ | ✅ | ✅ |
Decorators (keyframe / transition / mask / text-anim / image-anim) |
✅ | ✅ | ✅ | ✅ | ✅ (v0.3.0) |
| SRT import → per-cue text segments | ❌ | ❌ | ✅ | ❌ | ✅ (v0.3.0) |
| Multi-style text (word-level highlight captions) | partial | partial | ❌ | ❌ | ✅ (v0.3.0) |
| Enum discovery for AI agents | ❌ | ❌ | partial | ❌ | ✅ — 13 categories × 2 namespaces |
| CapCut + JianYing namespaces in one binary | JianYing only | CapCut only | both | partial | both via --jianying |
| Templates (save/apply) | partial | partial | ❌ | ❌ | ✅ — 3 shipped templates |
| Schema docs | partial | partial | minimal | none | full (docs/draft-schema/) |
| Wikimedia Commons URLs with license gate | ❌ | ❌ | ❌ | ❌ | ✅ (v0.3.0) |
| Runtime deps | several Python deps | several Python deps | Flask + Python | none (Go binary) | zero (Node ≥ 18 built-ins only) |
| AI-tool integration | none | none | HTTP | none | Claude Code plugin |
| Install | pip install -r requirements.txt |
pip install pyCapCut |
clone + run server | binary download | npm install -g capcut-cli |
| License | none | none | none | unclear | MIT |
Status of every feature shipped. ✅ = implemented, ⬜ = roadmap. Section anchors link to the relevant command docs further down.
- ✅
init— create a new draft from scratch - ✅
info·tracks·materials— overview - ✅
segments·texts— list, filterable by track type - ✅
segment/material<id> — progressive disclosure for AI agents - ✅
export-srt— dump captions to SRT - ✅
cut— extract a time range into a standalone short
- ✅
add-video·add-audio·add-text— local files - ✅
add-video/add-audio— Wikimedia Commons URLs with license gate - ✅
add-sticker— sticker track + transform - ✅
add-effect— scene effect on its own track (vhs, shake, cinematic, vignette, …)
- ✅
set-text·shift·shift-all·speed·volume·opacity·trim - ✅
batch— multiple edits, one JSON parse, one file write
- ✅
keyframe— position, scale, rotation, alpha, colour-adjust, volume (single +--batchJSONL on stdin) - ✅
transition— 8 starter slugs + the full enum catalogue - ✅
mask— linear / mirror / circle / rectangle / heart / star + geometry flags +--off - ✅
bg-blur— levels 1–4 +--off - ✅
text-style— alpha · shadow · border · bg-box (26 flags) - ✅
text-anim·image-anim— intro / outro / combo from CapCut's library - ✅
text-ranges— multi-style text, byte-accurate (unlocks word-level highlight captions)
- ✅
save-template·apply-template— extract any segment as reusable JSON; restamp with new timing / position / text - ✅ 3 templates ship in
templates/:gold-title,end-card,subscribe-cta
- ✅
import-srt— one cue per text segment; file, stdin, or--style-refmirror - ✅
enums— 12 categories × 2 namespaces from a committedenums.json(no network)
- ✅ Local files: mp4, mov, m4v, mp3, wav, aac, png, jpg, gif (any extension CapCut accepts)
- ✅ Wikimedia Commons URLs — page URL,
/wiki/File:URL, direct CDN URL, orapi.php?prop=pageimagesquery. License classifier refuses restrictive without--force-license.
- ✅ CapCut and JianYing — same binary,
--jianyingflag switches the enum namespace - ✅ macOS · Windows · Linux — pure Node ≥ 18, no native modules
- ✅ JSON (default — pipeable to
jq) - ✅
-H/--humantable mode (human-readable) - ✅
-q/--quietmode (exit code only)
- ✅ 60+ tests
node:testsuite (test/) running againsttest/draft_content.json - ✅ Husky pre-commit hook — Biome lint on staged files + full test run
- ✅ Schema reference in
docs/draft-schema/(7 files, ~3,700 lines) - ✅ Version support matrix — tested CapCut/JianYing versions, known-broken set, encryption status
- ✅ Claude Code plugin (
/plugin marketplace add https://github.com/renezander030/capcut-cli)
- ✅
version— detect CapCut/JianYing version + schema flags (mask_field,text_ranges,audio_fades) + support status - ✅
lint— schema-aware checks: caption overlaps (error), line length, cue duration, missing material refs, missing local files. Exit codes 0/1/2 for CI - ✅
migrate— apply known migrations (mask↔common_masksacross the JianYing 5.9 / CapCut 9.6 boundary) - ✅
decrypt— JianYing 6.0+ encryption detection + clear workaround UX (decryption algorithm intentionally not bundled)
- ✅
caption— whisper shell-out (openai-whisper / whisper.cpp / faster-whisper) → real caption-track segments withsub_type+caption_template_info(addresses pyJianYingDraft #148 — no more text-segment mimics) - ✅
translate— Anthropic-API multi-language draft clone, zero runtime deps (uses built-infetch).--dry-runfor safe inspection. Original stays untouched
- ✅
add-sfx— first-class sound effects on a dedicated track (15+ CapCut SFX slugs viaenums --audio-effects) - ✅
chroma— green-screen / chroma key on video segments (--color+--intensity, or--off) - ✅
serve— stateless JSONL queue runner (read from stdin or--queuefile, dispatch to existing CLI, write JSONL results). No daemon, no port, no shared state — unlocks n8n / Coze / Make / cron without becoming a service - ✅
export --batch— EXPERIMENTAL UI-automated render queue (macOS AppleScript; Windows path sketched).--dry-runfor safe exploration on any OS
- ⬜ Audio fade-in / fade-out command (workaround:
volumekeyframes) - ⬜ Text bubble effects / 花字 (workaround: hand-set
bubble_*fields on the text material) - ⬜ Filter-chain command +
enums --filtersdiscovery flag (no workaround —add-effecthandles VFX/scene effects, not colour filters) - ⬜ Drag-and-drop GIF demos in this README
- ⬜ JianYing 6.0+ decryption (currently only detection — see
decryptworkaround docs) - ⬜ Windows path for
export --batch(currently only macOS via AppleScript) - 🚫 HTTP server / cloud rendering / MCP server — explicitly out of scope per
PLAN.md.serveships as a stateless JSONL runner instead — no port, no daemon.
CapCut stores projects as draft_content.json -- deeply nested, undocumented, with timing in microseconds and text buried inside escaped JSON-in-JSON. Every manual edit means: find the right segment ID, trace it to the material, figure out the content format, convert your timestamp, edit, pray you didn't break the structure. 15 seconds per change, minimum.
capcut-cli already knows the schema. One command, one change, 5 seconds.
$ capcut texts ./project
[{"id":"a1b2c3d4-...","start_us":500000,"duration_us":2500000,"text":"Welcome to the video"}]
$ capcut set-text ./project a1b2c3 "Fixed subtitle"
{"ok":true,"id":"a1b2c3d4-...","old":"Welcome to the video","new":"Fixed subtitle"}
Zero dependencies. JSON output by default. Pipeable. Works with CapCut and JianYing.
npm install -g capcut-cliOr run directly:
npx capcut-cli info ./my-project/Add the marketplace, then enable the plugin:
/plugin marketplace add https://github.com/renezander030/capcut-cli
/plugin enable capcut-cli
This gives Claude Code the /capcut-cli:capcut-edit skill -- it learns every command, the progressive disclosure navigation pattern, and how to find your CapCut projects on macOS/Windows. Auto-installs the CLI on first enable.
Other CapCut / JianYing tooling exposes an HTTP API or MCP server. capcut-cli deliberately does not:
- No state to corrupt. Each command is JSON-in, JSON-out. The agent can interleave commands, retry safely, and exit any time. Versioning is
npm install -g capcut-cli@x.y.z. - No second tool to install. If the user has Node ≥ 18 they already have the runtime;
npx capcut-cliworks without a global install. No daemon, no port, no auth surface. - Works in every agent surface. Claude Code via the plugin — also
bash,make, GitHub Actions, cron, any wrapper that canexec. MCP ties you to one host; a CLI runs anywhereshruns.
The tradeoff is live streams — no progress events, no long-running render. That's intentional: every short-video platform requires a human render-and-publish step regardless (see PLAN.md).
Real transcript — Claude Code with the capcut-cli plugin, no server in the loop:
> Cut a 30-second teaser from this project starting at the climax around 4:12, add a hook title.
I'll inspect first.
→ capcut info ./project
{ "duration_us": 612000000, "segments": 24, "materials": 18 }
→ capcut segments ./project --track video
[{"id":"f1e2d3...","start_us":252000000,"duration_us":48000000,"speed":1.0}, ...]
Climax runs 4:12 → 5:00. Cutting a 30-second window centered there.
→ capcut cut ./project 4:00 4:30 --out ./teaser.json
{ "ok": true, "out": "./teaser.json", "duration_us": 30000000, "segments": 4 }
→ capcut add-text ./teaser.json 0s 3s "WAIT FOR IT" --font-size 28 --color "#FFD700"
{ "ok": true, "id": "a1b2c3...", "track": "text" }
Teaser ready at ./teaser.json. Open in CapCut to review and render.
The agent navigates info → segments → cut → add-text without holding any state. The CLI is the protocol — no second-tier server to maintain, version, or authenticate against.
JSON (default) -- pipe to jq, feed to scripts, consume from agents:
capcut texts ./project | jq '.[].text'
capcut info ./project | jq '.duration_us'Human-readable (-H / --human):
capcut texts ./project -H
ID Start -End Text
a1b2c3d4 0:00.50- 0:03.00 Welcome to the videoQuiet (-q / --quiet) -- exit code only, zero stdout on writes:
capcut set-text ./project a1b2c3 "New text" -q && echo "done"capcut info ./project # Project overview + material summary
capcut tracks ./project # List all tracks
capcut materials ./project # List all material types + counts
capcut materials ./project --type audios # List items of one material typecapcut segments ./project # List all segments with timing
capcut segments ./project --track text # Filter by track type
capcut texts ./project # List all text/subtitle content
capcut export-srt ./project > subs.srt # Export subtitles to SRTcapcut segment ./project a1b2c3 # Full detail for one segment + its material
capcut material ./project a1b2c3 # Full detail for one materialProgressive disclosure: info shows the shape, materials shows what's available, segment/material shows everything about one item. An AI agent navigates overview → list → detail, never gets more data than it needs.
No need to open CapCut first. Create a draft, add media, then open in CapCut.
# Create an empty draft
capcut init "My Short" --drafts ~/Movies/CapCut/User\ Data/Projects/com.lveditor.draft
# Add media
capcut add-video ./my-short ./clip.mp4 0s 10s
capcut add-audio ./my-short ./voiceover.wav 0s 10s --volume 0.9
capcut add-audio ./my-short ./music.mp3 0s 30s --volume 0.3
# Add titles
capcut add-text ./my-short 0s 5s "My Short" --font-size 24 --color "#FFD700"init creates a valid draft_content.json from a built-in template. add-video and add-audio copy the file into the draft's assets directory so CapCut can find it. Open the project in CapCut and everything links up.
Options for add-video / add-audio: --volume <0-1>, --template <path> (custom draft template).
capcut add-text ./project 0s 5s "Title" --font-size 24 --color "#FFD700" --y -0.4
capcut add-text ./project 55s 5s "Subscribe!" --font-size 14 --align 1Options: --font-size <n>, --color <hex>, --align <0|1|2> (left/center/right), --x <n> --y <n> (position, -1 to 1), --track-name <name>.
Every write command creates a .bak backup before modifying the file.
capcut set-text ./project a1b2c3 "New subtitle"
capcut shift ./project a1b2c3 +0.5s
capcut shift ./project a1b2c3 -200ms
capcut shift-all ./project +1s
capcut shift-all ./project -0.5s --track text
capcut speed ./project a1b2c3 1.5
capcut volume ./project a1b2c3 0.8
capcut opacity ./project a1b2c3 0.5
capcut trim ./project a1b2c3 2s 5sExtract any element from a project as a reusable template, then stamp it into other projects. Works with text, stickers, shapes, video, audio -- anything that exists as a segment.
# Save a styled text element as a template
capcut save-template ./project a1b2c3 "gold-title" --out gold-title.json
# Apply it to another project with new timing
capcut apply-template ./other-project gold-title.json 0s 5s
# Override the text content (keeps all styling -- font, color, size)
capcut apply-template ./project gold-title.json 5:00 4s "Chapter 3: The Forge"
# Save a sticker and reuse it
capcut save-template ./project d4e5f6 "subscribe-btn" --out subscribe.json
capcut apply-template ./project subscribe.json 9:50 5s --x 0.35 --y -0.35Templates preserve everything: styling, colors, font size, scale, resource IDs, shadow settings, shape params. Only the ID, timing, and optionally position/text get changed on apply.
Workflow: build a template library
# Create elements in CapCut, then extract them
mkdir -p ~/.capcut-templates
capcut save-template ./project abc123 "lower-third" --out ~/.capcut-templates/lower-third.json
capcut save-template ./project def456 "end-card" --out ~/.capcut-templates/end-card.json
capcut save-template ./project ghi789 "subscribe-cta" --out ~/.capcut-templates/subscribe-cta.json
# Stamp them into every new project
capcut apply-template ./new-project ~/.capcut-templates/lower-third.json 0s 5s "New Episode"
capcut apply-template ./new-project ~/.capcut-templates/end-card.json 9:55 5s
capcut apply-template ./new-project ~/.capcut-templates/subscribe-cta.json 9:50 5sPhase 1 / 2 / 4 — write to materials on existing segments:
capcut keyframe ./project a1b2c3 uniform_scale 0s 1.0
capcut keyframe ./project a1b2c3 uniform_scale 3s 1.2
capcut transition ./project a1b2c3 dissolve --duration 0.4s
capcut mask ./project a1b2c3 heart --size 0.6 --feather 20
capcut bg-blur ./project a1b2c3 2
capcut text-style ./project c1c1c1 --shadow --border-width 0.1 --border-color "#000000"
capcut text-anim ./project c1c1c1 --intro typewriter --outro fade-out
capcut image-anim ./project a1b2c3 --intro fade-in --outro fade-out
capcut add-sticker ./project 7089817320127663629 2s 4s --x 0.3 --y -0.3
capcut add-effect ./project vhs 0s 5s --params '[80]'
capcut text-ranges ./project c1c1c1 --styles '[
{"start":0,"end":5,"font_color":"#FFD700","bold":true},
{"start":6,"end":14,"font_color":"#FFFFFF"}
]'See skills/capcut-edit/references/api-reference.md for every flag and value
format.
capcut enums --transitions -H # 116 CapCut transitions
capcut enums --masks # JSON
capcut enums --scene-effects --jianying # switch namespace (912 slugs)
capcut enums --text-intros | jq '.[] | select(.slug | startswith("fade"))'Categories: --transitions, --masks, --image-intros, --image-outros,
--image-combos, --text-intros, --text-outros, --text-loop-anims,
--scene-effects, --character-effects, --audio-effects, --fonts.
add-video and add-audio accept a Wikimedia URL anywhere they accept a file
path. The CLI fetches through the Commons imageinfo API, license-checks, and
streams the file into the draft's assets dir.
# pageimages API — the official "give me the image for this page" call
capcut add-video ./project \
"https://en.wikipedia.org/w/api.php?action=query&titles=Barcelona&prop=pageimages&piprop=original&format=json" \
0s 5s
# /wiki/File: page
capcut add-audio ./project \
"https://commons.wikimedia.org/wiki/File:Wind_and_rain.ogg" \
0s 30s
# Direct CDN (still license-checks)
capcut add-video ./project \
"https://upload.wikimedia.org/wikipedia/commons/a/ab/Some_image.jpg" \
5s 5s
# Bypass refusal on restrictive/unknown license (you take responsibility)
capcut add-video ./project "https://en.wikipedia.org/wiki/File:Copyright_logo.svg" 10s 3s --force-licenseOutput JSON includes a wikimedia block: file_title, license,
license_class (permissive / fair-use / restrictive / unknown), artist,
credit, description_url, width, height, mime. Attribution the
CC-BY family requires — use artist + description_url in your YouTube
description.
Non-Wikimedia HTTPS URLs are refused before any network call. Download separately and pass a local path.
# From a file — one text segment per cue on a "captions" track
capcut import-srt ./project subs.srt --track-name captions --time-offset -120ms
# From stdin (Whisper output, etc.)
faster-whisper --output-format srt < audio.wav \
| capcut import-srt ./project - --style-ref c1c1c1--style-ref <seg-id> mirrors font/color/shadow/border/background from an
existing text segment onto every new cue.
Extract a time range from a project into a new file. Clips edge segments, rebases timing to zero, removes empty tracks, cleans up orphaned materials.
# 60-second teaser from a 10-minute video
capcut cut ./project 1:00 2:00 --out ./teaser.json
# 30-second highlight
capcut cut ./project 3:00 3:30 --out ./highlight.json
# Then add titles to the short
capcut add-text ./teaser.json 0s 5s "MYCENAE" --font-size 24 --color "#FFD700"
capcut add-text ./teaser.json 55s 5s "Full video in description" --font-size 14Cutting long-form into viral Shorts is what I built this for. The full pipeline — picking the right 60-second story, writing hooks that hold attention, the Claude skill that orchestrates
capcut-cliend-to-end — is the Viral Story Shorts Blueprint.
Multiple edits, one JSON parse, one file write:
echo '{"cmd":"set-text","id":"a1b2c3","text":"Line one"}
{"cmd":"set-text","id":"d4e5f6","text":"Line two"}
{"cmd":"shift","id":"a1b2c3","offset":"+0.3s"}
{"cmd":"volume","id":"g7h8i9","volume":0.5}' | capcut batch ./projectOutput: {"ok":true,"succeeded":4,"failed":0}
Batch tolerates per-operation errors and continues processing. Operations: set-text, shift, shift-all, speed, volume, opacity, trim.
Segment and material IDs are UUIDs. The first 6-8 characters work as prefix match:
$ capcut texts ./project | jq '.[0].id'
"a1b2c3d4-0000-0000-0000-000000000001"
$ capcut set-text ./project a1b2c3 "Hey everyone"
{"ok":true,"id":"a1b2c3d4-0000-0000-0000-000000000001","old":"Welcome","new":"Hey everyone"}1.5s-- 1.5 seconds500ms-- 500 milliseconds+0.5s/-1s-- relative offset1:30-- 1 minute 30 seconds0:05.5-- 5.5 seconds
CapCut stores projects as JSON (draft_content.json on Windows, draft_info.json on macOS). This CLI reads and modifies that JSON directly. It preserves the original file's indentation style on save.
Typical project location:
- Windows:
C:\Users\<you>\AppData\Local\CapCut\User Data\Projects\com.lveditor.draft\<id>\ - macOS:
/Users/<you>/Movies/CapCut/User Data/Projects/com.lveditor.draft/<id>/
Close the project in CapCut before editing, reopen after. CapCut reads the JSON on project open.
# Get all subtitle IDs and text
capcut texts ./project | jq '.[] | "\(.id) \(.text)"'
# Fix 3 typos + sync timing in one shot
echo '{"cmd":"set-text","id":"a1b2c3","text":"Corrected line one"}
{"cmd":"set-text","id":"d4e5f6","text":"Corrected line two"}
{"cmd":"set-text","id":"g7h8i9","text":"Corrected line three"}
{"cmd":"shift-all","offset":"+0.3s","track":"text"}' | capcut batch ./projectFour changes, one file write. Done in under 5 seconds.
End-to-end recipes in examples/:
- Cut one long video into multiple shorts
- Batch-fix subtitles (typos + timing in one pass)
- Build a short from scratch — clip + VO + music + title, no GUI
- Translate subtitles via SRT round-trip
- Save a styled title once, reuse across many projects
- Programmatic Ken Burns zoom keyframes
- Unfinished-pan keyframe pattern for epilogue stills
- Pre-flight check on VO + word-level timestamps
- Want the full viral-shorts system, not just the CLI? Get the Viral Story Shorts Blueprint + Claude Skill — the complete pipeline I use to ship Shorts at volume.
- Author: I'm Rene Zander — I build AI-driven content automation systems. More guides at renezander.com.
- Hire me for AI/automation work: renezander.com/contact.
MIT