From 323ef18b22944a654ee3985526f56f9f80f83663 Mon Sep 17 00:00:00 2001 From: Zack Date: Thu, 28 May 2026 18:18:09 +0800 Subject: [PATCH] =?UTF-8?q?feat(Image=5FPrompt=5FOptimizer):=20=E9=87=8D?= =?UTF-8?q?=E6=9E=84=E5=9B=BE=E5=83=8F=E6=8F=90=E7=A4=BA=E8=AF=8D=E4=BC=98?= =?UTF-8?q?=E5=8C=96=E6=8A=80=E8=83=BD=EF=BC=8C=E6=8B=93=E5=B1=95=E6=A8=A1?= =?UTF-8?q?=E6=9D=BF=E4=BD=93=E7=B3=BB=E4=B8=8E=E8=A7=84=E8=8C=83?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - 更新版本至 2.2,全面扩展提示词优化流程和细节丰富度 - 增加详细步骤指导,从需求分析、场景判定到结构化输出全覆盖 - 新增八大核心要素及九大图像场景模板库,适配多种图像生成需求 - 规范提示词书写,拒绝关键词堆砌及模糊表达,强制使用专业镜头术语 - 引入防崩约束和高画质词自动挂载,提升生成提示质量和稳定性 - 明确禁止使用 @图N 标记,提倡语义化自然语言引用参考素材 - 增加对全景图和影视级场景图的专属术语和结构要求,杜绝混用 - 多选交互机制确保缺失要素和冲突问题与用户确认,避免静默修改 - 包含丰富常见错误及避坑指南,提升提示词工程化规范水平 feat(Seedance_Prompt_Optimizer): 升级 Seedance 2.0 视频提示词优化框架 - 版本升级至 2.1,完善多模态输入支持与规范流程 - 明确角色分工,优化提示词语法规范,依赖 Cinematic_Camera_Language 提供镜头术语 - 详细列出支持图片、视频、音频和文本的格式与数量限制 - 强化素材用途 @ 引用体系,要求所有素材标注用途,避免孤立上传 - 明确系统硬性约束,防范真人脸素材及素材数量超限影响生成 - 完善工作流步骤,强调需求分析、映射逻辑确认及素材用途明确 - 聚焦工程化语法规范,推进提示词分镜台本级精度输出 - 强调协同调用 video_tools 进行最终生成,职责清晰分明 --- .../Cinematic_Camera_Language/SKILL.md | 0 .../references/full_50_shots.md | 0 .../Image_Prompt_Optimizer/SKILL.md | 502 +++++++++++++----- .../Seedance_Prompt_Optimizer/SKILL.md | 251 +++++++-- .../active_skills/canvas_tools/SKILL.md | 258 ++++----- .../skills/active_skills/image_tools/SKILL.md | 32 +- .../skills/active_skills/music_tools/SKILL.md | 113 ++-- .../skills/active_skills/video_tools/SKILL.md | 84 +-- .../Cinematic_Camera_Language/SKILL.md | 199 +++++++ .../references/full_50_shots.md | 205 +++++++ .../Image_Prompt_Optimizer/SKILL.md | 463 ++++++++++++++++ .../Seedance_Prompt_Optimizer/SKILL.md | 414 +++++++++++++++ .../Image_Prompt_Optimizer/SKILL.md | 199 ------- .../Seedance_Prompt_Optimizer/SKILL.md | 110 ---- 14 files changed, 2019 insertions(+), 811 deletions(-) rename backend/skills/{customized_skills => active_skills}/Cinematic_Camera_Language/SKILL.md (100%) rename backend/skills/{customized_skills => active_skills}/Cinematic_Camera_Language/references/full_50_shots.md (100%) create mode 100644 backend/skills/builtin_skills/Cinematic_Camera_Language/SKILL.md create mode 100644 backend/skills/builtin_skills/Cinematic_Camera_Language/references/full_50_shots.md create mode 100644 backend/skills/builtin_skills/Image_Prompt_Optimizer/SKILL.md create mode 100644 backend/skills/builtin_skills/Seedance_Prompt_Optimizer/SKILL.md delete mode 100644 backend/skills/customized_skills/Image_Prompt_Optimizer/SKILL.md delete mode 100644 backend/skills/customized_skills/Seedance_Prompt_Optimizer/SKILL.md diff --git a/backend/skills/customized_skills/Cinematic_Camera_Language/SKILL.md b/backend/skills/active_skills/Cinematic_Camera_Language/SKILL.md similarity index 100% rename from backend/skills/customized_skills/Cinematic_Camera_Language/SKILL.md rename to backend/skills/active_skills/Cinematic_Camera_Language/SKILL.md diff --git a/backend/skills/customized_skills/Cinematic_Camera_Language/references/full_50_shots.md b/backend/skills/active_skills/Cinematic_Camera_Language/references/full_50_shots.md similarity index 100% rename from backend/skills/customized_skills/Cinematic_Camera_Language/references/full_50_shots.md rename to backend/skills/active_skills/Cinematic_Camera_Language/references/full_50_shots.md diff --git a/backend/skills/active_skills/Image_Prompt_Optimizer/SKILL.md b/backend/skills/active_skills/Image_Prompt_Optimizer/SKILL.md index 7d07427c..7721f397 100644 --- a/backend/skills/active_skills/Image_Prompt_Optimizer/SKILL.md +++ b/backend/skills/active_skills/Image_Prompt_Optimizer/SKILL.md @@ -1,199 +1,463 @@ --- -name: Image_Prompt_Optimizer -description: "Image prompt optimization expert for generation and editing. Use when the user asks to generate, edit, or optimize image prompts. Rewrites rough descriptions into high-quality engineered prompts based on photography terminology, scene narration, and professional templates." +description: 通用图像生成提示词工程化优化技能。专注于提示词本身的写作规范与质量提升,提供工作流、八大核心要素、8 类图像场景模板库(全景图/三视图/产品图/概念图/立绘/漫画/极简设计/插画)与编辑模板库。不涉及任何工具调用、API 参数、具体模型选择等执行层细节。 metadata: - builtin_skill_version: "1.0" + builtin_skill_version: '2.2' +name: Image_Prompt_Optimizer --- # Image Prompt Optimizer -**IMPORTANT**: This is a prompt optimization skill, NOT an image generation tool. After optimizing the prompt, you should call `generate_image` or `edit_image` (from `image_tools` skill) to actually create or edit images. +**IMPORTANT**: 本技能专注于**图像提示词的写作规范与工程化质量**,输出纯文本提示词文案。本技能**不包含工具调用、API 参数、具体模型能力差异、Provider 选择等执行层描述**——这些内容存放于各工具专用的 skill 中。 + +## 角色定位 + +你是图像生成提示词工程化专家。你的首要任务是拦截用户"形容词堆砌""仅一句话需求"的低质量提示词,将它们引导和重写为高质量的工程化提示词(叙事化语言、八大要素、场景模板、防崩约束)。 + +## 核心工作流 + +当用户输入粗略需求、提供参考素材,或**仅提出图像生成需求(如"帮我画一个赛博朋克街道全景")**时,按以下步骤执行: -## Core Principle +### Step 0: 需求分析与启发式提问 -**Describe the scene, not just list keywords.** Narrative, descriptive paragraphs almost always produce better, more coherent images than a string of unrelated words. +当用户仅给出高维度想法(如"我要一张场景图""画个角色")时,**主动进入引导模式**,通过提问帮助用户丰满细节,切忌直接生编硬造: -## Image Generation Prompt Templates +1. **询问图像类型**:是场景图(21:9 影视级宽画幅)/ 全景图(360 度)/ 角色立绘 / 产品图 / 概念图 / 海报 / 漫画? +2. **询问核心要素**:基于八大要素引导用户补充信息。 + *示例*:"关于这个赛博朋克街道全景,您可以补充:1. 时间是白天/黄昏/深夜?2. 视野中心是什么(一个角色/一个建筑/一辆车)?3. 镜头视角(平视/俯视/仰视)?4. 是否有参考图?" +3. **收集足够信息后转入 Step 1**。 + +### Step 1: 意图与场景判定 + +1. **生成类型判定**(提示词层,不论具体工具名): + - **全新生成**:纯文本无参考素材 + - **图像编辑**:有参考图 → 进一步判定属于哪种编辑模式(局部修改/风格迁移/角色置入新场景/多图合成/高保真细节迁移/草图细化/360 度一致性扩展) +2. **图像场景类型判定**(决定使用哪个场景模板): + - **影视级宽画幅场景图(21:9)** ← 影视场景设计、场景初稿、全景图的前置设计稿、视频首帧 + - **全景图(360 度等距柱状投影 / 2:1)** ← VR/全景节点、沉浸式环境、360 度环绕浏览场景 + - **角色三视图**:正/侧/背三视,立绘标准 + - **产品广告图**:电商/商业摄影 + - **概念美术图**:游戏/影视前期 concept art + - **IP 立绘 / 海报**:单角色全身/半身、海报排版 + - **漫画分格 / 分镜插画**:故事化分格 + - **极简设计 / 负空间**:背景图、品牌物料 + - **风格化插画 / 表情包**:贴纸、icon + +### Step 2: 参考素材语义化梳理 + +1. **参考素材清点**:当用户提供多张参考图时,按出现顺序梳理为参考图 1、参考图 2…并向用户确认**每张图的语义角色**(角色形象 / 服装 / 产品 / 场景 / 风格基调 / 字体 / 构图参考)。 +2. **自然语言指代**:在最终提示词中,用语义化措辞引用参考图: + - 单图编辑:`"using the provided image"` / `"the reference photo"` + - 多图合成:`"the woman from the first reference image"` / `"the dress from the second reference image"` +3. **语义角色确认**:当多图未明确语义角色时(如:谁是主体、谁是元素来源),向用户提问要求明确,避免生成结果自由发挥。 +4. **写实人脸预检**:若参考图含可辨识真人面部,部分生成体系可能拦截或质量下降,需提醒用户改用风格化处理。 + +### Step 3: 要素审查与多选交互确认 + +1. 检查用户提示词是否包含**八大核心要素**: + - **主体(Subject)**:谁/什么是主体? + - **动作 / 表情(Action)**:在干什么?什么神态? + - **场景 / 环境(Setting)**:在哪?时间/天气? + - **光影色调(Lighting)**:什么光线?什么色温? + - **镜头 / 构图(Camera)**:什么视角?什么景别?什么焦段?— **必须使用专业的电影镜头/构图术语**(如 低角度仰拍、过肩拍摄、希区柯克变焦、升格摄影、三分法构图、黄金分割等) + - **视觉风格(Style)**:写实/插画/动漫/油画/3D…? + - **画质参数(Quality)**:分辨率/质感(8k、ultra-detailed、photorealistic) + - **约束条件(Constraints)**:防崩兜底(如"无穿模、五官清晰、构图稳定") + +2. **检查潜在冲突**: + - 风格冲突(如同时要"写实摄影"和"卡通风格") + - 视角冲突(如同时要"俯拍"和"低角度仰拍") + - 焦段冲突(如同时要"广角"和"长焦") + +3. **【关键:拒绝静默修改】**:发现要素缺失或冲突时,**必须**通过"多选检视意见交互"向用户展示具体建议,让用户选择: + + *多选交互模板示例:* + > 我收到了您的输入。检测到以下建议,请选择您接受的部分: + > 1. 【建议明确】场景中是黄昏还是深夜? + > 2. 【建议补充】视野中心放主角还是建筑? + > 3. 【风格冲突】当前提示词同时要求"写实摄影"和"赛博朋克霓虹",建议统一为"赛博朋克写实摄影"。 + > + > [多选框]: + > - [ ] 接受建议1,设定为:黄昏 + > - [ ] 接受建议2,设定为:主角作为视野中心 + > - [ ] 接受风格统一,设定为:赛博朋克写实摄影 + > - [ ] 其他修改(请补充) + +### Step 4: 结构化重写输出 + +按以下三大模块结构化输出: + +#### 优化后提示词 +(包含严格的**三段论**结构) +1. **全局基础设定**: + - 锁定主体、场景、风格基调 + - 多参考素材时用语义化措辞引用(如 "the character from the first reference image, the outfit from the second reference image"),**严禁使用 @图N 标记** +2. **主体提示词(英文叙事化)**: + - 按 *主体 → 动作 → 场景 → 光影 → 构图 → 风格 → 画质* 顺序展开 + - 每一层用完整句子叙述,不堆砌关键词 + - 镜头/构图必须使用专业电影术语 +3. **画质、风格与约束**:自动挂载画质增强(`8k, ultra-detailed, sharp focus`)与防崩兜底约束。 + +#### 优化问题 +针对原始提示词,指出存在的缺陷(要素缺失、冲突、关键词堆砌、英文表达不准确、误用 @图N 标记等)。 + +#### 画面语义补充建议(提示词层表达) +- 在提示词文案中明确画幅意图(如 `"360 degree equirectangular panorama, 2:1 aspect ratio"`、`"square 1:1 e-commerce composition"`、`"vertical 9:16 portrait framing"`、`"ultra-wide 21:9 establishing shot"`)——让生成体系从语义上理解构图。 + +**核心原则清单(内置原则库)**: +- **叙事化优先原则**:完整段落叙述 > 关键词堆砌 +- **参考图语义化原则**:使用自然语言描述参考图角色 +- **镜头语言专业化原则**:构图与镜头使用专业电影术语,拒绝口语化表述 +- **场景模板套用原则**:识别场景类型后必须套用对应模板,不要自由发挥 +- **语义负面替代原则**:用"安静空旷的街道"替代"没有人的街道"("no/without" 类指令理解差) +- **兜底强制原则**:必须挂载防崩约束与高画质词 + +## 图像场景类型模板库(9 大类) + +在 Step 4 重写时,根据 Step 1 判定的场景类型套用以下模板作为骨架,再叠加八大要素与镜头术语。 + +> 💡 **典型协作工作流**:21:9 影视场景图常作为视觉初稿先生成(建立场景的构图、光影、色调、风格);然后基于场景概念用 360 度全景模板扩展为环绕版本,用于沉浸式浏览或作为视频生成的首帧。**两者使用不同的提示词结构,不可混用**。 + +### 1. ⭐ 影视级宽画幅场景图(21:9) + +**适用**:影视场景设计、游戏地图初稿、长画幅环境叙事、**全景图的前置场景设计稿**、视频生成的首帧。 + +**核心要点**: +- **画幅语义**:21:9 超宽画幅是影视场景图的核心,提示词文案中需明确"ultra-wide 21:9 establishing shot"等措辞 +- **构图法则**: + - **三分法**:地平线/视野中心放在画面 1/3 或 2/3 处 + - **引导线**:道路、河流、建筑边缘形成视觉引导 + - **纵深层次**:前景 + 中景 + 远景三层,避免单层平铺 + - **视野中心(Focal Point)**:明确一个吸引视线的主体(角色 / 建筑 / 光源 / 异常元素) +- **可拼接性**:场景设计图常用于后续拼接为 360 度全景或视频,画面边缘避免突兀切割(如人物半截、文字断裂) +- **场景叙事节奏**:从一端到另一端的视觉故事(如左侧静谧、中段冲突、右侧远景留白) + +**核心模板**: +``` +A cinematic ultra-wide 21:9 establishing shot of [scene description]. +The composition follows the rule of thirds with [focal point] positioned at +[1/3 left | center | 2/3 right] of the frame. -### 1. Photorealistic Scenes +Foreground: [foreground element, close to camera, sharp detail]. +Mid-ground: [main subject / focal point, the visual anchor]. +Background: [distant element, atmospheric depth, soft focus]. -Use photography terminology: shooting angles, lens types, lighting, and details. +Lighting: [time of day, light direction, color temperature, e.g. +"golden-hour sunlight raking from camera-left, warm amber tones, +long shadows stretching across the ground"]. -``` -A photorealistic [shot type] of [subject], [action or expression], set in -[environment]. The scene is illuminated by [lighting description], creating -a [mood] atmosphere. Captured with a [camera/lens details], emphasizing -[key textures and details]. The image should be in a [aspect ratio] format. -``` +Camera: [shot type, e.g. "low-angle wide-angle lens, slight tilt to enhance +depth, anamorphic 2.39:1 framing"]. -### 2. Stylized Illustrations & Stickers +Atmosphere: [mood, weather, particles, e.g. "volumetric mist drifting between +buildings, dust motes catching the light, quiet tension"]. -Specify the style explicitly and request a white background. +Style: [photorealistic / matte painting / concept art / Studio Ghibli / +cyberpunk neon] with [rendering technique, e.g. "Unreal Engine 5 cinematic +render, hyper-detailed textures"]. +Technical: 8k ultra-detailed, sharp focus across all three depth layers, +no cropped subjects at frame edges, seamless horizontal continuity, +panoramic composition. ``` -A [style] sticker of a [subject], featuring [key characteristics] and a -[color palette]. The design should have [line style] and [shading style]. -The background must be white. + +**完整示例**(赛博朋克街道): ``` +A cinematic ultra-wide 21:9 establishing shot of a rain-soaked cyberpunk +street at midnight. The composition follows the rule of thirds with a +lone hooded figure positioned at 1/3 left of the frame, walking toward +the vanishing point in the right distance. -### 3. Text in Images +Foreground: glistening wet asphalt with neon reflections of pink and cyan +holographic billboards. Mid-ground: the hooded figure silhouetted against +a row of noodle stalls and steam-belching food carts. Background: towering +megacorp skyscrapers fading into purple haze, flying drones with red blinking +lights crossing between buildings. -Clearly state the text content, font style, and overall design. +Lighting: dominant cool cyan-magenta neon from signage, warm orange spill +from food stalls creating intimate pools of light, volumetric haze making +every light beam visible. -``` -Create a [image type] for [brand/concept] with the text "[text to render]" -in a [font style]. The design should be [style description], with a -[color scheme]. -``` +Camera: low-angle wide-angle lens at 24mm equivalent, slight upward tilt +to emphasize building height, anamorphic horizontal lens flares from neon. -### 4. Product & Commercial Photography +Atmosphere: heavy drizzle, steam from food vents, faint cherry blossom +petals drifting through frame, melancholic urban solitude. -For e-commerce, advertising, or branding — crisp, professional product shots. +Style: photorealistic cinematic, Blade Runner 2049 color grading, +Unreal Engine 5 hyper-realistic render with subsurface rain effects. -``` -A high-resolution, studio-lit product photograph of a [product description] -on a [background surface/description]. The lighting is a [lighting setup, -e.g., three-point softbox setup] to [lighting purpose]. The camera angle is -a [angle type] to showcase [specific feature]. Ultra-realistic, with sharp -focus on [key detail]. [Aspect ratio]. +Technical: 8k ultra-detailed, sharp focus across foreground figure and +mid-ground stalls, soft atmospheric falloff in background, no cropped +subjects at frame edges, seamless horizontal continuity for panoramic use. ``` -### 5. Minimalist & Negative Space Design +**影视场景图禁忌**: +- ❌ 不要把主体居中(21:9 居中会浪费两侧大量空间) +- ❌ 不要堆砌过多焦点(一个主焦点 + 一个次焦点足够) +- ❌ 不要写"a wide shot"就完事,必须明确 21:9 + 三分构图 + 三层纵深 +- ❌ 不要在画面边缘放重要元素(拼接/裁切会丢失) +- ❌ 场景图(21:9)与全景图(360 度)模板**不可混用**:21:9 是影视级横构图,360 度是球面环绕 -Ideal for creating backgrounds for websites, presentations, or marketing materials. +### 2. ⭐ 全景图(360 度等距柱状投影 / 2:1) +**适用**:全景节点、VR/AR 沉浸式场景、360 度环绕浏览(可在全景查看器中拖动环视)。 + +> ⚠️ **本套提示词是生成全景图的关键**——任何全景图需求都必须使用以下结构,关键术语不可替换、不可简化、不可删减。 + +**核心模板**: ``` -A minimalist composition featuring a single [subject] positioned in the -[bottom-right/top-left/etc.] of the frame. The background is a vast, empty -[color] canvas, creating significant negative space. Soft, subtle lighting. -[Aspect ratio]. +360 degree equirectangular panorama, seamless spherical projection, +2:1 aspect ratio, [主体描述]. The environment wraps fully 360 degrees +with consistent lighting and no visible seams. Style: photorealistic, +cinematic lighting, ultra detailed, 8K resolution ``` -### 6. Comic Panels / Storyboard +**关键术语解析**(缺一不可): +- `360 degree equirectangular panorama` ——声明全景投影类型,是生成体系识别全景的核心信号 +- `seamless spherical projection` ——强调球面无缝展开 +- `2:1 aspect ratio` ——等距柱状投影的标准比例(**与 21:9 影视宽画幅完全不同**) +- `wraps fully 360 degrees` ——强调环绕完整、首尾相接 +- `consistent lighting and no visible seams` ——防止首尾接缝处出现光影断层 +- `photorealistic, cinematic lighting, ultra detailed, 8K resolution` ——画质保障 -Create panels for visual storytelling based on character consistency and scene description. +**主体描述填法**: +- 将场景主体填入模板中部(如 `spaceship cockpit interior`、`medieval tavern at dusk`、`alien jungle with bioluminescent plants`) +- 可以包含光影、氛围、风格细节,但**不要描述固定镜头方向**(全景图无单一取景角度) +- **严禁**出现 `"left side"`、`"foreground"`、`"画面右侧"`、`"frame edge"` 等方位指代——全景图没有左右边缘 +**完整示例**(飞船驾驶舱): ``` -Make a 3 panel comic in a [style]. Put the character in a [type of scene]. +360 degree equirectangular panorama, seamless spherical projection, +2:1 aspect ratio, futuristic spaceship cockpit interior with curved +holographic displays surrounding the captain's chair, soft blue-cyan +ambient lighting from instrument panels, view of distant stars through +front viewport, clean metallic surfaces with subtle reflections. +The environment wraps fully 360 degrees with consistent lighting and +no visible seams. Style: photorealistic, cinematic lighting, ultra +detailed, 8K resolution ``` -## Image Editing Prompt Templates +**全景图禁忌**: +- ❌ 不要使用 `21:9`、`ultra-wide`、`cinematic establishing shot` 等横画幅术语(那是影视级横构图,不是全景图) +- ❌ 不要省略 `equirectangular` 关键字——这是生成体系识别全景投影的核心 +- ❌ 不要描述固定取景方向(如 `low-angle`、`over-the-shoulder`、`rule of thirds`) +- ❌ 主体描述中不要出现"画面左侧"、`foreground`、`frame edge` 等指代(全景没有边缘) +- ❌ 不要用 `panoramic shot`、`wide shot` 等模糊术语替代 `360 degree equirectangular panorama` -### 1. Adding & Removing Elements +### 3. 角色三视图(Character Sheet) -Provide the image and describe changes. The model will match the original style, lighting, and perspective. +**适用**:游戏角色、IP 设计、3D 建模参考。 +**模板**: ``` -Using the provided image of [subject], please [add/remove/modify] [element] -to/from the scene. Ensure the change is [description of how the change should integrate]. +A professional character reference sheet showing [character description] +in three views: front view, right side view (90° profile), and back view. +Neutral standing pose with arms slightly away from body. Pure white +background. Consistent proportions, lighting, and color palette across +all three views. Character design sheet style, soft even studio lighting, +no shadows, full body visible from head to feet. ``` -### 2. Semantic Inpainting +### 4. 产品广告图(Commercial Photography) -Define a conversational "mask" to modify specific parts while keeping the rest unchanged. +**适用**:电商主图、品牌广告、产品 PR。 +**模板**: ``` -Using the provided image, change only the [specific element] to [new -element/description]. Keep everything else in the image exactly the same, -preserving the original style, lighting, and composition. +A high-resolution, studio-lit commercial product photograph of +[product description]. Set on a [background surface, e.g. "polished black +marble" / "soft cream linen" / "floating in mid-air with depth-of-field +gradient"]. Three-point softbox lighting setup with [key light direction] +to [purpose, e.g. "highlight the curved bottle silhouette and create a +gentle gradient on the label"]. [Camera angle, e.g. "slight 15° tilt +above eye level"]. Ultra-realistic, sharp focus on [key detail, e.g. +"the embossed brand logo and condensation droplets"]. Color palette: +[brand colors]. Aspect ratio [1:1 / 4:5 / 3:4]. ``` -### 3. Style Transfer +### 5. 概念美术图(Concept Art) -Reproduce image content in a different artistic style. +**适用**:游戏/影视前期视觉开发、世界观设计。 +**模板**: ``` -Transform the provided photograph of [subject] into the artistic style of -[artist/art style]. Preserve the original composition but render it with -[description of stylistic elements]. +[World/scene name] concept art, [genre, e.g. "dark fantasy" / "post- +apocalyptic sci-fi" / "ethereal high fantasy"]. [Subject of focus, e.g. +"a lone knight standing before a cathedral of crystalline trees"]. +Painterly digital matte painting style, dramatic chiaroscuro lighting, +[color palette, e.g. "muted teal and burnt orange complementary scheme"], +visible brushwork. Composition: [composition technique]. Atmosphere: +[mood, environmental storytelling details]. Inspired by [reference artist +or studio, e.g. "Jakub Rozalski" / "Studio Trigger" / "Frazetta"]. +8k, intricate environmental details, story-rich background elements. ``` -### 4. Multi-Image Composition +### 6. IP 立绘 / 海报 -Combine multiple images into a new composite scene. Great for product mockups or creative collages. +**适用**:单角色全身/半身展示、宣传海报。 +**立绘模板**: ``` -Create a new image by combining the elements from the provided images. Take -the [element from image 1] and place it with/on the [element from image 2]. -The final image should be a [description of the final scene]. +Full-body character illustration of [character description] in a [pose +description, e.g. "powerful three-quarter stance, weight on back foot, +weapon held diagonally across body"]. [Style, e.g. "Genshin Impact-style +anime illustration" / "cel-shaded with bold linework"]. [Background, e.g. +"transparent background" / "soft gradient background, character separated +from background by subtle rim lighting"]. Detailed costume rendering with +[material details, e.g. "metal armor reflections, cloth folds, jewelry +sparkle"]. Eye-level camera, full-body framing with slight headroom. ``` -### 5. High-Fidelity Detail Preservation - -Preserve critical details (faces, logos) during editing by describing them thoroughly. - +**海报模板**(含文字): ``` -Using the provided images, place [element from image 2] onto [element from -image 1]. Ensure that the features of [element from image 1] remain -completely unchanged. The added element should [description of how the -element should integrate]. +A cinematic movie poster for "[title]". Central image: [main visual]. +Title "[exact title text]" rendered in [font style, e.g. "bold serif +weathered metallic gold"] positioned at [bottom center / top]. Tagline +"[tagline]" in smaller [font] below title. Color grading: [palette]. +Aspect ratio 2:3 (poster standard). 8k, professional theatrical poster +composition. ``` -### 6. Sketch to Image +### 7. 漫画分格 / 分镜插画 -Upload a sketch or doodle and have the model refine it into a finished image. +**适用**:分镜脚本可视化、漫画创作。 +**模板**: ``` -Turn this rough [medium] sketch of a [subject] into a [style description] -photo. Keep the [specific features] from the sketch but add [new details/materials]. +A [n]-panel comic page in [style, e.g. "Japanese seinen manga" / +"American superhero ink"]. Panel layout: [layout description, e.g. +"3 horizontal strips, 2 panels each"]. + +Panel 1 [size]: [scene description]. Camera: [shot type]. +Dialogue: "[dialogue]". + +Panel 2 [size]: [scene description]. ... + +Consistent character design across all panels, dynamic panel transitions +following [eye-flow direction, left-to-right top-to-bottom for Western, +right-to-left for manga]. Black ink linework with [shading style, e.g. +"halftone screentones" / "dramatic shadow blocks"]. ``` -### 7. 360-Degree Character Consistency +### 8. 极简设计 / 负空间 -Iteratively prompt different angles to generate a 360-degree view of a character. Include previously generated images in follow-up prompts to maintain consistency. +**适用**:网站背景、营销物料、品牌简约视觉。 +**模板**: ``` -Generate a [character description] from a [angle] view. Maintain consistent -appearance with the provided reference image(s). For complex poses, include -a reference image of the desired pose. +A minimalist composition featuring a single [subject] positioned at the +[bottom-right / top-left / golden ratio point] of the frame. Vast empty +[color] background creating significant negative space (approximately 80% +of frame). Soft, subtle [lighting direction] casting a delicate shadow. +[Optional: a single accent element at opposite corner for visual balance]. +Clean, uncluttered, breathable composition. ``` -## Optimization Workflow +### 9. 风格化插画 / 表情包 -When the user provides a rough description, follow these steps to optimize: +**适用**:贴纸、icon、表情包、UI 装饰。 -### Step 1: Analyze User Intent -Determine whether this is "new generation" or "image editing", then select the corresponding template category. +**模板**: +``` +A [style, e.g. "kawaii cartoon" / "flat vector" / "3D clay render"] +sticker of [subject], featuring [key characteristics, e.g. "oversized +sparkly eyes, blushing cheeks, holding a tiny coffee cup"] and a +[color palette, e.g. "pastel pink and mint green"]. [Line style, e.g. +"thick bold black outlines" / "no outlines, soft gradient edges"] and +[shading style, e.g. "cel-shading with hard shadows" / "soft airbrush +gradients"]. The background must be pure white (or transparent for +sticker use). Centered composition, full subject visible. +``` -### Step 2: Element Check -Verify the user's description includes these key elements: -- **Subject**: Who or what? -- **Action / Expression**: What are they doing? -- **Environment**: Where is this set? -- **Lighting**: What mood or atmosphere? -- **Composition / Camera**: How is it framed? -- **Style**: What visual style? +## 图像编辑模板库 -### Step 3: Enrich & Optimize -- Fill in missing elements using natural narrative language -- Write the final prompt in English for best quality -- Be extremely specific (use "ornate elven plate armor etched with silver leaf patterns" instead of "fantasy armor") -- Provide context and intent (state what the image is for) -- Use "semantic negative prompts" (use "an empty, desolate street" instead of "no cars") -- Use photography and cinematic language to control composition (wide-angle shot, macro shot, low-angle perspective) +当用户提供参考图并需要对图像进行修改时,使用以下编辑模板。 -### Step 4: Output Optimized Result -Present to the user: -1. **Optimized Prompt** — the complete English prompt -2. **Optimization Notes** — issues found in the original description and improvements made -3. **Suggested Parameters** — recommended aspect_ratio and n values +### Pattern 1: 局部修改 / Inpainting +保留原图大部分内容,仅修改特定元素: +``` +Using the provided image, change only the [specific element] to [new +description]. Keep everything else exactly the same, preserving the +original style, lighting, composition, and all other details. +``` -## Integration with image_tools +### Pattern 2: 风格迁移 +保留构图但改变艺术风格: +``` +Transform the provided photograph of [subject] into the artistic style +of [target style / artist]. Preserve the original composition and subject +identity, but render it with [stylistic elements description]. +``` -After optimization, pass the prompt to the corresponding tool: -- **New Generation** → `generate_image(prompt=..., aspect_ratio=..., n=...)` -- **Image Editing** → `edit_image(image_url=..., prompt=...)` +### Pattern 3: 角色置入新场景 +将参考图中的角色放入新环境: +``` +The same [character description from reference] from the reference image, +now [action / pose] in [new environment]. Preserve the character's +[key features to keep, e.g. "facial features, hair color, outfit details"]. +[Style and technical details, lighting, camera angle]. +``` -## Best Practices +### Pattern 4: 多图合成 +组合多张图的元素: +``` +Create a new image by combining elements from the provided images. Take +the [element from image 1] and [action] with the [element from image 2]. +The final image should be [description of final scene]. Adjust lighting +and shadows to create a cohesive, naturally integrated result. +``` + +### Pattern 5: 高保真细节保留 +关键细节(人脸/logo/文字)必须像素级保留: +``` +Using the provided image(s), [edit description]. Ensure that +[critical element, e.g. "the woman's face, hair, and skin tone"] remains +completely unchanged, pixel-perfect identical to the reference. The +[modified element] should [integration description, e.g. "appear naturally +printed on the fabric, following the cloth folds and lighting"]. +``` -- **English Prompts**: Always write the final prompt in English for best quality -- **Be Specific**: The more detail you provide, the more control over the output -- **Iterate**: Leverage the conversational nature for incremental adjustments ("make the lighting warmer", "make the expression more serious") -- **Step-by-Step Instructions**: Break complex scenes into multiple steps (background first, then foreground, then details) -- **Positive Descriptions**: Describe the desired scene to exclude unwanted elements, rather than saying what should not be there +### Pattern 6: 草图细化 +将草图/线稿转为成品图: +``` +Turn this rough [sketch / line art] of [subject] into a [target style, +e.g. "photorealistic 8k photograph" / "polished anime illustration"]. +Keep the [specific features from sketch, e.g. "pose, composition, +character proportions"] but add [new details, e.g. "realistic skin +texture, fabric materials, environmental lighting"]. +``` -## Limitations +### Pattern 7: 360 度角色一致性 +迭代生成角色不同角度: +``` +Generate the same [character description] from a [angle, e.g. +"three-quarter back view"]. Maintain consistent appearance with the +provided reference image(s) — same outfit, same hairstyle, same proportions, +same color palette. [Pose / action description]. +``` -- Best performance languages: English, zh-CN, ja-JP, ko-KR, fr-FR, de-DE, es-MX, pt-BR, ru-RU, it-IT, ar-EG, hi-IN, id-ID, vi-VN, ua-UA -- Audio or video inputs are not supported -- The model may not generate the exact number of images explicitly requested by the user +## 强制约束 + +- **拒绝静默修改**:未与用户确认前,不要自动猜测并填充缺失要素或修改冲突。 +- **强制兜底**:最终提示词必须包含防崩约束(`sharp focus`, `no cropped subjects`, `consistent proportions`)与高画质词(`8k ultra-detailed`)。 +- **英文优先原则**:默认所有最终提示词写为英文(跨体系最稳);仅在中式国漫/仙侠等中文语境强相关场景下可保留关键中文词。 +- **严禁 @图N / @视频N 标记**:图像提示词中不使用这类引用语法,多图引用一律使用自然语言(如 `"the character from the first reference image"`)。 +- **专业镜头术语强制**:构图、视角、焦段、光线方向必须使用专业电影术语,禁止使用 `"a nice angle"`、`"good lighting"` 等模糊表达。 +- **全景图(360 度等距柱状投影)特殊约束**:提示词必须包含 *`360 degree equirectangular panorama` + `seamless spherical projection` + `2:1 aspect ratio` + `wraps fully 360 degrees` + `consistent lighting and no visible seams`* 五个关键术语,缺一不可。主体描述中严禁出现 `left/right side`、`foreground`、`frame edge` 等方位指代。 +- **影视场景图(21:9)特殊约束**:提示词必须包含 *三分构图 + 三层纵深(前/中/远景) + 视野中心明确 + 边缘无截断*,主体不要居中。 +- **语义负面替代原则**:用 `"an empty desolate street"` 替代 `"no cars on the street"`;用 `"clean uncluttered desk"` 替代 `"desk without items"`。 +- **职责边界原则**:本技能输出中不出现具体模型名、Provider 名、工具参数名、API 签名与调用顺序。这些信息存放于各工具专用 skill。 + +## 常见错误与避坑指南 + +在 Step 3 要素审查阶段依据以下清单对提示词进行体检,发现问题在多选交互中列出: + +1. **关键词堆砌**:`"fisherman, dock, sunset, oil painting, 8k"` 这种关键词列表会被误解为"无关元素并列"。必须改为完整段落叙述。 +2. **模糊表达**:`"美一点"`、`"好看的角度"`、`"那种感觉"`,必须替换为专业电影镜头/构图术语。 +3. **指令冲突**:风格冲突(写实+卡通)、视角冲突(仰+俯)、焦段冲突(广角+长焦)—— 多选交互中必须让用户选定一种。 +4. **画幅语义未在 prompt 中表述**:提示词文案中应明确画幅语义(全景图用 `"360 degree equirectangular panorama, 2:1"`;横版用 `"21:9 ultra-wide"`;方图用 `"square 1:1"`),让生成体系从语义上理解构图。 +5. **参考素材无语义角色**:上传了 N 张参考图,每一张都必须在 prompt 中用自然语言点明语义角色("角色形象""服装""背景")。 +6. **写实人脸滥用**:部分生成体系对真人脸敏感,如需写实人物建议改为"风格化角色"避免拦截。 +7. **全景图使用错误术语**:生成 360 度全景时误用 `21:9`、`ultra-wide`、`panoramic shot` 等横画幅术语;反之,生成影视级场景图时误用 `equirectangular` 语法。两者术语体系不可混用。 +8. **多图合成无主次声明**:多图合成时必须用 prompt 明确"哪张是主体、哪张是元素来源",否则生成结果会自由发挥。 +9. **负面提示用 "no/without"**:生成体系对否定指令理解差,改用"语义负面替代"(描述目标状态而非排除项)。 diff --git a/backend/skills/active_skills/Seedance_Prompt_Optimizer/SKILL.md b/backend/skills/active_skills/Seedance_Prompt_Optimizer/SKILL.md index 648229c0..ded6ce01 100644 --- a/backend/skills/active_skills/Seedance_Prompt_Optimizer/SKILL.md +++ b/backend/skills/active_skills/Seedance_Prompt_Optimizer/SKILL.md @@ -1,23 +1,71 @@ --- -description: Seedance2.0视频生成模型专用的提示词优化技能 +description: Seedance 2.0 多模态AI视频生成模型专用的提示词优化技能。当用户提供视频生成提示词、多媒体素材(图片/视频/音频),或明确请求优化提示词时调用。提供三段式结构、八大核心要素、多模态参考控制框架与 12 类场景模板库;镜头术语词典依赖 `Cinematic_Camera_Language` skill 协同提供。本技能专注于工程化语法规范,将粗略描述重写为分镜台本级精度的提示词。 metadata: - builtin_skill_version: '1.0' + builtin_skill_version: '2.1' name: Seedance_Prompt_Optimizer --- ---- -name: "Seedance 2.0 提示词优化专家" -description: "Seedance 2.0 提示词优化专家。当用户提供视频生成提示词、多媒体素材,或明确请求优化提示词时调用。基于三段式结构、八大核心要素和多模态参考控制框架,将粗略描述重写为高质量工程化提示词。" -metadata: - builtin_skill_version: "1.0" ---- - # Seedance 2.0 Prompt Optimizer -**IMPORTANT**: This is a prompt optimization skill, NOT a video generation tool. After optimizing the prompt, you should call `generate_video` or `edit_video` (from `video_tools` skill) to actually generate the video. +**IMPORTANT**: 本技能是提示词优化器,**不是**视频生成工具。优化完成后,请调用 `video_tools` 技能中的 `generate_video` 或 `edit_video` 实际生成视频。 + +## 协同 Skill 依赖 + +本技能与以下 skill 形成明确职责分工,**使用时请同时启用**: + +| Skill | 角色 | 提供内容 | +|---|---|---| +| **Seedance_Prompt_Optimizer**(本技能) | 语法规范层 | 工作流、三段式结构、@引用语法、八大要素、场景模板、防崩约束 | +| **Cinematic_Camera_Language** | 镜头术语词典 | 50 种专业镜头枚举、场景-镜头映射表、完整镜头参考手册 | +| **video_tools** | 执行生产层 | `generate_video` / `edit_video` 实际调用 Seedance API | + +**分工原则**:本技能负责“怎么造句”(提示词工程化框架),Cinematic_Camera_Language 负责“用什么词”(镜头术语权威词典)。下文内嵌的最小词汇表仅为核心语法层术语,进阶镜头请查阅 Cinematic_Camera_Language 词典。 ## 角色定位 -你是 Seedance 2.0 多模态 AI 导演和提示词优化专家。你的首要任务是拦截用户"纯文案堆砌形容词"的低质量提示词,并基于《Seedance 2.0 提示词工程化优化框架》将它们引导和重写为高质量的工程化提示词(三段式结构、八大核心要素、多模态参考控制)。 +你是 Seedance 2.0 多模态 AI 导演和提示词优化专家。你的首要任务是拦截用户"纯文案堆砌形容词"的低质量提示词,并基于《Seedance 2.0 提示词工程化优化框架》将它们引导和重写为分镜台本级精度的工程化提示词(三段式结构、八大核心要素、多模态参考控制、电影镜头语言、场景模板库)。 + +## Seedance 2.0 模型能力规格 + +### 输入支持矩阵 +| 输入类型 | 数量上限 | 支持格式 | 大小限制 | +|---|---|---|---| +| 图片 | ≤ 9 张 | jpeg、png、webp、bmp、tiff、gif | 每张 < 30 MB | +| 视频 | ≤ 3 个 | mp4、mov | 每个 < 50 MB,总时长 2–15s | +| 音频 | ≤ 3 个 | mp3、wav | 每个 < 15 MB,总时长 ≤ 15s | +| 文本 | 自然语言提示词 | — | — | +| **总文件数** | **≤ 12 个** | — | — | + +### 输出参数 +- 生成时长:4–15 秒(按需选择,建议与提示词复杂度匹配) +- 自带音效/配乐能力(在提示词中显式指导音频) +- 视频分辨率:480p(640×640)至 720p(834×1112) + +### 系统硬性约束(拦截规则) +- **不支持写实真人脸部素材**(图片和视频均不可),系统会自动拦截 → 必须在 Step 2 提前识别并劝阻用户。 +- **有参考视频时生成费用略高** → 在交付前提醒用户。 +- **总文件数超 12 个时**必须协助用户裁剪,优先保留对画面或节奏影响最大的素材。 + +### @ 引用系统全用途映射表(核心语法) +Seedance 2.0 通过 `@` 来指定每个素材的用途,**这是提示词撰写最关键的部分**。务必明确说明**每个引用的作用**: + +| 用途分类 | 标准写法示例 | +|---|---| +| 首帧约束 | `@图1 作为首帧` | +| 尾帧约束 | `@图2 作为尾帧` | +| 人物形象 | `参考 @图1 的人物形象` | +| 场景/背景 | `场景参考 @图3` | +| 运镜复刻 | `参考 @视频1 的运镜效果` | +| 动作编排 | `参考 @视频1 的动作编排` | +| 特效/转场 | `完全参考 @视频1 的特效和转场` | +| 节奏/节拍 | `视频节奏参考 @视频1` | +| 音色/语气 | `旁白音色参考 @视频1` | +| 背景音乐 | `背景BGM参考 @音频1` | +| 音效采样 | `音效参考 @视频3 的音效` | +| 服装参考 | `穿着 @图2 的服装` | +| 产品外观 | `产品细节参考 @图3` | +| 字体/文字 | `字体参考 @图2 的字体` | + +**多引用组合范例**:`@图1 的人物作为主体,参考 @视频1 的运镜和动作编排,背景BGM参考 @音频1,场景参考 @图2`。 ## 核心工作流 当用户输入粗略的提示词、提供多模态素材(图片/视频),或**仅仅提出视频生成需求(如"帮我生成一个狗跑的视频")**时,请严格按照以下步骤执行: @@ -37,8 +85,14 @@ metadata: - 画布图像节点 → 通过 `get_canvas_node` 获取 `data.imageUrl` → 按传入顺序编号为 图片1/图片2 - 画布视频节点 → 通过 `get_canvas_node` 获取 `data.videoUrl` → 按传入顺序编号为 视频1/视频2 - 编号规则与 `generate_video` 工具的数组顺序一致:`reference_images[0]=图片1`, `reference_videos[0]=视频1` -2. **长图/九宫格确认**:询问用户上传的素材是否为长图或九宫格。拆分为单图后再使用。 -3. **映射逻辑确认**:当存在多图但未明确映射逻辑时(如:谁是左边谁是右边,谁是首帧谁是尾帧),向用户提问并要求明确。 +2. **用途明确化**:上传了 N 张图片,每一张都必须用 `@` 标注清楚用途(参考上方 *@ 引用系统全用途映射表*)。**不允许出现未被 @ 引用的孤立素材**。 +3. **长图/九宫格确认**:询问用户上传的素材是否为长图或九宫格。拆分为单图后再使用。 +4. **映射逻辑确认**:当存在多图但未明确映射逻辑时(如:谁是左边谁是右边,谁是首帧谁是尾帧),向用户提问并要求明确。 +5. **硬性约束预检**(拦截优先): + - **写实人脸检查**:若用户上传的图片/视频含可辨识的真人面部,立即提醒“Seedance 2.0 不支持写实真人脸部素材”,并提供替代方案(改用插画风格、转为荷兰画、马赛克处理等)。 + - **总文件数检查**:计算 `图片数 + 视频数 + 音频数 ≤ 12`。超过上限时与用户协商裁剪优先级。 + - **总时长检查**:参考视频总时长 ≤ 15s、参考音频总时长 ≤ 15s,超出需提醒裁剪。 + - **费用预告**:若使用了 `@视频N`,在交付提示词前提醒“含参考视频生成费用略高”。 ### Step 3: 要素审查与多选交互确认 1. 检查用户的提示词是否包含以下"八大核心要素": @@ -88,22 +142,17 @@ metadata: - `降格拍摄`:低帧率拍摄后常速播放,产生快进效果。 - 其余常规运镜:`推镜头`、`拉镜头`、`摇镜头`、`移镜头`、`升镜头`、`降镜头`、`环绕镜头`(Orbit)、`手持镜头`(Handheld)。 -**F. 进阶运镜与镜头手法(Advanced Shots)——决定叙事张力与节奏** -- `前推揭示`(Push-in Reveal):推镜头强调细节或介绍人物,剔除不重要信息,表示进入新空间;会拖慢节奏,谨慎使用。 -- `角色推进`:向角色缓慢推进,放大情绪、产生认同感;对无表情/无台词角色推进暗示"思考";角色走向镜头并同步推进可增强动感;可与角色主观视角镜头反复切换以表达"专注凝视"。 -- `反打镜头`(Shot/Reverse Shot):对话场景中 A、B 双方正反切换,分 `内反打`(越轴越肩正对人物)与 `外反打`(过肩带前景人物);外反打前推可逐渐转为内反打。 -- `跳切变焦`(Jump Zoom):省略变焦过程直接切换景别,产生强烈跳跃感;推式强调主体,拉式强调环境;反复跳切产生节奏韵律感。 +**F. 进阶运镜与镜头手法(Advanced Shots)——详见 `Cinematic_Camera_Language` 词典** + +本区仅保留 Seedance 工程化中高频使用、且与语法规范强相关的几项进阶镜头,全集50 项进阶手法(包括子弹时间、鱼眼、身体安装镜头、分相器、镜子倒影、Tilt-Shift 等)请调用 `Cinematic_Camera_Language` skill 获取完整词典。 + +- `希区柯克变焦`(Hitchcock Zoom ≈ Cinematic #4):推拉 + 变焦反向补偿,产生背景扭曲、主体不变的眩晕效果;适用于“心理冲击”、“顿悟瞬间”。 +- `跳切变焦`(Jump Zoom ≈ Cinematic #26):省略变焦过程直接切换景别,推式强调主体、拉式强调环境;反复跳切产生节奏韵律感。 - `两极镜头`(Extreme Cut):直接从远景/全景切到特写/近景,制造冲击力、惊吓感或紧张刺激感。 -- `后跟镜头`:空间感强,带监视/跟踪意味,暗示新事件即将发生,观众更在意空间变化。 -- `侧跟镜头`:加强主体运动感,强调动作过程与空间关系,适合长焦镜头拍摄。 +- `反打镜头`(Shot/Reverse Shot):对话场景 A、B 双方正反切换,分 `内反打` 与 `外反打`;外反打前推可逐渐转为内反打。 - `定格镜头`(Freeze Frame):画面瞬间凝固,常用于人物介绍、转场或结束。 -- `主观镜头`(POV)进阶:最好用标准焦段,持续时间不宜过长,可加前景遮罩(门缝、树叶等)增强代入感。 -- `跟焦`(Rack Focus):前后景焦点转移,引导观众视线。 -- `建置镜头 / 定场镜头`(Establishing Shot)进阶:作为影片首镜头,确立影调与色彩风格,介绍环境与气氛,一般用全景或远景选最能体现环境特点的视角。 -- `镜像镜头`:角色面对镜子展现真实自我,制造额外空间与反打变化,可代表多重人格或额外身份;`镜子破裂` 具不祥寓意。 -- `传声镜头`:角色向天空呼喊时镜头向上拉;角色喊话后镜头沿喊话方向移动,形似"传声筒"。 -- `升高跟进`:角色进入宽广/线索较多的新场景时常用,先跟进后上升,兼顾角色与环境。 -- `手持摇晃镜头`(Handheld Shake):增强纪实感与临场感;战争/动作片常用以强化动感、混乱与紧张;也可暗示角色精神不稳定。 + +> 其余进阶手法(前推揭示、角色推进、后跟镜头、侧跟镜头、主观镜头进阶、跟焦、建置镜头进阶、镜像镜头、传声镜头、升高跟进、手持摇晃镜头、鱼眼镜头、Tilt-Shift、分相器、Anamorphic Flare 等) → 调用 `Cinematic_Camera_Language` 词典。 **G. 场景叙事镜头模板(Scene Templates)——特定场景专用镜头组合** - `追逐镜头`(Chase Shot):必须交代 ①追逐者与被追者的空间关系(大景别);②双方状态细节(小景别);③画面运动感效果(如侧跟 + 广角 + 升格切换)。 @@ -194,7 +243,7 @@ metadata: - **断句防歧义原则**:`@图N` 之后必须紧跟指代词或名词。 - **Asset ID 屏蔽原则**:禁止在动作描述中直接使用 `[asset-xxx]`,必须通过 `@图N` 桥接。 - **运镜限制规范**:单个时间切片只允许 1 种运镜。 -- **镜头语言标准化原则**:**所有分镜必须使用"电影镜头语言词汇库"中的专业术语(景别 + 角度 + 运镜 三层元数据),严禁出现"拍一下""看一眼""拍得好看"等模糊口语化表述**。该原则的目标是让提示词达到分镜台本级别的精度,使 Seedance 2.0 能够准确解码导演的拍摄意图(例如:用 `低角度仰拍 + 广角镜头` 而非"从下往上拍";用 `升格摄影` 而非"慢镜头";用 `过肩拍摄` 而非"在他后面拍对面的人")。 +- **镜头语言标准化原则**:**所有分镜必须使用专业术语(景别 + 角度 + 运镜 三层元数据),严禁出现"拍一下""看一眼""拍得好看"等模糊口语化表述**。**镜头术语以 `Cinematic_Camera_Language` skill 为权威词典**——本技能内嵌仅为最小语法层词表,进阶术语请查阅 Cinematic 词典。该原则的目标是让提示词达到分镜台本级别的精度,使 Seedance 2.0 能够准确解码导演的拍摄意图(例如:用 `低角度仰拍 + 广角镜头` 而非"从下往上拍";用 `升格摄影` 而非"慢镜头";用 `过肩拍摄` 而非"在他后面拍对面的人")。 - **轴线原则(180° Rule)**:多角色/双方互动场景必须保持轴线一致性,若越轴须显式标注"越轴镜头"或插入中性过渡镜头,禁止默认越轴造成空间错乱。 - **蒙太奇结构显式声明原则**:当分镜涉及多线索、跳跃时空或剪辑强调时,必须在分镜脚本头部显式标注蒙太奇类型(如"采用交叉蒙太奇"、"采用平行蒙太奇"),并在对应时间切片中清晰拆分 A、B 线索。 - **场景模板套用原则**:`追逐镜头 / 打斗镜头` 等场景必须按模板要素完整交代(空间关系 + 状态细节 + 运动感效果 / 反应镜头 + 借位长焦 + 强化结果 等组合),禁止只写"他们在打架/追逐"。 @@ -204,10 +253,10 @@ metadata: ## 与 generate_video 工具的协同 -优化完成后,需要将提示词和素材传递给 `generate_video` 工具: +优化完成后,**推荐协同链路**:Seedance_Prompt_Optimizer → Cinematic_Camera_Language(查询镜头术语)→ video_tools.generate_video(执行生成)。需要将提示词和素材传递给 `generate_video` 工具: 1. **提示词** → `prompt` 参数 - - **镜头术语必须原样保留**:优化后提示词中的"电影镜头语言词汇库"术语(如 `低角度仰拍`、`升格摄影`、`过肩拍摄`、`定场镜头` 等)必须完整、逐字地写入 `prompt`,禁止在传递前被替换成口语化描述。 + - **镜头术语必须原样保留**:优化后提示词中的专业镜头术语(来自本技能内嵌词表或 `Cinematic_Camera_Language` 词典,如 `低角度仰拍`、`升格摄影`、`过肩拍摄`、`子弹时间`、`希区柯克变焦` 等)必须完整、逐字地写入 `prompt`,禁止在传递前被替换成口语化描述。 - **分镜格式保留**:时间片分镜脚本应以 `[时间段] | [景别] + [角度] + [运镜] | [动作描述]` 的结构写入 `prompt`,让模型逐切片解码。 2. **参考图片** → `reference_images` 数组(顺序与 图片1/图片2 编号一致) 3. **参考视频** → `reference_videos` 数组(顺序与 视频1/视频2 编号一致) @@ -222,4 +271,144 @@ metadata: - **复杂场景处理**:针对复杂的多人正面动态视频,**必须使用强方位约束**(如"左侧角色穿灰蓝色作训服"),并辅以固定机位控制,以避免穿模或跳脸。 - **Asset ID 屏蔽原则**:底层模型无法直接理解无语义的 Asset ID,必须通过 `@图N` 建立文本到视觉特征的桥梁,严禁让 `[asset-xxx]` 独立代替人物主体出现在提示词动作描述中。 - **断句防歧义原则**:所有的 `@图N` 引用后,必须紧跟指代词或名词(如"的男子"、"(李武)"),严禁直接连接动词或方位词,以防止大模型出现分词歧义导致的数量生成错误。 -- **镜头语言标准化强制约束**:最终交付给 `generate_video` 的 `prompt` 中,**每个时间切片都必须完整包含"景别 + 角度 + 运镜"三层专业术语**。若原始需求缺失任一层,必须通过 Step 3 多选交互补齐,禁止使用"拍摄""镜头对准""拍到"等模糊动词替代标准术语。 \ No newline at end of file +- **镜头语言标准化强制约束**:最终交付给 `generate_video` 的 `prompt` 中,**每个时间切片都必须完整包含"景别 + 角度 + 运镜"三层专业术语**。若原始需求缺失任一层,必须通过 Step 3 多选交互补齐,禁止使用"拍摄""镜头对准""拍到"等模糊动词替代标准术语。 + +## 场景模板库(12 类高频场景) + +在 Step 4 重写提示词时,根据 Step 1 判定的场景类型套用以下模板作为骨架,再叠加三段式结构与镜头语言。 + +### 1. 人物一致性场景 +通过锚定参考图片保持角色统一: +``` +男人 @图1 下班后疲惫的走在走廊,脚步变缓,最后停在家门口, +脸部特写镜头,男人深呼吸,调整情绪,收起了负面情绪,变得轻松, +然后特写翻找出钥匙,插入门锁,进入家里后,他的小女儿和一只宠物狗 +欢快的跑过来迎接拥抱,室内非常的温馨,全程自然对话 +``` +**核心禁忌**:不要多处描述人物外貌,让 `@图1` 独立接管人物形象。 + +### 2. 运镜精准复刻场景 +``` +参考 @图1 的男人形象,他在 @图2 的电梯中,完全参考 @视频1 +的所有运镜效果还有主角的面部表情,主角在惊恐时希区柯克变焦, +然后几个环绕镜头展示电梯内视角,电梯门打开,跟随镜头走出电梯, +电梯外场景参考 @图3,男人环顾四周,参考 @视频1 用机械臂多角度跟随人物的视线 +``` + +### 3. 创意模板 / 特效复刻场景 +``` +将 @视频1 的人物换成 @图1,@图1 为首帧,人物带上虚拟科幻眼镜, +参考 @视频1 的运镜及近的环绕镜头,从第三人称变为主观视角,在AI虚拟眼镜中穿梭, +来到 @图2 的深邈蓝色宇宙,出现几架飞船穿梭向远方,镜头跟随飞船穿梭到 @图3 的像素世界 +``` + +### 4. 视频延长场景【特殊说明】 +``` +将 @视频1 延长15秒。 +1-5秒:光影透过百叶窗在木桌、杯身上缓缓滑过,树枝伴随轻微呼吸般的晃动。 +6-10秒:一粒咖啡豆从画面上方轻轻飘落,镜头向咖啡豆推进至画面黑屏。 +11-15秒:英文渐显"Lucky Coffee"、"Breakfast"、"AM 7:00-10:00" +``` +**关键规则**:延长视频时,`generate_video` 的 `duration` 应选 **“新增部分”的时长**(如延长5秒,生成长度也选 5秒)。 + +### 5. 视频编辑(修改已有视频) +保留原视频大部分内容,定向修改特定元素: +``` +颠覆 @视频1 里的剧情,男人眼神从温柔瞬间转为冰冷狠厉, +在露丝毫无防备的瞬间,猛地将女主从桥上往外推。动作干脆利落,带着 +蓄谋已久的决绝,没有丝毫犹豫 +``` +**角色替换**:`@视频1 中的女主唱换成 @图1 的男主唱,动作完全模仿原视频,不要出现切镜`。 +**元素添加**:`将 @视频1 女人发型变成红色长发,@图1 中的大白鲨缓缓浮出半个脑袋,在她身后`。 + +### 6. 音乐卡点场景 +画面与音频节奏精确同步: +``` +@图1 @图2 @图3 @图4 @图5 @图6 @图7 的图片根据 @视频 中的画面关键帧位置 +和整体节奏进行卡点,画面中的人物更有动感,整体画面风格更梦幻,画面张力强 +``` + +### 7. 对话与声音演绎场景 +``` +在“猫狗吐槽间”的一段吐槽对话,要求情感丰沛,符合脱口秀表演: +喵酱(猫主持,舔毛翻眼):“家人们谁懂啊,我身边这位,每天除了摇尾巴、拆沙发…” +旺仔(狗主持,歪头晃尾巴):“你还好意思说我?你每天睡18个小时…” +``` +**推荐搭配**:`旁白音色参考 @视频1`,使用 *声画合一 / 声画分离 / 声画对立* 明确声画关系。 + +### 8. 一镜到底场景 +``` +谍战片风格,@图1 作为首帧画面,镜头正面跟拍穿着红风衣的女特工向前走, +镜头全景跟随,不断有路人遮挡红衣女子,走到一个拐角处,参考 @图2 的拐角建筑, +固定镜头红衣女子离开画面,走在拐角处消失,一个戴面具的女孩在拐角处躲着恶狠狠的盯着她, +面具女孩形象参考 @图3。全程不要切镜头,一镜到底 +``` + +### 9. 电商 / 产品展示场景 +``` +将参考图进行一个拆解,镜头保持静止,汉堡悬浮在空中开始旋转,食材轻柔而精准地分离, +保持形状和比例,动作流畅,汉堡向两边分开,包括顶部金黄色带芝麻面包盖、鲜翠绿生菜叶、 +带有水珠的鲜红番茄切片、两层厚实多汁且夹着融化金黄切达芝士的烤牛肉饼,以及最底部的松软面包底座 +``` + +### 10. 科普 / 教育场景 +``` +15秒健康科普短片。 +0–5秒:透明蓝色人体上半身,镜头从胸腔缓慢推进到一条清晰的动脉,血液流动顺畅、颜色干净偏蓝。 +5–10秒:象征性的奶茶糖分与脂肪颗粒进入血液,镜头跟随血流前进,血液逐渐变稠,血管内壁开始附着浅黄色脂质。 +10–15秒:血管内腔明显变窄,流速下降,对比画面形成"之前vs现在"的状态差异。 +``` + +### 11. AI 短剧 / 漫改场景 +``` +将 @图1 以从左到右从上到下的顺序进行漫画演绎,保持人物说的台词与图片上的一致, +分镜切换以及重点的情节演绎加入特殊音效,整体风格诙谐幽默;演绎方式参考 @视频1 +``` + +### 12. 视频融合 / 续写场景 +``` +视频1中由粒子组成的马逐渐具象化,粒子变密,逐渐过渡到视频2, +视频2中的马在奔跑过程中逐渐变为视频3,并逐渐消散,画面唯美, +背景音是马蹄声和科技感粒子音效 +``` + +## 风格与质感修饰词库 + +在提示词末尾添加以提升输出质量,Step 4 的"画质、风格与约束"模块优先从本库选取: + +### 画面风格 +- `电影级质感,胶片颗粒,浅景深` / `2.35:1 宽银幕,24fps` +- `黑白水墨风格` / `动漫风格` / `超写实风格` +- `高饱和霓虹色调,冷暖对比` / `赛博朋克霓虹灯色温` +- `超逼真4K医学CGI,半透明可视化` / `超精细CG动画技术` + +### 氛围 / 情绪 +- `紧张悬疑` / `温暖治愈` / `史诗恢宏` +- `喜剧风格,表情夸张` / `纪录片风格,旁白克制` +- `暗黑奇幻` / `仙侠高燃` / `圣诞节梦幻色调` + +### 音频指导 +- `背景音乐:恢宏大气` / `背景BGM参考 @音频1` +- `音效:走路声、人群声、汽车声、脚步声、呼吸声` +- `转场画面与音乐节奏卡点` / `脚步声、衣料摩擦声与节拍贴合` +- `旁白音色参考 @视频1,声画分离` + +### 防崩坏兜底约束词(必须挂载) +- `人物面部稳定不变形、五官清晰、无穿模` +- `4K 高清,细节丰富,构图稳定` +- `摆造型与服装与参考图一致,不出现多余肢体` + +## 常见错误与避坑指南 + +在 Step 3 要素审查阶段依据以下清单对提示词进行体检,发现问题在多选交互中列出: + +1. **引用模糊**:禁止只写“参考 @视频1”,必须说清参考什么(运镜?动作?特效?节奏?)。 +2. **指令冲突**:不要在同一时间切片中同时要求“固定镜头”和“环绕镜头”,或同时“推镜头”与“平移”。 +3. **内容过载**:不要在 4-5 秒内塞入太多场景,要符合物理可行性;建议按 0-3s / 3-7s / 7-10s 裁切。 +4. **素材无归属**:上传了 5 张图片,每一张都必须用 `@` 标注清楚用途。 +5. **忽视音频**:音效设计能大幅提升输出质量,一定要写音频指导(BGM/音效/旁白音色/声画关系)。 +6. **时长不匹配**:提示词的复杂度要与选定的生成时长匹配;8秒以上必须分时段描述。 +7. **写实人脸**:不要上传包含真人清晰可辨识面部的素材,必被系统拦截。 +8. **延长时长错配**:延长视频时,`duration` 应选“新增部分”的时长而非原视频总时长。 +9. **Asset ID 裸奔**:动作描述中呈现 `[asset-xxx]`,该 ID 必须被 `@图 N` 桥接。 +10. **@ 引用后接动词**:`@图1跑向` 会被分词器误解为“@图1跑”,必须加括号报幕:`@图1(李武)跑向`。 \ No newline at end of file diff --git a/backend/skills/active_skills/canvas_tools/SKILL.md b/backend/skills/active_skills/canvas_tools/SKILL.md index 44497fea..6596d518 100644 --- a/backend/skills/active_skills/canvas_tools/SKILL.md +++ b/backend/skills/active_skills/canvas_tools/SKILL.md @@ -1,206 +1,140 @@ --- name: canvas_tools -description: "Canvas node and edge CRUD operations. Provides tools to manage theater canvas nodes and connections between them." +description: "Canvas node CRUD operations. Provides tools to list, get, create, update, and delete theater canvas nodes." metadata: - builtin_skill_version: "1.4" + builtin_skill_version: "1.0" --- # Canvas Tools -Use this skill when the user asks to view, create, update, or delete content on the theater canvas (nodes and edges). +Use this skill when the user asks to view, create, update, or delete content on the theater canvas (nodes). -Loading this skill activates 8 tools: `list_canvas_nodes`, `get_canvas_node`, `create_canvas_node`, `update_canvas_node`, `delete_canvas_node`, `list_canvas_edges`, `create_canvas_edge`, `delete_canvas_edge`. +Loading this skill activates the following tools: +- `list_canvas_nodes` — List all nodes on the canvas +- `get_canvas_node` — Get full details of a specific node +- `create_canvas_node` — Create a new node +- `update_canvas_node` — Update an existing node +- `delete_canvas_node` — Delete a node ---- +**Note:** Loading this skill grants access to all node types (text, image, video, storyboard). A theater (canvas) must be active in the current conversation for these tools to work. -## Node Types & Data Fields +## Node Types -### text — 文本节点 -For scripts, copy, and written content. Supports Markdown. +The canvas supports these node types (available types depend on agent configuration): -| Field | Type | Notes | -|-------|------|-------| -| `title` | string | Required | -| `content` | string | Markdown body (`#`, `**bold**`, code blocks, lists). Required on create; **omit on update if unchanged**. | -| `tags` | string[] | Optional, for categorization | +### text +Text nodes for scripts, copy, ads, and other written content. Supports rich text (Markdown). -### image — 图像节点 -For character designs, scenes, posters. +Fields: +- `title` (string, required) — Node title +- `content` (string, Markdown) — Body text. Supports headings (#/##/###), paragraphs, bold (**text**), italic (*text*), code blocks, etc. Required when creating; omit when updating if unchanged. +- `tags` (array, optional) — Tags for categorization -| Field | Type | Notes | -|-------|------|-------| -| `name` | string | Required | -| `description` | string | Scene/character description | -| `imageUrl` | string | `/api/media/xxx.jpg` — JPEG/PNG/WebP | -| `fitMode` | string | `"cover"` or `"contain"` | +### image +Image nodes for character designs, scenes, posters, and visual content. -### video — 视频节点 +Fields: +- `name` (string) — Image name +- `description` (string) — Image description (scene, character info, etc.) +- `imageUrl` (string) — Image URL path (e.g. `/media/xxx.jpg`), supports JPEG/PNG/JPG +- `fitMode` (string) — "cover" (fill) or "contain" (fit) -| Field | Type | Notes | -|-------|------|-------| -| `name` | string | Required | -| `description` | string | Scene/duration description | -| `videoUrl` | string | `/api/media/xxx.mp4` — MP4 | -| `fitMode` | string | `"cover"` or `"contain"` | +### video +Video nodes for animations, short films, and motion content. -### audio — 音频节点 +Fields: +- `name` (string) — Video name +- `description` (string) — Video description (scene, duration, etc.) +- `videoUrl` (string) — Video URL path (e.g. `/media/xxx.mp4`), supports MP4 +- `fitMode` (string) — "cover" (fill) or "contain" (fit) -| Field | Type | Notes | -|-------|------|-------| -| `name` | string | Required | -| `description` | string | Style/purpose description | -| `audioUrl` | string | `/api/media/xxx.mp3` — MP3/WAV/OGG | -| `lyrics` | string | Optional lyrics text | +### storyboard +Storyboard nodes for shot breakdowns and multi-dimensional table content. -### storyboard — 分镜/多维表格节点 -For shot breakdowns and table content. **Supports embedding media in cells.** +Fields: +- `shotNumber` (string) — Shot number (e.g. "1-1", "2-3") +- `description` (string) — Shot description +- `duration` (integer) — Duration in seconds +- `pivotConfig` (JSON) — Multi-dimensional table config with custom field types -| Field | Type | Notes | -|-------|------|-------| -| `shotNumber` | string | e.g. `"1-1"` | -| `description` | string | Shot description | -| `duration` | integer | Seconds | -| `tableColumns` | array | `[{key, label, type}]` — type: `"text"`, `"number"`, `"image"`, `"video"`, `"audio"` | -| `tableData` | array | `[{key: value, ...}]` — media cells use `/api/media/xxx.ext` paths | +## Tool: list_canvas_nodes ---- +List all nodes on the canvas, optionally filtered by type. + +Parameters: +- `node_type` (string, optional) — Filter by node type (e.g. "text", "image", "video", "storyboard") + +Returns a list of node summaries (id, type, position, key fields). -## Positioning & Layout +## Tool: get_canvas_node -Nodes live on an infinite 2D canvas. Each node has `position_x` (horizontal) and `position_y` (vertical), where **right = +X**, **down = +Y**. +Get full details of a specific node. -### Auto-placement -When creating nodes **without** specifying position, the system places them automatically to the right of existing nodes, wrapping to new rows when space runs out. **Use auto-placement by default** unless the user requests a specific layout. +Parameters: +- `node_id` (string, required) — ID of the node to retrieve -### Manual positioning -Both `create_canvas_node` and `update_canvas_node` accept optional `position_x` and `position_y` parameters (top-level, not inside `data`). +Returns complete node data including all fields, position, and metadata. -**Typical node sizes for spacing reference:** -- Standard node: ~420×300 px -- Horizontal gap: ~40–80 px -- Vertical gap: ~60–100 px +## Tool: create_canvas_node -### Layout patterns -When the user asks to "arrange" or "rearrange" nodes: -1. Call `list_canvas_nodes` to get current positions and node list -2. Calculate new positions based on desired layout (grid, tree, flow, etc.) -3. Call `update_canvas_node` for each node with new `position_x` and `position_y` +Create a new node on the canvas. -**Common layouts:** -- **Horizontal flow:** nodes in a row, X increments by ~500, same Y -- **Grid:** rows of 3–4 nodes, X increments by ~500, Y increments by ~400 per row -- **Tree/hierarchy:** parent centered on top, children spread below +Parameters: +- `node_type` (string, required) — Type of node to create +- `data` (object, required) — Node data matching the type's field schema +- `position_x` (number, optional) — X position. Auto-calculated if omitted. +- `position_y` (number, optional) — Y position. Auto-calculated if omitted. -Example — move a node to a new position: +Example — create a text node: ``` -update_canvas_node(node_id="uuid", position_x=800, position_y=300) +create_canvas_node( + node_type="text", + data={ + "title": "Chapter 1 Outline", + "content": "# Chapter 1\n\nThe story begins...\n\n## Scene 1\n\nThe protagonist appears.", + "tags": ["outline", "chapter1"] + } +) ``` -Example — update both data and position: +Example — create an image node: ``` -update_canvas_node(node_id="uuid", data={"title": "New Title"}, position_x=100, position_y=200) +create_canvas_node( + node_type="image", + data={ + "name": "Hero Portrait", + "description": "Main character, age 18, cheerful personality", + "imageUrl": "/media/generated-image.jpg", + "fitMode": "cover" + } +) ``` ---- - -## Edge Conventions - -Edges connect nodes left-to-right. Always use the standard direction: -- `source_handle`: `"right-source"` (default) -- `target_handle`: `"left-target"` (default) - -Only deviate if the user explicitly requests a different flow direction. +## Tool: update_canvas_node -### Edge Compatibility Matrix (Source → Target) +Update an existing node's data. -This matrix is the **single source of truth** for edge legality. -It is mirrored on the frontend at `frontend/src/lib/canvas/edgeRules.md`. -Both `create_canvas_edge` and the frontend `onConnect` handler MUST validate against it. +Parameters: +- `node_id` (string, required) — ID of the node to update +- `data` (object, required) — Fields to update (partial update supported) -| Source \\ Target | text | image | video | audio | storyboard | -|---|---|---|---|---|---| -| **text** | allow (append/continue) | allow (fill prompt) | allow (fill prompt) | allow (fill lyrics/TTS) | allow (append row / column text) | -| **image** | deferred (OCR/caption) | allow (image-to-image ref) | allow (first-frame / ref) | allow (reference image) | allow (fill media column) | -| **video** | deferred (subtitle) | allow (frame extract) | allow (style/continuation) | deferred (audio extract) | allow (fill media column) | -| **audio** | deferred (ASR) | forbid | allow (voiceover input) | deferred (mix) | allow (fill media column) | -| **storyboard** | allow (flatten rows) | allow (batch generate) | allow (batch generate) | allow (batch generate) | allow (append/merge rows) | - -Legend: -- **allow** — create the edge. -- **forbid** — reject the edge and return an error with reason `"forbidden_type_combination"`. -- **deferred** — phase-1 not supported; reject with reason `"not_supported_yet"` (UI tooltip: "coming soon"). - -### Hard Constraints - -Always reject when any of these hold: -1. Self-loop: `source_node_id == target_node_id`. -2. Duplicate edge: same `(source_node_id, source_handle, target_node_id, target_handle)` already exists. -3. Same-polarity handles: both endpoints are `*-source` or both are `*-target`. -4. Matrix entry is `forbid` or `deferred`. - -### Content Injection Semantics (for reference) - -`create_canvas_edge` itself does NOT perform content injection — that is a frontend UX concern. -However, when planning a workflow, keep the semantics in mind: -- text → image/video: upstream text becomes the downstream generation prompt. -- image → image/video: upstream media URL is appended as a reference image. -- any media → storyboard: URL is written into the matching media column. -- storyboard → image/video/audio: each row triggers one generation task (downstream app logic). - ---- - -## Return Values - -| Tool | Returns | -|------|---------| -| `list_canvas_nodes` | `{count, nodes: [{id, node_type, position: {x, y}, ...key_fields}]}` | -| `get_canvas_node` | Full node object with all fields, position, dimensions | -| `create_canvas_node` | `{success: true, node: {full node object}}` | -| `update_canvas_node` | `{success: true, node: {full node object}}` | -| `delete_canvas_node` | `{success: true, deleted_node_id}` | -| `list_canvas_edges` | `{count, edges: [{id, source_node_id, target_node_id, ...}]}` | -| `create_canvas_edge` | `{success: true, edge: {edge object}}` | -| `delete_canvas_edge` | `{success: true, deleted_edge: {source, target}}` | - ---- - -## Workflow Patterns - -### Creating a set of connected nodes -``` -1. create_canvas_node(type="text", data={...}) → get node_id_A -2. create_canvas_node(type="image", data={...}) → get node_id_B -3. create_canvas_edge(source=node_id_A, target=node_id_B) +Example: ``` - -### Rebuilding canvas (delete all, recreate) -``` -1. list_canvas_nodes() → get all node IDs -2. delete_canvas_node(node_id=...) × N → edges auto-deleted -3. create_canvas_node(...) × N → new nodes -4. create_canvas_edge(...) × M → new connections +update_canvas_node( + node_id="node-uuid-here", + data={"title": "Updated Title", "tags": ["revised"]} +) ``` -### Rearranging existing nodes -``` -1. list_canvas_nodes() → get IDs and current positions -2. update_canvas_node(node_id=..., position_x=, position_y=) × N -``` +## Tool: delete_canvas_node -### Referencing media across nodes -To embed an existing image/video/audio node's media in a storyboard table: -``` -1. get_canvas_node(node_id="image-node-uuid") → extract imageUrl -2. Use that URL as the cell value in storyboard tableData -``` +Delete a node from the canvas. ---- +Parameters: +- `node_id` (string, required) — ID of the node to delete -## Best Practices +## Tips -1. **List before mutate** — always call `list_canvas_nodes` first to understand current state. -2. **Auto-place by default** — omit position unless the user specifies coordinates or requests a layout. -3. **Minimal updates** — only include changed fields in `update_canvas_node`. Never re-send `content` on text nodes unless it changed. -4. **Check edges before connecting** — use `list_canvas_edges` to avoid duplicate connections. -5. **Position is top-level** — pass `position_x`/`position_y` as top-level parameters, not inside `data`. -6. **Batch awareness** — when creating multiple nodes, the system auto-places them in a grid. For custom layouts, create first, then rearrange with update calls. -7. **Node types are restricted** — you can only access node types allowed by the agent configuration. +- Always use `list_canvas_nodes` first to see what exists before creating or modifying. +- When creating nodes, omit position to let the system auto-place them. +- Only include fields you want to change in `update_canvas_node`. +- Node types are restricted by agent configuration — you can only create/access allowed types. diff --git a/backend/skills/active_skills/image_tools/SKILL.md b/backend/skills/active_skills/image_tools/SKILL.md index 97ff576c..c6cc32f0 100644 --- a/backend/skills/active_skills/image_tools/SKILL.md +++ b/backend/skills/active_skills/image_tools/SKILL.md @@ -2,7 +2,7 @@ name: image_tools description: "AI image generation and editing. Provides generate_image and edit_image tools for creating and modifying images." metadata: - builtin_skill_version: "1.2" + builtin_skill_version: "1.1" --- # Image Tools @@ -51,7 +51,7 @@ Edit or generate an image **using one or more reference images as the visual bas - **Character consistency**: User wants to maintain a character's appearance across different scenes or poses. - **Style transfer**: Transform an image into a different art style while preserving the content. - **Inpainting / Partial edit**: Modify a specific region of an image while keeping everything else unchanged (e.g. "change the sofa color", "remove the person in the background"). -- **Multi-image composition**: Combine elements from multiple reference images (up to 10) into a new scene (e.g. "put the dress from image 1 on the person in image 2"). +- **Multi-image composition**: Combine elements from multiple reference images into a new scene (e.g. "put the dress from image 1 on the person in image 2"). - **High-fidelity preservation**: Preserve critical details (face, logo, text) while making other changes. **Key decision rule**: Whenever a reference image exists (from canvas, chat history, or user upload) and the user wants the output to visually relate to it, use `edit_image`. Only use `generate_image` when creating from pure text with no visual reference. @@ -61,7 +61,7 @@ Edit or generate an image **using one or more reference images as the visual bas | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `image_url` | string | No* | Single reference image URL/path (e.g. `/api/media/filename.jpg`). Do NOT pass base64. | -| `image_urls` | string[] | No* | Multiple reference image URLs/paths (up to 10). Use for multi-image composition. | +| `image_urls` | string[] | No* | Multiple reference image URLs/paths (up to 5). Use for multi-image composition. | | `prompt` | string | Yes | Description of the desired output. See [Edit Prompt Patterns](#edit-prompt-patterns) below for templates. | | `aspect_ratio` | string | No | Output aspect ratio. Single-image edit follows input; multi-image defaults to first image (can be overridden). | | `quality` | string | No | Output quality ("standard" or "hd"). Default uses global config. | @@ -126,30 +126,6 @@ The tool returns the edited/generated image URL in markdown format. --- -## Reference Image Categories (Gemini 3) - -Gemini 3 image models support mixing up to 10 reference images in a single `edit_image` call. Reference images fall into two semantic categories, and the model handles each differently: - -| Category | Purpose | Best For | Examples | -|----------|---------|----------|----------| -| **Object reference (high-fidelity)** | Preserve the exact appearance of a specific object | Products, logos, clothing, props, text, branding | "use the bag from image 1", "put this logo on the shirt", "place the product in the scene" | -| **Character reference (consistency)** | Keep a character's identity consistent across scenes/poses | People, creatures, stylized characters — faces, hair, outfits | "the same woman from image 2 now in a forest", "these 3 people making funny faces" | - -**Recommended mix in a single call** (aligned with Gemini 3's internal routing): - -- Up to ~6 object-reference images for high-fidelity embedding (logos, products, outfits to preserve pixel-accurately) -- Up to ~5 character-reference images for identity consistency (faces and features that must stay recognizable) -- Combined total must stay within the `image_urls` cap of 10 - -### Referencing Tips - -- **Be explicit about roles**: in the prompt, refer to each image by order, e.g. *"Take the dress from image 1, the handbag from image 2, and put them on the woman from image 3"*. -- **Declare intent per image**: object-reference → use verbs like "use", "include", "place"; character-reference → use "the same person/character from image X". -- **Group photos of multiple characters**: pass each character's reference separately (e.g. person1.png … person5.png) and describe the scene — e.g. *"An office group photo of these 5 people making funny faces"*. -- **Aspect ratio**: defaults to the first image; pass `aspect_ratio` explicitly to override. - ---- - ## Prompt Writing Guide **Core principle: Describe the scene narratively, don't just list keywords.** @@ -269,7 +245,7 @@ Generate a character reference sheet from a single image. - **Always write prompts in English** for best quality, even when the user speaks another language. - For `edit_image`, use the file path from canvas nodes or previous generations — never paste base64 data. - When multiple images are needed from the same prompt, set `n` parameter on `generate_image` instead of calling multiple times. -- **Multi-image editing**: When the user references multiple images (e.g. "refer to the two character images I sent earlier"), use `image_urls` array. Up to 10 images supported. +- **Multi-image editing**: When the user references multiple images (e.g. "refer to the two character images I sent earlier"), use `image_urls` array. Up to 5 images supported. - **Single vs multi aspect ratio**: Single-image edit always follows the input image's aspect ratio (cannot be overridden). Multi-image edit defaults to the first image but can be overridden with `aspect_ratio`. - **Iterative refinement**: Use each edit output as the input for the next edit to progressively refine — describe only the incremental changes needed. - **Accurate text in images**: To render text in generated images, put the desired text in quotes and describe its placement clearly (e.g. 'the word "HELLO" in bold serif font centered on the banner'). diff --git a/backend/skills/active_skills/music_tools/SKILL.md b/backend/skills/active_skills/music_tools/SKILL.md index dbffbaad..c3256dea 100644 --- a/backend/skills/active_skills/music_tools/SKILL.md +++ b/backend/skills/active_skills/music_tools/SKILL.md @@ -1,8 +1,8 @@ --- name: music_tools -description: "AI music generation. Provides the generate_music tool for creating music clips and full songs from text prompts with structure tags, timestamps, optional lyrics, and image references using Google Lyria 3 models." +description: "AI music generation. Provides the generate_music tool for creating music clips and full songs from text prompts with optional image references using Google Lyria 3 models." metadata: - builtin_skill_version: "1.1" + builtin_skill_version: "1.0" --- # Music Tools @@ -10,9 +10,9 @@ Use this skill when the user asks to create, generate, compose, or produce music Loading this skill activates the `generate_music` tool. -**IMPORTANT**: After loading this skill, you MUST call the `generate_music` tool to perform music operations. Do NOT call `music_tools` directly — it is NOT a tool name. +**IMPORTANT**: After loading this skill, you MUST call the `generate_music` tool to perform music operations. Do NOT call `music_tools` directly - it is NOT a tool name. -**Important:** Music generation is asynchronous and takes 30–120 seconds. The tool returns a task ID immediately; the user will be notified when the result is ready. +**Important:** Music generation is asynchronous and takes 30-120 seconds. The tool returns a task ID immediately; the user will be notified when the result is ready. ## Tool: generate_music @@ -22,117 +22,73 @@ Generate a music clip or full song from a text prompt, with optional reference i - User asks to create, generate, compose, or produce music, a song, an audio track, or a soundtrack. - User wants background music for a video, scene, or project. -- User wants to generate music inspired by reference images (scene → soundtrack). +- User wants to generate music inspired by reference images. ### Parameters | Parameter | Type | Required | Description | |-----------|------|----------|-------------| -| `prompt` | string | Yes | Detailed musical description. Include **genre**, **instruments**, **BPM**, **key/scale**, **mood**, and **structure**. Supports section tags (`[Verse]`, `[Chorus]`, `[Bridge]`, `[Intro]`, `[Outro]`), timestamps (`[0:00-0:30]`), and inline lyrics. For instrumental tracks append `Instrumental only, no vocals`. **Prompt language determines the vocal language.** | -| `output_format` | string | No | Audio output format: `mp3` (default, Clip & Pro) or `wav` (Pro only, 48 kHz stereo lossless). | -| `reference_images` | string[] | No | Up to **10** image URLs for multimodal inspiration (mood, colors, atmosphere). Canvas node URLs (e.g. `/api/media/scene.jpg`) or external URLs. | +| `prompt` | string | Yes | Detailed description of the music to generate. Include genre, mood, instruments, tempo, and any specific musical direction. For songs with lyrics, include the lyrics or lyrical theme in the prompt. | +| `output_format` | string | No | Audio output format: "mp3" (default, Clip & Pro) or "wav" (Pro only, higher quality). | +| `reference_images` | string[] | No | Array of reference image URLs to guide the musical style or mood. Images can be canvas node URLs (e.g. `/api/media/scene.jpg`) or external URLs. | ### Model Capabilities -| Model | Duration | Formats | Notes | -|-------|----------|---------|-------| -| `lyria-3-clip-preview` | Fixed ~30 s clip | MP3 | Fast short clips; lyrics & structure tags supported. | -| `lyria-3-pro-preview` | Full song (~1–2 min, steerable via prompt) | MP3, WAV | Higher quality; best for full songs with verse/chorus structure. | - -### Prompt Writing Patterns - -**1. Instrumental (no vocals)** -``` -Upbeat synthwave instrumental, 120 BPM, A minor, pulsing analog synths, -retro drum machine, warm pads. Instrumental only, no vocals. -``` - -**2. Structured song with lyrics** -``` -Indie folk ballad, 75 BPM, G major, acoustic guitar + soft piano + light strings, melancholic. - -[Intro] -(soft fingerpicked guitar, 8 bars) - -[Verse 1] -Walking through the autumn leaves, -memories fall like rain. - -[Chorus] -And I remember you, -in every shade of blue. - -[Bridge] -(strings swell, drums enter) - -[Outro] -(fade out with piano) -``` - -**3. Timestamp-based structure** -``` -Cinematic orchestral piece, 90 BPM, D minor, epic and mysterious. -[0:00-0:15] Sparse piano motif, distant strings. -[0:15-0:45] Add low brass swell, timpani hits. -[0:45-1:30] Full orchestra climax with choir. -``` +| Model | Duration | Formats | Features | +|-------|----------|---------|----------| +| `lyria-3-clip-preview` | ~30 seconds | MP3 only | Short music clips, fast generation | +| `lyria-3-pro-preview` | Full songs (3-5 min) | MP3, WAV | Full songs with lyrics, higher quality | ### Examples -Short instrumental clip (Clip model): +Generate a short music clip: ``` generate_music( - prompt="Upbeat electronic dance track, 128 BPM, F# minor, pulsing saw-lead synths, heavy sub bass, four-on-the-floor kick. Instrumental only, no vocals." + prompt="An upbeat electronic dance track with pulsing synths, heavy bass drops, and energetic drums at 128 BPM" ) ``` -Full song with structured lyrics (Pro model, lossless): +Generate a full song with lyrics (Pro model): ``` generate_music( - prompt="Melancholic indie folk ballad, 75 BPM, G major, acoustic guitar and soft vocals.\n\n[Verse 1]\nWalking through the autumn leaves,\nmemories fall like rain.\n\n[Chorus]\nAnd I remember you,\nin every shade of blue.\n\n[Outro]\n(fade out)", + prompt="A melancholic indie folk ballad with acoustic guitar and soft vocals. Lyrics: 'Walking through the autumn leaves, memories fall like rain...'", output_format="wav" ) ``` -Soundtrack inspired by a canvas scene: +Generate music inspired by an image: ``` generate_music( - prompt="Cinematic orchestral piece matching this scene — epic and mysterious. [0:00-0:20] Sparse strings. [0:20-1:00] Full orchestra with timpani and choir. Instrumental only, no vocals.", + prompt="Compose a cinematic orchestral piece that matches the mood of this scene - epic and mysterious", reference_images=["/api/media/dark_forest_scene.jpg"] ) ``` ## Tips -- Be **specific** about genre, instruments, BPM, key/scale, and mood — vague prompts produce generic music. -- Use **section tags** (`[Verse]`, `[Chorus]`, `[Bridge]`, `[Intro]`, `[Outro]`) to define song structure. -- Use **timestamps** (`[0:00-0:30]`) for precise transition control; works best with the Pro model. -- Write the prompt in the **language you want the lyrics in** (write Chinese prompt → Chinese vocals). -- For instrumental tracks, explicitly append **`Instrumental only, no vocals`**. -- Use `output_format="wav"` with the Pro model when the user needs lossless / studio-grade audio. -- Reference images influence **mood and atmosphere**, not literal content — best for scene-to-soundtrack. -- Tell the user generation takes **30–120 seconds**; the audio node appears on the canvas when ready. +- Write detailed prompts describing genre, mood, instruments, tempo, and style for best results. +- For songs with lyrics, include the full lyrics in the prompt. The model will generate vocals. +- Use `output_format="wav"` with Pro model for higher audio quality (lossless). +- Reference images can influence the musical mood and style - use scene images for soundtrack generation. +- Music generation is async - inform the user it will take 30-120 seconds. +- The generated audio will automatically be added as an audio node on the canvas. ## Canvas Integration -Generated music is automatically added as an audio node on the active canvas. To use canvas images as references: +Generated music is automatically created as an audio node on the active canvas. To use canvas images as references: -**Step 1**: Discover available image nodes: +**Step 1**: Call `list_canvas_nodes` to discover available image nodes: ``` -list_canvas_nodes(node_type="image") -→ [{id: "uuid-a", name: "Forest Scene"}, ...] +list_canvas_nodes(node_type="image") -> returns [{id: "uuid-a", name: "Scene"}, ...] ``` -**Step 2**: Fetch a node's media URL: -``` -get_canvas_node(node_id="uuid-a") -→ data.imageUrl = "/api/media/scene.jpg" -``` +**Step 2**: Call `get_canvas_node(node_id="uuid-a")` to get the media URL: +- Image nodes -> `data.imageUrl` (e.g. `/api/media/scene.jpg`) -**Step 3**: Pass URL(s) to generate_music (max 10): +**Step 3**: Pass URLs to generate_music: ``` generate_music( - prompt="Compose ambient background music matching this scene's atmosphere. Instrumental only, no vocals.", + prompt="Compose background music matching the atmosphere of this scene", reference_images=["/api/media/scene.jpg"] ) ``` @@ -141,7 +97,6 @@ generate_music( | Error | Meaning | How to Handle | |-------|---------|---------------| -| Safety filter triggered | Content violates safety policies | Tell the user the prompt was rejected. Suggest rephrasing without sensitive content. | -| API timeout | Generation took too long | Inform the user and suggest retrying with a simpler prompt. | -| Empty response | Model returned no audio | Suggest simplifying the prompt or reducing structural complexity. | -| `wav` not supported | `output_format="wav"` requested on Clip model | Retry with `output_format="mp3"`, or switch to the Pro model. | +| Safety filter triggered | Content violates safety policies | Tell the user the prompt was rejected due to content policy. Suggest rephrasing. | +| API timeout | Generation took too long | Inform the user and suggest retrying. | +| Empty response | Model returned no audio | Suggest simplifying or rephrasing the prompt. | diff --git a/backend/skills/active_skills/video_tools/SKILL.md b/backend/skills/active_skills/video_tools/SKILL.md index 1ee42bab..e1bcccaf 100644 --- a/backend/skills/active_skills/video_tools/SKILL.md +++ b/backend/skills/active_skills/video_tools/SKILL.md @@ -210,7 +210,7 @@ When the video generation API returns an error, the error message will be passed | Error Code | Meaning | How to Handle | |---|---|---| -| `InputImageSensitiveContentDetected.PrivacyInformation` | Input image contains a real person's face | **STOP immediately.** Tell the user: "The image contains a real person's face, which is rejected by the platform's content safety policy." Then call `list_virtual_human_presets` to offer preset virtual humans as alternatives. Do NOT retry with the same image. | +| `InputImageSensitiveContentDetected.PrivacyInformation` | Input image contains a real person's face | **STOP immediately.** Tell the user: "The image contains a real person's face, which is rejected by the platform's content safety policy. Please use a non-real-person image (e.g. AI-generated, illustrated, or cartoon style) and try again." Do NOT retry with the same image. | | `InputImageSensitiveContentDetected` | Input image has sensitive content | **STOP immediately.** Tell the user the image was rejected due to content policy. Do NOT retry. | | Other 400 errors | Various API validation failures | Tell the user the specific error and suggest corrections. Do NOT blindly retry more than once. | @@ -220,88 +220,6 @@ When the video generation API returns an error, the error message will be passed 3. Suggest alternatives (e.g. use AI-generated character images instead of real photos) 4. Wait for the user to provide new input before trying again -## Virtual Human Presets (Seedance 2.0 Only) - -Seedance 2.0 **does NOT support uploading real human face images/videos** — they will be rejected by content safety review. To generate realistic human videos, use the platform's **preset virtual human library**. - -### Tool: list_virtual_human_presets - -Lists available virtual human presets with their asset URIs. - -**Parameters:** - -| Parameter | Type | Required | Description | -|-----------|------|----------|-------------| -| `gender` | string | No | Filter by gender: "male" or "female". Omit to list all. | -| `style` | string | No | Filter by style tag (e.g. "realistic"). Omit to list all. | - -### When to Use - -When the user asks to generate a video featuring a **realistic human character** using Seedance 2.0: - -1. Call `list_virtual_human_presets` to show available virtual humans -2. Present the options to the user (show preview images and descriptions) -3. After the user selects a virtual human, use its `asset_uri` (e.g. `asset://asset-20260401123823-6d4x2`) as an element in `reference_images` - -### Usage Example - -``` -# Step 1: List available virtual humans -list_virtual_human_presets(gender="female") - -# Step 2: User selects one, then generate video -generate_video( - prompt="图片1中的女生面带笑容,向镜头介绍图片2中的产品,说'这款面霜真的超好用',自然光线,近景镜头", - video_mode="reference_images", - reference_images=["asset://asset-20260401123823-6d4x2", "/api/media/product.jpg"], - duration=8, - aspect_ratio="9:16" -) -``` - -### Important Rules - -- **Asset URI format**: Always use `asset://` — do NOT modify or truncate the asset ID. -- **Prompt referencing**: In the prompt, reference virtual humans using the standard numbering format (图片1, 图片2, etc.) based on their position in the `reference_images` array. **NEVER use asset IDs in the prompt.** -- **Combining with other images**: You can mix virtual human asset URIs with regular image URLs in `reference_images`. For example, `reference_images=["asset://...", "/api/media/product.jpg"]` — the virtual human is 图片1, the product is 图片2. -- **Content safety**: Virtual human presets are pre-approved and will NOT trigger face detection rejection, unlike uploaded real human photos. - -## Script and Storyboard Workflow - -When the user provides a script file or storyboard description, follow these principles: - -### Core Principles -- **One script generates one video by default** (not one video per shot) -- Describe the complete shot sequence in a single `generate_video` call through the prompt - -### Multi-Shot Prompt Construction -Combine multiple shots into a coherent prompt using shot transition descriptions: - -``` -[Opening shot] → [Transition/camera movement] → [Middle shot] → [Transition/camera movement] → [Ending shot] - -Example: -"A woman walks into a coffee shop (opening shot), - the camera follows her as she approaches the counter (tracking shot), - she smiles and orders a latte (close-up), - camera pulls back to show the bustling cafe atmosphere (wide shot)" -``` - -### When to Split Into Multiple Videos - -| Scenario | Approach | Notes | -|----------|----------|-------| -| Continuous action in same time/space | **Single video** | Use prompt to describe shot transitions | -| Different scenes/time periods | **Multiple videos** | One video per scene, concatenate later with `edit_video` | -| Need precise control per shot parameters | **Multiple videos** | Generate separately then concatenate | - -### Strict Script Adherence -- **Do NOT add content outside the script** — only generate what is explicitly described -- **Do NOT expand the plot** — even if the script is short, do not add your own storyline -- **Maintain character/scene consistency** — use `reference_images` to provide character/scene references - ---- - ## Canvas Node → Multimodal Reference Workflow Canvas nodes are identified by UUIDs. To use canvas media as multimodal references: diff --git a/backend/skills/builtin_skills/Cinematic_Camera_Language/SKILL.md b/backend/skills/builtin_skills/Cinematic_Camera_Language/SKILL.md new file mode 100644 index 00000000..9d608d2c --- /dev/null +++ b/backend/skills/builtin_skills/Cinematic_Camera_Language/SKILL.md @@ -0,0 +1,199 @@ +--- +name: Cinematic_Camera_Language +description: "50 professional cinematic camera techniques with AI prompt templates. Provides expert camera movement and shot descriptions to enhance visual quality and narrative tension when generating images or videos. Applicable to image generation, video generation, and storyboard design." +metadata: + builtin_skill_version: "1.0" +--- + +# Cinematic Camera Language (50 Shot Types) + +**Purpose**: Inject professional camera language into image/video generation prompts to enhance cinematic quality, narrative tension, and visual impact. + +## When to Use + +- User requests cinematic / blockbuster-style images or videos +- User describes specific camera movement needs (e.g. "push in", "orbit shot", "overhead view") +- Designing camera motion for storyboards +- User wants to elevate visual expressiveness + +## Core Principles + +1. **One motion per shot**: Use only one primary camera movement per shot/clip to avoid conflicts +2. **Serve the narrative**: Camera language should serve emotion and storytelling, not show off +3. **Scene matching**: Automatically match the most suitable shot type to the content +4. **Prompt integration**: Weave camera descriptions naturally into generation prompts rather than stacking keywords + +## Shot Type Quick Reference + +### I. Dolly / Track Shots (Depth Movement) + +| # | Shot | Effect | Best For | +|---|------|--------|----------| +| 1 | Slow Dolly In | Enhances depth, amplifies emotion | Solitude, contemplation, rising tension | +| 2 | Slow Dolly Out | Reveals vast environment | Loneliness, insignificance, full reveal | +| 3 | Crash Dolly In | Compresses space, instant urgency | Critical moments, confrontation | + +### II. Zoom & Focus Techniques + +| # | Shot | Effect | Best For | +|---|------|--------|----------| +| 4 | Hitchcock Zoom (Dolly Zoom) | Background warps, vertigo/unease | Fear, epiphany, psychological distortion | +| 5 | Macro Zoom | Seamless macro-to-micro transition | Detail close-ups, sci-fi traversal | +| 6 | Extreme Seamless Zoom | Single-take from space to ground | Epic openings, spatial leap | +| 11 | Rack Focus Reveal | Bokeh to sharp, suspense reveal | Mysterious character entrance | +| 12 | Pull Focus (Focus Shift) | Guides viewer's gaze | Dialogue, subject priority switch | +| 24 | Smooth Optical Push (Static) | Subject enlarges, depth compresses | Portrait close-up, emotional focus | +| 25 | Smooth Optical Pull (Static) | Reveals environment, centered subject | Environment reveal, spatial expansion | +| 26 | Snap Zoom (Crash Zoom) | Ultra-fast impact, shock | Key twists, emotional explosion | + +### III. Pan & Tilt (Fixed Position, Lens Rotates) + +| # | Shot | Effect | Best For | +|---|------|--------|----------| +| 13 | Tilt Up | Feet to face, builds presence | Character entrance, authority | +| 14 | Tilt Down | Reveals full-body styling | Fashion, character showcase | +| 15 | Dolly Left | Parallax enhances depth | Lateral scene reveal | +| 16 | Dolly Right | Parallax enhances depth | Lateral scene reveal | +| 33 | Whip Pan (Swish Pan) | Motion blur, rapid transition | Scene switch, pacing acceleration | + +### IV. Orbit & Arc Movements + +| # | Shot | Effect | Best For | +|---|------|--------|----------| +| 17 | 180° Semi-Orbit | Front to back, intensifies emotion | Isolation, contemplation, atmosphere | +| 18 | 360° Speed Orbit | Background streaks into light trails | Combat, epic moments | +| 19 | Slow Arc | Gentle side-profile reveal | Quiet contemplation, elegance | +| 42 | Bullet Time | Time freezes + orbit continues | Peak action freeze-frame | + +### V. Vertical & Crane Movements + +| # | Shot | Effect | Best For | +|---|------|--------|----------| +| 20 | Vertical Descent | Sinking perspective, immersion | Oppression, character focus | +| 21 | Vertical Ascent | Opens space, uplifting emotion | Awakening, empowerment | +| 22 | Crane Up + Pull Back | Small scene to grand vista, epic | Grand reveal, finale elevation | +| 23 | Crane Down | God's eye to eye-level | Opening descent, locking on subject | +| 41 | Crane Sweep | Mid-height lateral sweep | Battle formations, crowds, panorama | + +### VI. Drone / Aerial Shots + +| # | Shot | Effect | Best For | +|---|------|--------|----------| +| 27 | High Flyover | Overhead vast landscape | Nature scenery, travel | +| 28 | Epic Reveal (Rise + Tilt Down) | Ascending reveal of grand vista | Spectacular opening | +| 29 | Wide Orbit | Emphasizes massive environment scale | Landmarks, natural wonders | +| 30 | Top-Down (God's Eye) | 90° overhead + slow rotation | Ritual, fate, ceremony | +| 31 | FPV Dive | High-speed building-face descent | Intense action, thrill | + +### VII. Tracking & POV Shots + +| # | Shot | Effect | Best For | +|---|------|--------|----------| +| 7 | Over-the-Shoulder (OTS) | Layered dialogue framing | Conversation, interrogation | +| 35 | Reverse Tracking | Locks on subject's face while retreating | Determined walk, dialogue | +| 36 | Following Shot (Behind) | Immersion, suspense | Trailing, mystery | +| 37 | Parallel Side Track | Natural side-angle tracking | Walking, running | +| 38 | First-Person POV (Walk) | Walking sway, deep immersion | Exploration, horror, empathy | +| 32 | Documentary Handheld | Micro-shake, breathing feel, raw | Documentary, realism | +| 39 | SnorriCam (Body-Mount) | Subject static, background shakes wildly | Panic, intoxication, intense focus | + +### VIII. Special Angles & Perspective + +| # | Shot | Effect | Best For | +|---|------|--------|----------| +| 8 | Fisheye Lens | Distorted peephole feel | Surveillance, tension, claustrophobia | +| 9 | Foreground Reveal (Lateral Wipe) | Subject emerges from obstruction | Suspense reveal, character entrance | +| 10 | Through-Shot (Passthrough) | Camera passes through narrow space | Space transition, entering new world | +| 34 | Dutch Angle (Canted Frame) | Tilted horizon, imbalance | Mental breakdown, madness | +| 45 | Worm's Eye View (Extreme Low) | Subject appears enormous | Giants, intimidation, power | + +### IX. Speed & Time Manipulation + +| # | Shot | Effect | Best For | +|---|------|--------|----------| +| 43 | Slow Motion (High-Speed) | Stretches an instant, magnifies detail | Explosions, splashes, emotional peak | +| 44 | Hyperlapse | Extreme time compression with spatial movement | Cityscapes, day-night cycle, flowing clouds | + +### X. Optical & Lens Effects + +| # | Shot | Effect | Best For | +|---|------|--------|----------| +| 40 | Z-Axis Roll (Barrel Roll) | Zero-gravity, dreamlike feel | Dreams, weightlessness, surreal | +| 46 | Reflection Reveal | Shows reflection first, then reality | Poetic transition, mood | +| 47 | Split Shot (Over-Under) | Above and below water simultaneously | Ocean, contrasting narratives | +| 48 | Tilt-Shift Lens | Miniature model effect | City overhead, whimsy | +| 49 | Split Diopter | Near and far subjects both sharp | Tension standoff, dual narratives | +| 50 | Anamorphic Lens Flare | Horizontal blue streak flare | Blockbuster feel, light impact | + +--- + +## Prompt Template Library (By Scenario) + +### Emotional Focus (Character Inner World) +``` +The camera slowly dollies in toward a person sitting alone at a dining table, warm window light behind them; as the camera advances, the room's depth unfolds, amplifying the subtle emotional shifts of solitude. +``` + +### Environment Reveal (Small to Grand) +``` +Starting from a couple standing side by side, the crane smoothly rises and pulls back, gradually revealing the endless beach stretching beneath their feet. +``` + +### Tense Confrontation (Interrogation / Dialogue) +``` +Framed over the blurred shoulder of the detective, inside a dim interrogation room, harsh overhead light falls on the suspect, strong chiaroscuro emphasizes the subject, paired with shallow depth of field to intensify the atmosphere. +``` + +### Dynamic Impact (Action / Chase) +``` +An FPV drone dives at extreme speed along the facade of a city skyscraper, racing toward the runner about to reach the building's base. +``` + +### Epic Opening (Grand Reveal) +``` +At sunrise, the drone rises steadily from behind a mountain ridge while tilting down, unveiling the majestic valley panorama, settling on a solitary figure standing at the cliff's edge. +``` + +### Time Freeze (Bullet Time) +``` +Time instantly freezes — raindrops hover in mid-air — the camera orbits at normal speed through a full 360° around a black-clad warrior mid-flying-kick, showcasing this intensely dynamic moment from every angle. +``` + +### Surreal (Dream / Weightlessness) +``` +The camera pushes smoothly forward down a narrow hotel corridor while the frame slowly rotates clockwise 180°, ceiling and floor gradually inverting, creating an Inception-like sense of broken gravity. +``` + +### Micro Traversal +``` +A human eye stares directly into the lens without blinking. (Macro zoom penetrates from the iris into microscopic cellular structure.) +``` + +--- + +## Integration with generate_image / generate_video + +When weaving camera language into prompts: + +1. **Image generation**: Describe the "frozen frame" camera state (angle, composition, perspective) + - Example: `Extreme low worm's-eye view, a giant mech strides forward, its boot nearly crushing the lens, cyberpunk skyscrapers tower behind` + +2. **Video generation**: Describe camera motion trajectory and timeline + - Example: `Starting from a high overhead shot of the city plaza, the crane descends smoothly, finally settling on the lone traveler waiting at the center` + +3. **Never combine conflicting motions**: Only one primary movement type per shot + +## Quick Intent Matching Guide + +| User Intent | Recommended Shots | +|-------------|-------------------| +| "Cinematic / blockbuster feel" | Hitchcock Zoom, Crane Up, Anamorphic Flare | +| "Tense / suspenseful" | Crash Dolly, Dutch Angle, Fisheye, Foreground Reveal | +| "Lonely / insignificant" | Dolly Out, Top-Down, Wide Orbit | +| "Powerful / majestic" | Tilt Up, Worm's Eye, 360° Orbit | +| "Gentle / beautiful" | Slow Arc, Rack Focus Reveal, Slow Dolly In | +| "Dynamic / thrilling" | FPV Dive, Whip Pan, Bullet Time | +| "Epic / spectacular" | Crane Up, Epic Reveal, Extreme Seamless Zoom | +| "Realistic / documentary" | Handheld, First-Person POV | +| "Surreal / dreamlike" | Z-Axis Roll, Tilt-Shift, Split Shot | +| "Time manipulation" | Slow Motion, Hyperlapse, Bullet Time | diff --git a/backend/skills/builtin_skills/Cinematic_Camera_Language/references/full_50_shots.md b/backend/skills/builtin_skills/Cinematic_Camera_Language/references/full_50_shots.md new file mode 100644 index 00000000..3299b42f --- /dev/null +++ b/backend/skills/builtin_skills/Cinematic_Camera_Language/references/full_50_shots.md @@ -0,0 +1,205 @@ +# 50 Cinematic Camera Techniques — Full Reference + +Each shot includes: technical description + AI prompt example. + +--- + +## 1. Slow Dolly In (Track Forward) +**Technique**: Camera moves smoothly forward on a track, physically approaching the subject, enhancing spatial depth. +**Prompt**: At dusk, the camera slowly dollies in toward a person sitting alone at a dining table, warm window light behind them; as the camera advances, the room's depth gradually unfolds, amplifying the subtle emotional shifts of solitude. + +## 2. Slow Dolly Out (Track Backward) +**Technique**: Camera pulls back smoothly on a track, shrinking the subject's proportion to reveal the environment. +**Prompt**: In twilight, an astronaut stands motionless on a barren alien plain, the camera slowly pulls back, gradually revealing the vast desolation beneath a dark purple sky. + +## 3. Crash Dolly In (Rush Push) +**Technique**: Camera charges forward on track toward the subject's face, compressing space, instantly maximizing urgency. +**Prompt**: A rain-soaked alley at night, sirens echoing, neon reflections flickering on the wet pavement, the camera crashes in, locking onto the determined detective's face. + +## 4. Hitchcock Zoom (Dolly Zoom / Vertigo Effect) +**Technique**: Camera moves backward while lens zooms in — subject stays the same size but background perspective warps dramatically, creating vertigo and unease. +**Prompt**: The character stands frozen in absolute terror, the camera moves backward while zooming forward, the background warps and compresses violently, yet the character remains the same size throughout — suffocating cinematic tension. + +## 5. Macro Zoom +**Technique**: Extreme macro focus transition, seamlessly shifting from a human portrait into microscopic cellular structures. +**Prompt**: A human eye stares directly into the lens, unblinking throughout. + +## 6. Extreme Seamless Zoom +**Technique**: Single continuous zoom from deep space through cloud layers and city blocks down to a ground-level scene. +**Prompt**: Single uninterrupted zoom shot starting from Earth in deep space, piercing through cloud layers and city towers, finally settling smoothly onto a lively sidewalk café terrace. + +## 7. Over-the-Shoulder Shot (OTS) +**Technique**: Camera positioned behind a secondary character's shoulder, foreground blurred, focusing on the facing subject — standard for dialogue and interrogation. +**Prompt**: Framed over the blurred detective's shoulder, inside a dim interrogation room, harsh overhead light falls on the suspect, strong chiaroscuro emphasizes the subject with shallow depth of field intensifying the atmosphere. + +## 8. Fisheye Lens (Peephole / Surveillance View) +**Technique**: Ultra-wide-angle distortion, center bulging with edges curving — simulates peeping or surveillance, inherently tense. +**Prompt**: Fisheye lens framing — a nervous man stands alone in a dimly lit apartment hallway, walls bending abnormally toward the frame edges, harsh overhead light, lens slightly smudged, recreating the oppressive realism of surveillance footage. + +## 9. Foreground Reveal (Lateral Wipe) +**Technique**: Opening frame blocked by foreground, camera moves laterally to progressively reveal the subject — builds suspense. +**Prompt**: An empty underground parking garage, dense concrete pillars, harsh overhead light, the camera slides laterally from behind a pillar, gradually revealing a figure half-hidden in shadow, leaning against the wall. + +## 10. Through-Shot (Passthrough) +**Technique**: Camera passes smoothly through a window, doorway, or narrow gap, seamlessly transitioning to the other side. +**Prompt**: The camera passes through a narrow window into a sun-drenched courtyard, finally settling on a woman standing quietly at the center of the frame. + +## 11. Rack Focus Reveal +**Technique**: Opens fully defocused with soft bokeh, lens slowly racks into focus, subject gradually sharpens. +**Prompt**: A late-night café — opening on nothing but circular bokeh light spots, the lens slowly racks into focus, finally settling sharply on a solitary man with an unreadable expression, shallow depth of field throughout, maximum atmospheric tension. + +## 12. Pull Focus (Focus Shift) +**Technique**: Foreground sharp, background blurred — focus smoothly shifts to redirect the viewer's gaze. +**Prompt**: A dining table scene — opening with a foreground water glass in sharp focus, the tense conversation in the background completely blurred; then focus smoothly shifts, the glass softens while the background characters and dialogue come into crisp clarity. + +## 13. Tilt Up +**Technique**: Fixed camera position, lens tilts vertically upward from feet to face — builds commanding presence. +**Prompt**: City street lined with skyscrapers, the camera starts at polished leather shoes, slowly tilts up, finally locking on the sharply dressed figure's face radiating authority. + +## 14. Tilt Down +**Technique**: Fixed camera position, lens tilts vertically downward — reveals full-body styling. +**Prompt**: Inside a minimalist studio, the model gazes directly into the lens, the camera tilts down from face, fully revealing the entire outfit down to the shoes. + +## 15. Dolly Left (Lateral Track Left) +**Technique**: Camera moves horizontally left, foreground-background parallax enhances spatial dynamics. +**Prompt**: At sunset, a woman stands on a balcony overlooking layered city views, bathed in warm light, the camera tracks smoothly left, strong parallax between foreground and background enhances depth. + +## 16. Dolly Right (Lateral Track Right) +**Technique**: Camera moves horizontally right, foreground-background parallax enhances spatial dynamics. +**Prompt**: At sunset, a woman stands on a balcony overlooking layered city views, the camera tracks smoothly right, the frame filled with cinematic depth and dynamic movement. + +## 17. 180° Semi-Orbit +**Technique**: Camera arcs around the subject in a half-circle, transitioning from front to back — intensifies emotion. +**Prompt**: In an endless desert, the camera orbits a man with his head slightly bowed, completing a 180° arc from front to back, amplifying the weight of his solitude. + +## 18. 360° Speed Orbit +**Technique**: Camera rapidly orbits the subject 360°, background streaks into light trails — intensely dynamic. +**Prompt**: Inside a circular arena, a gladiator stands firm at center, the camera whips around in a full 360° orbit, surrounding lights streaking into flowing trails. + +## 19. Slow Arc +**Technique**: Camera moves in a gentle arc, small-range orbit revealing the subject's profile — atmosphere is tender and tranquil. +**Prompt**: A forest clearing in soft backlight, the camera moves in a slow arc, gradually revealing a woman gazing off-frame, lost in thought. + +## 20. Vertical Descent (Camera Lowers) +**Technique**: Camera body descends vertically — sinking perspective, deeper immersion. +**Prompt**: Inside a minimalist modern office, the camera descends smoothly from eye level, settling on a standing figure slowly exhaling. + +## 21. Vertical Ascent (Camera Rises) +**Technique**: Camera body rises vertically — opens space, lifts emotional tone. +**Prompt**: On a crowded street, the camera rises smoothly from waist height, the frame gradually opens up, amplifying the character's sense of awakening and empowerment. + +## 22. Crane Up + Pull Back (High Reveal) +**Technique**: Crane rises and pulls back — small scene expands to grand environment, creating epic scale. +**Prompt**: Starting from a couple standing side by side, the crane smoothly rises and pulls back, gradually revealing the endless beach stretching beneath their feet. + +## 23. Crane Down (Descending Lock) +**Technique**: Crane descends from high altitude to eye level — transitions from god's-eye to subjective view. +**Prompt**: Starting from a high overhead shot of a city plaza, the crane descends smoothly, finally settling on the lone traveler waiting at the center. + +## 24. Smooth Optical Zoom In (Static Position) +**Technique**: Camera stays fixed, only the lens zooms in — compresses depth of field, subject fills frame. +**Prompt**: Soft-lit interior, background is blurred light spots, the camera holds position while smoothly zooming in, the character's calm face gradually filling the entire frame. + +## 25. Smooth Optical Zoom Out (Static Position) +**Technique**: Camera stays fixed, only the lens zooms out — reveals environment while subject remains centered. +**Prompt**: Opening on a centered seated figure, the camera holds position while smoothly zooming out, gradually revealing the complete spacious room surrounding them. + +## 26. Snap Zoom (Crash Zoom) +**Technique**: Instantaneous extreme zoom — smashes into the subject's eyes with maximum impact force, expressing epiphany or shock. +**Prompt**: Background completely still and chaotic, the lens snap-zooms directly into the character's eyes at the instant of epiphany. + +## 27. Drone High Flyover +**Technique**: High-altitude drone flies steadily forward, passing over the subject, surveying open landscapes. +**Prompt**: Rolling hills bathed in soft morning light, a small group walks slowly, the high-altitude drone flies steadily forward, fully revealing the expansive mountain wilderness. + +## 28. Drone Epic Reveal +**Technique**: Drone rises from behind an obstruction while tilting down — instantly reveals a grand panorama. +**Prompt**: At sunrise, the drone rises steadily from behind a mountain ridge while tilting down, unveiling the majestic valley panorama, settling on a solitary figure standing at the cliff's edge. + +## 29. Wide Drone Orbit +**Technique**: Drone orbits the subject at distance — emphasizes the massive scale of the environment. +**Prompt**: Wide-angle drone orbit capturing a landmark building at the center of a vast natural landscape, maximizing the sense of scale and awe. + +## 30. Top-Down (God's Eye) +**Technique**: 90° vertical overhead shot, optionally with slow rotation — strong sense of ritual and fate. +**Prompt**: Drone shoots vertically downward with slow rotation, capturing a lone figure wearing a red cloak at the center of an ancient circular stone plaza, concentric ring carvings on the ground, thin mist drifting, light and shadow flowing. + +## 31. FPV Dive +**Technique**: High-speed dive along a building facade — explosive dynamic energy. +**Prompt**: An FPV drone dives at extreme speed along the facade of a city skyscraper, racing toward the runner about to reach the building's base. + +## 32. Documentary Handheld +**Technique**: No stabilizer, retaining micro-shake and breathing rhythm — raw, present, authentic. +**Prompt**: Handheld camera follows the subject, capturing candid conversation in a real interior setting, using only natural available light throughout, no staged lighting. + +## 33. Whip Pan (Swish Pan) +**Technique**: Ultra-fast horizontal pan with heavy motion blur — rapid scene transition. +**Prompt**: The camera whip-pans left at extreme speed, using flowing motion blur to transition instantly to a new subject in the adjacent space. + +## 34. Dutch Angle (Canted Frame) +**Technique**: Tilted frame with skewed horizon — creates imbalance, tension, madness. +**Prompt**: Dutch angle framing — a character losing emotional control inside a narrow, oppressive interior, amplifying the sense of psychological imbalance. + +## 35. Reverse Tracking Shot +**Technique**: Camera retreats in sync with the walking subject — locks on the subject's front, stable tracking. +**Prompt**: The character strides determinedly through a long corridor, the camera retreats in sync, maintaining fixed distance and composition throughout. + +## 36. Following Shot (From Behind) +**Technique**: Camera follows behind the subject moving forward — strong sense of immersion and suspense. +**Prompt**: An oppressive forest trail, the character walks forward alone, the camera follows closely from behind. + +## 37. Parallel Side Track +**Technique**: Camera moves parallel to the subject, tracking their profile — standard for walking scenes. +**Prompt**: The camera moves parallel, tracking a woman walking briskly from left to right on a city street — natural and authentic. + +## 38. First-Person Walking POV +**Technique**: First-person perspective retaining natural walking sway and bob — maximum immersion. +**Prompt**: First-person perspective with natural slight walking sway, moving through a lived-in room, glimpses of hair, hat brim, or shadow at frame edges hint at the character's presence. + +## 39. SnorriCam (Body-Mount) +**Technique**: Camera rigidly mounted on the actor's body (typically facing their face) — subject stays absolutely still in frame while background shakes violently. Used for vertigo, panic, intoxication, or extreme focus. +**Prompt**: Camera fixed to the runner's chest, the character's face and upper body remain absolutely still at center frame while the neon-lit streets behind them shake violently and rush backward — conveying extreme panic and psychological derailment. + +## 40. Z-Axis Roll (Barrel Roll) +**Technique**: Camera rotates 360° along the Z-axis (the direction the lens points) — creates weightlessness or dream-breaking unreality. +**Prompt**: The camera pushes smoothly forward down a narrow hotel corridor while the frame slowly rotates clockwise 180°, ceiling and floor gradually inverting, creating an Inception-like sense of broken gravity. + +## 41. Crane Sweep +**Technique**: Crane moves laterally at mid-height (no rise/fall), sweeping from left to right — used for battle formations, crowds, or wide terrain. +**Prompt**: At medium height, the camera sweeps rapidly from left to right like a crane across an ancient battlefield formation, passing rows of spear-bearing soldiers — majestic and imposing. + +## 42. Bullet Time +**Technique**: Time appears to freeze or move extremely slowly in the frame while the camera travels at normal speed through 3D space around the subject. +**Prompt**: Time instantly freezes — raindrops hover in mid-air — the camera orbits at normal speed through a full 360° around a black-clad warrior mid-flying-kick, showcasing this intensely dynamic moment from every angle. + +## 43. Slow Motion (High-Speed Camera) +**Technique**: Shot at extremely high frame rate, stretching a brief instant, magnifying detail and emotional impact. +**Prompt**: 1000fps extreme slow motion, close-up of a water-filled glass dropping onto a marble floor and shattering — water splashes bloom like crystals in slow motion, fragments hover and scatter in mid-air. + +## 44. Hyperlapse +**Technique**: Camera moves through large distances in space while time is extremely compressed — shows rapid passage of time with spatial movement. +**Prompt**: Hyperlapse — the camera rushes forward along a busy city boulevard, clouds churn like flowing water overhead, day and night alternate in seconds, pedestrians and vehicles dissolve into streaming light trails. + +## 45. Worm's Eye View (Extreme Low Angle) +**Technique**: Ultra-low angle near the ground shooting upward — makes the subject appear enormous and imposing. +**Prompt**: Extreme low worm's-eye perspective, looking up at a giant in heavy mech armor striding forward, the massive boot nearly crushing the lens, cyberpunk skyscrapers tower into the clouds behind — overwhelming presence. + +## 46. Reflection Reveal +**Technique**: First captures a reflection in water/mirror/glass, then camera tilts up or moves to reveal the real physical world. +**Prompt**: Close-up on a rain puddle reflecting an inverted cyberpunk neon street scene with pedestrians; then the camera slowly tilts up, transitioning from the water surface to the real bustling nightscape above. + +## 47. Split Shot (Over-Under) +**Technique**: Half the lens above water, half below — simultaneously shows the calm surface and the activity beneath. +**Prompt**: Split shot — the upper half shows a sunny tropical sea surface with a small boat, the lower half reveals the deep blue underwater world with a massive shark silently approaching, waterline ripples across the center of frame. + +## 48. Tilt-Shift Lens +**Technique**: Tilted optical axis creates an extremely narrow band of focus with heavy blur above and below — makes the real world look like a miniature model. +**Prompt**: Tilt-shift lens overhead shot of a busy city intersection, strong blur at top and bottom edges, only the crossing pedestrians and yellow taxis in the center strip remain sharp — creating a fascinating miniature toy diorama effect. + +## 49. Split Diopter +**Technique**: Breaks physical depth-of-field limits — both an extreme foreground subject and a distant background subject remain absolutely sharp simultaneously. +**Prompt**: High-tension split diopter shot — left foreground is a close-up of the protagonist's terrified profile, right background shows the villain slowly pushing open a door to enter the room — both near and far planes maintain razor-sharp focus. + +## 50. Anamorphic Lens Flare +**Technique**: Shot with anamorphic widescreen lenses — when hitting bright light sources, produces signature horizontal blue streak flares (JJ Abrams / Michael Bay style). +**Prompt**: Anamorphic widescreen lens shot — inside a dim hangar, a sports car suddenly switches on its blindingly bright headlights aimed directly at the lens, instantly producing a dramatic horizontal blue streak flare across the frame. diff --git a/backend/skills/builtin_skills/Image_Prompt_Optimizer/SKILL.md b/backend/skills/builtin_skills/Image_Prompt_Optimizer/SKILL.md new file mode 100644 index 00000000..7721f397 --- /dev/null +++ b/backend/skills/builtin_skills/Image_Prompt_Optimizer/SKILL.md @@ -0,0 +1,463 @@ +--- +description: 通用图像生成提示词工程化优化技能。专注于提示词本身的写作规范与质量提升,提供工作流、八大核心要素、8 类图像场景模板库(全景图/三视图/产品图/概念图/立绘/漫画/极简设计/插画)与编辑模板库。不涉及任何工具调用、API 参数、具体模型选择等执行层细节。 +metadata: + builtin_skill_version: '2.2' +name: Image_Prompt_Optimizer +--- + +# Image Prompt Optimizer + +**IMPORTANT**: 本技能专注于**图像提示词的写作规范与工程化质量**,输出纯文本提示词文案。本技能**不包含工具调用、API 参数、具体模型能力差异、Provider 选择等执行层描述**——这些内容存放于各工具专用的 skill 中。 + +## 角色定位 + +你是图像生成提示词工程化专家。你的首要任务是拦截用户"形容词堆砌""仅一句话需求"的低质量提示词,将它们引导和重写为高质量的工程化提示词(叙事化语言、八大要素、场景模板、防崩约束)。 + +## 核心工作流 + +当用户输入粗略需求、提供参考素材,或**仅提出图像生成需求(如"帮我画一个赛博朋克街道全景")**时,按以下步骤执行: + +### Step 0: 需求分析与启发式提问 + +当用户仅给出高维度想法(如"我要一张场景图""画个角色")时,**主动进入引导模式**,通过提问帮助用户丰满细节,切忌直接生编硬造: + +1. **询问图像类型**:是场景图(21:9 影视级宽画幅)/ 全景图(360 度)/ 角色立绘 / 产品图 / 概念图 / 海报 / 漫画? +2. **询问核心要素**:基于八大要素引导用户补充信息。 + *示例*:"关于这个赛博朋克街道全景,您可以补充:1. 时间是白天/黄昏/深夜?2. 视野中心是什么(一个角色/一个建筑/一辆车)?3. 镜头视角(平视/俯视/仰视)?4. 是否有参考图?" +3. **收集足够信息后转入 Step 1**。 + +### Step 1: 意图与场景判定 + +1. **生成类型判定**(提示词层,不论具体工具名): + - **全新生成**:纯文本无参考素材 + - **图像编辑**:有参考图 → 进一步判定属于哪种编辑模式(局部修改/风格迁移/角色置入新场景/多图合成/高保真细节迁移/草图细化/360 度一致性扩展) +2. **图像场景类型判定**(决定使用哪个场景模板): + - **影视级宽画幅场景图(21:9)** ← 影视场景设计、场景初稿、全景图的前置设计稿、视频首帧 + - **全景图(360 度等距柱状投影 / 2:1)** ← VR/全景节点、沉浸式环境、360 度环绕浏览场景 + - **角色三视图**:正/侧/背三视,立绘标准 + - **产品广告图**:电商/商业摄影 + - **概念美术图**:游戏/影视前期 concept art + - **IP 立绘 / 海报**:单角色全身/半身、海报排版 + - **漫画分格 / 分镜插画**:故事化分格 + - **极简设计 / 负空间**:背景图、品牌物料 + - **风格化插画 / 表情包**:贴纸、icon + +### Step 2: 参考素材语义化梳理 + +1. **参考素材清点**:当用户提供多张参考图时,按出现顺序梳理为参考图 1、参考图 2…并向用户确认**每张图的语义角色**(角色形象 / 服装 / 产品 / 场景 / 风格基调 / 字体 / 构图参考)。 +2. **自然语言指代**:在最终提示词中,用语义化措辞引用参考图: + - 单图编辑:`"using the provided image"` / `"the reference photo"` + - 多图合成:`"the woman from the first reference image"` / `"the dress from the second reference image"` +3. **语义角色确认**:当多图未明确语义角色时(如:谁是主体、谁是元素来源),向用户提问要求明确,避免生成结果自由发挥。 +4. **写实人脸预检**:若参考图含可辨识真人面部,部分生成体系可能拦截或质量下降,需提醒用户改用风格化处理。 + +### Step 3: 要素审查与多选交互确认 + +1. 检查用户提示词是否包含**八大核心要素**: + - **主体(Subject)**:谁/什么是主体? + - **动作 / 表情(Action)**:在干什么?什么神态? + - **场景 / 环境(Setting)**:在哪?时间/天气? + - **光影色调(Lighting)**:什么光线?什么色温? + - **镜头 / 构图(Camera)**:什么视角?什么景别?什么焦段?— **必须使用专业的电影镜头/构图术语**(如 低角度仰拍、过肩拍摄、希区柯克变焦、升格摄影、三分法构图、黄金分割等) + - **视觉风格(Style)**:写实/插画/动漫/油画/3D…? + - **画质参数(Quality)**:分辨率/质感(8k、ultra-detailed、photorealistic) + - **约束条件(Constraints)**:防崩兜底(如"无穿模、五官清晰、构图稳定") + +2. **检查潜在冲突**: + - 风格冲突(如同时要"写实摄影"和"卡通风格") + - 视角冲突(如同时要"俯拍"和"低角度仰拍") + - 焦段冲突(如同时要"广角"和"长焦") + +3. **【关键:拒绝静默修改】**:发现要素缺失或冲突时,**必须**通过"多选检视意见交互"向用户展示具体建议,让用户选择: + + *多选交互模板示例:* + > 我收到了您的输入。检测到以下建议,请选择您接受的部分: + > 1. 【建议明确】场景中是黄昏还是深夜? + > 2. 【建议补充】视野中心放主角还是建筑? + > 3. 【风格冲突】当前提示词同时要求"写实摄影"和"赛博朋克霓虹",建议统一为"赛博朋克写实摄影"。 + > + > [多选框]: + > - [ ] 接受建议1,设定为:黄昏 + > - [ ] 接受建议2,设定为:主角作为视野中心 + > - [ ] 接受风格统一,设定为:赛博朋克写实摄影 + > - [ ] 其他修改(请补充) + +### Step 4: 结构化重写输出 + +按以下三大模块结构化输出: + +#### 优化后提示词 +(包含严格的**三段论**结构) +1. **全局基础设定**: + - 锁定主体、场景、风格基调 + - 多参考素材时用语义化措辞引用(如 "the character from the first reference image, the outfit from the second reference image"),**严禁使用 @图N 标记** +2. **主体提示词(英文叙事化)**: + - 按 *主体 → 动作 → 场景 → 光影 → 构图 → 风格 → 画质* 顺序展开 + - 每一层用完整句子叙述,不堆砌关键词 + - 镜头/构图必须使用专业电影术语 +3. **画质、风格与约束**:自动挂载画质增强(`8k, ultra-detailed, sharp focus`)与防崩兜底约束。 + +#### 优化问题 +针对原始提示词,指出存在的缺陷(要素缺失、冲突、关键词堆砌、英文表达不准确、误用 @图N 标记等)。 + +#### 画面语义补充建议(提示词层表达) +- 在提示词文案中明确画幅意图(如 `"360 degree equirectangular panorama, 2:1 aspect ratio"`、`"square 1:1 e-commerce composition"`、`"vertical 9:16 portrait framing"`、`"ultra-wide 21:9 establishing shot"`)——让生成体系从语义上理解构图。 + +**核心原则清单(内置原则库)**: +- **叙事化优先原则**:完整段落叙述 > 关键词堆砌 +- **参考图语义化原则**:使用自然语言描述参考图角色 +- **镜头语言专业化原则**:构图与镜头使用专业电影术语,拒绝口语化表述 +- **场景模板套用原则**:识别场景类型后必须套用对应模板,不要自由发挥 +- **语义负面替代原则**:用"安静空旷的街道"替代"没有人的街道"("no/without" 类指令理解差) +- **兜底强制原则**:必须挂载防崩约束与高画质词 + +## 图像场景类型模板库(9 大类) + +在 Step 4 重写时,根据 Step 1 判定的场景类型套用以下模板作为骨架,再叠加八大要素与镜头术语。 + +> 💡 **典型协作工作流**:21:9 影视场景图常作为视觉初稿先生成(建立场景的构图、光影、色调、风格);然后基于场景概念用 360 度全景模板扩展为环绕版本,用于沉浸式浏览或作为视频生成的首帧。**两者使用不同的提示词结构,不可混用**。 + +### 1. ⭐ 影视级宽画幅场景图(21:9) + +**适用**:影视场景设计、游戏地图初稿、长画幅环境叙事、**全景图的前置场景设计稿**、视频生成的首帧。 + +**核心要点**: +- **画幅语义**:21:9 超宽画幅是影视场景图的核心,提示词文案中需明确"ultra-wide 21:9 establishing shot"等措辞 +- **构图法则**: + - **三分法**:地平线/视野中心放在画面 1/3 或 2/3 处 + - **引导线**:道路、河流、建筑边缘形成视觉引导 + - **纵深层次**:前景 + 中景 + 远景三层,避免单层平铺 + - **视野中心(Focal Point)**:明确一个吸引视线的主体(角色 / 建筑 / 光源 / 异常元素) +- **可拼接性**:场景设计图常用于后续拼接为 360 度全景或视频,画面边缘避免突兀切割(如人物半截、文字断裂) +- **场景叙事节奏**:从一端到另一端的视觉故事(如左侧静谧、中段冲突、右侧远景留白) + +**核心模板**: +``` +A cinematic ultra-wide 21:9 establishing shot of [scene description]. +The composition follows the rule of thirds with [focal point] positioned at +[1/3 left | center | 2/3 right] of the frame. + +Foreground: [foreground element, close to camera, sharp detail]. +Mid-ground: [main subject / focal point, the visual anchor]. +Background: [distant element, atmospheric depth, soft focus]. + +Lighting: [time of day, light direction, color temperature, e.g. +"golden-hour sunlight raking from camera-left, warm amber tones, +long shadows stretching across the ground"]. + +Camera: [shot type, e.g. "low-angle wide-angle lens, slight tilt to enhance +depth, anamorphic 2.39:1 framing"]. + +Atmosphere: [mood, weather, particles, e.g. "volumetric mist drifting between +buildings, dust motes catching the light, quiet tension"]. + +Style: [photorealistic / matte painting / concept art / Studio Ghibli / +cyberpunk neon] with [rendering technique, e.g. "Unreal Engine 5 cinematic +render, hyper-detailed textures"]. + +Technical: 8k ultra-detailed, sharp focus across all three depth layers, +no cropped subjects at frame edges, seamless horizontal continuity, +panoramic composition. +``` + +**完整示例**(赛博朋克街道): +``` +A cinematic ultra-wide 21:9 establishing shot of a rain-soaked cyberpunk +street at midnight. The composition follows the rule of thirds with a +lone hooded figure positioned at 1/3 left of the frame, walking toward +the vanishing point in the right distance. + +Foreground: glistening wet asphalt with neon reflections of pink and cyan +holographic billboards. Mid-ground: the hooded figure silhouetted against +a row of noodle stalls and steam-belching food carts. Background: towering +megacorp skyscrapers fading into purple haze, flying drones with red blinking +lights crossing between buildings. + +Lighting: dominant cool cyan-magenta neon from signage, warm orange spill +from food stalls creating intimate pools of light, volumetric haze making +every light beam visible. + +Camera: low-angle wide-angle lens at 24mm equivalent, slight upward tilt +to emphasize building height, anamorphic horizontal lens flares from neon. + +Atmosphere: heavy drizzle, steam from food vents, faint cherry blossom +petals drifting through frame, melancholic urban solitude. + +Style: photorealistic cinematic, Blade Runner 2049 color grading, +Unreal Engine 5 hyper-realistic render with subsurface rain effects. + +Technical: 8k ultra-detailed, sharp focus across foreground figure and +mid-ground stalls, soft atmospheric falloff in background, no cropped +subjects at frame edges, seamless horizontal continuity for panoramic use. +``` + +**影视场景图禁忌**: +- ❌ 不要把主体居中(21:9 居中会浪费两侧大量空间) +- ❌ 不要堆砌过多焦点(一个主焦点 + 一个次焦点足够) +- ❌ 不要写"a wide shot"就完事,必须明确 21:9 + 三分构图 + 三层纵深 +- ❌ 不要在画面边缘放重要元素(拼接/裁切会丢失) +- ❌ 场景图(21:9)与全景图(360 度)模板**不可混用**:21:9 是影视级横构图,360 度是球面环绕 + +### 2. ⭐ 全景图(360 度等距柱状投影 / 2:1) + +**适用**:全景节点、VR/AR 沉浸式场景、360 度环绕浏览(可在全景查看器中拖动环视)。 + +> ⚠️ **本套提示词是生成全景图的关键**——任何全景图需求都必须使用以下结构,关键术语不可替换、不可简化、不可删减。 + +**核心模板**: +``` +360 degree equirectangular panorama, seamless spherical projection, +2:1 aspect ratio, [主体描述]. The environment wraps fully 360 degrees +with consistent lighting and no visible seams. Style: photorealistic, +cinematic lighting, ultra detailed, 8K resolution +``` + +**关键术语解析**(缺一不可): +- `360 degree equirectangular panorama` ——声明全景投影类型,是生成体系识别全景的核心信号 +- `seamless spherical projection` ——强调球面无缝展开 +- `2:1 aspect ratio` ——等距柱状投影的标准比例(**与 21:9 影视宽画幅完全不同**) +- `wraps fully 360 degrees` ——强调环绕完整、首尾相接 +- `consistent lighting and no visible seams` ——防止首尾接缝处出现光影断层 +- `photorealistic, cinematic lighting, ultra detailed, 8K resolution` ——画质保障 + +**主体描述填法**: +- 将场景主体填入模板中部(如 `spaceship cockpit interior`、`medieval tavern at dusk`、`alien jungle with bioluminescent plants`) +- 可以包含光影、氛围、风格细节,但**不要描述固定镜头方向**(全景图无单一取景角度) +- **严禁**出现 `"left side"`、`"foreground"`、`"画面右侧"`、`"frame edge"` 等方位指代——全景图没有左右边缘 + +**完整示例**(飞船驾驶舱): +``` +360 degree equirectangular panorama, seamless spherical projection, +2:1 aspect ratio, futuristic spaceship cockpit interior with curved +holographic displays surrounding the captain's chair, soft blue-cyan +ambient lighting from instrument panels, view of distant stars through +front viewport, clean metallic surfaces with subtle reflections. +The environment wraps fully 360 degrees with consistent lighting and +no visible seams. Style: photorealistic, cinematic lighting, ultra +detailed, 8K resolution +``` + +**全景图禁忌**: +- ❌ 不要使用 `21:9`、`ultra-wide`、`cinematic establishing shot` 等横画幅术语(那是影视级横构图,不是全景图) +- ❌ 不要省略 `equirectangular` 关键字——这是生成体系识别全景投影的核心 +- ❌ 不要描述固定取景方向(如 `low-angle`、`over-the-shoulder`、`rule of thirds`) +- ❌ 主体描述中不要出现"画面左侧"、`foreground`、`frame edge` 等指代(全景没有边缘) +- ❌ 不要用 `panoramic shot`、`wide shot` 等模糊术语替代 `360 degree equirectangular panorama` + +### 3. 角色三视图(Character Sheet) + +**适用**:游戏角色、IP 设计、3D 建模参考。 + +**模板**: +``` +A professional character reference sheet showing [character description] +in three views: front view, right side view (90° profile), and back view. +Neutral standing pose with arms slightly away from body. Pure white +background. Consistent proportions, lighting, and color palette across +all three views. Character design sheet style, soft even studio lighting, +no shadows, full body visible from head to feet. +``` + +### 4. 产品广告图(Commercial Photography) + +**适用**:电商主图、品牌广告、产品 PR。 + +**模板**: +``` +A high-resolution, studio-lit commercial product photograph of +[product description]. Set on a [background surface, e.g. "polished black +marble" / "soft cream linen" / "floating in mid-air with depth-of-field +gradient"]. Three-point softbox lighting setup with [key light direction] +to [purpose, e.g. "highlight the curved bottle silhouette and create a +gentle gradient on the label"]. [Camera angle, e.g. "slight 15° tilt +above eye level"]. Ultra-realistic, sharp focus on [key detail, e.g. +"the embossed brand logo and condensation droplets"]. Color palette: +[brand colors]. Aspect ratio [1:1 / 4:5 / 3:4]. +``` + +### 5. 概念美术图(Concept Art) + +**适用**:游戏/影视前期视觉开发、世界观设计。 + +**模板**: +``` +[World/scene name] concept art, [genre, e.g. "dark fantasy" / "post- +apocalyptic sci-fi" / "ethereal high fantasy"]. [Subject of focus, e.g. +"a lone knight standing before a cathedral of crystalline trees"]. +Painterly digital matte painting style, dramatic chiaroscuro lighting, +[color palette, e.g. "muted teal and burnt orange complementary scheme"], +visible brushwork. Composition: [composition technique]. Atmosphere: +[mood, environmental storytelling details]. Inspired by [reference artist +or studio, e.g. "Jakub Rozalski" / "Studio Trigger" / "Frazetta"]. +8k, intricate environmental details, story-rich background elements. +``` + +### 6. IP 立绘 / 海报 + +**适用**:单角色全身/半身展示、宣传海报。 + +**立绘模板**: +``` +Full-body character illustration of [character description] in a [pose +description, e.g. "powerful three-quarter stance, weight on back foot, +weapon held diagonally across body"]. [Style, e.g. "Genshin Impact-style +anime illustration" / "cel-shaded with bold linework"]. [Background, e.g. +"transparent background" / "soft gradient background, character separated +from background by subtle rim lighting"]. Detailed costume rendering with +[material details, e.g. "metal armor reflections, cloth folds, jewelry +sparkle"]. Eye-level camera, full-body framing with slight headroom. +``` + +**海报模板**(含文字): +``` +A cinematic movie poster for "[title]". Central image: [main visual]. +Title "[exact title text]" rendered in [font style, e.g. "bold serif +weathered metallic gold"] positioned at [bottom center / top]. Tagline +"[tagline]" in smaller [font] below title. Color grading: [palette]. +Aspect ratio 2:3 (poster standard). 8k, professional theatrical poster +composition. +``` + +### 7. 漫画分格 / 分镜插画 + +**适用**:分镜脚本可视化、漫画创作。 + +**模板**: +``` +A [n]-panel comic page in [style, e.g. "Japanese seinen manga" / +"American superhero ink"]. Panel layout: [layout description, e.g. +"3 horizontal strips, 2 panels each"]. + +Panel 1 [size]: [scene description]. Camera: [shot type]. +Dialogue: "[dialogue]". + +Panel 2 [size]: [scene description]. ... + +Consistent character design across all panels, dynamic panel transitions +following [eye-flow direction, left-to-right top-to-bottom for Western, +right-to-left for manga]. Black ink linework with [shading style, e.g. +"halftone screentones" / "dramatic shadow blocks"]. +``` + +### 8. 极简设计 / 负空间 + +**适用**:网站背景、营销物料、品牌简约视觉。 + +**模板**: +``` +A minimalist composition featuring a single [subject] positioned at the +[bottom-right / top-left / golden ratio point] of the frame. Vast empty +[color] background creating significant negative space (approximately 80% +of frame). Soft, subtle [lighting direction] casting a delicate shadow. +[Optional: a single accent element at opposite corner for visual balance]. +Clean, uncluttered, breathable composition. +``` + +### 9. 风格化插画 / 表情包 + +**适用**:贴纸、icon、表情包、UI 装饰。 + +**模板**: +``` +A [style, e.g. "kawaii cartoon" / "flat vector" / "3D clay render"] +sticker of [subject], featuring [key characteristics, e.g. "oversized +sparkly eyes, blushing cheeks, holding a tiny coffee cup"] and a +[color palette, e.g. "pastel pink and mint green"]. [Line style, e.g. +"thick bold black outlines" / "no outlines, soft gradient edges"] and +[shading style, e.g. "cel-shading with hard shadows" / "soft airbrush +gradients"]. The background must be pure white (or transparent for +sticker use). Centered composition, full subject visible. +``` + +## 图像编辑模板库 + +当用户提供参考图并需要对图像进行修改时,使用以下编辑模板。 + +### Pattern 1: 局部修改 / Inpainting +保留原图大部分内容,仅修改特定元素: +``` +Using the provided image, change only the [specific element] to [new +description]. Keep everything else exactly the same, preserving the +original style, lighting, composition, and all other details. +``` + +### Pattern 2: 风格迁移 +保留构图但改变艺术风格: +``` +Transform the provided photograph of [subject] into the artistic style +of [target style / artist]. Preserve the original composition and subject +identity, but render it with [stylistic elements description]. +``` + +### Pattern 3: 角色置入新场景 +将参考图中的角色放入新环境: +``` +The same [character description from reference] from the reference image, +now [action / pose] in [new environment]. Preserve the character's +[key features to keep, e.g. "facial features, hair color, outfit details"]. +[Style and technical details, lighting, camera angle]. +``` + +### Pattern 4: 多图合成 +组合多张图的元素: +``` +Create a new image by combining elements from the provided images. Take +the [element from image 1] and [action] with the [element from image 2]. +The final image should be [description of final scene]. Adjust lighting +and shadows to create a cohesive, naturally integrated result. +``` + +### Pattern 5: 高保真细节保留 +关键细节(人脸/logo/文字)必须像素级保留: +``` +Using the provided image(s), [edit description]. Ensure that +[critical element, e.g. "the woman's face, hair, and skin tone"] remains +completely unchanged, pixel-perfect identical to the reference. The +[modified element] should [integration description, e.g. "appear naturally +printed on the fabric, following the cloth folds and lighting"]. +``` + +### Pattern 6: 草图细化 +将草图/线稿转为成品图: +``` +Turn this rough [sketch / line art] of [subject] into a [target style, +e.g. "photorealistic 8k photograph" / "polished anime illustration"]. +Keep the [specific features from sketch, e.g. "pose, composition, +character proportions"] but add [new details, e.g. "realistic skin +texture, fabric materials, environmental lighting"]. +``` + +### Pattern 7: 360 度角色一致性 +迭代生成角色不同角度: +``` +Generate the same [character description] from a [angle, e.g. +"three-quarter back view"]. Maintain consistent appearance with the +provided reference image(s) — same outfit, same hairstyle, same proportions, +same color palette. [Pose / action description]. +``` + +## 强制约束 + +- **拒绝静默修改**:未与用户确认前,不要自动猜测并填充缺失要素或修改冲突。 +- **强制兜底**:最终提示词必须包含防崩约束(`sharp focus`, `no cropped subjects`, `consistent proportions`)与高画质词(`8k ultra-detailed`)。 +- **英文优先原则**:默认所有最终提示词写为英文(跨体系最稳);仅在中式国漫/仙侠等中文语境强相关场景下可保留关键中文词。 +- **严禁 @图N / @视频N 标记**:图像提示词中不使用这类引用语法,多图引用一律使用自然语言(如 `"the character from the first reference image"`)。 +- **专业镜头术语强制**:构图、视角、焦段、光线方向必须使用专业电影术语,禁止使用 `"a nice angle"`、`"good lighting"` 等模糊表达。 +- **全景图(360 度等距柱状投影)特殊约束**:提示词必须包含 *`360 degree equirectangular panorama` + `seamless spherical projection` + `2:1 aspect ratio` + `wraps fully 360 degrees` + `consistent lighting and no visible seams`* 五个关键术语,缺一不可。主体描述中严禁出现 `left/right side`、`foreground`、`frame edge` 等方位指代。 +- **影视场景图(21:9)特殊约束**:提示词必须包含 *三分构图 + 三层纵深(前/中/远景) + 视野中心明确 + 边缘无截断*,主体不要居中。 +- **语义负面替代原则**:用 `"an empty desolate street"` 替代 `"no cars on the street"`;用 `"clean uncluttered desk"` 替代 `"desk without items"`。 +- **职责边界原则**:本技能输出中不出现具体模型名、Provider 名、工具参数名、API 签名与调用顺序。这些信息存放于各工具专用 skill。 + +## 常见错误与避坑指南 + +在 Step 3 要素审查阶段依据以下清单对提示词进行体检,发现问题在多选交互中列出: + +1. **关键词堆砌**:`"fisherman, dock, sunset, oil painting, 8k"` 这种关键词列表会被误解为"无关元素并列"。必须改为完整段落叙述。 +2. **模糊表达**:`"美一点"`、`"好看的角度"`、`"那种感觉"`,必须替换为专业电影镜头/构图术语。 +3. **指令冲突**:风格冲突(写实+卡通)、视角冲突(仰+俯)、焦段冲突(广角+长焦)—— 多选交互中必须让用户选定一种。 +4. **画幅语义未在 prompt 中表述**:提示词文案中应明确画幅语义(全景图用 `"360 degree equirectangular panorama, 2:1"`;横版用 `"21:9 ultra-wide"`;方图用 `"square 1:1"`),让生成体系从语义上理解构图。 +5. **参考素材无语义角色**:上传了 N 张参考图,每一张都必须在 prompt 中用自然语言点明语义角色("角色形象""服装""背景")。 +6. **写实人脸滥用**:部分生成体系对真人脸敏感,如需写实人物建议改为"风格化角色"避免拦截。 +7. **全景图使用错误术语**:生成 360 度全景时误用 `21:9`、`ultra-wide`、`panoramic shot` 等横画幅术语;反之,生成影视级场景图时误用 `equirectangular` 语法。两者术语体系不可混用。 +8. **多图合成无主次声明**:多图合成时必须用 prompt 明确"哪张是主体、哪张是元素来源",否则生成结果会自由发挥。 +9. **负面提示用 "no/without"**:生成体系对否定指令理解差,改用"语义负面替代"(描述目标状态而非排除项)。 diff --git a/backend/skills/builtin_skills/Seedance_Prompt_Optimizer/SKILL.md b/backend/skills/builtin_skills/Seedance_Prompt_Optimizer/SKILL.md new file mode 100644 index 00000000..ded6ce01 --- /dev/null +++ b/backend/skills/builtin_skills/Seedance_Prompt_Optimizer/SKILL.md @@ -0,0 +1,414 @@ +--- +description: Seedance 2.0 多模态AI视频生成模型专用的提示词优化技能。当用户提供视频生成提示词、多媒体素材(图片/视频/音频),或明确请求优化提示词时调用。提供三段式结构、八大核心要素、多模态参考控制框架与 12 类场景模板库;镜头术语词典依赖 `Cinematic_Camera_Language` skill 协同提供。本技能专注于工程化语法规范,将粗略描述重写为分镜台本级精度的提示词。 +metadata: + builtin_skill_version: '2.1' +name: Seedance_Prompt_Optimizer +--- + +# Seedance 2.0 Prompt Optimizer + +**IMPORTANT**: 本技能是提示词优化器,**不是**视频生成工具。优化完成后,请调用 `video_tools` 技能中的 `generate_video` 或 `edit_video` 实际生成视频。 + +## 协同 Skill 依赖 + +本技能与以下 skill 形成明确职责分工,**使用时请同时启用**: + +| Skill | 角色 | 提供内容 | +|---|---|---| +| **Seedance_Prompt_Optimizer**(本技能) | 语法规范层 | 工作流、三段式结构、@引用语法、八大要素、场景模板、防崩约束 | +| **Cinematic_Camera_Language** | 镜头术语词典 | 50 种专业镜头枚举、场景-镜头映射表、完整镜头参考手册 | +| **video_tools** | 执行生产层 | `generate_video` / `edit_video` 实际调用 Seedance API | + +**分工原则**:本技能负责“怎么造句”(提示词工程化框架),Cinematic_Camera_Language 负责“用什么词”(镜头术语权威词典)。下文内嵌的最小词汇表仅为核心语法层术语,进阶镜头请查阅 Cinematic_Camera_Language 词典。 + +## 角色定位 +你是 Seedance 2.0 多模态 AI 导演和提示词优化专家。你的首要任务是拦截用户"纯文案堆砌形容词"的低质量提示词,并基于《Seedance 2.0 提示词工程化优化框架》将它们引导和重写为分镜台本级精度的工程化提示词(三段式结构、八大核心要素、多模态参考控制、电影镜头语言、场景模板库)。 + +## Seedance 2.0 模型能力规格 + +### 输入支持矩阵 +| 输入类型 | 数量上限 | 支持格式 | 大小限制 | +|---|---|---|---| +| 图片 | ≤ 9 张 | jpeg、png、webp、bmp、tiff、gif | 每张 < 30 MB | +| 视频 | ≤ 3 个 | mp4、mov | 每个 < 50 MB,总时长 2–15s | +| 音频 | ≤ 3 个 | mp3、wav | 每个 < 15 MB,总时长 ≤ 15s | +| 文本 | 自然语言提示词 | — | — | +| **总文件数** | **≤ 12 个** | — | — | + +### 输出参数 +- 生成时长:4–15 秒(按需选择,建议与提示词复杂度匹配) +- 自带音效/配乐能力(在提示词中显式指导音频) +- 视频分辨率:480p(640×640)至 720p(834×1112) + +### 系统硬性约束(拦截规则) +- **不支持写实真人脸部素材**(图片和视频均不可),系统会自动拦截 → 必须在 Step 2 提前识别并劝阻用户。 +- **有参考视频时生成费用略高** → 在交付前提醒用户。 +- **总文件数超 12 个时**必须协助用户裁剪,优先保留对画面或节奏影响最大的素材。 + +### @ 引用系统全用途映射表(核心语法) +Seedance 2.0 通过 `@` 来指定每个素材的用途,**这是提示词撰写最关键的部分**。务必明确说明**每个引用的作用**: + +| 用途分类 | 标准写法示例 | +|---|---| +| 首帧约束 | `@图1 作为首帧` | +| 尾帧约束 | `@图2 作为尾帧` | +| 人物形象 | `参考 @图1 的人物形象` | +| 场景/背景 | `场景参考 @图3` | +| 运镜复刻 | `参考 @视频1 的运镜效果` | +| 动作编排 | `参考 @视频1 的动作编排` | +| 特效/转场 | `完全参考 @视频1 的特效和转场` | +| 节奏/节拍 | `视频节奏参考 @视频1` | +| 音色/语气 | `旁白音色参考 @视频1` | +| 背景音乐 | `背景BGM参考 @音频1` | +| 音效采样 | `音效参考 @视频3 的音效` | +| 服装参考 | `穿着 @图2 的服装` | +| 产品外观 | `产品细节参考 @图3` | +| 字体/文字 | `字体参考 @图2 的字体` | + +**多引用组合范例**:`@图1 的人物作为主体,参考 @视频1 的运镜和动作编排,背景BGM参考 @音频1,场景参考 @图2`。 + +## 核心工作流 +当用户输入粗略的提示词、提供多模态素材(图片/视频),或**仅仅提出视频生成需求(如"帮我生成一个狗跑的视频")**时,请严格按照以下步骤执行: + +### Step 0: 需求分析与启发式提问(仅当用户只提供需求而无具体提示词时) +当用户仅提供了一个高维度的想法或需求(例如:"我想做一段赛博朋克风格的视频"或"生成一个女孩跳舞的视频"),你必须**主动进入引导模式**,通过提问帮助用户丰满细节,切忌直接生编硬造: +1. **询问核心要素**:基于"八大核心要素"引导用户补充信息。 + *示例提问*:"关于这个女孩跳舞的视频,您可以补充几个细节吗?比如:1. 女孩的外貌特征和穿着?2. 跳舞的场景是在哪里(赛博朋克街道/古典舞台)?3. 您有参考图片(@图1)提供给我吗?" +2. **收集信息后转入常规流程**:当用户回复了足够的信息后,再进入下述的 Step 1 及后续步骤。 + +### Step 1: 意图与场景判定 +1. 判定生成类型:是"全新生成"还是"视频编辑(增删改接)"。 +2. 判定场景动态:是"文戏(需微操化,如情绪细节)"还是"武戏(保留大动态,配合参考素材)"。 + +### Step 2: 元素自检与素材映射(自动解析) +1. **多模态素材自动映射**:根据用户提供的素材在输入中出现的**先后顺序(从 1 开始)**,自动为它们分配 `@图1`, `@图2` 或 `@视频1` 等标准代号。 + - 画布图像节点 → 通过 `get_canvas_node` 获取 `data.imageUrl` → 按传入顺序编号为 图片1/图片2 + - 画布视频节点 → 通过 `get_canvas_node` 获取 `data.videoUrl` → 按传入顺序编号为 视频1/视频2 + - 编号规则与 `generate_video` 工具的数组顺序一致:`reference_images[0]=图片1`, `reference_videos[0]=视频1` +2. **用途明确化**:上传了 N 张图片,每一张都必须用 `@` 标注清楚用途(参考上方 *@ 引用系统全用途映射表*)。**不允许出现未被 @ 引用的孤立素材**。 +3. **长图/九宫格确认**:询问用户上传的素材是否为长图或九宫格。拆分为单图后再使用。 +4. **映射逻辑确认**:当存在多图但未明确映射逻辑时(如:谁是左边谁是右边,谁是首帧谁是尾帧),向用户提问并要求明确。 +5. **硬性约束预检**(拦截优先): + - **写实人脸检查**:若用户上传的图片/视频含可辨识的真人面部,立即提醒“Seedance 2.0 不支持写实真人脸部素材”,并提供替代方案(改用插画风格、转为荷兰画、马赛克处理等)。 + - **总文件数检查**:计算 `图片数 + 视频数 + 音频数 ≤ 12`。超过上限时与用户协商裁剪优先级。 + - **总时长检查**:参考视频总时长 ≤ 15s、参考音频总时长 ≤ 15s,超出需提醒裁剪。 + - **费用预告**:若使用了 `@视频N`,在交付提示词前提醒“含参考视频生成费用略高”。 + +### Step 3: 要素审查与多选交互确认 +1. 检查用户的提示词是否包含以下"八大核心要素": + - 精准主体(谁?) + - 动作细节(在干什么?) + - 场景环境(在哪?) + - 光影色调(什么氛围?) + - 镜头运镜(怎么拍?)— **必须使用下方"电影镜头语言词汇库"中的专业术语** + - 视觉风格(什么画风?) + - 画质参数(清晰度要求?) + - 约束条件(兜底防崩要求) + +#### 电影镜头语言词汇库(镜头运镜专用) +在描述"镜头运镜"时,必须优先从以下标准化词汇库中选用专业术语,避免使用"拍一下""拍得好看点"等模糊表述。词汇库按功能划分为五大类: + +**A. 镜头角度类(Camera Angle)——决定观众心理代入** +- `高角度俯拍`:镜头俯视被摄体,弱化角色,营造渺小感、孤立感或俯瞰全局的叙事视角。 +- `水平角度`:镜头与被摄体视线齐平,中性客观视角,常用于日常叙事。 +- `低角度仰拍`:镜头仰视被摄体,强化被摄体的体量与气势,常用于英雄登场、权威人物。 +- `地面角度`:镜头贴近地面拍摄,特殊视角,服务于特定构图需求(如脚步特写、地面爬行)。 + +**B. 镜头关系类(Camera Relation)——决定空间叙事** +- `过肩拍摄`(OTS, Over-The-Shoulder):从一方角色的肩部越过,拍摄对面角色,建立双方空间关系,对话场景标配。 +- `主观视角`(POV, Point-of-View):镜头代表角色的眼睛,使观众与角色感同身受。 +- `广角镜头`:大视野焦距,交代环境全貌,强化空间纵深与透视。 +- `长焦镜头`:压缩空间,削弱透视效果,背景虚化、突出主体。 + +**C. 景别分类(Shot Size)——决定信息密度** +- `定场镜头 / 建置镜头`(Establishing Shot):常用于开头或转场,介绍故事整体环境与地理关系。 +- `大全景镜头`(Extreme Long Shot):景别极大,人物极小,展现宏大环境规模。 +- `中全景镜头`(Medium Long Shot):拍摄至人物膝盖位置,兼顾人物肢体与部分环境。 +- `中景镜头`(Medium Shot):拍摄至人物腰部,对话场景常用。 +- `中近景 / 近景`(Medium Close-Up):拍摄至人物胸部以上,强调表情与情绪。 +- `特写镜头`(Close-Up):突出面部/手部/物件细节,强化情感。 +- `微距镜头`(Macro Shot):极近距离拍摄细微细节,用于质感/纹理/微小物件。 + +**D. 特殊效果类(Special Shot)——决定氛围与节奏** +- `虚化镜头`:大光圈浅景深效果,主体清晰、背景柔化。 +- `空镜头`:无主要人物的画面,万能空境,用于转场、情绪渲染或诗意留白。 +- `固定镜头`:机位完全不动,客观、稳定、平静的视觉感受。 + +**E. 运动镜头类(Camera Movement)——决定时间与动态** +- `跟拍镜头`:镜头跟随主体移动,细分为 `前跟`(主体面向镜头后退拍摄)、`侧跟`(镜头与主体并排)、`后跟`(镜头在主体身后跟随),保持主体相对位置稳定。 +- `延时摄影`:时间压缩效果,表现长时段变化(日出日落、人流)。 +- `抽帧摄影`:丢帧处理制造节奏变化效果,常用于打斗或紧张节奏。 +- `升格摄影`:高帧率拍摄后常速播放,产生慢动作效果,强调瞬间情绪。 +- `降格拍摄`:低帧率拍摄后常速播放,产生快进效果。 +- 其余常规运镜:`推镜头`、`拉镜头`、`摇镜头`、`移镜头`、`升镜头`、`降镜头`、`环绕镜头`(Orbit)、`手持镜头`(Handheld)。 + +**F. 进阶运镜与镜头手法(Advanced Shots)——详见 `Cinematic_Camera_Language` 词典** + +本区仅保留 Seedance 工程化中高频使用、且与语法规范强相关的几项进阶镜头,全集50 项进阶手法(包括子弹时间、鱼眼、身体安装镜头、分相器、镜子倒影、Tilt-Shift 等)请调用 `Cinematic_Camera_Language` skill 获取完整词典。 + +- `希区柯克变焦`(Hitchcock Zoom ≈ Cinematic #4):推拉 + 变焦反向补偿,产生背景扭曲、主体不变的眩晕效果;适用于“心理冲击”、“顿悟瞬间”。 +- `跳切变焦`(Jump Zoom ≈ Cinematic #26):省略变焦过程直接切换景别,推式强调主体、拉式强调环境;反复跳切产生节奏韵律感。 +- `两极镜头`(Extreme Cut):直接从远景/全景切到特写/近景,制造冲击力、惊吓感或紧张刺激感。 +- `反打镜头`(Shot/Reverse Shot):对话场景 A、B 双方正反切换,分 `内反打` 与 `外反打`;外反打前推可逐渐转为内反打。 +- `定格镜头`(Freeze Frame):画面瞬间凝固,常用于人物介绍、转场或结束。 + +> 其余进阶手法(前推揭示、角色推进、后跟镜头、侧跟镜头、主观镜头进阶、跟焦、建置镜头进阶、镜像镜头、传声镜头、升高跟进、手持摇晃镜头、鱼眼镜头、Tilt-Shift、分相器、Anamorphic Flare 等) → 调用 `Cinematic_Camera_Language` 词典。 + +**G. 场景叙事镜头模板(Scene Templates)——特定场景专用镜头组合** +- `追逐镜头`(Chase Shot):必须交代 ①追逐者与被追者的空间关系(大景别);②双方状态细节(小景别);③画面运动感效果(如侧跟 + 广角 + 升格切换)。 +- `打斗镜头`(Fight Shot):可组合以下手法—— + - `反应镜头`:A 打 B 后接 B 的受击反应。 + - `借位长焦`:用长焦制造 B 被打到的错觉。 + - `摇晃快切`:让人看不清但感觉激烈。 + - `强化结果`:慢放关键一击或多重抓取。 + - `武器移动`:从武器起拍移动到持武器角色的脸。 + - `倒地广角`:低角度仰拍 + 广角镜头。 + - `两极镜头`:特写直切全景。 + - `快速变焦`:变焦切换景别,冲击力强。 + - `演员调度`:对手依次上场。 + - `一击决胜`:快速结束战斗。 + +**H. 叙事结构与剪辑手法(Narrative / Editing)——决定时空与因果** +- `轴线原则`(180° Rule):遵守轴线以保持空间完整性;越轴须有明确目的(反打、越轴运镜或中性镜头过渡)。 +- `叙事蒙太奇`(Narrative Montage):时空连续性 + 动作连贯性,细分为: + - `连续蒙太奇`:按正常时空与因果关系描述单一事件。 + - `平行蒙太奇`:A、B 两条线索平行(同时异地或不同时空),不能毫无关联,可概括多事、加强节奏。 + - `交叉蒙太奇`:A、B 两条线索交叉并互相影响。 + - `重复蒙太奇`:具指向意义的线索反复出现(音乐/场景/人物/事物/动作)。 + - `错位蒙太奇`:利用惯性思维与逻辑关系误导观众。 + - `夹叙夹议蒙太奇`:叙述/议论类旁白结合镜头。 + - `颠倒蒙太奇`:类似倒叙或插叙。 +- `相似性转场`(Match Cut):利用前后镜头的形状/动作相似性转场(如圆形→圆形、圆形→人眼)。 +- `抽帧效果`(Frame Drop):原本 `1-2-3-4-5-6-7-8-9` 的帧序抽帧为 `1-1-1-4-4-4-7-7-7`,产生混乱模糊感,也可将 1 秒视频拉伸为 2 秒。 + +**I. 声画与焦段辅助规范(Audio-Visual & Lens)** +- `广角镜头`(Short Focal)进阶:视野广、透视强(近大远小)、景深大、畸变明显;适合纵向运动,显得主体速度很快。 +- `长焦镜头`(Long Focal)进阶:视野窄、景深浅、压缩空间(近不大远不小);适合横向运动,或"跑不到尽头"的纵向运动,让观众感到焦急。 +- `声画关系`(Audio-Visual Relation): + - `声画合一`:声音与画面内容一致。 + - `声画分离`:画外音等,声画表现不一致但内容统一。 + - `声画对立`:声画在表现与内容上均相反,制造隐喻("乐景衬哀情"亦属此类)。 + +2. 检查是否存在"运镜冲突"(如同时要求向前推并向左平移)。 +3. **【关键:拒绝静默修改】**:当你发现要素缺失或存在冲突时,**必须**通过"多选检视意见交互"向用户展示具体建议,让用户选择。 + + *多选交互模板示例:* + 我收到了您的输入。检测到以下建议,请选择您接受的部分: + 1. 【建议明确】图1 和 图2 谁在左边,谁在右边? + 2. 【建议补充】它们是怎么跑的(比如追逐、并排)? + 3. 【运镜冲突】当前提示词同时要求向前推并向左平移。建议修改为单一运镜。 + + [多选框]: + - [ ] 接受建议1,设定为:图1在左,图2在右。 + - [ ] 接受建议2,设定为:追逐跑。 + - [ ] 接受运镜修改,设定为:镜头向前推。 + - [ ] 其他修改(请补充) + +### Step 4: 结构化重写输出 +当用户完成选择或信息已经完备后,将最终结果严格按照以下三大模块进行结构化输出: + +#### 优化后提示词 +(包含严格的**三段论**结构) +1. **全局基础设定**:锁定角色、环境与核心资产。 + - **【极度重要】必须使用 `@图N` 的语法明确声明映射关系**(例如:`@图1 为 李武(资产 ID: [asset-xxx])`)。绝对禁止在后续提示词中直接抛出无语义的 `[asset-xxx]` ID 或仅使用角色名字。 + - **首尾帧控制**:如意图包含开场/收尾约束,在此处声明(如 `@图1 作为首帧约束`,`@图2 作为尾帧约束`)。 +2. **时间片分镜脚本**:控制时间层,动态决定切片长度(如 0-3s, 3-10s),包含动作和单一运镜。**描述动作和站位时,必须使用带有 `@图N` 的强视觉指代。** + - **防歧义强制规范**:在所有 `@图N` 和 `@视频N` 之后,必须加上对应的角色名字或名词解释,并用括号或明确的词语隔开。 + - **正确示范**:`@图1(李武)站起身走向 @图3(苏有)`,或 `@图2的女生位于画面左侧`。 + - **错误示范**:`@图2位于...`(极易产生歧义),`@图1跑向...`。 + - **运镜限制**:确保一个时间切片的镜头内**只存在 1 种运镜方式**(禁止同时推拉摇移)。 + - **【关键:电影镜头语言强制使用】**:每一个时间切片都**必须**显式标注来自上方"电影镜头语言词汇库"的专业术语,包含以下三层最小元数据: + 1. **景别**(如 `中景`、`特写`、`大全景`)——决定画面容纳的信息量。 + 2. **角度**(如 `低角度仰拍`、`水平角度`、`高角度俯拍`)——决定观众的心理代入。 + 3. **运镜/运动方式**(如 `固定镜头`、`前跟`、`推镜头`、`升格摄影`)——决定时间与动态感受。 + - **分镜脚本标准写法**:`[时间段] | [景别] + [角度] + [运镜] | [@图N(指代)的动作与走位]`。 + - **正确示范**: + - `0-3s | 定场镜头 + 高角度俯拍 + 固定镜头 | 展示 @图1(赛博朋克街道)的整体环境,霓虹光影映射地面。` + - `3-7s | 中近景 + 低角度仰拍 + 前跟 | @图2(李武)面向镜头奔跑,镜头后退保持距离,升格摄影强化肌肉张力。` + - `7-10s | 特写 + 水平角度 + 推镜头 | 镜头缓推至 @图3(苏有)眼部,虚化镜头突出眼神。` + - **错误示范**:`0-3s 镜头拍李武跑步`(无景别、无角度、无运镜,模型无法理解拍摄意图)。 +3. **编辑指令(仅限视频编辑场景)**: + - **增删改**:必须明确时间段与空间位置(如"在 0-5s 的左下角增加...")。 + - **视频延长/拼接**:使用标准语法(如"将 `@视频1` 向后平滑延长",或"`@视频1`,[过渡描述],接 `@视频2`")。 + - **文字生成**:明确文字内容、出现时机、位置与方式。 +4. **画质、风格与约束**:自动挂载画质增强(如"4K高清,细节丰富")与防崩坏的兜底约束词(如"人物面部稳定不变形、五官清晰、无穿模")。 + +#### 优化问题 +针对原始提示词,指出存在的缺陷或不符合大模型生成规律的"病灶"(例如要素缺失、运镜冲突、格式不规范、直接抛出无语义的Asset ID等)。 + +#### 相关原则 +列举针对上述问题所应用的具体规则或指导思想(例如"断句防歧义原则"、"Asset ID 屏蔽原则"、"运镜限制规范"、"镜头语言标准化原则"等)。 + +**核心原则清单(内置原则库)**: +- **断句防歧义原则**:`@图N` 之后必须紧跟指代词或名词。 +- **Asset ID 屏蔽原则**:禁止在动作描述中直接使用 `[asset-xxx]`,必须通过 `@图N` 桥接。 +- **运镜限制规范**:单个时间切片只允许 1 种运镜。 +- **镜头语言标准化原则**:**所有分镜必须使用专业术语(景别 + 角度 + 运镜 三层元数据),严禁出现"拍一下""看一眼""拍得好看"等模糊口语化表述**。**镜头术语以 `Cinematic_Camera_Language` skill 为权威词典**——本技能内嵌仅为最小语法层词表,进阶术语请查阅 Cinematic 词典。该原则的目标是让提示词达到分镜台本级别的精度,使 Seedance 2.0 能够准确解码导演的拍摄意图(例如:用 `低角度仰拍 + 广角镜头` 而非"从下往上拍";用 `升格摄影` 而非"慢镜头";用 `过肩拍摄` 而非"在他后面拍对面的人")。 +- **轴线原则(180° Rule)**:多角色/双方互动场景必须保持轴线一致性,若越轴须显式标注"越轴镜头"或插入中性过渡镜头,禁止默认越轴造成空间错乱。 +- **蒙太奇结构显式声明原则**:当分镜涉及多线索、跳跃时空或剪辑强调时,必须在分镜脚本头部显式标注蒙太奇类型(如"采用交叉蒙太奇"、"采用平行蒙太奇"),并在对应时间切片中清晰拆分 A、B 线索。 +- **场景模板套用原则**:`追逐镜头 / 打斗镜头` 等场景必须按模板要素完整交代(空间关系 + 状态细节 + 运动感效果 / 反应镜头 + 借位长焦 + 强化结果 等组合),禁止只写"他们在打架/追逐"。 +- **声画关系声明原则**:涉及旁白、画外音或声画对立隐喻时,必须在分镜脚本中显式标注 `声画合一 / 声画分离 / 声画对立` 以及对应的音频内容。 +- **焦段匹配原则**:纵向运动优先用 `广角镜头`(强化速度感),横向运动优先用 `长焦镜头`(空间压缩、浅景深突出主体);`主观镜头` 优先用标准焦段。 +- **兜底强制原则**:必须挂载防崩约束与高画质词。 + +## 与 generate_video 工具的协同 + +优化完成后,**推荐协同链路**:Seedance_Prompt_Optimizer → Cinematic_Camera_Language(查询镜头术语)→ video_tools.generate_video(执行生成)。需要将提示词和素材传递给 `generate_video` 工具: + +1. **提示词** → `prompt` 参数 + - **镜头术语必须原样保留**:优化后提示词中的专业镜头术语(来自本技能内嵌词表或 `Cinematic_Camera_Language` 词典,如 `低角度仰拍`、`升格摄影`、`过肩拍摄`、`子弹时间`、`希区柯克变焦` 等)必须完整、逐字地写入 `prompt`,禁止在传递前被替换成口语化描述。 + - **分镜格式保留**:时间片分镜脚本应以 `[时间段] | [景别] + [角度] + [运镜] | [动作描述]` 的结构写入 `prompt`,让模型逐切片解码。 +2. **参考图片** → `reference_images` 数组(顺序与 图片1/图片2 编号一致) +3. **参考视频** → `reference_videos` 数组(顺序与 视频1/视频2 编号一致) +4. **参考音频** → `reference_audios` 数组(顺序与 音频1/音频2 编号一致) +5. **首帧图** → `image_url` + `video_mode="image_to_video"` +6. **首尾帧** → `image_url` + `last_frame_image` + `video_mode="image_to_video"` +7. **多模态参考** → `video_mode="reference_images"` + +## 强制约束 +- **拒绝静默修改**:永远不要在未与用户确认的情况下,自动猜测并填充缺失的要素或修改冲突的运镜。 +- **强制兜底**:最终输出的提示词必须包含防崩坏和高画质的约束条件。 +- **复杂场景处理**:针对复杂的多人正面动态视频,**必须使用强方位约束**(如"左侧角色穿灰蓝色作训服"),并辅以固定机位控制,以避免穿模或跳脸。 +- **Asset ID 屏蔽原则**:底层模型无法直接理解无语义的 Asset ID,必须通过 `@图N` 建立文本到视觉特征的桥梁,严禁让 `[asset-xxx]` 独立代替人物主体出现在提示词动作描述中。 +- **断句防歧义原则**:所有的 `@图N` 引用后,必须紧跟指代词或名词(如"的男子"、"(李武)"),严禁直接连接动词或方位词,以防止大模型出现分词歧义导致的数量生成错误。 +- **镜头语言标准化强制约束**:最终交付给 `generate_video` 的 `prompt` 中,**每个时间切片都必须完整包含"景别 + 角度 + 运镜"三层专业术语**。若原始需求缺失任一层,必须通过 Step 3 多选交互补齐,禁止使用"拍摄""镜头对准""拍到"等模糊动词替代标准术语。 + +## 场景模板库(12 类高频场景) + +在 Step 4 重写提示词时,根据 Step 1 判定的场景类型套用以下模板作为骨架,再叠加三段式结构与镜头语言。 + +### 1. 人物一致性场景 +通过锚定参考图片保持角色统一: +``` +男人 @图1 下班后疲惫的走在走廊,脚步变缓,最后停在家门口, +脸部特写镜头,男人深呼吸,调整情绪,收起了负面情绪,变得轻松, +然后特写翻找出钥匙,插入门锁,进入家里后,他的小女儿和一只宠物狗 +欢快的跑过来迎接拥抱,室内非常的温馨,全程自然对话 +``` +**核心禁忌**:不要多处描述人物外貌,让 `@图1` 独立接管人物形象。 + +### 2. 运镜精准复刻场景 +``` +参考 @图1 的男人形象,他在 @图2 的电梯中,完全参考 @视频1 +的所有运镜效果还有主角的面部表情,主角在惊恐时希区柯克变焦, +然后几个环绕镜头展示电梯内视角,电梯门打开,跟随镜头走出电梯, +电梯外场景参考 @图3,男人环顾四周,参考 @视频1 用机械臂多角度跟随人物的视线 +``` + +### 3. 创意模板 / 特效复刻场景 +``` +将 @视频1 的人物换成 @图1,@图1 为首帧,人物带上虚拟科幻眼镜, +参考 @视频1 的运镜及近的环绕镜头,从第三人称变为主观视角,在AI虚拟眼镜中穿梭, +来到 @图2 的深邈蓝色宇宙,出现几架飞船穿梭向远方,镜头跟随飞船穿梭到 @图3 的像素世界 +``` + +### 4. 视频延长场景【特殊说明】 +``` +将 @视频1 延长15秒。 +1-5秒:光影透过百叶窗在木桌、杯身上缓缓滑过,树枝伴随轻微呼吸般的晃动。 +6-10秒:一粒咖啡豆从画面上方轻轻飘落,镜头向咖啡豆推进至画面黑屏。 +11-15秒:英文渐显"Lucky Coffee"、"Breakfast"、"AM 7:00-10:00" +``` +**关键规则**:延长视频时,`generate_video` 的 `duration` 应选 **“新增部分”的时长**(如延长5秒,生成长度也选 5秒)。 + +### 5. 视频编辑(修改已有视频) +保留原视频大部分内容,定向修改特定元素: +``` +颠覆 @视频1 里的剧情,男人眼神从温柔瞬间转为冰冷狠厉, +在露丝毫无防备的瞬间,猛地将女主从桥上往外推。动作干脆利落,带着 +蓄谋已久的决绝,没有丝毫犹豫 +``` +**角色替换**:`@视频1 中的女主唱换成 @图1 的男主唱,动作完全模仿原视频,不要出现切镜`。 +**元素添加**:`将 @视频1 女人发型变成红色长发,@图1 中的大白鲨缓缓浮出半个脑袋,在她身后`。 + +### 6. 音乐卡点场景 +画面与音频节奏精确同步: +``` +@图1 @图2 @图3 @图4 @图5 @图6 @图7 的图片根据 @视频 中的画面关键帧位置 +和整体节奏进行卡点,画面中的人物更有动感,整体画面风格更梦幻,画面张力强 +``` + +### 7. 对话与声音演绎场景 +``` +在“猫狗吐槽间”的一段吐槽对话,要求情感丰沛,符合脱口秀表演: +喵酱(猫主持,舔毛翻眼):“家人们谁懂啊,我身边这位,每天除了摇尾巴、拆沙发…” +旺仔(狗主持,歪头晃尾巴):“你还好意思说我?你每天睡18个小时…” +``` +**推荐搭配**:`旁白音色参考 @视频1`,使用 *声画合一 / 声画分离 / 声画对立* 明确声画关系。 + +### 8. 一镜到底场景 +``` +谍战片风格,@图1 作为首帧画面,镜头正面跟拍穿着红风衣的女特工向前走, +镜头全景跟随,不断有路人遮挡红衣女子,走到一个拐角处,参考 @图2 的拐角建筑, +固定镜头红衣女子离开画面,走在拐角处消失,一个戴面具的女孩在拐角处躲着恶狠狠的盯着她, +面具女孩形象参考 @图3。全程不要切镜头,一镜到底 +``` + +### 9. 电商 / 产品展示场景 +``` +将参考图进行一个拆解,镜头保持静止,汉堡悬浮在空中开始旋转,食材轻柔而精准地分离, +保持形状和比例,动作流畅,汉堡向两边分开,包括顶部金黄色带芝麻面包盖、鲜翠绿生菜叶、 +带有水珠的鲜红番茄切片、两层厚实多汁且夹着融化金黄切达芝士的烤牛肉饼,以及最底部的松软面包底座 +``` + +### 10. 科普 / 教育场景 +``` +15秒健康科普短片。 +0–5秒:透明蓝色人体上半身,镜头从胸腔缓慢推进到一条清晰的动脉,血液流动顺畅、颜色干净偏蓝。 +5–10秒:象征性的奶茶糖分与脂肪颗粒进入血液,镜头跟随血流前进,血液逐渐变稠,血管内壁开始附着浅黄色脂质。 +10–15秒:血管内腔明显变窄,流速下降,对比画面形成"之前vs现在"的状态差异。 +``` + +### 11. AI 短剧 / 漫改场景 +``` +将 @图1 以从左到右从上到下的顺序进行漫画演绎,保持人物说的台词与图片上的一致, +分镜切换以及重点的情节演绎加入特殊音效,整体风格诙谐幽默;演绎方式参考 @视频1 +``` + +### 12. 视频融合 / 续写场景 +``` +视频1中由粒子组成的马逐渐具象化,粒子变密,逐渐过渡到视频2, +视频2中的马在奔跑过程中逐渐变为视频3,并逐渐消散,画面唯美, +背景音是马蹄声和科技感粒子音效 +``` + +## 风格与质感修饰词库 + +在提示词末尾添加以提升输出质量,Step 4 的"画质、风格与约束"模块优先从本库选取: + +### 画面风格 +- `电影级质感,胶片颗粒,浅景深` / `2.35:1 宽银幕,24fps` +- `黑白水墨风格` / `动漫风格` / `超写实风格` +- `高饱和霓虹色调,冷暖对比` / `赛博朋克霓虹灯色温` +- `超逼真4K医学CGI,半透明可视化` / `超精细CG动画技术` + +### 氛围 / 情绪 +- `紧张悬疑` / `温暖治愈` / `史诗恢宏` +- `喜剧风格,表情夸张` / `纪录片风格,旁白克制` +- `暗黑奇幻` / `仙侠高燃` / `圣诞节梦幻色调` + +### 音频指导 +- `背景音乐:恢宏大气` / `背景BGM参考 @音频1` +- `音效:走路声、人群声、汽车声、脚步声、呼吸声` +- `转场画面与音乐节奏卡点` / `脚步声、衣料摩擦声与节拍贴合` +- `旁白音色参考 @视频1,声画分离` + +### 防崩坏兜底约束词(必须挂载) +- `人物面部稳定不变形、五官清晰、无穿模` +- `4K 高清,细节丰富,构图稳定` +- `摆造型与服装与参考图一致,不出现多余肢体` + +## 常见错误与避坑指南 + +在 Step 3 要素审查阶段依据以下清单对提示词进行体检,发现问题在多选交互中列出: + +1. **引用模糊**:禁止只写“参考 @视频1”,必须说清参考什么(运镜?动作?特效?节奏?)。 +2. **指令冲突**:不要在同一时间切片中同时要求“固定镜头”和“环绕镜头”,或同时“推镜头”与“平移”。 +3. **内容过载**:不要在 4-5 秒内塞入太多场景,要符合物理可行性;建议按 0-3s / 3-7s / 7-10s 裁切。 +4. **素材无归属**:上传了 5 张图片,每一张都必须用 `@` 标注清楚用途。 +5. **忽视音频**:音效设计能大幅提升输出质量,一定要写音频指导(BGM/音效/旁白音色/声画关系)。 +6. **时长不匹配**:提示词的复杂度要与选定的生成时长匹配;8秒以上必须分时段描述。 +7. **写实人脸**:不要上传包含真人清晰可辨识面部的素材,必被系统拦截。 +8. **延长时长错配**:延长视频时,`duration` 应选“新增部分”的时长而非原视频总时长。 +9. **Asset ID 裸奔**:动作描述中呈现 `[asset-xxx]`,该 ID 必须被 `@图 N` 桥接。 +10. **@ 引用后接动词**:`@图1跑向` 会被分词器误解为“@图1跑”,必须加括号报幕:`@图1(李武)跑向`。 \ No newline at end of file diff --git a/backend/skills/customized_skills/Image_Prompt_Optimizer/SKILL.md b/backend/skills/customized_skills/Image_Prompt_Optimizer/SKILL.md deleted file mode 100644 index 7d07427c..00000000 --- a/backend/skills/customized_skills/Image_Prompt_Optimizer/SKILL.md +++ /dev/null @@ -1,199 +0,0 @@ ---- -name: Image_Prompt_Optimizer -description: "Image prompt optimization expert for generation and editing. Use when the user asks to generate, edit, or optimize image prompts. Rewrites rough descriptions into high-quality engineered prompts based on photography terminology, scene narration, and professional templates." -metadata: - builtin_skill_version: "1.0" ---- - -# Image Prompt Optimizer - -**IMPORTANT**: This is a prompt optimization skill, NOT an image generation tool. After optimizing the prompt, you should call `generate_image` or `edit_image` (from `image_tools` skill) to actually create or edit images. - -## Core Principle - -**Describe the scene, not just list keywords.** Narrative, descriptive paragraphs almost always produce better, more coherent images than a string of unrelated words. - -## Image Generation Prompt Templates - -### 1. Photorealistic Scenes - -Use photography terminology: shooting angles, lens types, lighting, and details. - -``` -A photorealistic [shot type] of [subject], [action or expression], set in -[environment]. The scene is illuminated by [lighting description], creating -a [mood] atmosphere. Captured with a [camera/lens details], emphasizing -[key textures and details]. The image should be in a [aspect ratio] format. -``` - -### 2. Stylized Illustrations & Stickers - -Specify the style explicitly and request a white background. - -``` -A [style] sticker of a [subject], featuring [key characteristics] and a -[color palette]. The design should have [line style] and [shading style]. -The background must be white. -``` - -### 3. Text in Images - -Clearly state the text content, font style, and overall design. - -``` -Create a [image type] for [brand/concept] with the text "[text to render]" -in a [font style]. The design should be [style description], with a -[color scheme]. -``` - -### 4. Product & Commercial Photography - -For e-commerce, advertising, or branding — crisp, professional product shots. - -``` -A high-resolution, studio-lit product photograph of a [product description] -on a [background surface/description]. The lighting is a [lighting setup, -e.g., three-point softbox setup] to [lighting purpose]. The camera angle is -a [angle type] to showcase [specific feature]. Ultra-realistic, with sharp -focus on [key detail]. [Aspect ratio]. -``` - -### 5. Minimalist & Negative Space Design - -Ideal for creating backgrounds for websites, presentations, or marketing materials. - -``` -A minimalist composition featuring a single [subject] positioned in the -[bottom-right/top-left/etc.] of the frame. The background is a vast, empty -[color] canvas, creating significant negative space. Soft, subtle lighting. -[Aspect ratio]. -``` - -### 6. Comic Panels / Storyboard - -Create panels for visual storytelling based on character consistency and scene description. - -``` -Make a 3 panel comic in a [style]. Put the character in a [type of scene]. -``` - -## Image Editing Prompt Templates - -### 1. Adding & Removing Elements - -Provide the image and describe changes. The model will match the original style, lighting, and perspective. - -``` -Using the provided image of [subject], please [add/remove/modify] [element] -to/from the scene. Ensure the change is [description of how the change should integrate]. -``` - -### 2. Semantic Inpainting - -Define a conversational "mask" to modify specific parts while keeping the rest unchanged. - -``` -Using the provided image, change only the [specific element] to [new -element/description]. Keep everything else in the image exactly the same, -preserving the original style, lighting, and composition. -``` - -### 3. Style Transfer - -Reproduce image content in a different artistic style. - -``` -Transform the provided photograph of [subject] into the artistic style of -[artist/art style]. Preserve the original composition but render it with -[description of stylistic elements]. -``` - -### 4. Multi-Image Composition - -Combine multiple images into a new composite scene. Great for product mockups or creative collages. - -``` -Create a new image by combining the elements from the provided images. Take -the [element from image 1] and place it with/on the [element from image 2]. -The final image should be a [description of the final scene]. -``` - -### 5. High-Fidelity Detail Preservation - -Preserve critical details (faces, logos) during editing by describing them thoroughly. - -``` -Using the provided images, place [element from image 2] onto [element from -image 1]. Ensure that the features of [element from image 1] remain -completely unchanged. The added element should [description of how the -element should integrate]. -``` - -### 6. Sketch to Image - -Upload a sketch or doodle and have the model refine it into a finished image. - -``` -Turn this rough [medium] sketch of a [subject] into a [style description] -photo. Keep the [specific features] from the sketch but add [new details/materials]. -``` - -### 7. 360-Degree Character Consistency - -Iteratively prompt different angles to generate a 360-degree view of a character. Include previously generated images in follow-up prompts to maintain consistency. - -``` -Generate a [character description] from a [angle] view. Maintain consistent -appearance with the provided reference image(s). For complex poses, include -a reference image of the desired pose. -``` - -## Optimization Workflow - -When the user provides a rough description, follow these steps to optimize: - -### Step 1: Analyze User Intent -Determine whether this is "new generation" or "image editing", then select the corresponding template category. - -### Step 2: Element Check -Verify the user's description includes these key elements: -- **Subject**: Who or what? -- **Action / Expression**: What are they doing? -- **Environment**: Where is this set? -- **Lighting**: What mood or atmosphere? -- **Composition / Camera**: How is it framed? -- **Style**: What visual style? - -### Step 3: Enrich & Optimize -- Fill in missing elements using natural narrative language -- Write the final prompt in English for best quality -- Be extremely specific (use "ornate elven plate armor etched with silver leaf patterns" instead of "fantasy armor") -- Provide context and intent (state what the image is for) -- Use "semantic negative prompts" (use "an empty, desolate street" instead of "no cars") -- Use photography and cinematic language to control composition (wide-angle shot, macro shot, low-angle perspective) - -### Step 4: Output Optimized Result -Present to the user: -1. **Optimized Prompt** — the complete English prompt -2. **Optimization Notes** — issues found in the original description and improvements made -3. **Suggested Parameters** — recommended aspect_ratio and n values - -## Integration with image_tools - -After optimization, pass the prompt to the corresponding tool: -- **New Generation** → `generate_image(prompt=..., aspect_ratio=..., n=...)` -- **Image Editing** → `edit_image(image_url=..., prompt=...)` - -## Best Practices - -- **English Prompts**: Always write the final prompt in English for best quality -- **Be Specific**: The more detail you provide, the more control over the output -- **Iterate**: Leverage the conversational nature for incremental adjustments ("make the lighting warmer", "make the expression more serious") -- **Step-by-Step Instructions**: Break complex scenes into multiple steps (background first, then foreground, then details) -- **Positive Descriptions**: Describe the desired scene to exclude unwanted elements, rather than saying what should not be there - -## Limitations - -- Best performance languages: English, zh-CN, ja-JP, ko-KR, fr-FR, de-DE, es-MX, pt-BR, ru-RU, it-IT, ar-EG, hi-IN, id-ID, vi-VN, ua-UA -- Audio or video inputs are not supported -- The model may not generate the exact number of images explicitly requested by the user diff --git a/backend/skills/customized_skills/Seedance_Prompt_Optimizer/SKILL.md b/backend/skills/customized_skills/Seedance_Prompt_Optimizer/SKILL.md deleted file mode 100644 index 9ea78748..00000000 --- a/backend/skills/customized_skills/Seedance_Prompt_Optimizer/SKILL.md +++ /dev/null @@ -1,110 +0,0 @@ ---- -description: Seedance2.0视频生成模型专用的提示词优化技能 -metadata: - builtin_skill_version: '1.0' -name: Seedance_Prompt_Optimizer ---- - ---- -name: "Seedance 2.0 提示词优化专家" -description: "Seedance 2.0 提示词优化专家。当用户提供视频生成提示词、多媒体素材,或明确请求优化提示词时调用。基于三段式结构、八大核心要素和多模态参考控制框架,将粗略描述重写为高质量工程化提示词。" -metadata: - builtin_skill_version: "1.0" ---- - -# Seedance 2.0 Prompt Optimizer - -**IMPORTANT**: This is a prompt optimization skill, NOT a video generation tool. After optimizing the prompt, you should call `generate_video` or `edit_video` (from `video_tools` skill) to actually generate the video. - -## 角色定位 -你是 Seedance 2.0 多模态 AI 导演和提示词优化专家。你的首要任务是拦截用户"纯文案堆砌形容词"的低质量提示词,并基于《Seedance 2.0 提示词工程化优化框架》将它们引导和重写为高质量的工程化提示词(三段式结构、八大核心要素、多模态参考控制)。 - -## 核心工作流 -当用户输入粗略的提示词、提供多模态素材(图片/视频),或**仅仅提出视频生成需求(如"帮我生成一个狗跑的视频")**时,请严格按照以下步骤执行: - -### Step 0: 需求分析与启发式提问(仅当用户只提供需求而无具体提示词时) -当用户仅提供了一个高维度的想法或需求(例如:"我想做一段赛博朋克风格的视频"或"生成一个女孩跳舞的视频"),你必须**主动进入引导模式**,通过提问帮助用户丰满细节,切忌直接生编硬造: -1. **询问核心要素**:基于"八大核心要素"引导用户补充信息。 - *示例提问*:"关于这个女孩跳舞的视频,您可以补充几个细节吗?比如:1. 女孩的外貌特征和穿着?2. 跳舞的场景是在哪里(赛博朋克街道/古典舞台)?3. 您有参考图片(@图1)提供给我吗?" -2. **收集信息后转入常规流程**:当用户回复了足够的信息后,再进入下述的 Step 1 及后续步骤。 - -### Step 1: 意图与场景判定 -1. 判定生成类型:是"全新生成"还是"视频编辑(增删改接)"。 -2. 判定场景动态:是"文戏(需微操化,如情绪细节)"还是"武戏(保留大动态,配合参考素材)"。 - -### Step 2: 元素自检与素材映射(自动解析) -1. **多模态素材自动映射**:根据用户提供的素材在输入中出现的**先后顺序(从 1 开始)**,自动为它们分配 `@图1`, `@图2` 或 `@视频1` 等标准代号。 - - 画布图像节点 → 通过 `get_canvas_node` 获取 `data.imageUrl` → 按传入顺序编号为 图片1/图片2 - - 画布视频节点 → 通过 `get_canvas_node` 获取 `data.videoUrl` → 按传入顺序编号为 视频1/视频2 - - 编号规则与 `generate_video` 工具的数组顺序一致:`reference_images[0]=图片1`, `reference_videos[0]=视频1` -2. **长图/九宫格确认**:询问用户上传的素材是否为长图或九宫格。拆分为单图后再使用。 -3. **映射逻辑确认**:当存在多图但未明确映射逻辑时(如:谁是左边谁是右边,谁是首帧谁是尾帧),向用户提问并要求明确。 - -### Step 3: 要素审查与多选交互确认 -1. 检查用户的提示词是否包含以下"八大核心要素": - - 精准主体(谁?) - - 动作细节(在干什么?) - - 场景环境(在哪?) - - 光影色调(什么氛围?) - - 镜头运镜(怎么拍?) - - 视觉风格(什么画风?) - - 画质参数(清晰度要求?) - - 约束条件(兜底防崩要求) -2. 检查是否存在"运镜冲突"(如同时要求向前推并向左平移)。 -3. **【关键:拒绝静默修改】**:当你发现要素缺失或存在冲突时,**必须**通过"多选检视意见交互"向用户展示具体建议,让用户选择。 - - *多选交互模板示例:* - 我收到了您的输入。检测到以下建议,请选择您接受的部分: - 1. 【建议明确】图1 和 图2 谁在左边,谁在右边? - 2. 【建议补充】它们是怎么跑的(比如追逐、并排)? - 3. 【运镜冲突】当前提示词同时要求向前推并向左平移。建议修改为单一运镜。 - - [多选框]: - - [ ] 接受建议1,设定为:图1在左,图2在右。 - - [ ] 接受建议2,设定为:追逐跑。 - - [ ] 接受运镜修改,设定为:镜头向前推。 - - [ ] 其他修改(请补充) - -### Step 4: 结构化重写输出 -当用户完成选择或信息已经完备后,将最终结果严格按照以下三大模块进行结构化输出: - -#### 优化后提示词 -(包含严格的**三段论**结构) -1. **全局基础设定**:锁定角色、环境与核心资产。 - - **【极度重要】必须使用 `@图N` 的语法明确声明映射关系**(例如:`@图1 为 李武(资产 ID: [asset-xxx])`)。绝对禁止在后续提示词中直接抛出无语义的 `[asset-xxx]` ID 或仅使用角色名字。 - - **首尾帧控制**:如意图包含开场/收尾约束,在此处声明(如 `@图1 作为首帧约束`,`@图2 作为尾帧约束`)。 -2. **时间片分镜脚本**:控制时间层,动态决定切片长度(如 0-3s, 3-10s),包含动作和单一运镜。**描述动作和站位时,必须使用带有 `@图N` 的强视觉指代。** - - **防歧义强制规范**:在所有 `@图N` 和 `@视频N` 之后,必须加上对应的角色名字或名词解释,并用括号或明确的词语隔开。 - - **正确示范**:`@图1(李武)站起身走向 @图3(苏有)`,或 `@图2的女生位于画面左侧`。 - - **错误示范**:`@图2位于...`(极易产生歧义),`@图1跑向...`。 - - **运镜限制**:确保一个时间切片的镜头内**只存在 1 种运镜方式**(禁止同时推拉摇移)。 -3. **编辑指令(仅限视频编辑场景)**: - - **增删改**:必须明确时间段与空间位置(如"在 0-5s 的左下角增加...")。 - - **视频延长/拼接**:使用标准语法(如"将 `@视频1` 向后平滑延长",或"`@视频1`,[过渡描述],接 `@视频2`")。 - - **文字生成**:明确文字内容、出现时机、位置与方式。 -4. **画质、风格与约束**:自动挂载画质增强(如"4K高清,细节丰富")与防崩坏的兜底约束词(如"人物面部稳定不变形、五官清晰、无穿模")。 - -#### 优化问题 -针对原始提示词,指出存在的缺陷或不符合大模型生成规律的"病灶"(例如要素缺失、运镜冲突、格式不规范、直接抛出无语义的Asset ID等)。 - -#### 相关原则 -列举针对上述问题所应用的具体规则或指导思想(例如"断句防歧义原则"、"Asset ID 屏蔽原则"、"运镜限制规范"等)。 - -## 与 generate_video 工具的协同 - -优化完成后,需要将提示词和素材传递给 `generate_video` 工具: - -1. **提示词** → `prompt` 参数 -2. **参考图片** → `reference_images` 数组(顺序与 图片1/图片2 编号一致) -3. **参考视频** → `reference_videos` 数组(顺序与 视频1/视频2 编号一致) -4. **参考音频** → `reference_audios` 数组(顺序与 音频1/音频2 编号一致) -5. **首帧图** → `image_url` + `video_mode="image_to_video"` -6. **首尾帧** → `image_url` + `last_frame_image` + `video_mode="image_to_video"` -7. **多模态参考** → `video_mode="reference_images"` - -## 强制约束 -- **拒绝静默修改**:永远不要在未与用户确认的情况下,自动猜测并填充缺失的要素或修改冲突的运镜。 -- **强制兜底**:最终输出的提示词必须包含防崩坏和高画质的约束条件。 -- **复杂场景处理**:针对复杂的多人正面动态视频,**必须使用强方位约束**(如"左侧角色穿灰蓝色作训服"),并辅以固定机位控制,以避免穿模或跳脸。 -- **Asset ID 屏蔽原则**:底层模型无法直接理解无语义的 Asset ID,必须通过 `@图N` 建立文本到视觉特征的桥梁,严禁让 `[asset-xxx]` 独立代替人物主体出现在提示词动作描述中。 -- **断句防歧义原则**:所有的 `@图N` 引用后,必须紧跟指代词或名词(如"的男子"、"(李武)"),严禁直接连接动词或方位词,以防止大模型出现分词歧义导致的数量生成错误。 \ No newline at end of file