spec: Veo Foundation — shared lib, cross-cutting params, audio context-aware#1
spec: Veo Foundation — shared lib, cross-cutting params, audio context-aware#1GiuseppeIuculano wants to merge 26 commits into
Conversation
Foundation is sub-project 1 of 5 in the Veo improvements roadmap. Covers shared library extraction (skills/_shared/veo-core/), cross-cutting API parameters (negativePrompt, enhancePrompt, storageUri, personGeneration), extended model support (Veo 3.1 Lite, Veo 3 stable, Veo 2), 4K resolution, audio context-aware default derived from use case, and centralized cross-parameter validation. Other sub-projects (separate specs): - /veo-animate (image-to-video) - /veo-interpolate (first+last frame) - /veo-extend (video extension) - /veo-multi-shot v2 (referenceImages integration) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive design specification for the 'Veo Foundation' sub-project, aimed at refactoring the veo-tools plugin. The proposal includes creating a shared veo-core library to handle authentication, API requests, validation, and pricing, while expanding support for new models and parameters like 4K resolution and negative prompts. Review feedback identifies several areas for refinement: inconsistencies in the pricing calculation factors (specifically the omission of sampleCount and duration), a contradiction between the default model selection and the stated stability preference, and the use of parameters in validation rules that are missing from the configuration table. There is also a recommendation to adjust the token counting logic to prevent false rejections of prompts near the limit.
| const PRICING = { ... } as const | ||
| ``` | ||
|
|
||
| The source URL and review-on-release invariant are encoded as file-level comments. Pricing is a manually-maintained lookup table per model × resolution × audio. Estimate displayed in Phase 4 PRESENT with breakdown. |
There was a problem hiding this comment.
The pricing strategy description is inconsistent and incomplete. Line 82 mentions duration as a factor, but line 336 omits it. Additionally, both sections fail to account for sampleCount, which is a critical multiplier for cost in the Veo API (since users pay for each generated video). The design should specify that estimateCost uses model × resolution × duration × audio × sampleCount.
| | `model` | enum | expanded list (see below) | `veo-3.1-generate-preview` (pending empirical verification) | | | ||
|
|
||
| #### Model expansion | ||
|
|
||
| | Model ID | Variant | Use case | Audio | Max resolution | | ||
| |---|---|---|---|---| | ||
| | `veo-3.1-generate-preview` | Veo 3.1 quality (preview) | Default quality | yes | 4K | | ||
| | `veo-3.1-fast-generate-preview` | Veo 3.1 fast | Fast iteration | yes | 4K | | ||
| | `veo-3.1-lite-generate-preview` | Veo 3.1 lite | Lowest cost, text/image-to-video only | yes | 1080p | | ||
| | `veo-3.0-generate-001` | Veo 3 stable | Production with audio | yes | 4K | | ||
| | `veo-3.0-fast-generate-001` | Veo 3 fast stable | Production fast iteration | yes | 4K | | ||
| | `veo-2.0-generate-001` | Veo 2 | Silent video, multi-sample | **no** | 720p | | ||
|
|
||
| The current default `veo-3.1-generate-001` does not appear in official docs and may be invalid. Foundation includes empirical verification: during implementation, test which IDs return successfully from `predictLongRunning`. Default falls back to the most-recent confirmed-working ID, with preference for stable over preview when both work. |
There was a problem hiding this comment.
There is a contradiction between the proposed default model and the stated selection criteria. Line 142 sets the default to veo-3.1-generate-preview, but line 155 establishes a "preference for stable over preview when both work." Since veo-3.0-generate-001 is the designated stable version (line 151), it should be the default to remain consistent with the stated preference, unless the rule is intended to prioritize the latest version (3.1) regardless of its preview status.
| | 6 | `model ∈ veo-2.*` ⇒ `resolution == 720p` | "Veo 2 max resolution is 720p" | | ||
| | 7 | `model ∈ veo-3.1-lite-*` ⇒ no `referenceImages`, no extension | "Lite model doesn't support reference images or extension" | | ||
| | 8 | `referenceImages.length ∈ [1, 3]` | "Reference images must be 1-3" | | ||
| | 9 | `prompt.tokens > 1024` (approx: chars / 3.5) | "Prompt exceeds 1024 tokens" | |
There was a problem hiding this comment.
The token counting approximation of chars / 3.5 (Rule 9) is risky for prompts near the 1024-token limit. An inaccurate estimate could lead to false rejections of valid prompts. It is recommended to implement this as a warning for prompts estimated to be near the limit (e.g., > 900 tokens), reserving hard rejection for cases where the estimate is significantly over or when the API explicitly returns a token limit error.
| |---|---|---| | ||
| | 1 | `resolution ∈ {1080p, 4k}` ⇒ `durationSeconds == 8` | "1080p/4K require duration=8" | | ||
| | 2 | `referenceImages` present ⇒ `durationSeconds == 8` | "Reference images require duration=8" | | ||
| | 3 | `lastFrame` present ⇒ `durationSeconds == 8` AND `image` present | "Interpolation requires both image and lastFrame, duration=8" | |
There was a problem hiding this comment.
Validation Rule 3 references lastFrame and image parameters, but these are missing from the VeoConfig parameter table at line 134. Even if these are implemented in later sub-projects, the shared VeoConfig type and validation.ts logic defined in this Foundation project must include these fields to avoid type errors and ensure the proposed validation rules are implementable.
Applied 10 review findings from two parallel review agents (technical correctness + scope decomposition). Technical fixes: - Resolve 80 vs 150 lines contradiction (Goals vs Success criteria) - Remove validation rules referencing out-of-scope parameters (referenceImages, lastFrame, video extension); document them as forward references owned by their sub-projects - Clarify Veo 3.1 Lite model description (supports text-to-video in Foundation; image-to-video added by /veo-animate) - Remove undocumented "Audio Layer added" auto-adjustment from Phase 4 PRESENT example - Resolve region detection contradiction with explicit VEO_REGION env var mechanism Decomposition fixes: - Foundation now ships shared image input plumbing (ImageInput type, image-helpers.ts) so downstream sub-projects don't reinvent it - Sequencing clarified: /veo-extend serialized after /veo-animate + /veo-interpolate (riskier operation-chain semantics) - Edit operations (insert/remove/upscale) formally out of roadmap - Added Risks & contingency section with Foundation-A/B split as authorized fallback if scope grows - Added Pricing & constants maintenance protocol to prevent cross- project drift Renumbered validation rules and updated cross-references accordingly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Agent review applied (10 findings, 1 follow-up commit)Two review agents ran in parallel on the initial spec — a code-reviewer for technical consistency and a general-purpose agent for scope/decomposition. All findings were addressed in commit Technical fixes (5)
Decomposition fixes (5)
What didn't change
Ready for team review. |
|
/gemini review |
There was a problem hiding this comment.
Pull request overview
Adds a design spec document for the “Foundation” sub-project in the Veo improvements roadmap, describing the planned shared veo-core library extraction, cross-cutting parameter support, audio-default policy, and centralized validation strategy.
Changes:
- Introduces a comprehensive Foundation design spec covering scope, architecture, validation rules, audio-default policy, and testing strategy.
- Defines proposed module boundaries and a migration plan for extracting shared Veo infrastructure into
skills/_shared/veo-core/. - Documents backwards-compatibility expectations and a staged roadmap for follow-on skills (
/veo-animate,/veo-interpolate,/veo-extend,/veo-multi-shot v2).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Veo Foundation — Design Spec | ||
|
|
||
| **Date**: 2026-05-23 | ||
| **Status**: Approved (pending user review) |
| - Missing API parameters: `negativePrompt`, `enhancePrompt`, `storageUri`, `personGeneration` | ||
| - Missing models: Veo 3.1 Lite, Veo 3 stable, Veo 2 | ||
| - Missing resolution: 4K | ||
| - Audio nativo treated as second-class (default off, no prompt guidance) despite being a killer feature of Veo 3+ |
| - Foundation must reduce per-script line count from ~600 to <150 by extracting shared code (target aligned with Success criteria). | ||
| - New skills (`/veo-animate` etc.) must be implementable by writing only a CLI + workflow, not new auth/polling/validation code. | ||
| - Audio nativo becomes a first-class option when use case warrants it, never silently lost when use case warrants it. | ||
|
|
|
|
||
| ## Risks & contingency | ||
|
|
||
| **Scope risk — Foundation accorpa 3 concern**: refactor lib, params+models+validation, audio context-aware system. Agent review flagged this as a possible scope-creep risk. Mitigation: if during implementation any of these sub-systems drifts past the planned size (e.g., audio context-aware requires more workflow rewriting than estimated), the implementer is authorized to split Foundation into two PRs without re-running this brainstorming step: |
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive design specification for the 'Veo Foundation' sub-project, which centralizes core functionality for Google Veo tools into a shared library. The proposal includes support for new models, 4K resolution, expanded API parameters, and a context-aware audio system. Feedback focused on improving the token counting heuristic for non-English languages to prevent unexpected API errors and expanding the supported aspect ratios to include the 1:1 format for social media compatibility.
| | 1 | `resolution ∈ {1080p, 4k}` ⇒ `durationSeconds == 8` | "1080p/4K require duration=8" | Foundation | | ||
| | 2 | `model ∈ veo-2.*` ⇒ `generateAudio == false` | "Veo 2 doesn't support audio" | Foundation | | ||
| | 3 | `model ∈ veo-2.*` ⇒ `resolution == 720p` | "Veo 2 max resolution is 720p" | Foundation | | ||
| | 4 | `prompt.tokens > 1024` (approx: chars / 3.5) | "Prompt exceeds 1024 tokens" | Foundation | |
There was a problem hiding this comment.
The token counting approximation chars / 3.5 is a rough heuristic that varies significantly by language. For CJK languages (Chinese, Japanese, Korean), where a single character often maps to multiple tokens, this formula will significantly underestimate the count. This could lead to hard API errors in Phase 5 instead of the intended proactive validation in Phase 3. Consider refining this heuristic or adding a note that validation is approximate for non-English prompts.
| | 4 | `prompt.tokens > 1024` (approx: chars / 3.5) | "Prompt exceeds 1024 tokens" | Foundation | | ||
| | 5 | `personGeneration == allow_all` in EU/UK/CH/MENA region (see Open Question #2 for detection mechanism) | Auto-correct + warning: "Region restriction: falling back to allow_adult" | Foundation | | ||
| | 6 | `sampleCount ∈ [1, model-max]` (see Open Question #4) | "sampleCount out of range for selected model" | Foundation | | ||
| | 7 | `aspectRatio ∈ {16:9, 9:16}` only | "Invalid aspect ratio" | Foundation | |
There was a problem hiding this comment.
The aspectRatio validation rule excludes the 1:1 (square) format, which is a native capability of Google Veo 3.1 and a common requirement for social media platforms (e.g., Instagram, LinkedIn). Including it would better align with the "marketing" and "social" use cases identified in Phase 1.
| | 7 | `aspectRatio ∈ {16:9, 9:16}` only | "Invalid aspect ratio" | Foundation | | |
| | 7 | aspectRatio ∈ {16:9, 9:16, 1:1} only | "Invalid aspect ratio" | Foundation | |
There was a problem hiding this comment.
Suggestion declined — claim contradicted by official documentation.
I verified the aspectRatio valid values across three official Google sources:
- Vertex AI — Veo 3.1 model page (
docs.cloud.google.com/vertex-ai/generative-ai/docs/models/veo/3-1-generate-preview): "Supported aspect ratios: 9:16, 16:9" - Gemini API — Veo docs (
ai.google.dev/gemini-api/docs/video): "aspectRatio: The video's aspect ratio.\"16:9\"(default),\"9:16\"" - Vertex AI — Veo 3.0 model page: "Supported aspect ratios: 9:16, 16:9" (both for standard and fast-generate variants)
All three explicitly enumerate exactly two valid values: 16:9 and 9:16. 1:1 is not a supported aspect ratio for any current Veo model. Including it in validation would either fail at API call (false positive in our validation) or silently downscale/crop (worse).
If/when Google adds 1:1 support to Veo, we'll add it via a constants.ts update following the maintenance protocol defined in §6. Rule #7 stays at {16:9, 9:16} for now.
Accepted 7 findings, rejected 1 (aspect ratio 1:1 — see below).
Substantive fixes:
- Pricing formula: add sampleCount as strict multiplier; align all
occurrences to model × resolution × duration × audio × sampleCount
- Default model selection rule: keep veo-3.1-generate-preview (latest
generation) as default with veo-3.0-generate-001 as fallback if
preview unavailable. Stable-preference applies within same generation,
not across generations.
- VeoConfig type schema: declare image, lastFrame, referenceImages,
videoExtensionInput as optional forward declarations in Foundation
so downstream sub-projects don't have to modify Foundation types.
Foundation's validateConfig() ignores these; sub-projects register
rules via registerRule() API.
- Token counting: rule #4 changes from hard-reject at 1024 to soft
warning at >900 / hard reject at >1024. Added CJK detection branch
using chars/1.5 heuristic plus note that estimate is approximate.
API-side token errors are surfaced verbatim for heuristic refinement.
Trivial fixes:
- Status frontmatter: "Approved (pending user review)" → "Proposed
(under review)" (resolve self-contradiction)
- "nativo" → "native" (two occurrences)
- "Foundation accorpa 3 concern" → "Foundation bundles 3 concerns"
Rejected:
- aspectRatio 1:1: bot claim contradicted by three official Google
doc sources (Vertex AI Veo 3.1, Veo 3.0, Gemini API docs) which
all list exactly {16:9, 9:16}. Suggestion declined with citation
in PR reply.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bot review applied (commit
|
| Finding | Source | Resolution |
|---|---|---|
Pricing formula missed sampleCount and inconsistent duration mention |
gemini | estimateCost keyed by model × resolution × duration × audio × sampleCount everywhere. sampleCount is a strict multiplier (users pay per generated video). |
| Default model contradicted "preference for stable" | gemini | Decision: keep veo-3.1-generate-preview as default (latest generation gives users access to newest features). Reformulated selection rule: stable-preference applies within the same generation, not across. Fallback to veo-3.0-generate-001 if preview unavailable. |
Validation rules referenced lastFrame/image/referenceImages missing from VeoConfig table |
gemini | Added forward declarations in VeoConfig: image?, lastFrame?, referenceImages?, videoExtensionInput? are optional from day one. Foundation's validateConfig() ignores them; sub-projects register rules via registerRule() API. Downstream sub-projects don't have to modify Foundation types. |
Token counting chars/3.5 risks false rejections + significantly underestimates CJK languages |
gemini (twice) | Rule #4 now: soft warning at >900 tokens, hard reject only at >1024. CJK detection branch uses chars/1.5. API-side token errors surfaced verbatim for heuristic refinement. |
| Status frontmatter self-contradictory ("Approved (pending user review)") | Copilot | Changed to "Proposed (under review)". |
| "nativo" should be "native" (two occurrences) | Copilot | Fixed. |
| "Foundation accorpa 3 concern" mixed Italian/English | Copilot | Rewritten as "Foundation bundles 3 concerns". |
Declined (1)
| Finding | Source | Reason |
|---|---|---|
Add 1:1 (square) to aspectRatio |
gemini | Contradicted by 3 official Google doc sources (Vertex AI Veo 3.1, Veo 3.0, Gemini API docs) all listing exactly {16:9, 9:16}. Full citation in inline reply. If/when Google adds 1:1 support, we'll update via the constants.ts maintenance protocol (§6). |
Diff summary
6de17b0: 43 insertions, 11 deletions. Net effect: spec is now internally consistent, all type/validation forward declarations explicit, pricing model accurate, default selection rule unambiguous.
Ready for the next round of review.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive design specification for the 'Veo Foundation' sub-project, which aims to centralize core logic into a shared library and expand the capabilities of the Veo plugin. Key changes include refactoring existing skills to use a unified API, adding support for new parameters like 4K resolution and negative prompts, and implementing an audio context-aware system. Feedback from the review highlights opportunities to improve the robustness of the shared library, such as adding explicit HTTP status validation for downloads, making the download step conditional on storage settings, and ensuring the type system and validation rules are fully forward-compatible with future roadmap items like video extension and social-media-friendly aspect ratios. Additionally, refinements were suggested for region detection and token counting heuristics for non-Latin scripts.
| | Module | Public API | Internal | | ||
| |---|---|---| | ||
| | `auth.ts` | `getAccessToken(): string` | gcloud CLI invocation, error normalization | | ||
| | `api.ts` | `submitGeneration(config, token): Promise<operationName>`, `pollOperation(opName, token): Promise<status>`, `downloadFile(url, path, token): Promise<void>` | URL building, HTTPS request handling, redirect following | |
There was a problem hiding this comment.
The downloadFile function in the shared api.ts should include explicit validation for HTTP status codes. The current implementation in veo-generate.ts (which this spec refactors) only handles redirects and pipes the response body regardless of other status codes. Ensure the shared version throws an error for non-200/30x responses (e.g., 404, 403, 500) to prevent the creation of empty or corrupted video files.
| ├─> getAccessToken() | ||
| ├─> submitGeneration(config, token) → operationName | ||
| ├─> pollOperation(operationName, token) → status | ||
| └─> downloadFile(status.videoUrl, outputPath, token) |
There was a problem hiding this comment.
The data flow diagram indicates that downloadFile is a mandatory step. However, the storageUri parameter (Line 143) allows users to store the video directly on GCS instead of downloading it locally. The orchestration logic in generate.ts should be updated to make the download step conditional on the absence of a storageUri in the configuration.
| // Foundation-owned (validated and consumed here) | ||
| prompt: string | ||
| model?: string | ||
| aspectRatio?: '16:9' | '9:16' |
There was a problem hiding this comment.
The aspectRatio type is currently limited to 16:9 and 9:16. Since Veo 3.1 natively supports a 1:1 (square) aspect ratio, which is highly relevant for social media use cases mentioned in the spec (Line 217), consider adding '1:1' to the allowed values.
| aspectRatio?: '16:9' | '9:16' | |
| aspectRatio?: '16:9' | '9:16' | '1:1' |
There was a problem hiding this comment.
Suggestion declined (second time) — same rationale as previous reply on aspectRatio 1:1: verified against three official Google sources, all listing exactly {16:9, 9:16}. See prior reply for full citation.
| prompt: string | ||
| model?: string | ||
| aspectRatio?: '16:9' | '9:16' | ||
| durationSeconds?: 4 | 6 | 8 |
There was a problem hiding this comment.
The durationSeconds type is restricted to the literal values 4 | 6 | 8. However, the roadmap (Line 28) includes the /veo-extend sub-project which supports durations up to 148s. To fulfill the design goal of forward-declaring fields to avoid future interface modifications (Line 164), durationSeconds should be typed as number. The 4/6/8 restriction for standard models should be enforced via validation rules in validation.ts instead of the type system.
| durationSeconds?: 4 | 6 | 8 | |
| durationSeconds?: number |
|
|
||
| Anti-patterns rejected by Phase 3 VALIDATE: | ||
| - Audio on but no audio descriptors in prompt → warning | ||
| - Comic-book onomatopoeia (`BAM!`, `WHOOSH!`) → rejected; suggest realistic description |
There was a problem hiding this comment.
There is an inconsistency between the hard rejection of onomatopoeia (e.g., 'BAM!', 'WHOOSH!') here and the softening of text-related rules in Phase 3 (Line 306), where text requests are downgraded to warnings. Given that Veo 3+ supports native audio, users might use these terms to describe sound effects. Unless onomatopoeia is known to cause severe artifacts in the audio engine, consider treating it as a warning (suggesting descriptive Foley terms instead) to remain consistent with the overall prompt validation strategy.
| ## Open questions | ||
|
|
||
| 1. **Default model ID**: `veo-3.1-generate-001` (current) vs `veo-3.1-generate-preview` (docs) — to be resolved empirically during implementation. If both work, prefer current for backwards compatibility; if only one works, choose that. | ||
| 2. **Region detection**: rule #5 (`personGeneration` regional restriction) requires knowing the user's region for proactive auto-fix. Resolution: Foundation reads `VEO_REGION` env var (values: `us`, `eu`, `uk`, `ch`, `mena`, `other`). If set, auto-fix applies proactively in Phase 4 PRESENT. If unset, no auto-fix; the API error (if any) is surfaced verbatim in Phase 5 GENERATE. This eliminates the contradiction between proactive auto-fix and "delegate to API error" — both paths exist and the env var decides which. |
There was a problem hiding this comment.
The proposed region detection relies solely on a manual VEO_REGION environment variable. To improve the user experience and the reliability of proactive auto-fixes (Rule #5), the veo-core library should attempt to infer the region from the GOOGLE_CLOUD_LOCATION environment variable (e.g., mapping us-central1 to us, europe-west1 to eu) when VEO_REGION is not explicitly set.
|
|
||
| 1. **Default model ID**: `veo-3.1-generate-001` (current) vs `veo-3.1-generate-preview` (docs) — to be resolved empirically during implementation. If both work, prefer current for backwards compatibility; if only one works, choose that. | ||
| 2. **Region detection**: rule #5 (`personGeneration` regional restriction) requires knowing the user's region for proactive auto-fix. Resolution: Foundation reads `VEO_REGION` env var (values: `us`, `eu`, `uk`, `ch`, `mena`, `other`). If set, auto-fix applies proactively in Phase 4 PRESENT. If unset, no auto-fix; the API error (if any) is surfaced verbatim in Phase 5 GENERATE. This eliminates the contradiction between proactive auto-fix and "delegate to API error" — both paths exist and the env var decides which. | ||
| 3. **Token counting** (rule #4): `chars / 3.5` is a Latin-script heuristic. For CJK languages (Chinese, Japanese, Korean) one character often maps to multiple tokens, so the heuristic significantly **underestimates** token count and risks letting prompts through that the API later rejects. For non-Latin scripts (Cyrillic, Arabic, Hebrew, Devanagari) the ratio also differs. Foundation mitigation: (a) rule #4 acts as a **soft warning** at >900 estimated tokens, **hard reject** only at >1024 estimated; (b) for prompts with significant CJK content (detected via Unicode range scan), the heuristic switches to `chars / 1.5` (rough average) and the warning surfaces a note that the estimate is approximate; (c) if Phase 5 returns an API-side token error, the error is surfaced verbatim and tagged for the maintainer to revisit the heuristic. Accurate token counting via an extra API call remains deferred until evidence justifies it. |
There was a problem hiding this comment.
The token counting heuristic for non-Latin scripts currently only specifies a multiplier for CJK content (chars / 1.5). Other non-Latin scripts mentioned (Cyrillic, Arabic, Hebrew, Devanagari) also have significantly different character-to-token ratios compared to Latin. The 'significant non-Latin content' detection should apply a safer, more conservative multiplier to all non-Latin Unicode ranges to prevent unexpected API rejections due to underestimation.
|
|
||
| ``` | ||
| skills/ | ||
| _shared/ # NEW — non-skill code (underscore prefix excludes from skill loader) |
| SKILL.md # minimal update (params pass-through) | ||
| ``` | ||
|
|
||
| The `_shared/` directory uses an underscore prefix so the Claude Code skill loader does not treat it as a skill (skill loader scans `*/SKILL.md`). Each consuming skill references it via relative imports plus a tsconfig path mapping (`@veo-core/*`) to keep imports readable and refactor-safe. |
| ├─> validateConfig(config) → may throw or auto-fix | ||
| ├─> getAccessToken() | ||
| ├─> submitGeneration(config, token) → operationName | ||
| ├─> pollOperation(operationName, token) → status | ||
| └─> downloadFile(status.videoUrl, outputPath, token) |
| | New `validateConfig()` rules with auto-fix | No | Auto-fix is additive; rejects only what API would reject anyway | | ||
| | Extended `VeoConfig` type | No | All new fields optional | | ||
|
|
||
| Plugin version in `.claude-plugin/plugin.json` bumps minor (0.x → 0.(x+1).0). New `CHANGELOG.md` documents each behavioral change. |
|
|
||
| ### Unit tests (free, deterministic) | ||
|
|
||
| Run on every PR; mock or pure functions only. |
Resolved 11 findings from gemini-code-assist and Copilot on commit 6de17b0. Aspect ratio 1:1 again declined with same citation. Architectural: - api.ts: explicit HTTP status validation in downloadFile; partial-file cleanup on error - generate.ts: outputPath now optional; downloadFile skipped when storageUri is set; GenerationResult carries either videoPath or gcsUri depending on path - VeoConfig.durationSeconds: type widened from literal 4|6|8 to number; Foundation enforces 4/6/8 via validation.ts so /veo-extend can later use larger values without modifying the type - _shared/ exclusion rationale corrected: loader scans for SKILL.md files (the operative mechanism); underscore prefix is defensive convention. Added note to verify loader behavior at implementation. - Imports: relative imports chosen for Foundation (no root tsconfig exists; @veo-core/* alias wouldn't resolve under npx ts-node). Root tsconfig deferred until a sub-project requires it. Refinements: - Onomatopoeia and unquoted dialogue downgraded from reject to warning for consistency with rest of Phase 3 softening - Region detection now two-tier: VEO_REGION env var first, then inference from GOOGLE_CLOUD_LOCATION (mapping table in spec) - Token counting heuristic expanded with per-script multipliers (Latin 3.5, CJK 1.5, Cyrillic 2.0, Arabic 2.0, Hebrew 2.0, Devanagari 1.8) detected via Unicode range scan Infrastructure: - Version bump: plugin.json 1.0.0 → 1.1.0 (semver minor, additive features with backwards-compat overrides). Corrects prior 0.x text. - Test runner + CI added as Foundation deliverable: vitest + workflow on PR. Without this, unit-test success criteria was aspirational. Migration plan step 1 is now infrastructure setup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Second-round bot review applied (commit
|
| Source | Finding | Resolution |
|---|---|---|
| gemini #1 | downloadFile missing HTTP status validation |
api.ts public API now requires explicit non-2xx/3xx → throw + partial-file cleanup |
| gemini #2 + Copilot | storageUri makes download conditional but data flow ignores it |
generate.ts: outputPath now optional; download skipped when storageUri set; GenerationResult.gcsUri vs .videoPath branch |
| gemini #4 | durationSeconds literal 4|6|8 blocks /veo-extend (148s) |
Type widened to number; Foundation enforces 4/6/8 via validation.ts, sub-projects can extend |
| Copilot #8 | _shared/ underscore convention unverified |
Rationale corrected: operative mechanism is absence of SKILL.md; underscore is defensive convention; note added to verify loader behavior at impl |
| Copilot #9 | TS path alias @veo-core/* won't work without root tsconfig |
Decision: relative imports for Foundation. Root tsconfig deferred until a sub-project genuinely needs it. |
Refinements (3)
| Source | Finding | Resolution |
|---|---|---|
| gemini #5 | Onomatopoeia hard-reject inconsistent with Phase 3 softening | Downgraded to warning with Foley-style suggestion. Dialogue without quotes also downgraded to warning. |
| gemini #6 | Region detection should fall back to GOOGLE_CLOUD_LOCATION |
Two-tier: VEO_REGION first, then mapping from GOOGLE_CLOUD_LOCATION (us-, europe-, me-*, etc.) |
| gemini #7 | Token counting only handles CJK, not Cyrillic/Arabic/Hebrew/Devanagari | Per-script multiplier table: Latin 3.5, CJK 1.5, Cyrillic/Arabic/Hebrew 2.0, Devanagari 1.8. Dominant range >30% determines multiplier. |
Infrastructure (2)
| Source | Finding | Resolution |
|---|---|---|
| Copilot #11 | Version bump policy says "0.x → 0.(x+1).0" but plugin.json is 1.0.0 |
Corrected: bump 1.0.0 → 1.1.0 (semver minor; behavioral changes have override flags) |
| Copilot #12 | "Unit tests run on every PR" but no CI/test runner exists | Added as Foundation deliverable: vitest + package.json at root + .github/workflows/test.yml. Migration plan step #1 is now CI setup; without it, unit-test success criteria was aspirational. |
Declined (1)
| Source | Finding | Reason |
|---|---|---|
| gemini #3 | Add 1:1 to aspectRatio (third occurrence) |
Same as before — contradicts all three official Google sources. Inline reply citation. |
Diff summary
bc82e80: 72 insertions, 30 deletions. Spec now has explicit HTTP error semantics, conditional download path, forward-compatible duration typing, Latin/non-Latin token heuristics, CI infrastructure as deliverable, and corrected versioning.
Ready for the next review round.
|
/gemini review |
| ``` | ||
| CLI (veo-generate.ts) | ||
| └─> parseArgs → VeoConfig | ||
| └─> generateVideo(config, outputPath?) | ||
| ├─> validateConfig(config) → may throw or auto-fix | ||
| ├─> getAccessToken() |
| | `resolution=1080p/4k` + `duration<8` (user didn't explicitly set duration) | Force `duration=8` | "Bumped duration to 8s to enable 1080p/4K" | | ||
| | Region=EU + `personGeneration=allow_all` | Force `allow_adult` | "Region restriction: personGeneration set to allow_adult" | | ||
| | `model=veo-2.*` + `audio=on` | Force `audio=off` | "Veo 2 doesn't support audio, disabled" | | ||
|
|
||
| Auto-fixes apply only when the corrected value is unambiguous. Ambiguous combinations (e.g., user explicitly set duration=6 AND resolution=1080p) fail with error instead. |
|
|
||
| - **Single owner per release**: `pricing.ts` and `constants.ts` are owned by Foundation. Any sub-project that needs to add a model, cost vector, or constant lands a Foundation-touching change in its own PR, not an inline duplicate in its skill folder. | ||
| - **Dated header is the audit trail**: every PR that touches `pricing.ts` updates the `// Last updated: YYYY-MM-DD` comment and the `// Source:` URL if applicable. | ||
| - **No skill-local pricing**: skills must `import { estimateCost } from '@veo-core/pricing'`. Hardcoded `~$0.50` strings in SKILL.md examples are allowed (they're guidance), but generated cost estimates must come from `estimateCost()`. |
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive design specification for the 'Veo Foundation' sub-project, which aims to refactor the veo-tools plugin by centralizing core logic into a shared library, expanding support for new API parameters and models, and establishing a robust validation and testing framework. The reviewer feedback focuses on improving project maintainability through TypeScript path aliases, ensuring the API submission logic is sufficiently dynamic to handle future modalities, and refining validation rules—specifically by adding explicit checks for supported video durations and softening the token count heuristic to avoid false positives.
|
|
||
| The `_shared/` directory is excluded from skill discovery because it contains **no `SKILL.md`** — the Claude Code skill loader scans for `SKILL.md` files, not directory naming. The underscore prefix is a defensive convention (makes the intent visually obvious and avoids ambiguity with future loader rules), but the operative exclusion mechanism is the absence of `SKILL.md`. *To verify before implementation: confirm the current loader behavior in Claude Code; if loader rules ever change to scan by directory name, revisit.* | ||
|
|
||
| Consuming skills reference `_shared/` via **relative imports** (e.g., `import { generateVideo } from '../../_shared/veo-core/generate'`). The original draft proposed a tsconfig path alias `@veo-core/*`, but the repository has no root `tsconfig.json` and the scripts run via `npx ts-node` without `--project`, so path aliases wouldn't resolve reliably. Relative imports are chosen for Foundation. If later sub-projects need cleaner imports, the team can introduce a root `tsconfig.json` + `TS_NODE_PROJECT` env var as a follow-up. |
There was a problem hiding this comment.
Since the migration plan (step 1) already includes adding a root package.json, it is highly recommended to also include a root tsconfig.json and use path aliases (e.g., @veo-core/*) instead of relative imports. This avoids brittle paths like ../../_shared/veo-core/ and improves maintainability as the project structure grows with the four planned sub-projects.
| Anti-patterns in Phase 3 VALIDATE: | ||
| - Audio on but no audio descriptors in prompt → **warning** | ||
| - Comic-book onomatopoeia (`BAM!`, `WHOOSH!`) → **warning** with suggestion to swap for Foley-style description (e.g., `metallic clang, dust settling`). Not rejected, since Veo 3+ may interpret onomatopoeia as legitimate SFX cues; we surface a softer recommendation instead. | ||
| - Dialogue without quotes → **warning** (downgraded from reject to align with the rest of Phase 3 softening; users may want stylized speech) | ||
|
|
||
| New file `skills/veo/references/audio-lexicon.md` contains: | ||
| - Professional SFX vocabulary (Foley terms) | ||
| - Dialogue patterns that fit 8s (~25 words max) | ||
| - Ambient patterns per mood (industrial, organic, cinematic) | ||
| - Known limit: voice does not extend if absent in last 1s (relevant to future `/veo-extend`) | ||
|
|
||
| ### 3. Cross-parameter validation rules | ||
|
|
There was a problem hiding this comment.
The submitGeneration implementation example is missing several cross-cutting parameters that this Foundation project is intended to introduce, such as negativePrompt, enhancePrompt, personGeneration, and storageUri. Additionally, the instances array is hardcoded for text-to-video, which contradicts the goal of unblocking downstream sub-projects like /veo-animate (image-to-video). The design should ensure submitGeneration dynamically constructs the request body based on all Foundation-owned parameters and input modalities.
|
|
||
| | # | Rule | Error if violated | Owned by | | ||
| |---|---|---|---| | ||
| | 1 | `resolution ∈ {1080p, 4k}` ⇒ `durationSeconds == 8` | "1080p/4K require duration=8" | Foundation | |
There was a problem hiding this comment.
While Rule 1 covers the duration requirement for 1080p/4K, the validation table should also include a general rule enforcing that durationSeconds must be one of the API-supported values {4, 6, 8} for all models (as mentioned in the VeoConfig interface notes on line 182). This ensures consistency across all generation types and prevents invalid API calls.
| | 1 | `resolution ∈ {1080p, 4k}` ⇒ `durationSeconds == 8` | "1080p/4K require duration=8" | Foundation | | ||
| | 2 | `model ∈ veo-2.*` ⇒ `generateAudio == false` | "Veo 2 doesn't support audio" | Foundation | | ||
| | 3 | `model ∈ veo-2.*` ⇒ `resolution == 720p` | "Veo 2 max resolution is 720p" | Foundation | | ||
| | 4 | `prompt.tokens > 1024` (approx: chars / 3.5 for Latin-script; see note below) | **Warning** when estimated tokens > 900 (soft); **Reject** only when estimated > 1024 (hard ceiling) — see token counting note | Foundation | |
There was a problem hiding this comment.
Rule 4 specifies a "Hard reject" when the estimated token count exceeds 1024. Given that the estimation is based on a character-count heuristic (chars / 3.5), it is prone to false positives. It would be safer to treat this as a "Warning" and allow the request to proceed to the API, which provides the definitive token validation. A hard rejection should only be used if an accurate tokenizer is implemented to avoid blocking valid user prompts.
Resolved 6 findings from Copilot (3) and gemini-code-assist (3). Reversed prior decision on imports per user direction (option B): switching from relative imports to @veo-core/* path aliases now that root infrastructure (package.json, vitest, CI) is already in scope. Contract clarifications: - validateConfig() now explicitly never-throws; returns discriminated union. autoFixed and autoFixMessages are first-class fields. generateVideo() is the only function that throws on invalid config. Test code can introspect ValidationResult without exception handling. - Option provenance: VeoConfig fields are undefined when user did NOT pass the flag. Defaults applied inside validateConfig, not in CLI parser. Auto-fix triggers only on undefined fields; explicit conflicting values produce hard errors with remedial suggestions. Imports / infrastructure (decision reversal): - Switched from relative imports to @veo-core/* path alias - Added root tsconfig.json + vitest.config.ts to Foundation infrastructure setup (migration step 1) - TS_NODE_PROJECT exported by each script that invokes ts-node - Rationale: marginal cost over already-planned root infrastructure; scales to 4 downstream sub-projects without import path duplication API request construction: - Added "API request construction in submitGeneration" subsection showing dynamic body building from VeoConfig fields - Forward-declared modalities (image, lastFrame, referenceImages, videoExtensionInput) flow through to the API even before their owning sub-projects ship — Foundation does not client-gate; the API rejects calls with its own error if a model doesn't support a field Validation rules: - Added explicit rule #1: durationSeconds ∈ {4, 6, 8} (was implicit in VeoConfig note; bot caught the gap) - Renumbered remaining Foundation rules 2-8 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Third-round bot review applied (commit
|
| Source | Finding | Resolution |
|---|---|---|
| Copilot #1 | validateConfig contract ambiguous (return type says no throw, data flow says "may throw") |
Never-throws contract: discriminated union return. autoFixed and autoFixMessages are first-class. generateVideo() is the only function that throws on invalid config. Test code can call validateConfig and inspect results without try/catch. |
| Copilot #2 | Auto-fix can't distinguish user-set vs default values | Option provenance: VeoConfig fields are undefined when user didn't pass the flag. Defaults applied inside validateConfig, not in CLI parser. Auto-fix triggers only on undefined fields; explicit conflicts (e.g., --duration 6 --resolution 1080p) produce hard errors with suggestions. |
Imports decision reversal (Copilot #3 + gemini #4)
Previous decision: relative imports for Foundation, deferred root tsconfig.
This round: switched to @veo-core/* path aliases per user direction.
Rationale: commit bc82e80 already added root package.json, vitest, and CI workflow to Foundation. The marginal cost of also adding root tsconfig.json + vitest.config.ts is small, and the 4 downstream sub-projects will repeatedly import from _shared/ — relative paths like ../../_shared/veo-core/pricing repeated everywhere are brittle.
Migration plan step 1 now includes: root tsconfig.json with @veo-core/* → skills/_shared/veo-core/* alias, vitest.config.ts mirroring it, TS_NODE_PROJECT exported by each script. Pricing protocol's @veo-core/pricing import is now consistent with architecture (it was the contradiction that surfaced this).
API request construction (gemini #5)
Added "API request construction in submitGeneration" subsection showing the function builds the Vertex AI request body dynamically from VeoConfig fields. Forward-declared modalities (image, lastFrame, referenceImages, videoExtensionInput) flow through to the API even before their owning sub-projects ship — Foundation does not client-gate them; the API rejects with its own error if a model doesn't support a field. This means a power user can already exercise image-to-video against Foundation's library before /veo-animate ships (just without SKILL.md workflow guidance).
Missing validation rule (gemini #6)
Added explicit rule #1: durationSeconds ∈ {4, 6, 8} as a Foundation-owned rule. Previously implicit in VeoConfig notes. Foundation rules renumbered 1-8.
Diff summary
0ac09e6: 73 insertions, 21 deletions. Spec is now consistent on import strategy, has unambiguous validation contracts, makes option provenance explicit (mirrored in both data flow and auto-fix tables), and shows how submitGeneration builds requests dynamically.
Ready for next review round.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request proposes a design specification for the 'Veo Foundation' sub-project, aimed at centralizing core logic for the veo-tools plugin into a shared library. The plan includes refactoring existing skills, expanding support for new models and parameters, and introducing a robust validation and testing framework. Review feedback suggests several technical improvements: adopting tsconfig-paths for more maintainable alias resolution, switching to the official google-auth-library for authentication, ensuring validation rules are model-specific (particularly for Veo 2), and utilizing the countTokens API to avoid false positives in prompt length enforcement.
|
|
||
| The `_shared/` directory is excluded from skill discovery because it contains **no `SKILL.md`** — the Claude Code skill loader scans for `SKILL.md` files, not directory naming. The underscore prefix is a defensive convention (makes the intent visually obvious and avoids ambiguity with future loader rules), but the operative exclusion mechanism is the absence of `SKILL.md`. *To verify before implementation: confirm the current loader behavior in Claude Code; if loader rules ever change to scan by directory name, revisit.* | ||
|
|
||
| Consuming skills reference `_shared/` via the **`@veo-core/*` path alias**, configured in a root `tsconfig.json` added as part of Foundation's infrastructure setup (alongside the root `package.json` and CI workflow — see migration plan step 1). The alias resolves under `ts-node` via the `TS_NODE_PROJECT` env var (exported by each skill script that uses it) and under `vitest` via `vitest.config.ts` referencing the same `tsconfig.json`. Example: `import { generateVideo } from '@veo-core/generate'`. |
There was a problem hiding this comment.
Using TS_NODE_PROJECT to resolve path aliases is brittle as it requires manual environment variable management for every script execution. A more maintainable approach is to use the tsconfig-paths package. By registering it (e.g., via NODE_OPTIONS="-r tsconfig-paths/register"), aliases defined in the root tsconfig.json will be resolved automatically across all sub-projects without requiring explicit project path exports in every file.
|
|
||
| | Module | Public API | Internal | | ||
| |---|---|---| | ||
| | `auth.ts` | `getAccessToken(): string` | gcloud CLI invocation, error normalization | |
There was a problem hiding this comment.
Relying on gcloud CLI invocations for authentication is less robust than using the official google-auth-library. The latter handles various credential types (Service Accounts, ADC, Workload Identity) natively and removes the dependency on an external CLI being installed and configured in the environment. This is particularly important for ensuring the shared library works consistently across different execution contexts, such as CI/CD runners or containerized environments.
|
|
||
| | # | Rule | Error if violated | Owned by | | ||
| |---|---|---|---| | ||
| | 1 | `durationSeconds ∈ {4, 6, 8}` for all standard models | "durationSeconds must be 4, 6, or 8" | Foundation | |
There was a problem hiding this comment.
Rule 1 enforces a duration of {4, 6, 8} seconds for "all standard models," but the spec also includes support for veo-2.0-generate-001 (Line 172). Veo 2 often has different duration constraints (e.g., 5 or 10 seconds). This rule should be made model-aware to prevent the validation logic from incorrectly rejecting valid configurations for non-Veo 3.1 models.
| | 2 | `resolution ∈ {1080p, 4k}` ⇒ `durationSeconds == 8` | "1080p/4K require duration=8" | Foundation | | ||
| | 3 | `model ∈ veo-2.*` ⇒ `generateAudio == false` | "Veo 2 doesn't support audio" | Foundation | | ||
| | 4 | `model ∈ veo-2.*` ⇒ `resolution == 720p` | "Veo 2 max resolution is 720p" | Foundation | | ||
| | 5 | `prompt.tokens > 1024` (approx: chars / 3.5 for Latin-script; see note below) | **Warning** when estimated tokens > 900 (soft); **Reject** only when estimated > 1024 (hard ceiling) — see token counting note | Foundation | |
There was a problem hiding this comment.
Implementing a "Hard Reject" based on a character-to-token heuristic is likely to result in false positives, blocking valid prompts that the API would otherwise accept. Since Vertex AI provides a countTokens API, it should be used for accurate validation. If an extra API call is undesirable, the "Hard Reject" should be downgraded to a "Warning," allowing the actual API call to proceed and letting the backend return a definitive error if the limit is truly exceeded.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces the "Veo Foundation" design spec, which centralizes core functionality for the veo-tools plugin into a shared library. Key changes include support for expanded model variants, 4K resolution, and advanced parameters like negativePrompt and storageUri, alongside a new testing infrastructure and an audio context-aware default system. Feedback focuses on enhancing security by precisely defining "origin" for header stripping during redirects, improving reliability with a total timeout for file downloads, and ensuring the documentation explicitly lists the change in default video duration as a breaking change.
| | Module | Public API | Internal | | ||
| |---|---|---| | ||
| | `auth.ts` | `getAccessToken(): Promise<string>` | **Uses `google-auth-library`** npm package (not shelled-out `gcloud` CLI). Supports Service Accounts, ADC, and Workload Identity natively. Removes dependency on gcloud CLI being installed in the execution context — important for CI/CD and containerized usage. The previous `gcloud auth print-access-token` call in `veo-generate.ts` is replaced. | | ||
| | `api.ts` | `submitGeneration(config, token): Promise<operationName>`, `pollOperation(opName, token): Promise<status>`, `downloadFile(target, path, token): Promise<void>` (where `target` is either an `https://…` URL or a `gs://…` URI) | URL building, request handling. `makeRequest` (internal) handles HTTPS API calls — does **not** follow redirects (the predict/poll endpoints are not expected to return 3xx) — and uses a **30-second total-request timeout** appropriate for short metadata calls. `downloadFile` accepts both `https://` (downloads via HTTPS, follows redirects manually with a **max-depth limit of 5**, validates non-2xx/3xx → throws with status + body, cleans up partial files on error; uses a **socket idle timeout of 30s** rather than total-request, since large video downloads — especially 4K 8s clips — can legitimately exceed 30s of wall-clock; **on each redirect, if the target origin differs from the current request's origin, the `Authorization` header is stripped before following** — this prevents Vertex bearer tokens leaking to e.g. signed GCS URLs on `storage.googleapis.com`, which already carry their own auth and would also reject a bearer token) and `gs://` (downloads via `@google-cloud/storage` client — that library handles its own retries and timeouts). The Vertex AI Veo API can return either form in `status.videoUrl` depending on project configuration. | |
There was a problem hiding this comment.
| | Module | Public API | Internal | | ||
| |---|---|---| | ||
| | `auth.ts` | `getAccessToken(): Promise<string>` | **Uses `google-auth-library`** npm package (not shelled-out `gcloud` CLI). Supports Service Accounts, ADC, and Workload Identity natively. Removes dependency on gcloud CLI being installed in the execution context — important for CI/CD and containerized usage. The previous `gcloud auth print-access-token` call in `veo-generate.ts` is replaced. | | ||
| | `api.ts` | `submitGeneration(config, token): Promise<operationName>`, `pollOperation(opName, token): Promise<status>`, `downloadFile(target, path, token): Promise<void>` (where `target` is either an `https://…` URL or a `gs://…` URI) | URL building, request handling. `makeRequest` (internal) handles HTTPS API calls — does **not** follow redirects (the predict/poll endpoints are not expected to return 3xx) — and uses a **30-second total-request timeout** appropriate for short metadata calls. `downloadFile` accepts both `https://` (downloads via HTTPS, follows redirects manually with a **max-depth limit of 5**, validates non-2xx/3xx → throws with status + body, cleans up partial files on error; uses a **socket idle timeout of 30s** rather than total-request, since large video downloads — especially 4K 8s clips — can legitimately exceed 30s of wall-clock; **on each redirect, if the target origin differs from the current request's origin, the `Authorization` header is stripped before following** — this prevents Vertex bearer tokens leaking to e.g. signed GCS URLs on `storage.googleapis.com`, which already carry their own auth and would also reject a bearer token) and `gs://` (downloads via `@google-cloud/storage` client — that library handles its own retries and timeouts). The Vertex AI Veo API can return either form in `status.videoUrl` depending on project configuration. | |
There was a problem hiding this comment.
While a 30-second socket idle timeout is appropriate for large video downloads, the design should also specify a maximum total duration (e.g., 15 minutes) for the downloadFile operation. This prevents the process from hanging indefinitely in environments like CI/CD if a connection remains active but extremely slow.
| | Default model `veo-3.1-generate-001` → `veo-3.1-generate-preview` (and `veo-3.0-generate-001` fallback) | **Yes, behavioral** | Documented in CHANGELOG; the legacy ID `veo-3.1-generate-001` is dropped from the default chain because it doesn't appear in current docs. Users who hardcoded the legacy ID in their own scripts can keep doing so — the spec only changes the *default*. Aligned with the "Resolved decisions" section. | | ||
| | New `validateConfig()` rules with auto-fix | No | Auto-fix is additive; rejects only what API would reject anyway | | ||
| | Extended `VeoConfig` type — most new fields are truly optional | No | — | | ||
| | `VeoConfig.outputPath` is **additive-required** when `storageUri` is unset | **Yes, behavioral** | Rule #9 hard-errors if neither `outputPath` nor `storageUri` is set. Callers using the previous `generateVideo(config, outputPath)` signature must now pass `outputPath` (or `storageUri`) **inside** the config object. Migration is mechanical and the error message is explicit, but it's not source-compatible with the previous shape. | |
There was a problem hiding this comment.
|
@claude review |
|
Claude finished @GiuseppeIuculano's task in 3m 19s —— View job Tasks
The spec is in very good shape after 19 rounds. Architecture decisions are all sound. Three findings remain — one is a concrete backwards-compatibility gap verified against the current codebase, two are precision nits from the final gemini round (17:47 UTC) that weren't captured in the 19th-round summary. High — Unresolved from final gemini review (17:47 UTC)1. Verified against the current codebase: The audio-default change has its own row in the backwards-compatibility table (line 727). This change needs one too. The 19th-round summary resolved the separate "document what the default is" finding from my prior round 3 review, but the gemini reviewer's distinct finding — "this default change should be in the backwards-compat table" — wasn't addressed. Suggested table row: Medium — Two precision gaps from final gemini review2. Line 186: "uses a socket idle timeout of 30s rather than total-request." A 30s socket-idle timeout correctly handles large downloads but doesn't protect against an extremely slow connection that keeps sending bytes (never idle). The gemini review at 17:47 UTC explicitly asked for "a maximum total duration (e.g., 15 minutes) for the 3. "Origin" is undefined in the Authorization-stripping security note Line 186: "if the target origin differs from the current request's origin, the Low — Previously flagged, still unaddressed4. Two choices in the
A maintainer reviewing I flagged this in my prior review (round 3); it wasn't addressed. Confirming three previous verdicts
Overall verdict: After 19 rounds this spec is implementation-ready. Finding #1 is the only item that should be fixed before coding starts, as it directly affects the CHANGELOG content and user communication. Findings #2 and #3 are implementation-precision items cheap to close now. Finding #4 is a documentation nit that, if left unaddressed, becomes a maintenance liability once sub-projects start questioning the model-routing logic. |
…GGESTIONS + precision fixes Resolved 4 review findings + corrected a misinterpretation from round 17. Critical correction: - The 19th-round fix for durationSeconds default incorrectly set the spec default to 4s. The current veo-generate.ts line 84 and README documentation both pin the CLI/library default at 8s. The 4s in the manual integration test #1 was a SKILL.md Phase 1 recommendation for hero-background/loop use cases (smoother loops), not a global default. - Restored CLI/library default to 8s (matches existing behavior; no breaking change). Added DURATION_SUGGESTIONS constant (analogous to AUDIO_DEFAULTS) for use-case-aware Phase 1 recommendations: hero-background/loop/ambient → 4s (smoother loops) social/marketing/product/storytelling → 8s (max content) - Manual checklist expanded from 8 to 9 items: added a "default bare CLI invocation" regression to verify no use-case override has leaked into the library default chain. Success criteria updated to 9/9. Precision fixes (gemini + claude review): - downloadFile: added max total duration cap of 15 minutes alongside the 30s socket-idle timeout. Belt-and-suspenders against near-zero-throughput connections that never trigger idle but also never finish. - "Origin" defined inline per RFC 6454: scheme + host + port. Prevents an implementer from comparing only hostnames and missing HTTPS→HTTP cross-scheme redirects. - MODEL_SUGGESTIONS now has asymmetry rationale inline: social uses preview (latest features for short attention spans), marketing/product/storytelling use stable (production reliability), hero/ambient/loop include lite (cost optimization for high-volume landing-page usage; not relevant for pure-text social ads). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Twentieth-round review applied (commit
|
| # | Finding | Resolution |
|---|---|---|
| 1 | durationSeconds default change 8→4 — actually a misinterpretation, not a real breaking change |
Restored default to 8s (matches existing behavior; no breaking change). Added DURATION_SUGGESTIONS constant analogous to AUDIO_DEFAULTS: hero/loop/ambient → 4s, social/marketing/product/storytelling → 8s. Phase 1 SKILL.md applies these as use-case-aware overrides; user's explicit --duration always wins. Manual checklist expanded from 8 to 9 items (added bare-CLI regression test) |
Precision fixes (3)
| # | Finding | Resolution |
|---|---|---|
| 2 | downloadFile missing max total duration — slow connections at near-zero throughput hang indefinitely |
Added 15-minute max total duration alongside the 30s socket-idle timeout. Belt-and-suspenders |
| 3 | "Origin" undefined in Authorization-strip note | Inline definition added: scheme + host + port per RFC 6454. HTTPS→HTTP redirects on same host are cross-origin |
| 4 | MODEL_SUGGESTIONS asymmetry had no documented rationale |
Inline comments explaining: social uses preview (latest features for short attention), marketing/product/storytelling use stable (production reliability), hero/ambient/loop include lite (cost optimization for landing-page volume; not relevant for pure-text social) |
Diff summary
931397f: 36 insertions, 12 deletions. The durationSeconds correction is the most consequential — confirms no breaking change to that default and adds the proper SKILL.md mechanism for use-case-aware overrides.
|
/gemini review |
|
@claude review |
|
Claude finished @GiuseppeIuculano's task in 2m 50s —— View job Tasks
The spec is in excellent shape after 20 rounds. All previously flagged issues (never-throws contract, REGIONS shape, GenerationResult type, MODEL_SAMPLE_MAX exports, audio reason-string provenance, durationSeconds default correction, DURATION_SUGGESTIONS, downloadFile max total duration, Medium —
|
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive design specification for the Veo Foundation project, which establishes a shared library and infrastructure for Google Veo video generation. The proposal includes a refactored architecture with centralized authentication, API handling, and validation, alongside support for new parameters, models, and audio context-aware defaults. Review feedback focuses on hardening the proposed downloadFile implementation, specifically by recommending the prevention of HTTPS-to-HTTP downgrade redirects, capping error response body sizes to avoid memory issues, and utilizing temporary files for atomic downloads. Additionally, it was suggested to increase the maximum redirect depth to better support complex CDN chains.
| | Module | Public API | Internal | | ||
| |---|---|---| | ||
| | `auth.ts` | `getAccessToken(): Promise<string>` | **Uses `google-auth-library`** npm package (not shelled-out `gcloud` CLI). Supports Service Accounts, ADC, and Workload Identity natively. Removes dependency on gcloud CLI being installed in the execution context — important for CI/CD and containerized usage. The previous `gcloud auth print-access-token` call in `veo-generate.ts` is replaced. | | ||
| | `api.ts` | `submitGeneration(config, token): Promise<operationName>`, `pollOperation(opName, token): Promise<status>`, `downloadFile(target, path, token): Promise<void>` (where `target` is either an `https://…` URL or a `gs://…` URI) | URL building, request handling. `makeRequest` (internal) handles HTTPS API calls — does **not** follow redirects (the predict/poll endpoints are not expected to return 3xx) — and uses a **30-second total-request timeout** appropriate for short metadata calls. `downloadFile` accepts both `https://` (downloads via HTTPS, follows redirects manually with a **max-depth limit of 5**, validates non-2xx/3xx → throws with status + body, cleans up partial files on error; uses a **socket idle timeout of 30s** plus a **max total duration of 15 minutes** as a belt-and-suspenders guard against connections that keep sending bytes at near-zero throughput — neither limit alone protects against the other; **on each redirect, if the target origin differs from the current request's origin, the `Authorization` header is stripped before following** — where *origin* is `scheme + host + port` per RFC 6454, so HTTPS→HTTP redirects on the same host are cross-origin even though hostname matches — this prevents Vertex bearer tokens leaking to e.g. signed GCS URLs on `storage.googleapis.com`, which already carry their own auth and would also reject a bearer token) and `gs://` (downloads via `@google-cloud/storage` client — that library handles its own retries and timeouts). The Vertex AI Veo API can return either form in `status.videoUrl` depending on project configuration. | |
There was a problem hiding this comment.
The proposed downloadFile implementation for https:// targets should explicitly disallow redirects from HTTPS to HTTP (downgrade redirects). Even if the Authorization header is stripped, the content itself (the video) is sensitive and should not be transmitted over an unencrypted channel. Most modern HTTP clients allow configuring this security policy.
| | Module | Public API | Internal | | ||
| |---|---|---| | ||
| | `auth.ts` | `getAccessToken(): Promise<string>` | **Uses `google-auth-library`** npm package (not shelled-out `gcloud` CLI). Supports Service Accounts, ADC, and Workload Identity natively. Removes dependency on gcloud CLI being installed in the execution context — important for CI/CD and containerized usage. The previous `gcloud auth print-access-token` call in `veo-generate.ts` is replaced. | | ||
| | `api.ts` | `submitGeneration(config, token): Promise<operationName>`, `pollOperation(opName, token): Promise<status>`, `downloadFile(target, path, token): Promise<void>` (where `target` is either an `https://…` URL or a `gs://…` URI) | URL building, request handling. `makeRequest` (internal) handles HTTPS API calls — does **not** follow redirects (the predict/poll endpoints are not expected to return 3xx) — and uses a **30-second total-request timeout** appropriate for short metadata calls. `downloadFile` accepts both `https://` (downloads via HTTPS, follows redirects manually with a **max-depth limit of 5**, validates non-2xx/3xx → throws with status + body, cleans up partial files on error; uses a **socket idle timeout of 30s** plus a **max total duration of 15 minutes** as a belt-and-suspenders guard against connections that keep sending bytes at near-zero throughput — neither limit alone protects against the other; **on each redirect, if the target origin differs from the current request's origin, the `Authorization` header is stripped before following** — where *origin* is `scheme + host + port` per RFC 6454, so HTTPS→HTTP redirects on the same host are cross-origin even though hostname matches — this prevents Vertex bearer tokens leaking to e.g. signed GCS URLs on `storage.googleapis.com`, which already carry their own auth and would also reject a bearer token) and `gs://` (downloads via `@google-cloud/storage` client — that library handles its own retries and timeouts). The Vertex AI Veo API can return either form in `status.videoUrl` depending on project configuration. | |
There was a problem hiding this comment.
When downloadFile throws an error for non-2xx/3xx responses, it is proposed to include the response body in the error. To prevent potential Denial of Service (DoS) via Out-of-Memory (OOM) if a malicious or misconfigured server returns an extremely large error body, the amount of data read from the response stream for error reporting should be strictly capped (e.g., to the first 1KB).
| | Module | Public API | Internal | | ||
| |---|---|---| | ||
| | `auth.ts` | `getAccessToken(): Promise<string>` | **Uses `google-auth-library`** npm package (not shelled-out `gcloud` CLI). Supports Service Accounts, ADC, and Workload Identity natively. Removes dependency on gcloud CLI being installed in the execution context — important for CI/CD and containerized usage. The previous `gcloud auth print-access-token` call in `veo-generate.ts` is replaced. | | ||
| | `api.ts` | `submitGeneration(config, token): Promise<operationName>`, `pollOperation(opName, token): Promise<status>`, `downloadFile(target, path, token): Promise<void>` (where `target` is either an `https://…` URL or a `gs://…` URI) | URL building, request handling. `makeRequest` (internal) handles HTTPS API calls — does **not** follow redirects (the predict/poll endpoints are not expected to return 3xx) — and uses a **30-second total-request timeout** appropriate for short metadata calls. `downloadFile` accepts both `https://` (downloads via HTTPS, follows redirects manually with a **max-depth limit of 5**, validates non-2xx/3xx → throws with status + body, cleans up partial files on error; uses a **socket idle timeout of 30s** plus a **max total duration of 15 minutes** as a belt-and-suspenders guard against connections that keep sending bytes at near-zero throughput — neither limit alone protects against the other; **on each redirect, if the target origin differs from the current request's origin, the `Authorization` header is stripped before following** — where *origin* is `scheme + host + port` per RFC 6454, so HTTPS→HTTP redirects on the same host are cross-origin even though hostname matches — this prevents Vertex bearer tokens leaking to e.g. signed GCS URLs on `storage.googleapis.com`, which already carry their own auth and would also reject a bearer token) and `gs://` (downloads via `@google-cloud/storage` client — that library handles its own retries and timeouts). The Vertex AI Veo API can return either form in `status.videoUrl` depending on project configuration. | |
There was a problem hiding this comment.
The downloadFile logic mentions cleaning up partial files on error using fs.unlink. However, if the process is killed or crashes, a corrupt partial file will remain at the outputPath. A more robust approach is to download to a temporary file (e.g., ${outputPath}.tmp) and only rename it to the final outputPath upon successful completion and validation of the download.
| | Module | Public API | Internal | | ||
| |---|---|---| | ||
| | `auth.ts` | `getAccessToken(): Promise<string>` | **Uses `google-auth-library`** npm package (not shelled-out `gcloud` CLI). Supports Service Accounts, ADC, and Workload Identity natively. Removes dependency on gcloud CLI being installed in the execution context — important for CI/CD and containerized usage. The previous `gcloud auth print-access-token` call in `veo-generate.ts` is replaced. | | ||
| | `api.ts` | `submitGeneration(config, token): Promise<operationName>`, `pollOperation(opName, token): Promise<status>`, `downloadFile(target, path, token): Promise<void>` (where `target` is either an `https://…` URL or a `gs://…` URI) | URL building, request handling. `makeRequest` (internal) handles HTTPS API calls — does **not** follow redirects (the predict/poll endpoints are not expected to return 3xx) — and uses a **30-second total-request timeout** appropriate for short metadata calls. `downloadFile` accepts both `https://` (downloads via HTTPS, follows redirects manually with a **max-depth limit of 5**, validates non-2xx/3xx → throws with status + body, cleans up partial files on error; uses a **socket idle timeout of 30s** plus a **max total duration of 15 minutes** as a belt-and-suspenders guard against connections that keep sending bytes at near-zero throughput — neither limit alone protects against the other; **on each redirect, if the target origin differs from the current request's origin, the `Authorization` header is stripped before following** — where *origin* is `scheme + host + port` per RFC 6454, so HTTPS→HTTP redirects on the same host are cross-origin even though hostname matches — this prevents Vertex bearer tokens leaking to e.g. signed GCS URLs on `storage.googleapis.com`, which already carry their own auth and would also reject a bearer token) and `gs://` (downloads via `@google-cloud/storage` client — that library handles its own retries and timeouts). The Vertex AI Veo API can return either form in `status.videoUrl` depending on project configuration. | |
There was a problem hiding this comment.
The proposed max-depth limit of 5 for redirects is somewhat low. While sufficient for most cases, complex CDN chains or authentication flows (like those involving GCS signed URLs) can sometimes exceed this. Increasing the limit to 10 would align better with common defaults in libraries like axios or follow-redirects and reduce the risk of spurious failures.
rule #10 forward-declared warning
Resolved 7 findings: 4 gemini (1 security-high + 3 medium) + 3 claude
bot (1 medium + 2 low).
Security (High + Medium):
- downloadFile rejects HTTPS→HTTP redirects outright. Even with the
Authorization header stripped, the video payload itself is
sensitive and must not transit cleartext. Defense in depth.
- Error response body capped at 1 KB before being included in
thrown Error. Prevents OOM DoS from a malicious or misconfigured
server returning a huge error payload.
Reliability (Medium):
- Atomic write pattern: download streams to ${outputPath}.tmp and
renames to outputPath only on full success. A crashed or killed
process leaves a stranded .tmp instead of a corrupt final file
that downstream code might mistake for valid output.
- Redirect max-depth raised from 5 to 10, aligning with
follow-redirects and axios defaults. CDN chains and GCS
signed-URL auth flows can legitimately exceed 5 hops.
Type completeness (Medium):
- MODEL_DURATIONS explicit shape: ReadonlyMap<string, ReadonlySet<
number>>. Veo 3.x maps to {4,6,8}; Veo 2 to {5,6,7,8} (PROVISIONAL
per Open Question #4).
- MODEL_SAMPLE_MAX explicit shape: Readonly<Record<string, number>>.
Defensive initial values from current docs; refined per Open
Question #3 during the probe pass.
- Both maps return undefined for unknown model keys; rules #1 and #7
skip with soft warnings as documented.
Coherence (Low):
- MODEL_SUGGESTIONS rationale comment corrected. The previous
comment said "lite is image-to-video only (not text-to-video)" to
explain why social uses no lite suggestion — but the model table
in §1 explicitly says Lite supports text-to-video. The actual
rationale is the quality argument already in the comment;
removed the factual error and reframed as "lite is omitted on
quality grounds, not capability."
Architecture (Low):
- Rule #10 added: forward-declared VeoConfig fields without
Foundation-level semantics (currently videoExtensionInput; future
additions inherit automatically) trigger a warning when set, then
are dropped from the request body so the API receives a clean
call. Previously the spec asserted "validateConfig emits a
warning" without grounding the warning in a rule — now it's
rule #10 in the table. The buildRequestBody pseudocode comment
is updated to reference the rule.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Twenty-first-round review applied (commit
|
| # | Severity | Finding | Resolution |
|---|---|---|---|
| 1 | High | downloadFile could follow HTTPS→HTTP redirects even with Authorization stripped — video payload in cleartext |
Rejected outright. Defense in depth: even without credentials, sensitive content must not transit cleartext |
| 2 | Medium | Error body included verbatim in thrown Error → OOM DoS if server returns huge payload | Capped at 1 KB before inclusion |
Reliability (2)
| # | Severity | Finding | Resolution |
|---|---|---|---|
| 3 | Medium | fs.unlink cleanup leaves corrupt partial file if process killed mid-download |
Atomic write: streams to ${outputPath}.tmp, renames to outputPath only on full success |
| 4 | Medium | Max redirect depth 5 too low for CDN chains and GCS signed-URL flows | Raised to 10 (aligned with follow-redirects and axios defaults) |
Type completeness (1)
| # | Severity | Finding | Resolution |
|---|---|---|---|
| 5 | Medium | MODEL_DURATIONS and MODEL_SAMPLE_MAX lacked TypeScript shapes — implementer trap |
Both fully typed: ReadonlyMap<string, ReadonlySet<number>> and Readonly<Record<string, number>> with provisional values + Open Question references |
Coherence + architecture (2)
| # | Severity | Finding | Resolution |
|---|---|---|---|
| 6 | Low | MODEL_SUGGESTIONS lite comment said "lite is image-to-video only" — contradicting model table which says lite supports text-to-video |
Comment corrected: "omitted on quality grounds, not capability" |
| 7 | Low | videoExtensionInput warning attributed to validateConfig but no rule triggered it |
Rule #10 added: forward-declared VeoConfig fields → warning + dropped from request body. Future forward declarations inherit automatically |
Diff summary
b9d3b1f: 41 insertions, 8 deletions. The security findings (HTTPS downgrade + body cap) closed real attack vectors. The atomic-write pattern is a clear robustness improvement. The type shapes close a concrete implementer trap. Rule #10 properly grounds the previously-floating warning in the validation table.
PR #2 had no new reviews this round.
|
/gemini review |
|
@claude review |
|
Claude finished @GiuseppeIuculano's task in 3m 15s —— View job Tasks
The spec is in excellent shape after 21 rounds of review. Architecture decisions, security hardening, type contracts, validation composition, and migration plan are all well-specified. Four narrow issues remain — all Low severity, all cheap to fix before implementation starts. Low — Data flow diagram: redirect max depth is
|
There was a problem hiding this comment.
Code Review
This pull request introduces a design specification for the Veo Foundation sub-project, which refactors the veo-tools plugin into a shared library architecture. The proposal includes support for new API parameters, expanded model options, and a context-aware audio system. Review feedback suggests applying atomic write/rename logic consistently across both HTTPS and GCS download methods and using unique temporary filenames to prevent potential race conditions during file operations.
| | Module | Public API | Internal | | ||
| |---|---|---| | ||
| | `auth.ts` | `getAccessToken(): Promise<string>` | **Uses `google-auth-library`** npm package (not shelled-out `gcloud` CLI). Supports Service Accounts, ADC, and Workload Identity natively. Removes dependency on gcloud CLI being installed in the execution context — important for CI/CD and containerized usage. The previous `gcloud auth print-access-token` call in `veo-generate.ts` is replaced. | | ||
| | `api.ts` | `submitGeneration(config, token): Promise<operationName>`, `pollOperation(opName, token): Promise<status>`, `downloadFile(target, path, token): Promise<void>` (where `target` is either an `https://…` URL or a `gs://…` URI) | URL building, request handling. `makeRequest` (internal) handles HTTPS API calls — does **not** follow redirects (the predict/poll endpoints are not expected to return 3xx) — and uses a **30-second total-request timeout** appropriate for short metadata calls. `downloadFile` accepts both `https://` (downloads via HTTPS, follows redirects manually with a **max-depth limit of 10** — aligned with `follow-redirects` and `axios` defaults; CDN chains and GCS signed-URL flows can exceed 5; validates non-2xx/3xx → throws with status + body **capped at 1 KB** to prevent OOM if a misbehaving server returns a huge error payload; **writes to `${outputPath}.tmp` and atomically renames to `outputPath` only on full success**, so a crashed or killed process leaves a stranded `.tmp` instead of a corrupt final file; uses a **socket idle timeout of 30s** plus a **max total duration of 15 minutes** as a belt-and-suspenders guard against connections that keep sending bytes at near-zero throughput — neither limit alone protects against the other; **on each redirect, if the target origin differs from the current request's origin, the `Authorization` header is stripped before following** — where *origin* is `scheme + host + port` per RFC 6454, so HTTPS→HTTP redirects on the same host are cross-origin even though hostname matches — this prevents Vertex bearer tokens leaking to e.g. signed GCS URLs on `storage.googleapis.com`, which already carry their own auth and would also reject a bearer token; **HTTPS→HTTP redirects are rejected outright** regardless of header handling — the video payload itself is sensitive and must not transit cleartext even with credentials stripped) and `gs://` (downloads via `@google-cloud/storage` client — that library handles its own retries and timeouts). The Vertex AI Veo API can return either form in `status.videoUrl` depending on project configuration. | |
There was a problem hiding this comment.
The atomic write/rename logic (${outputPath}.tmp -> outputPath) is described for the manual HTTPS download implementation but not explicitly for the GCS client download. For consistency and robustness, both download methods should use the same temporary file pattern to ensure that a crashed or killed process does not leave a corrupt or partial file at the final destination.
| | Module | Public API | Internal | | ||
| |---|---|---| | ||
| | `auth.ts` | `getAccessToken(): Promise<string>` | **Uses `google-auth-library`** npm package (not shelled-out `gcloud` CLI). Supports Service Accounts, ADC, and Workload Identity natively. Removes dependency on gcloud CLI being installed in the execution context — important for CI/CD and containerized usage. The previous `gcloud auth print-access-token` call in `veo-generate.ts` is replaced. | | ||
| | `api.ts` | `submitGeneration(config, token): Promise<operationName>`, `pollOperation(opName, token): Promise<status>`, `downloadFile(target, path, token): Promise<void>` (where `target` is either an `https://…` URL or a `gs://…` URI) | URL building, request handling. `makeRequest` (internal) handles HTTPS API calls — does **not** follow redirects (the predict/poll endpoints are not expected to return 3xx) — and uses a **30-second total-request timeout** appropriate for short metadata calls. `downloadFile` accepts both `https://` (downloads via HTTPS, follows redirects manually with a **max-depth limit of 10** — aligned with `follow-redirects` and `axios` defaults; CDN chains and GCS signed-URL flows can exceed 5; validates non-2xx/3xx → throws with status + body **capped at 1 KB** to prevent OOM if a misbehaving server returns a huge error payload; **writes to `${outputPath}.tmp` and atomically renames to `outputPath` only on full success**, so a crashed or killed process leaves a stranded `.tmp` instead of a corrupt final file; uses a **socket idle timeout of 30s** plus a **max total duration of 15 minutes** as a belt-and-suspenders guard against connections that keep sending bytes at near-zero throughput — neither limit alone protects against the other; **on each redirect, if the target origin differs from the current request's origin, the `Authorization` header is stripped before following** — where *origin* is `scheme + host + port` per RFC 6454, so HTTPS→HTTP redirects on the same host are cross-origin even though hostname matches — this prevents Vertex bearer tokens leaking to e.g. signed GCS URLs on `storage.googleapis.com`, which already carry their own auth and would also reject a bearer token; **HTTPS→HTTP redirects are rejected outright** regardless of header handling — the video payload itself is sensitive and must not transit cleartext even with credentials stripped) and `gs://` (downloads via `@google-cloud/storage` client — that library handles its own retries and timeouts). The Vertex AI Veo API can return either form in `status.videoUrl` depending on project configuration. | |
There was a problem hiding this comment.
Using a fixed .tmp suffix for atomic writes can lead to race conditions or collisions if multiple processes attempt to write to the same output path simultaneously. It is safer to use a unique temporary filename (e.g., by appending a UUID or a random string) within the same directory before the final atomic rename.
Resolved 6 review findings (2 gemini + 4 claude bot). Atomic write robustness: - gemini #1: gs:// branch now uses the same atomic-write pattern as HTTPS. The @google-cloud/storage downloads also go through temp file + rename, not direct to outputPath. Consistent crash safety. - gemini #2: temp file naming now includes a random suffix: ${outputPath}.${randomSuffix}.tmp instead of fixed .tmp. Prevents collision if two processes target the same output path concurrently (low likelihood in practice but trivially correct). Coherence cleanup: - claude #3: data flow diagram updated to "max 10" matching the module boundary (was still showing the pre-round-21 "max 5"). Diagram also now shows the cross-origin auth strip, HTTPS→HTTP rejection, and atomic write — full alignment with the module boundary description. - claude #4: orphan "#### `REGIONS` constant" heading removed. Round 21 inserted MODEL_DURATIONS shapes between the heading and its content, leaving an empty section visible in the rendered outline. - claude #5: rule #10 row moved to its correct position after rules #8 and #9 in the validation table. Numeric sequence is now #1-#10 followed by F1-F5. - claude #6: DURATION_SUGGESTIONS test coverage added to audio-default.test.ts (co-located since both are pure lookup constants with identical test shape — avoids trivial file proliferation). Test plan now explicitly references both. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Twenty-second-round review applied (commit
|
| # | Source | Finding | Resolution |
|---|---|---|---|
| 1 | gemini | Atomic write pattern documented only for HTTPS branch, not GCS | Now consistent: @google-cloud/storage downloads also use .tmp + rename, not direct write |
| 2 | gemini | Fixed .tmp suffix → race condition for concurrent processes |
Random suffix: ${outputPath}.${randomSuffix}.tmp |
Coherence cleanup (4)
| # | Source | Finding | Resolution |
|---|---|---|---|
| 3 | claude | Data flow diagram still said "max 5"; module boundary updated to 10 in round 21 | Diagram aligned to 10, also added cross-origin auth strip, HTTPS→HTTP rejection, atomic write |
| 4 | claude | Orphan #### REGIONS constant heading with no content |
Removed (was leftover from round 21 insertion of MODEL_DURATIONS section in between) |
| 5 | claude | Rule #10 row out of order — appeared before #8/#9 in the table | Moved to correct sequential position after #9 |
| 6 | claude | DURATION_SUGGESTIONS had no test file in plan |
Coverage added to audio-default.test.ts (co-located since both are pure lookup constants with identical test shape) |
Diff summary
6376f41: 6 insertions, 7 deletions. All findings closed — none architectural, all cleanup of stragglers from prior rounds.
Summary
Design spec for Foundation (sub-project 1 of 5) in the Veo improvements roadmap. Foundation is the enabling block: it extracts shared library code, adds the cross-cutting Veo API parameters currently missing, introduces audio context-aware defaults derived from the use case, and centralizes cross-parameter validation.
Spec file:
docs/superpowers/specs/2026-05-23-veo-foundation-design.mdRoadmap context
Approach A (staged by capability). Five sub-projects:
/veo-animate— image-to-video/veo-interpolate— first+last frame/veo-multi-shot v2—referenceImagesintegration/veo-extend— video extension up to ~148s (serialized last)Foundation must land first because the other four import from
skills/_shared/veo-core/.Key design decisions (current state after 6 review rounds)
skills/_shared/veo-core/— excluded from skill loader (noSKILL.md). CLI scripts target ~150 lines, accept up to ~200.USE CASE. Hero/loop/ambient → off. Marketing/social/product/storytelling → on. Unspecified → on (matches Veo 3.1 API native default). Explicit--audio/--no-audioalways wins.createValidator()factory —FOUNDATION_RULESexported as array; each skill composes its own validator with extra rules. No global registry, no test-order dependence.veo-3.1-generate-preview(latest gen) withveo-3.0-generate-001as fallback. Legacyveo-3.1-generate-001dropped from default chain (still usable via explicit--model).countTokensendpoint, so any heuristic produces false positives. Soft warning at >900 estimated tokens; the API enforces the 1024 ceiling.google-auth-library, not shelling out togcloudCLI. Critical for CI/CD and containers.@veo-core/*path alias withtsconfig-pathsregistered throughNODE_OPTIONS. Roottsconfig.json+vitest.config.ts+package.json+.github/workflows/test.ymlintroduced as part of Foundation.Out of scope
Open questions (resolved during implementation)
sampleCountupper bound per model (empirically verified)Test plan
This PR ships only the spec — no code changes yet. Validation here is reviewer alignment.
Implementation follows in a separate PR after spec sign-off.
🤖 Generated with Claude Code