Skip to content

spec: Veo Foundation — shared lib, cross-cutting params, audio context-aware#1

Open
GiuseppeIuculano wants to merge 26 commits into
mainfrom
feat/veo-foundation-spec
Open

spec: Veo Foundation — shared lib, cross-cutting params, audio context-aware#1
GiuseppeIuculano wants to merge 26 commits into
mainfrom
feat/veo-foundation-spec

Conversation

@GiuseppeIuculano

@GiuseppeIuculano GiuseppeIuculano commented May 23, 2026

Copy link
Copy Markdown
Member

Summary

Design spec for Foundation (sub-project 1 of 5) in the Veo improvements roadmap. Foundation is the enabling block: it extracts shared library code, adds the cross-cutting Veo API parameters currently missing, introduces audio context-aware defaults derived from the use case, and centralizes cross-parameter validation.

Spec file: docs/superpowers/specs/2026-05-23-veo-foundation-design.md

Roadmap context

Approach A (staged by capability). Five sub-projects:

  1. Foundation (this PR — spec only)
  2. /veo-animate — image-to-video
  3. /veo-interpolate — first+last frame
  4. /veo-multi-shot v2referenceImages integration
  5. /veo-extend — video extension up to ~148s (serialized last)

Foundation must land first because the other four import from skills/_shared/veo-core/.

Key design decisions (current state after 6 review rounds)

  • Shared lib at skills/_shared/veo-core/ — excluded from skill loader (no SKILL.md). CLI scripts target ~150 lines, accept up to ~200.
  • Audio context-aware default — derived from Phase 1 UNDERSTAND USE CASE. Hero/loop/ambient → off. Marketing/social/product/storytelling → on. Unspecified → on (matches Veo 3.1 API native default). Explicit --audio/--no-audio always wins.
  • Validation: createValidator() factoryFOUNDATION_RULES exported as array; each skill composes its own validator with extra rules. No global registry, no test-order dependence.
  • Default model selection: veo-3.1-generate-preview (latest gen) with veo-3.0-generate-001 as fallback. Legacy veo-3.1-generate-001 dropped from default chain (still usable via explicit --model).
  • Token validation: warning-only — verified Veo has no countTokens endpoint, so any heuristic produces false positives. Soft warning at >900 estimated tokens; the API enforces the 1024 ceiling.
  • Auth via google-auth-library, not shelling out to gcloud CLI. Critical for CI/CD and containers.
  • Imports via @veo-core/* path alias with tsconfig-paths registered through NODE_OPTIONS. Root tsconfig.json + vitest.config.ts + package.json + .github/workflows/test.yml introduced as part of Foundation.

Out of scope

  • Direct Gemini API endpoint (user explicitly excluded — remain on Vertex AI)
  • Image-to-video, extension, interpolation, referenceImages workflows (separate sub-projects)
  • Insert/remove/upscale operations (formally out of roadmap)

Open questions (resolved during implementation)

  1. Region detection mechanism (env var + GOOGLE_CLOUD_LOCATION inference)
  2. Token counting heuristic for non-Latin scripts (multipliers per script range)
  3. sampleCount upper bound per model (empirically verified)
  4. Veo 2 allowed durations (empirically verified — official doc says 5-8s range)

Test plan

This PR ships only the spec — no code changes yet. Validation here is reviewer alignment.

  • Engineering team review of architecture & module boundaries
  • Product/design review of audio-default policy and workflow changes
  • Agent review (Claude Code review agents) — 6 rounds completed, all findings either applied or declined with documented rationale
  • Confirm sub-project decomposition matches team's preferred review/release cadence

Implementation follows in a separate PR after spec sign-off.

🤖 Generated with Claude Code

Foundation is sub-project 1 of 5 in the Veo improvements roadmap.
Covers shared library extraction (skills/_shared/veo-core/), cross-cutting
API parameters (negativePrompt, enhancePrompt, storageUri, personGeneration),
extended model support (Veo 3.1 Lite, Veo 3 stable, Veo 2), 4K resolution,
audio context-aware default derived from use case, and centralized
cross-parameter validation.

Other sub-projects (separate specs):
- /veo-animate (image-to-video)
- /veo-interpolate (first+last frame)
- /veo-extend (video extension)
- /veo-multi-shot v2 (referenceImages integration)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive design specification for the 'Veo Foundation' sub-project, aimed at refactoring the veo-tools plugin. The proposal includes creating a shared veo-core library to handle authentication, API requests, validation, and pricing, while expanding support for new models and parameters like 4K resolution and negative prompts. Review feedback identifies several areas for refinement: inconsistencies in the pricing calculation factors (specifically the omission of sampleCount and duration), a contradiction between the default model selection and the stated stability preference, and the use of parameters in validation rules that are missing from the configuration table. There is also a recommendation to adjust the token counting logic to prevent false rejections of prompts near the limit.

const PRICING = { ... } as const
```

The source URL and review-on-release invariant are encoded as file-level comments. Pricing is a manually-maintained lookup table per model × resolution × audio. Estimate displayed in Phase 4 PRESENT with breakdown.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The pricing strategy description is inconsistent and incomplete. Line 82 mentions duration as a factor, but line 336 omits it. Additionally, both sections fail to account for sampleCount, which is a critical multiplier for cost in the Veo API (since users pay for each generated video). The design should specify that estimateCost uses model × resolution × duration × audio × sampleCount.

Comment on lines +142 to +155
| `model` | enum | expanded list (see below) | `veo-3.1-generate-preview` (pending empirical verification) | |

#### Model expansion

| Model ID | Variant | Use case | Audio | Max resolution |
|---|---|---|---|---|
| `veo-3.1-generate-preview` | Veo 3.1 quality (preview) | Default quality | yes | 4K |
| `veo-3.1-fast-generate-preview` | Veo 3.1 fast | Fast iteration | yes | 4K |
| `veo-3.1-lite-generate-preview` | Veo 3.1 lite | Lowest cost, text/image-to-video only | yes | 1080p |
| `veo-3.0-generate-001` | Veo 3 stable | Production with audio | yes | 4K |
| `veo-3.0-fast-generate-001` | Veo 3 fast stable | Production fast iteration | yes | 4K |
| `veo-2.0-generate-001` | Veo 2 | Silent video, multi-sample | **no** | 720p |

The current default `veo-3.1-generate-001` does not appear in official docs and may be invalid. Foundation includes empirical verification: during implementation, test which IDs return successfully from `predictLongRunning`. Default falls back to the most-recent confirmed-working ID, with preference for stable over preview when both work.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a contradiction between the proposed default model and the stated selection criteria. Line 142 sets the default to veo-3.1-generate-preview, but line 155 establishes a "preference for stable over preview when both work." Since veo-3.0-generate-001 is the designated stable version (line 151), it should be the default to remain consistent with the stated preference, unless the rule is intended to prioritize the latest version (3.1) regardless of its preview status.

| 6 | `model ∈ veo-2.*` ⇒ `resolution == 720p` | "Veo 2 max resolution is 720p" |
| 7 | `model ∈ veo-3.1-lite-*` ⇒ no `referenceImages`, no extension | "Lite model doesn't support reference images or extension" |
| 8 | `referenceImages.length ∈ [1, 3]` | "Reference images must be 1-3" |
| 9 | `prompt.tokens > 1024` (approx: chars / 3.5) | "Prompt exceeds 1024 tokens" |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The token counting approximation of chars / 3.5 (Rule 9) is risky for prompts near the 1024-token limit. An inaccurate estimate could lead to false rejections of valid prompts. It is recommended to implement this as a warning for prompts estimated to be near the limit (e.g., > 900 tokens), reserving hard rejection for cases where the estimate is significantly over or when the API explicitly returns a token limit error.

|---|---|---|
| 1 | `resolution ∈ {1080p, 4k}` ⇒ `durationSeconds == 8` | "1080p/4K require duration=8" |
| 2 | `referenceImages` present ⇒ `durationSeconds == 8` | "Reference images require duration=8" |
| 3 | `lastFrame` present ⇒ `durationSeconds == 8` AND `image` present | "Interpolation requires both image and lastFrame, duration=8" |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Validation Rule 3 references lastFrame and image parameters, but these are missing from the VeoConfig parameter table at line 134. Even if these are implemented in later sub-projects, the shared VeoConfig type and validation.ts logic defined in this Foundation project must include these fields to avoid type errors and ensure the proposed validation rules are implementable.

Applied 10 review findings from two parallel review agents (technical
correctness + scope decomposition).

Technical fixes:
- Resolve 80 vs 150 lines contradiction (Goals vs Success criteria)
- Remove validation rules referencing out-of-scope parameters
  (referenceImages, lastFrame, video extension); document them as
  forward references owned by their sub-projects
- Clarify Veo 3.1 Lite model description (supports text-to-video in
  Foundation; image-to-video added by /veo-animate)
- Remove undocumented "Audio Layer added" auto-adjustment from
  Phase 4 PRESENT example
- Resolve region detection contradiction with explicit VEO_REGION
  env var mechanism

Decomposition fixes:
- Foundation now ships shared image input plumbing (ImageInput type,
  image-helpers.ts) so downstream sub-projects don't reinvent it
- Sequencing clarified: /veo-extend serialized after /veo-animate +
  /veo-interpolate (riskier operation-chain semantics)
- Edit operations (insert/remove/upscale) formally out of roadmap
- Added Risks & contingency section with Foundation-A/B split as
  authorized fallback if scope grows
- Added Pricing & constants maintenance protocol to prevent cross-
  project drift

Renumbered validation rules and updated cross-references accordingly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@GiuseppeIuculano

Copy link
Copy Markdown
Member Author

Agent review applied (10 findings, 1 follow-up commit)

Two review agents ran in parallel on the initial spec — a code-reviewer for technical consistency and a general-purpose agent for scope/decomposition. All findings were addressed in commit 430f1c9.

Technical fixes (5)

# Severity Finding Resolution
1 Major Validation rules #2/#3/#4/#7/#8 referenced parameters out of Foundation scope (referenceImages, lastFrame, video extension) Rules removed from Foundation table; documented as forward references owned by each sub-project. Added registerRule() API for downstream sub-projects to extend validation.
2 Major Contradiction "~80 lines" (Goals) vs "<150 lines" (Success criteria) Unified to <150 lines (realistic target).
3 Major Veo 3.1 Lite description ambiguous ("text/image-to-video only") Clarified: Lite supports text-to-video in Foundation; image-to-video added by /veo-animate. No referenceImages, no extension.
4 Minor Region detection contradiction (auto-fix proactive vs delegate-to-API-error) Resolved with VEO_REGION env var: set → proactive auto-fix; unset → API error path. Both coexist.
5 Minor Phase 4 PRESENT example showed "Audio Layer added" as auto-adjustment but feature undocumented Removed. Audio Layer absence is a warning, not an auto-add.

Decomposition fixes (5)

# Type Finding Resolution
D1 Decomposition /veo-animate, /veo-interpolate, /veo-multi-shot v2 all need image input plumbing Foundation now ships ImageInput type and image-helpers.ts (MIME validation, base64, GCS upload) even though no Foundation parameter consumes them.
D2 Sequencing /veo-extend has riskier operation-chain semantics than animate/interpolate Execution order updated: animate + interpolate + multi-shot v2 parallel after Foundation; extend serialized last.
D3 Missing Insert/remove/upscale labeled "possible stretch" Formally out of roadmap; documented as demand-gated future work, not a deferred sub-project.
D4 Scope risk Foundation accorpa 3 concern (refactor, params+models, audio system) — possible scope creep Added "Risks & contingency" section: implementer authorized to split into Foundation-A (refactor+params+validation+image plumbing) and Foundation-B (audio context-aware system) if PR size exceeds expectations. Default remains single PR.
D5 Decomposition pricing.ts + model constants could drift across 4 sub-projects Added "Pricing & constants maintenance protocol": single Foundation-owned source, dated header audit trail, mandatory estimateCost() import for all generated cost output.

What didn't change

  • Audio default policy (unspecified → on, derived per use case otherwise) — agent review didn't challenge this.
  • 6-phase workflow structure — only individual phase contents updated.
  • Out-of-scope items (Gemini API direct endpoint, web UI, multi-provider abstraction).

Ready for team review.

@GiuseppeIuculano GiuseppeIuculano marked this pull request as ready for review May 23, 2026 19:26
Copilot AI review requested due to automatic review settings May 23, 2026 19:26
@GiuseppeIuculano

Copy link
Copy Markdown
Member Author

/gemini review

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a design spec document for the “Foundation” sub-project in the Veo improvements roadmap, describing the planned shared veo-core library extraction, cross-cutting parameter support, audio-default policy, and centralized validation strategy.

Changes:

  • Introduces a comprehensive Foundation design spec covering scope, architecture, validation rules, audio-default policy, and testing strategy.
  • Defines proposed module boundaries and a migration plan for extracting shared Veo infrastructure into skills/_shared/veo-core/.
  • Documents backwards-compatibility expectations and a staged roadmap for follow-on skills (/veo-animate, /veo-interpolate, /veo-extend, /veo-multi-shot v2).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

# Veo Foundation — Design Spec

**Date**: 2026-05-23
**Status**: Approved (pending user review)
- Missing API parameters: `negativePrompt`, `enhancePrompt`, `storageUri`, `personGeneration`
- Missing models: Veo 3.1 Lite, Veo 3 stable, Veo 2
- Missing resolution: 4K
- Audio nativo treated as second-class (default off, no prompt guidance) despite being a killer feature of Veo 3+
Comment on lines +64 to +67
- Foundation must reduce per-script line count from ~600 to <150 by extracting shared code (target aligned with Success criteria).
- New skills (`/veo-animate` etc.) must be implementable by writing only a CLI + workflow, not new auth/polling/validation code.
- Audio nativo becomes a first-class option when use case warrants it, never silently lost when use case warrants it.


## Risks & contingency

**Scope risk — Foundation accorpa 3 concern**: refactor lib, params+models+validation, audio context-aware system. Agent review flagged this as a possible scope-creep risk. Mitigation: if during implementation any of these sub-systems drifts past the planned size (e.g., audio context-aware requires more workflow rewriting than estimated), the implementer is authorized to split Foundation into two PRs without re-running this brainstorming step:

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive design specification for the 'Veo Foundation' sub-project, which centralizes core functionality for Google Veo tools into a shared library. The proposal includes support for new models, 4K resolution, expanded API parameters, and a context-aware audio system. Feedback focused on improving the token counting heuristic for non-English languages to prevent unexpected API errors and expanding the supported aspect ratios to include the 1:1 format for social media compatibility.

| 1 | `resolution ∈ {1080p, 4k}` ⇒ `durationSeconds == 8` | "1080p/4K require duration=8" | Foundation |
| 2 | `model ∈ veo-2.*` ⇒ `generateAudio == false` | "Veo 2 doesn't support audio" | Foundation |
| 3 | `model ∈ veo-2.*` ⇒ `resolution == 720p` | "Veo 2 max resolution is 720p" | Foundation |
| 4 | `prompt.tokens > 1024` (approx: chars / 3.5) | "Prompt exceeds 1024 tokens" | Foundation |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The token counting approximation chars / 3.5 is a rough heuristic that varies significantly by language. For CJK languages (Chinese, Japanese, Korean), where a single character often maps to multiple tokens, this formula will significantly underestimate the count. This could lead to hard API errors in Phase 5 instead of the intended proactive validation in Phase 3. Consider refining this heuristic or adding a note that validation is approximate for non-English prompts.

| 4 | `prompt.tokens > 1024` (approx: chars / 3.5) | "Prompt exceeds 1024 tokens" | Foundation |
| 5 | `personGeneration == allow_all` in EU/UK/CH/MENA region (see Open Question #2 for detection mechanism) | Auto-correct + warning: "Region restriction: falling back to allow_adult" | Foundation |
| 6 | `sampleCount ∈ [1, model-max]` (see Open Question #4) | "sampleCount out of range for selected model" | Foundation |
| 7 | `aspectRatio ∈ {16:9, 9:16}` only | "Invalid aspect ratio" | Foundation |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The aspectRatio validation rule excludes the 1:1 (square) format, which is a native capability of Google Veo 3.1 and a common requirement for social media platforms (e.g., Instagram, LinkedIn). Including it would better align with the "marketing" and "social" use cases identified in Phase 1.

Suggested change
| 7 | `aspectRatio ∈ {16:9, 9:16}` only | "Invalid aspect ratio" | Foundation |
| 7 | aspectRatio ∈ {16:9, 9:16, 1:1} only | "Invalid aspect ratio" | Foundation |

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion declined — claim contradicted by official documentation.

I verified the aspectRatio valid values across three official Google sources:

  1. Vertex AI — Veo 3.1 model page (docs.cloud.google.com/vertex-ai/generative-ai/docs/models/veo/3-1-generate-preview): "Supported aspect ratios: 9:16, 16:9"
  2. Gemini API — Veo docs (ai.google.dev/gemini-api/docs/video): "aspectRatio: The video's aspect ratio. \"16:9\" (default), \"9:16\""
  3. Vertex AI — Veo 3.0 model page: "Supported aspect ratios: 9:16, 16:9" (both for standard and fast-generate variants)

All three explicitly enumerate exactly two valid values: 16:9 and 9:16. 1:1 is not a supported aspect ratio for any current Veo model. Including it in validation would either fail at API call (false positive in our validation) or silently downscale/crop (worse).

If/when Google adds 1:1 support to Veo, we'll add it via a constants.ts update following the maintenance protocol defined in §6. Rule #7 stays at {16:9, 9:16} for now.

Accepted 7 findings, rejected 1 (aspect ratio 1:1 — see below).

Substantive fixes:
- Pricing formula: add sampleCount as strict multiplier; align all
  occurrences to model × resolution × duration × audio × sampleCount
- Default model selection rule: keep veo-3.1-generate-preview (latest
  generation) as default with veo-3.0-generate-001 as fallback if
  preview unavailable. Stable-preference applies within same generation,
  not across generations.
- VeoConfig type schema: declare image, lastFrame, referenceImages,
  videoExtensionInput as optional forward declarations in Foundation
  so downstream sub-projects don't have to modify Foundation types.
  Foundation's validateConfig() ignores these; sub-projects register
  rules via registerRule() API.
- Token counting: rule #4 changes from hard-reject at 1024 to soft
  warning at >900 / hard reject at >1024. Added CJK detection branch
  using chars/1.5 heuristic plus note that estimate is approximate.
  API-side token errors are surfaced verbatim for heuristic refinement.

Trivial fixes:
- Status frontmatter: "Approved (pending user review)" → "Proposed
  (under review)" (resolve self-contradiction)
- "nativo" → "native" (two occurrences)
- "Foundation accorpa 3 concern" → "Foundation bundles 3 concerns"

Rejected:
- aspectRatio 1:1: bot claim contradicted by three official Google
  doc sources (Vertex AI Veo 3.1, Veo 3.0, Gemini API docs) which
  all list exactly {16:9, 9:16}. Suggestion declined with citation
  in PR reply.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@GiuseppeIuculano

Copy link
Copy Markdown
Member Author

Bot review applied (commit 6de17b0)

Reviewed and resolved 10 findings from @gemini-code-assist and @copilot-pull-request-reviewer.

Accepted (7)

Finding Source Resolution
Pricing formula missed sampleCount and inconsistent duration mention gemini estimateCost keyed by model × resolution × duration × audio × sampleCount everywhere. sampleCount is a strict multiplier (users pay per generated video).
Default model contradicted "preference for stable" gemini Decision: keep veo-3.1-generate-preview as default (latest generation gives users access to newest features). Reformulated selection rule: stable-preference applies within the same generation, not across. Fallback to veo-3.0-generate-001 if preview unavailable.
Validation rules referenced lastFrame/image/referenceImages missing from VeoConfig table gemini Added forward declarations in VeoConfig: image?, lastFrame?, referenceImages?, videoExtensionInput? are optional from day one. Foundation's validateConfig() ignores them; sub-projects register rules via registerRule() API. Downstream sub-projects don't have to modify Foundation types.
Token counting chars/3.5 risks false rejections + significantly underestimates CJK languages gemini (twice) Rule #4 now: soft warning at >900 tokens, hard reject only at >1024. CJK detection branch uses chars/1.5. API-side token errors surfaced verbatim for heuristic refinement.
Status frontmatter self-contradictory ("Approved (pending user review)") Copilot Changed to "Proposed (under review)".
"nativo" should be "native" (two occurrences) Copilot Fixed.
"Foundation accorpa 3 concern" mixed Italian/English Copilot Rewritten as "Foundation bundles 3 concerns".

Declined (1)

Finding Source Reason
Add 1:1 (square) to aspectRatio gemini Contradicted by 3 official Google doc sources (Vertex AI Veo 3.1, Veo 3.0, Gemini API docs) all listing exactly {16:9, 9:16}. Full citation in inline reply. If/when Google adds 1:1 support, we'll update via the constants.ts maintenance protocol (§6).

Diff summary

6de17b0: 43 insertions, 11 deletions. Net effect: spec is now internally consistent, all type/validation forward declarations explicit, pricing model accurate, default selection rule unambiguous.

Ready for the next round of review.

@GiuseppeIuculano

Copy link
Copy Markdown
Member Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive design specification for the 'Veo Foundation' sub-project, which aims to centralize core logic into a shared library and expand the capabilities of the Veo plugin. Key changes include refactoring existing skills to use a unified API, adding support for new parameters like 4K resolution and negative prompts, and implementing an audio context-aware system. Feedback from the review highlights opportunities to improve the robustness of the shared library, such as adding explicit HTTP status validation for downloads, making the download step conditional on storage settings, and ensuring the type system and validation rules are fully forward-compatible with future roadmap items like video extension and social-media-friendly aspect ratios. Additionally, refinements were suggested for region detection and token counting heuristics for non-Latin scripts.

| Module | Public API | Internal |
|---|---|---|
| `auth.ts` | `getAccessToken(): string` | gcloud CLI invocation, error normalization |
| `api.ts` | `submitGeneration(config, token): Promise<operationName>`, `pollOperation(opName, token): Promise<status>`, `downloadFile(url, path, token): Promise<void>` | URL building, HTTPS request handling, redirect following |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The downloadFile function in the shared api.ts should include explicit validation for HTTP status codes. The current implementation in veo-generate.ts (which this spec refactors) only handles redirects and pipes the response body regardless of other status codes. Ensure the shared version throws an error for non-200/30x responses (e.g., 404, 403, 500) to prevent the creation of empty or corrupted video files.

├─> getAccessToken()
├─> submitGeneration(config, token) → operationName
├─> pollOperation(operationName, token) → status
└─> downloadFile(status.videoUrl, outputPath, token)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The data flow diagram indicates that downloadFile is a mandatory step. However, the storageUri parameter (Line 143) allows users to store the video directly on GCS instead of downloading it locally. The orchestration logic in generate.ts should be updated to make the download step conditional on the absence of a storageUri in the configuration.

// Foundation-owned (validated and consumed here)
prompt: string
model?: string
aspectRatio?: '16:9' | '9:16'

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The aspectRatio type is currently limited to 16:9 and 9:16. Since Veo 3.1 natively supports a 1:1 (square) aspect ratio, which is highly relevant for social media use cases mentioned in the spec (Line 217), consider adding '1:1' to the allowed values.

Suggested change
aspectRatio?: '16:9' | '9:16'
aspectRatio?: '16:9' | '9:16' | '1:1'

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion declined (second time) — same rationale as previous reply on aspectRatio 1:1: verified against three official Google sources, all listing exactly {16:9, 9:16}. See prior reply for full citation.

prompt: string
model?: string
aspectRatio?: '16:9' | '9:16'
durationSeconds?: 4 | 6 | 8

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The durationSeconds type is restricted to the literal values 4 | 6 | 8. However, the roadmap (Line 28) includes the /veo-extend sub-project which supports durations up to 148s. To fulfill the design goal of forward-declaring fields to avoid future interface modifications (Line 164), durationSeconds should be typed as number. The 4/6/8 restriction for standard models should be enforced via validation rules in validation.ts instead of the type system.

Suggested change
durationSeconds?: 4 | 6 | 8
durationSeconds?: number


Anti-patterns rejected by Phase 3 VALIDATE:
- Audio on but no audio descriptors in prompt → warning
- Comic-book onomatopoeia (`BAM!`, `WHOOSH!`) → rejected; suggest realistic description

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is an inconsistency between the hard rejection of onomatopoeia (e.g., 'BAM!', 'WHOOSH!') here and the softening of text-related rules in Phase 3 (Line 306), where text requests are downgraded to warnings. Given that Veo 3+ supports native audio, users might use these terms to describe sound effects. Unless onomatopoeia is known to cause severe artifacts in the audio engine, consider treating it as a warning (suggesting descriptive Foley terms instead) to remain consistent with the overall prompt validation strategy.

## Open questions

1. **Default model ID**: `veo-3.1-generate-001` (current) vs `veo-3.1-generate-preview` (docs) — to be resolved empirically during implementation. If both work, prefer current for backwards compatibility; if only one works, choose that.
2. **Region detection**: rule #5 (`personGeneration` regional restriction) requires knowing the user's region for proactive auto-fix. Resolution: Foundation reads `VEO_REGION` env var (values: `us`, `eu`, `uk`, `ch`, `mena`, `other`). If set, auto-fix applies proactively in Phase 4 PRESENT. If unset, no auto-fix; the API error (if any) is surfaced verbatim in Phase 5 GENERATE. This eliminates the contradiction between proactive auto-fix and "delegate to API error" — both paths exist and the env var decides which.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The proposed region detection relies solely on a manual VEO_REGION environment variable. To improve the user experience and the reliability of proactive auto-fixes (Rule #5), the veo-core library should attempt to infer the region from the GOOGLE_CLOUD_LOCATION environment variable (e.g., mapping us-central1 to us, europe-west1 to eu) when VEO_REGION is not explicitly set.


1. **Default model ID**: `veo-3.1-generate-001` (current) vs `veo-3.1-generate-preview` (docs) — to be resolved empirically during implementation. If both work, prefer current for backwards compatibility; if only one works, choose that.
2. **Region detection**: rule #5 (`personGeneration` regional restriction) requires knowing the user's region for proactive auto-fix. Resolution: Foundation reads `VEO_REGION` env var (values: `us`, `eu`, `uk`, `ch`, `mena`, `other`). If set, auto-fix applies proactively in Phase 4 PRESENT. If unset, no auto-fix; the API error (if any) is surfaced verbatim in Phase 5 GENERATE. This eliminates the contradiction between proactive auto-fix and "delegate to API error" — both paths exist and the env var decides which.
3. **Token counting** (rule #4): `chars / 3.5` is a Latin-script heuristic. For CJK languages (Chinese, Japanese, Korean) one character often maps to multiple tokens, so the heuristic significantly **underestimates** token count and risks letting prompts through that the API later rejects. For non-Latin scripts (Cyrillic, Arabic, Hebrew, Devanagari) the ratio also differs. Foundation mitigation: (a) rule #4 acts as a **soft warning** at >900 estimated tokens, **hard reject** only at >1024 estimated; (b) for prompts with significant CJK content (detected via Unicode range scan), the heuristic switches to `chars / 1.5` (rough average) and the warning surfaces a note that the estimate is approximate; (c) if Phase 5 returns an API-side token error, the error is surfaced verbatim and tagged for the maintainer to revisit the heuristic. Accurate token counting via an extra API call remains deferred until evidence justifies it.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The token counting heuristic for non-Latin scripts currently only specifies a multiplier for CJK content (chars / 1.5). Other non-Latin scripts mentioned (Cyrillic, Arabic, Hebrew, Devanagari) also have significantly different character-to-token ratios compared to Latin. The 'significant non-Latin content' detection should apply a safer, more conservative multiplier to all non-Latin Unicode ranges to prevent unexpected API rejections due to underestimation.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 5 comments.


```
skills/
_shared/ # NEW — non-skill code (underscore prefix excludes from skill loader)
SKILL.md # minimal update (params pass-through)
```

The `_shared/` directory uses an underscore prefix so the Claude Code skill loader does not treat it as a skill (skill loader scans `*/SKILL.md`). Each consuming skill references it via relative imports plus a tsconfig path mapping (`@veo-core/*`) to keep imports readable and refactor-safe.
Comment on lines +127 to +131
├─> validateConfig(config) → may throw or auto-fix
├─> getAccessToken()
├─> submitGeneration(config, token) → operationName
├─> pollOperation(operationName, token) → status
└─> downloadFile(status.videoUrl, outputPath, token)
| New `validateConfig()` rules with auto-fix | No | Auto-fix is additive; rejects only what API would reject anyway |
| Extended `VeoConfig` type | No | All new fields optional |

Plugin version in `.claude-plugin/plugin.json` bumps minor (0.x → 0.(x+1).0). New `CHANGELOG.md` documents each behavioral change.

### Unit tests (free, deterministic)

Run on every PR; mock or pure functions only.
Resolved 11 findings from gemini-code-assist and Copilot on commit
6de17b0. Aspect ratio 1:1 again declined with same citation.

Architectural:
- api.ts: explicit HTTP status validation in downloadFile; partial-file
  cleanup on error
- generate.ts: outputPath now optional; downloadFile skipped when
  storageUri is set; GenerationResult carries either videoPath or
  gcsUri depending on path
- VeoConfig.durationSeconds: type widened from literal 4|6|8 to
  number; Foundation enforces 4/6/8 via validation.ts so /veo-extend
  can later use larger values without modifying the type
- _shared/ exclusion rationale corrected: loader scans for SKILL.md
  files (the operative mechanism); underscore prefix is defensive
  convention. Added note to verify loader behavior at implementation.
- Imports: relative imports chosen for Foundation (no root tsconfig
  exists; @veo-core/* alias wouldn't resolve under npx ts-node). Root
  tsconfig deferred until a sub-project requires it.

Refinements:
- Onomatopoeia and unquoted dialogue downgraded from reject to warning
  for consistency with rest of Phase 3 softening
- Region detection now two-tier: VEO_REGION env var first, then
  inference from GOOGLE_CLOUD_LOCATION (mapping table in spec)
- Token counting heuristic expanded with per-script multipliers
  (Latin 3.5, CJK 1.5, Cyrillic 2.0, Arabic 2.0, Hebrew 2.0,
  Devanagari 1.8) detected via Unicode range scan

Infrastructure:
- Version bump: plugin.json 1.0.0 → 1.1.0 (semver minor, additive
  features with backwards-compat overrides). Corrects prior 0.x text.
- Test runner + CI added as Foundation deliverable: vitest + workflow
  on PR. Without this, unit-test success criteria was aspirational.
  Migration plan step 1 is now infrastructure setup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@GiuseppeIuculano

Copy link
Copy Markdown
Member Author

Second-round bot review applied (commit bc82e80)

Resolved 11 new findings from @gemini-code-assist and @copilot-pull-request-reviewer. One declined (aspect ratio 1:1 — same citation as before).

Architectural fixes (5)

Source Finding Resolution
gemini #1 downloadFile missing HTTP status validation api.ts public API now requires explicit non-2xx/3xx → throw + partial-file cleanup
gemini #2 + Copilot storageUri makes download conditional but data flow ignores it generate.ts: outputPath now optional; download skipped when storageUri set; GenerationResult.gcsUri vs .videoPath branch
gemini #4 durationSeconds literal 4|6|8 blocks /veo-extend (148s) Type widened to number; Foundation enforces 4/6/8 via validation.ts, sub-projects can extend
Copilot #8 _shared/ underscore convention unverified Rationale corrected: operative mechanism is absence of SKILL.md; underscore is defensive convention; note added to verify loader behavior at impl
Copilot #9 TS path alias @veo-core/* won't work without root tsconfig Decision: relative imports for Foundation. Root tsconfig deferred until a sub-project genuinely needs it.

Refinements (3)

Source Finding Resolution
gemini #5 Onomatopoeia hard-reject inconsistent with Phase 3 softening Downgraded to warning with Foley-style suggestion. Dialogue without quotes also downgraded to warning.
gemini #6 Region detection should fall back to GOOGLE_CLOUD_LOCATION Two-tier: VEO_REGION first, then mapping from GOOGLE_CLOUD_LOCATION (us-, europe-, me-*, etc.)
gemini #7 Token counting only handles CJK, not Cyrillic/Arabic/Hebrew/Devanagari Per-script multiplier table: Latin 3.5, CJK 1.5, Cyrillic/Arabic/Hebrew 2.0, Devanagari 1.8. Dominant range >30% determines multiplier.

Infrastructure (2)

Source Finding Resolution
Copilot #11 Version bump policy says "0.x → 0.(x+1).0" but plugin.json is 1.0.0 Corrected: bump 1.0.0 → 1.1.0 (semver minor; behavioral changes have override flags)
Copilot #12 "Unit tests run on every PR" but no CI/test runner exists Added as Foundation deliverable: vitest + package.json at root + .github/workflows/test.yml. Migration plan step #1 is now CI setup; without it, unit-test success criteria was aspirational.

Declined (1)

Source Finding Reason
gemini #3 Add 1:1 to aspectRatio (third occurrence) Same as before — contradicts all three official Google sources. Inline reply citation.

Diff summary

bc82e80: 72 insertions, 30 deletions. Spec now has explicit HTTP error semantics, conditional download path, forward-compatible duration typing, Latin/non-Latin token heuristics, CI infrastructure as deliverable, and corrected versioning.

Ready for the next review round.

@GiuseppeIuculano

Copy link
Copy Markdown
Member Author

/gemini review

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

Comment on lines +125 to +130
```
CLI (veo-generate.ts)
└─> parseArgs → VeoConfig
└─> generateVideo(config, outputPath?)
├─> validateConfig(config) → may throw or auto-fix
├─> getAccessToken()
Comment on lines +286 to +290
| `resolution=1080p/4k` + `duration<8` (user didn't explicitly set duration) | Force `duration=8` | "Bumped duration to 8s to enable 1080p/4K" |
| Region=EU + `personGeneration=allow_all` | Force `allow_adult` | "Region restriction: personGeneration set to allow_adult" |
| `model=veo-2.*` + `audio=on` | Force `audio=off` | "Veo 2 doesn't support audio, disabled" |

Auto-fixes apply only when the corrected value is unambiguous. Ambiguous combinations (e.g., user explicitly set duration=6 AND resolution=1080p) fail with error instead.

- **Single owner per release**: `pricing.ts` and `constants.ts` are owned by Foundation. Any sub-project that needs to add a model, cost vector, or constant lands a Foundation-touching change in its own PR, not an inline duplicate in its skill folder.
- **Dated header is the audit trail**: every PR that touches `pricing.ts` updates the `// Last updated: YYYY-MM-DD` comment and the `// Source:` URL if applicable.
- **No skill-local pricing**: skills must `import { estimateCost } from '@veo-core/pricing'`. Hardcoded `~$0.50` strings in SKILL.md examples are allowed (they're guidance), but generated cost estimates must come from `estimateCost()`.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive design specification for the 'Veo Foundation' sub-project, which aims to refactor the veo-tools plugin by centralizing core logic into a shared library, expanding support for new API parameters and models, and establishing a robust validation and testing framework. The reviewer feedback focuses on improving project maintainability through TypeScript path aliases, ensuring the API submission logic is sufficiently dynamic to handle future modalities, and refining validation rules—specifically by adding explicit checks for supported video durations and softening the token count heuristic to avoid false positives.


The `_shared/` directory is excluded from skill discovery because it contains **no `SKILL.md`** — the Claude Code skill loader scans for `SKILL.md` files, not directory naming. The underscore prefix is a defensive convention (makes the intent visually obvious and avoids ambiguity with future loader rules), but the operative exclusion mechanism is the absence of `SKILL.md`. *To verify before implementation: confirm the current loader behavior in Claude Code; if loader rules ever change to scan by directory name, revisit.*

Consuming skills reference `_shared/` via **relative imports** (e.g., `import { generateVideo } from '../../_shared/veo-core/generate'`). The original draft proposed a tsconfig path alias `@veo-core/*`, but the repository has no root `tsconfig.json` and the scripts run via `npx ts-node` without `--project`, so path aliases wouldn't resolve reliably. Relative imports are chosen for Foundation. If later sub-projects need cleaner imports, the team can introduce a root `tsconfig.json` + `TS_NODE_PROJECT` env var as a follow-up.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Since the migration plan (step 1) already includes adding a root package.json, it is highly recommended to also include a root tsconfig.json and use path aliases (e.g., @veo-core/*) instead of relative imports. This avoids brittle paths like ../../_shared/veo-core/ and improves maintainability as the project structure grows with the four planned sub-projects.

Comment on lines +246 to +258
Anti-patterns in Phase 3 VALIDATE:
- Audio on but no audio descriptors in prompt → **warning**
- Comic-book onomatopoeia (`BAM!`, `WHOOSH!`) → **warning** with suggestion to swap for Foley-style description (e.g., `metallic clang, dust settling`). Not rejected, since Veo 3+ may interpret onomatopoeia as legitimate SFX cues; we surface a softer recommendation instead.
- Dialogue without quotes → **warning** (downgraded from reject to align with the rest of Phase 3 softening; users may want stylized speech)

New file `skills/veo/references/audio-lexicon.md` contains:
- Professional SFX vocabulary (Foley terms)
- Dialogue patterns that fit 8s (~25 words max)
- Ambient patterns per mood (industrial, organic, cinematic)
- Known limit: voice does not extend if absent in last 1s (relevant to future `/veo-extend`)

### 3. Cross-parameter validation rules

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The submitGeneration implementation example is missing several cross-cutting parameters that this Foundation project is intended to introduce, such as negativePrompt, enhancePrompt, personGeneration, and storageUri. Additionally, the instances array is hardcoded for text-to-video, which contradicts the goal of unblocking downstream sub-projects like /veo-animate (image-to-video). The design should ensure submitGeneration dynamically constructs the request body based on all Foundation-owned parameters and input modalities.


| # | Rule | Error if violated | Owned by |
|---|---|---|---|
| 1 | `resolution ∈ {1080p, 4k}` ⇒ `durationSeconds == 8` | "1080p/4K require duration=8" | Foundation |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While Rule 1 covers the duration requirement for 1080p/4K, the validation table should also include a general rule enforcing that durationSeconds must be one of the API-supported values {4, 6, 8} for all models (as mentioned in the VeoConfig interface notes on line 182). This ensures consistency across all generation types and prevents invalid API calls.

| 1 | `resolution ∈ {1080p, 4k}` ⇒ `durationSeconds == 8` | "1080p/4K require duration=8" | Foundation |
| 2 | `model ∈ veo-2.*` ⇒ `generateAudio == false` | "Veo 2 doesn't support audio" | Foundation |
| 3 | `model ∈ veo-2.*` ⇒ `resolution == 720p` | "Veo 2 max resolution is 720p" | Foundation |
| 4 | `prompt.tokens > 1024` (approx: chars / 3.5 for Latin-script; see note below) | **Warning** when estimated tokens > 900 (soft); **Reject** only when estimated > 1024 (hard ceiling) — see token counting note | Foundation |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Rule 4 specifies a "Hard reject" when the estimated token count exceeds 1024. Given that the estimation is based on a character-count heuristic (chars / 3.5), it is prone to false positives. It would be safer to treat this as a "Warning" and allow the request to proceed to the API, which provides the definitive token validation. A hard rejection should only be used if an accurate tokenizer is implemented to avoid blocking valid user prompts.

Resolved 6 findings from Copilot (3) and gemini-code-assist (3).
Reversed prior decision on imports per user direction (option B):
switching from relative imports to @veo-core/* path aliases now that
root infrastructure (package.json, vitest, CI) is already in scope.

Contract clarifications:
- validateConfig() now explicitly never-throws; returns discriminated
  union. autoFixed and autoFixMessages are first-class fields.
  generateVideo() is the only function that throws on invalid config.
  Test code can introspect ValidationResult without exception handling.
- Option provenance: VeoConfig fields are undefined when user did NOT
  pass the flag. Defaults applied inside validateConfig, not in CLI
  parser. Auto-fix triggers only on undefined fields; explicit
  conflicting values produce hard errors with remedial suggestions.

Imports / infrastructure (decision reversal):
- Switched from relative imports to @veo-core/* path alias
- Added root tsconfig.json + vitest.config.ts to Foundation
  infrastructure setup (migration step 1)
- TS_NODE_PROJECT exported by each script that invokes ts-node
- Rationale: marginal cost over already-planned root infrastructure;
  scales to 4 downstream sub-projects without import path duplication

API request construction:
- Added "API request construction in submitGeneration" subsection
  showing dynamic body building from VeoConfig fields
- Forward-declared modalities (image, lastFrame, referenceImages,
  videoExtensionInput) flow through to the API even before their
  owning sub-projects ship — Foundation does not client-gate; the API
  rejects calls with its own error if a model doesn't support a field

Validation rules:
- Added explicit rule #1: durationSeconds ∈ {4, 6, 8} (was implicit
  in VeoConfig note; bot caught the gap)
- Renumbered remaining Foundation rules 2-8

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@GiuseppeIuculano

Copy link
Copy Markdown
Member Author

Third-round bot review applied (commit 0ac09e6)

Resolved 6 new findings from @copilot-pull-request-reviewer and @gemini-code-assist. Notable: this round includes a reversal of an earlier decision on imports.

Contract clarifications (2)

Source Finding Resolution
Copilot #1 validateConfig contract ambiguous (return type says no throw, data flow says "may throw") Never-throws contract: discriminated union return. autoFixed and autoFixMessages are first-class. generateVideo() is the only function that throws on invalid config. Test code can call validateConfig and inspect results without try/catch.
Copilot #2 Auto-fix can't distinguish user-set vs default values Option provenance: VeoConfig fields are undefined when user didn't pass the flag. Defaults applied inside validateConfig, not in CLI parser. Auto-fix triggers only on undefined fields; explicit conflicts (e.g., --duration 6 --resolution 1080p) produce hard errors with suggestions.

Imports decision reversal (Copilot #3 + gemini #4)

Previous decision: relative imports for Foundation, deferred root tsconfig.
This round: switched to @veo-core/* path aliases per user direction.

Rationale: commit bc82e80 already added root package.json, vitest, and CI workflow to Foundation. The marginal cost of also adding root tsconfig.json + vitest.config.ts is small, and the 4 downstream sub-projects will repeatedly import from _shared/ — relative paths like ../../_shared/veo-core/pricing repeated everywhere are brittle.

Migration plan step 1 now includes: root tsconfig.json with @veo-core/*skills/_shared/veo-core/* alias, vitest.config.ts mirroring it, TS_NODE_PROJECT exported by each script. Pricing protocol's @veo-core/pricing import is now consistent with architecture (it was the contradiction that surfaced this).

API request construction (gemini #5)

Added "API request construction in submitGeneration" subsection showing the function builds the Vertex AI request body dynamically from VeoConfig fields. Forward-declared modalities (image, lastFrame, referenceImages, videoExtensionInput) flow through to the API even before their owning sub-projects ship — Foundation does not client-gate them; the API rejects with its own error if a model doesn't support a field. This means a power user can already exercise image-to-video against Foundation's library before /veo-animate ships (just without SKILL.md workflow guidance).

Missing validation rule (gemini #6)

Added explicit rule #1: durationSeconds ∈ {4, 6, 8} as a Foundation-owned rule. Previously implicit in VeoConfig notes. Foundation rules renumbered 1-8.

Diff summary

0ac09e6: 73 insertions, 21 deletions. Spec is now consistent on import strategy, has unambiguous validation contracts, makes option provenance explicit (mirrored in both data flow and auto-fix tables), and shows how submitGeneration builds requests dynamically.

Ready for next review round.

@GiuseppeIuculano GiuseppeIuculano requested a review from Copilot May 24, 2026 07:36
@GiuseppeIuculano

Copy link
Copy Markdown
Member Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request proposes a design specification for the 'Veo Foundation' sub-project, aimed at centralizing core logic for the veo-tools plugin into a shared library. The plan includes refactoring existing skills, expanding support for new models and parameters, and introducing a robust validation and testing framework. Review feedback suggests several technical improvements: adopting tsconfig-paths for more maintainable alias resolution, switching to the official google-auth-library for authentication, ensuring validation rules are model-specific (particularly for Veo 2), and utilizing the countTokens API to avoid false positives in prompt length enforcement.


The `_shared/` directory is excluded from skill discovery because it contains **no `SKILL.md`** — the Claude Code skill loader scans for `SKILL.md` files, not directory naming. The underscore prefix is a defensive convention (makes the intent visually obvious and avoids ambiguity with future loader rules), but the operative exclusion mechanism is the absence of `SKILL.md`. *To verify before implementation: confirm the current loader behavior in Claude Code; if loader rules ever change to scan by directory name, revisit.*

Consuming skills reference `_shared/` via the **`@veo-core/*` path alias**, configured in a root `tsconfig.json` added as part of Foundation's infrastructure setup (alongside the root `package.json` and CI workflow — see migration plan step 1). The alias resolves under `ts-node` via the `TS_NODE_PROJECT` env var (exported by each skill script that uses it) and under `vitest` via `vitest.config.ts` referencing the same `tsconfig.json`. Example: `import { generateVideo } from '@veo-core/generate'`.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using TS_NODE_PROJECT to resolve path aliases is brittle as it requires manual environment variable management for every script execution. A more maintainable approach is to use the tsconfig-paths package. By registering it (e.g., via NODE_OPTIONS="-r tsconfig-paths/register"), aliases defined in the root tsconfig.json will be resolved automatically across all sub-projects without requiring explicit project path exports in every file.


| Module | Public API | Internal |
|---|---|---|
| `auth.ts` | `getAccessToken(): string` | gcloud CLI invocation, error normalization |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Relying on gcloud CLI invocations for authentication is less robust than using the official google-auth-library. The latter handles various credential types (Service Accounts, ADC, Workload Identity) natively and removes the dependency on an external CLI being installed and configured in the environment. This is particularly important for ensuring the shared library works consistently across different execution contexts, such as CI/CD runners or containerized environments.


| # | Rule | Error if violated | Owned by |
|---|---|---|---|
| 1 | `durationSeconds ∈ {4, 6, 8}` for all standard models | "durationSeconds must be 4, 6, or 8" | Foundation |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Rule 1 enforces a duration of {4, 6, 8} seconds for "all standard models," but the spec also includes support for veo-2.0-generate-001 (Line 172). Veo 2 often has different duration constraints (e.g., 5 or 10 seconds). This rule should be made model-aware to prevent the validation logic from incorrectly rejecting valid configurations for non-Veo 3.1 models.

| 2 | `resolution ∈ {1080p, 4k}` ⇒ `durationSeconds == 8` | "1080p/4K require duration=8" | Foundation |
| 3 | `model ∈ veo-2.*` ⇒ `generateAudio == false` | "Veo 2 doesn't support audio" | Foundation |
| 4 | `model ∈ veo-2.*` ⇒ `resolution == 720p` | "Veo 2 max resolution is 720p" | Foundation |
| 5 | `prompt.tokens > 1024` (approx: chars / 3.5 for Latin-script; see note below) | **Warning** when estimated tokens > 900 (soft); **Reject** only when estimated > 1024 (hard ceiling) — see token counting note | Foundation |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Implementing a "Hard Reject" based on a character-to-token heuristic is likely to result in false positives, blocking valid prompts that the API would otherwise accept. Since Vertex AI provides a countTokens API, it should be used for accurate validation. If an extra API call is undesirable, the "Hard Reject" should be downgraded to a "Warning," allowing the actual API call to proceed and letting the backend return a definitive error if the limit is truly exceeded.

@GiuseppeIuculano

Copy link
Copy Markdown
Member Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the "Veo Foundation" design spec, which centralizes core functionality for the veo-tools plugin into a shared library. Key changes include support for expanded model variants, 4K resolution, and advanced parameters like negativePrompt and storageUri, alongside a new testing infrastructure and an audio context-aware default system. Feedback focuses on enhancing security by precisely defining "origin" for header stripping during redirects, improving reliability with a total timeout for file downloads, and ensuring the documentation explicitly lists the change in default video duration as a breaking change.

| Module | Public API | Internal |
|---|---|---|
| `auth.ts` | `getAccessToken(): Promise<string>` | **Uses `google-auth-library`** npm package (not shelled-out `gcloud` CLI). Supports Service Accounts, ADC, and Workload Identity natively. Removes dependency on gcloud CLI being installed in the execution context — important for CI/CD and containerized usage. The previous `gcloud auth print-access-token` call in `veo-generate.ts` is replaced. |
| `api.ts` | `submitGeneration(config, token): Promise<operationName>`, `pollOperation(opName, token): Promise<status>`, `downloadFile(target, path, token): Promise<void>` (where `target` is either an `https://…` URL or a `gs://…` URI) | URL building, request handling. `makeRequest` (internal) handles HTTPS API calls — does **not** follow redirects (the predict/poll endpoints are not expected to return 3xx) — and uses a **30-second total-request timeout** appropriate for short metadata calls. `downloadFile` accepts both `https://` (downloads via HTTPS, follows redirects manually with a **max-depth limit of 5**, validates non-2xx/3xx → throws with status + body, cleans up partial files on error; uses a **socket idle timeout of 30s** rather than total-request, since large video downloads — especially 4K 8s clips — can legitimately exceed 30s of wall-clock; **on each redirect, if the target origin differs from the current request's origin, the `Authorization` header is stripped before following** — this prevents Vertex bearer tokens leaking to e.g. signed GCS URLs on `storage.googleapis.com`, which already carry their own auth and would also reject a bearer token) and `gs://` (downloads via `@google-cloud/storage` client — that library handles its own retries and timeouts). The Vertex AI Veo API can return either form in `status.videoUrl` depending on project configuration. |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The security policy for stripping the Authorization header on redirects should explicitly define "origin" as the combination of scheme, host, and port. This ensures that tokens are not leaked if a redirect occurs from HTTPS to HTTP on the same host, or to a different port.

| Module | Public API | Internal |
|---|---|---|
| `auth.ts` | `getAccessToken(): Promise<string>` | **Uses `google-auth-library`** npm package (not shelled-out `gcloud` CLI). Supports Service Accounts, ADC, and Workload Identity natively. Removes dependency on gcloud CLI being installed in the execution context — important for CI/CD and containerized usage. The previous `gcloud auth print-access-token` call in `veo-generate.ts` is replaced. |
| `api.ts` | `submitGeneration(config, token): Promise<operationName>`, `pollOperation(opName, token): Promise<status>`, `downloadFile(target, path, token): Promise<void>` (where `target` is either an `https://…` URL or a `gs://…` URI) | URL building, request handling. `makeRequest` (internal) handles HTTPS API calls — does **not** follow redirects (the predict/poll endpoints are not expected to return 3xx) — and uses a **30-second total-request timeout** appropriate for short metadata calls. `downloadFile` accepts both `https://` (downloads via HTTPS, follows redirects manually with a **max-depth limit of 5**, validates non-2xx/3xx → throws with status + body, cleans up partial files on error; uses a **socket idle timeout of 30s** rather than total-request, since large video downloads — especially 4K 8s clips — can legitimately exceed 30s of wall-clock; **on each redirect, if the target origin differs from the current request's origin, the `Authorization` header is stripped before following** — this prevents Vertex bearer tokens leaking to e.g. signed GCS URLs on `storage.googleapis.com`, which already carry their own auth and would also reject a bearer token) and `gs://` (downloads via `@google-cloud/storage` client — that library handles its own retries and timeouts). The Vertex AI Veo API can return either form in `status.videoUrl` depending on project configuration. |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While a 30-second socket idle timeout is appropriate for large video downloads, the design should also specify a maximum total duration (e.g., 15 minutes) for the downloadFile operation. This prevents the process from hanging indefinitely in environments like CI/CD if a connection remains active but extremely slow.

| Default model `veo-3.1-generate-001` → `veo-3.1-generate-preview` (and `veo-3.0-generate-001` fallback) | **Yes, behavioral** | Documented in CHANGELOG; the legacy ID `veo-3.1-generate-001` is dropped from the default chain because it doesn't appear in current docs. Users who hardcoded the legacy ID in their own scripts can keep doing so — the spec only changes the *default*. Aligned with the "Resolved decisions" section. |
| New `validateConfig()` rules with auto-fix | No | Auto-fix is additive; rejects only what API would reject anyway |
| Extended `VeoConfig` type — most new fields are truly optional | No | — |
| `VeoConfig.outputPath` is **additive-required** when `storageUri` is unset | **Yes, behavioral** | Rule #9 hard-errors if neither `outputPath` nor `storageUri` is set. Callers using the previous `generateVideo(config, outputPath)` signature must now pass `outputPath` (or `storageUri`) **inside** the config object. Migration is mechanical and the error message is explicit, but it's not source-compatible with the previous shape. |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The change in the default value for durationSeconds from 8s (current implementation in veo-generate.ts) to 4s (proposed in §1) is a behavioral breaking change. It should be explicitly listed in the "Backwards compatibility" table to ensure users are aware of the shift in default output length.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@GiuseppeIuculano

Copy link
Copy Markdown
Member Author

@claude review

@claude

claude Bot commented May 24, 2026

Copy link
Copy Markdown

Claude finished @GiuseppeIuculano's task in 3m 19s —— View job


Tasks

  • Load tools
  • Read spec file (current state after 19 rounds, commit 621d31f)
  • Analyze for remaining issues
  • Post review findings

The spec is in very good shape after 19 rounds. Architecture decisions are all sound. Three findings remain — one is a concrete backwards-compatibility gap verified against the current codebase, two are precision nits from the final gemini round (17:47 UTC) that weren't captured in the 19th-round summary.


High — Unresolved from final gemini review (17:47 UTC)

1. durationSeconds default change (8 → 4) is not in the backwards-compatibility table

Verified against the current codebase: skills/veo/scripts/veo-generate.ts line 84 has durationSeconds: 8 as the current default, and the CLI help text (line 442) says "default: 8". The spec changes this to 4 (documented in VeoConfig line 330). Users who never pass --duration will get 4-second videos instead of 8-second videos after Foundation lands — a real behavioral change with cost and quality implications (shorter clips, lower cost).

The audio-default change has its own row in the backwards-compatibility table (line 727). This change needs one too.

The 19th-round summary resolved the separate "document what the default is" finding from my prior round 3 review, but the gemini reviewer's distinct finding — "this default change should be in the backwards-compat table" — wasn't addressed.

Suggested table row:

| `durationSeconds` default 8s → 4s | **Yes, behavioral** | Users who never passed `--duration` will get shorter videos. `--duration 8` restores the previous behavior. Documented in CHANGELOG. |

Fix this →


Medium — Two precision gaps from final gemini review

2. downloadFile has no maximum total duration — only socket-idle timeout

Line 186: "uses a socket idle timeout of 30s rather than total-request." A 30s socket-idle timeout correctly handles large downloads but doesn't protect against an extremely slow connection that keeps sending bytes (never idle). The gemini review at 17:47 UTC explicitly asked for "a maximum total duration (e.g., 15 minutes) for the downloadFile operation" to prevent indefinite hangs in CI/CD on a connection that stays technically active but delivers at near-zero throughput. This was not captured in the 19th-round resolutions — the round addressed the Copilot finding (clarify idle vs total) but not this additive request.

Fix this →


3. "Origin" is undefined in the Authorization-stripping security note

Line 186: "if the target origin differs from the current request's origin, the Authorization header is stripped." The spec never defines "origin." Per RFC 6454, origin = scheme + host + port. Without an explicit definition, an implementer might compare only hostnames — failing to strip the header on HTTPS → HTTP redirects to the same host (different scheme = different origin). Signed GCS URLs are always HTTPS, so this particular edge case may not matter in practice, but the ambiguity could cause security issues if the API ever redirects across schemes.

Fix this →


Low — Previously flagged, still unaddressed

4. MODEL_SUGGESTIONS asymmetry has no documented rationale

Two choices in the MODEL_SUGGESTIONS constant (lines 405–413) are inconsistent with each other and with the rest of the table, with no rationale documented:

  • social suggests veo-3.1-generate-preview (preview) as the quality model, while marketing/product/storytelling suggest veo-3.0-generate-001 (stable). The implicit reasoning — "social media tolerates preview; production brand work prefers stability" — isn't stated anywhere.
  • hero-background/ambient/loop include a lite option; social does not, despite Lite supporting text-to-video and being relevant for high-volume social content generation.

A maintainer reviewing model-routing.test.ts assertions will be unable to tell whether these are intentional policy decisions or accidental omissions. One inline comment per inconsistency is sufficient.

I flagged this in my prior review (round 3); it wasn't addressed.


Confirming three previous verdicts


Overall verdict: After 19 rounds this spec is implementation-ready. Finding #1 is the only item that should be fixed before coding starts, as it directly affects the CHANGELOG content and user communication. Findings #2 and #3 are implementation-precision items cheap to close now. Finding #4 is a documentation nit that, if left unaddressed, becomes a maintenance liability once sub-projects start questioning the model-routing logic.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

…GGESTIONS + precision fixes

Resolved 4 review findings + corrected a misinterpretation from
round 17.

Critical correction:
- The 19th-round fix for durationSeconds default incorrectly set
  the spec default to 4s. The current veo-generate.ts line 84 and
  README documentation both pin the CLI/library default at 8s.
  The 4s in the manual integration test #1 was a SKILL.md
  Phase 1 recommendation for hero-background/loop use cases
  (smoother loops), not a global default.
- Restored CLI/library default to 8s (matches existing behavior;
  no breaking change). Added DURATION_SUGGESTIONS constant
  (analogous to AUDIO_DEFAULTS) for use-case-aware Phase 1
  recommendations:
    hero-background/loop/ambient → 4s (smoother loops)
    social/marketing/product/storytelling → 8s (max content)
- Manual checklist expanded from 8 to 9 items: added a "default
  bare CLI invocation" regression to verify no use-case override
  has leaked into the library default chain. Success criteria
  updated to 9/9.

Precision fixes (gemini + claude review):
- downloadFile: added max total duration cap of 15 minutes
  alongside the 30s socket-idle timeout. Belt-and-suspenders against
  near-zero-throughput connections that never trigger idle but
  also never finish.
- "Origin" defined inline per RFC 6454: scheme + host + port.
  Prevents an implementer from comparing only hostnames and
  missing HTTPS→HTTP cross-scheme redirects.
- MODEL_SUGGESTIONS now has asymmetry rationale inline: social
  uses preview (latest features for short attention spans),
  marketing/product/storytelling use stable (production
  reliability), hero/ambient/loop include lite (cost optimization
  for high-volume landing-page usage; not relevant for pure-text
  social ads).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@GiuseppeIuculano

Copy link
Copy Markdown
Member Author

Twentieth-round review applied (commit 931397f)

Important correction: the 19th-round fix for durationSeconds default was wrong — I had documented the default as 4s, but the existing veo-generate.ts (line 84) and README both pin the CLI/library default at 8s. The 4s in the manual integration test #1 was a SKILL.md Phase 1 recommendation for hero-background/loop use cases (smoother loops), not a global default.

Critical correction (1)

# Finding Resolution
1 durationSeconds default change 8→4 — actually a misinterpretation, not a real breaking change Restored default to 8s (matches existing behavior; no breaking change). Added DURATION_SUGGESTIONS constant analogous to AUDIO_DEFAULTS: hero/loop/ambient → 4s, social/marketing/product/storytelling → 8s. Phase 1 SKILL.md applies these as use-case-aware overrides; user's explicit --duration always wins. Manual checklist expanded from 8 to 9 items (added bare-CLI regression test)

Precision fixes (3)

# Finding Resolution
2 downloadFile missing max total duration — slow connections at near-zero throughput hang indefinitely Added 15-minute max total duration alongside the 30s socket-idle timeout. Belt-and-suspenders
3 "Origin" undefined in Authorization-strip note Inline definition added: scheme + host + port per RFC 6454. HTTPS→HTTP redirects on same host are cross-origin
4 MODEL_SUGGESTIONS asymmetry had no documented rationale Inline comments explaining: social uses preview (latest features for short attention), marketing/product/storytelling use stable (production reliability), hero/ambient/loop include lite (cost optimization for landing-page volume; not relevant for pure-text social)

Diff summary

931397f: 36 insertions, 12 deletions. The durationSeconds correction is the most consequential — confirms no breaking change to that default and adds the proper SKILL.md mechanism for use-case-aware overrides.

@GiuseppeIuculano GiuseppeIuculano requested a review from Copilot May 24, 2026 18:04
@GiuseppeIuculano

Copy link
Copy Markdown
Member Author

/gemini review

@GiuseppeIuculano

Copy link
Copy Markdown
Member Author

@claude review

@claude

claude Bot commented May 24, 2026

Copy link
Copy Markdown

Claude finished @GiuseppeIuculano's task in 2m 50s —— View job


Tasks

  • Load tools
  • Read spec file (current state after 20 rounds, commit 931397f)
  • Analyze for remaining issues
  • Post review findings

The spec is in excellent shape after 20 rounds. All previously flagged issues (never-throws contract, REGIONS shape, GenerationResult type, MODEL_SAMPLE_MAX exports, audio reason-string provenance, durationSeconds default correction, DURATION_SUGGESTIONS, downloadFile max total duration, resolveDefaultModel error propagation) are correctly addressed. Three issues remain.


Medium — MODEL_DURATIONS and MODEL_SAMPLE_MAX have no TypeScript shape defined

Every other constant in constants.ts that has behavioral implications has an explicit TypeScript shape in the spec body: AVAILABLE_MODELS (ReadonlySet<string>), DEFAULT_MODEL_CHAIN (readonly string[]), AUDIO_DEFAULTS (Record<string, boolean>), DURATION_SUGGESTIONS (Record<string, number>), MODEL_SUGGESTIONS (Record<string, { quality: string; fast: string; lite?: string }>), REGIONS (ordered array with discriminated-union entries). Both MODEL_DURATIONS and MODEL_SAMPLE_MAX appear only by name in rule descriptions and the constants.ts directory layout line — their types are never defined.

MODEL_DURATIONS[model] (rule #1) must return something a rule can check membership in ({4,6,8}) — Set<number> or number[] or ReadonlySet<number>. MODEL_SAMPLE_MAX[model] (rule #7) returns a number ceiling. An implementer following the spec must guess both, and the wrong choice propagates into validation.ts and validation.test.ts.

Fix this →


Low — MODEL_SUGGESTIONS comment contains a factual error about Veo 3.1 Lite

The rationale comment (lines 430–431) for why social has no lite option reads:

"social does not because lite is image-to-video only (not text-to-video) — out of scope for pure social ad text generation."

But the model table at line 256 says Veo 3.1 Lite "Supports text-to-video (in scope here) and image-to-video (added by /veo-animate)." Lite is not image-only. A maintainer reading both sections in the same file will see a direct contradiction. The actual rationale for excluding social from lite suggestions is probably the first clause of the same comment: "social media tolerates latest features and benefits from highest quality for short attention spans" — the quality argument, not a capability argument. The second clause should be removed or corrected.

Fix this →


Low — videoExtensionInput warning has no home in the formal rule table

buildRequestBody pseudocode (lines 569–576) states: "validateConfig adds an explicit warning 'videoExtensionInput is not yet supported by Foundation; this field will be picked up when /veo-extend ships' so the field isn't silently ignored."

Foundation rules #1–#9 (§3) cover the nine Foundation-owned constraints. None of them covers "forward-declared field present → emit warning." Rules F1–F4 are sub-project-owned and also don't cover this. The warning is described as coming from validateConfig, but there's no rule that triggers it.

Either:

  • Option A: add an implicit rule #10 to the Foundation rules table: "any forward-declared field set on VeoConfig (videoExtensionInput, and any future additions) → warning 'this field has no implementation in Foundation and will be ignored'" — this is general and handles future fields too.
  • Option B: document that this warning is emitted by buildRequestBody / generateVideo (outside validateConfig), not by the validation layer — which would mean GenerationResult.warnings can come from two sources (validation + generation), and the spec should say so.

Option A keeps all warnings in the validation layer (consistent with existing design); Option B requires documenting a second warning source.


Confirming round-20 fixes are correct

  • durationSeconds default restored to 8s (CLI/library default) with DURATION_SUGGESTIONS as SKILL.md-facing hints — correct and consistent with integration checklist item 2 (bare CLI → 8s). ✓
  • downloadFile has both socket-idle timeout (30s) and max total duration (15 min) — belt-and-suspenders correctly documented. ✓
  • "Origin = scheme + host + port per RFC 6454" — cross-origin Authorization-stripping now unambiguous. ✓
  • MODEL_SUGGESTIONS rationale comments added for both stable-vs-preview and lite asymmetries — the quality rationale for social is correct; only the "lite is image-only" clause (above) is wrong. ✓ (except for the Low finding above)

Overall verdict: The spec is implementation-ready after 20 rounds. The Medium finding (missing MODEL_DURATIONS / MODEL_SAMPLE_MAX shapes) is the only item that will surface as a concrete implementer blocker within the first hour of coding — all other constants have explicit shapes. The two Low items are precision corrections worth capturing now. Architecture, validation design, bootstrap mechanism, module contracts, and migration plan are all sound.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive design specification for the Veo Foundation project, which establishes a shared library and infrastructure for Google Veo video generation. The proposal includes a refactored architecture with centralized authentication, API handling, and validation, alongside support for new parameters, models, and audio context-aware defaults. Review feedback focuses on hardening the proposed downloadFile implementation, specifically by recommending the prevention of HTTPS-to-HTTP downgrade redirects, capping error response body sizes to avoid memory issues, and utilizing temporary files for atomic downloads. Additionally, it was suggested to increase the maximum redirect depth to better support complex CDN chains.

| Module | Public API | Internal |
|---|---|---|
| `auth.ts` | `getAccessToken(): Promise<string>` | **Uses `google-auth-library`** npm package (not shelled-out `gcloud` CLI). Supports Service Accounts, ADC, and Workload Identity natively. Removes dependency on gcloud CLI being installed in the execution context — important for CI/CD and containerized usage. The previous `gcloud auth print-access-token` call in `veo-generate.ts` is replaced. |
| `api.ts` | `submitGeneration(config, token): Promise<operationName>`, `pollOperation(opName, token): Promise<status>`, `downloadFile(target, path, token): Promise<void>` (where `target` is either an `https://…` URL or a `gs://…` URI) | URL building, request handling. `makeRequest` (internal) handles HTTPS API calls — does **not** follow redirects (the predict/poll endpoints are not expected to return 3xx) — and uses a **30-second total-request timeout** appropriate for short metadata calls. `downloadFile` accepts both `https://` (downloads via HTTPS, follows redirects manually with a **max-depth limit of 5**, validates non-2xx/3xx → throws with status + body, cleans up partial files on error; uses a **socket idle timeout of 30s** plus a **max total duration of 15 minutes** as a belt-and-suspenders guard against connections that keep sending bytes at near-zero throughput — neither limit alone protects against the other; **on each redirect, if the target origin differs from the current request's origin, the `Authorization` header is stripped before following** — where *origin* is `scheme + host + port` per RFC 6454, so HTTPS→HTTP redirects on the same host are cross-origin even though hostname matches — this prevents Vertex bearer tokens leaking to e.g. signed GCS URLs on `storage.googleapis.com`, which already carry their own auth and would also reject a bearer token) and `gs://` (downloads via `@google-cloud/storage` client — that library handles its own retries and timeouts). The Vertex AI Veo API can return either form in `status.videoUrl` depending on project configuration. |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The proposed downloadFile implementation for https:// targets should explicitly disallow redirects from HTTPS to HTTP (downgrade redirects). Even if the Authorization header is stripped, the content itself (the video) is sensitive and should not be transmitted over an unencrypted channel. Most modern HTTP clients allow configuring this security policy.

| Module | Public API | Internal |
|---|---|---|
| `auth.ts` | `getAccessToken(): Promise<string>` | **Uses `google-auth-library`** npm package (not shelled-out `gcloud` CLI). Supports Service Accounts, ADC, and Workload Identity natively. Removes dependency on gcloud CLI being installed in the execution context — important for CI/CD and containerized usage. The previous `gcloud auth print-access-token` call in `veo-generate.ts` is replaced. |
| `api.ts` | `submitGeneration(config, token): Promise<operationName>`, `pollOperation(opName, token): Promise<status>`, `downloadFile(target, path, token): Promise<void>` (where `target` is either an `https://…` URL or a `gs://…` URI) | URL building, request handling. `makeRequest` (internal) handles HTTPS API calls — does **not** follow redirects (the predict/poll endpoints are not expected to return 3xx) — and uses a **30-second total-request timeout** appropriate for short metadata calls. `downloadFile` accepts both `https://` (downloads via HTTPS, follows redirects manually with a **max-depth limit of 5**, validates non-2xx/3xx → throws with status + body, cleans up partial files on error; uses a **socket idle timeout of 30s** plus a **max total duration of 15 minutes** as a belt-and-suspenders guard against connections that keep sending bytes at near-zero throughput — neither limit alone protects against the other; **on each redirect, if the target origin differs from the current request's origin, the `Authorization` header is stripped before following** — where *origin* is `scheme + host + port` per RFC 6454, so HTTPS→HTTP redirects on the same host are cross-origin even though hostname matches — this prevents Vertex bearer tokens leaking to e.g. signed GCS URLs on `storage.googleapis.com`, which already carry their own auth and would also reject a bearer token) and `gs://` (downloads via `@google-cloud/storage` client — that library handles its own retries and timeouts). The Vertex AI Veo API can return either form in `status.videoUrl` depending on project configuration. |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

When downloadFile throws an error for non-2xx/3xx responses, it is proposed to include the response body in the error. To prevent potential Denial of Service (DoS) via Out-of-Memory (OOM) if a malicious or misconfigured server returns an extremely large error body, the amount of data read from the response stream for error reporting should be strictly capped (e.g., to the first 1KB).

| Module | Public API | Internal |
|---|---|---|
| `auth.ts` | `getAccessToken(): Promise<string>` | **Uses `google-auth-library`** npm package (not shelled-out `gcloud` CLI). Supports Service Accounts, ADC, and Workload Identity natively. Removes dependency on gcloud CLI being installed in the execution context — important for CI/CD and containerized usage. The previous `gcloud auth print-access-token` call in `veo-generate.ts` is replaced. |
| `api.ts` | `submitGeneration(config, token): Promise<operationName>`, `pollOperation(opName, token): Promise<status>`, `downloadFile(target, path, token): Promise<void>` (where `target` is either an `https://…` URL or a `gs://…` URI) | URL building, request handling. `makeRequest` (internal) handles HTTPS API calls — does **not** follow redirects (the predict/poll endpoints are not expected to return 3xx) — and uses a **30-second total-request timeout** appropriate for short metadata calls. `downloadFile` accepts both `https://` (downloads via HTTPS, follows redirects manually with a **max-depth limit of 5**, validates non-2xx/3xx → throws with status + body, cleans up partial files on error; uses a **socket idle timeout of 30s** plus a **max total duration of 15 minutes** as a belt-and-suspenders guard against connections that keep sending bytes at near-zero throughput — neither limit alone protects against the other; **on each redirect, if the target origin differs from the current request's origin, the `Authorization` header is stripped before following** — where *origin* is `scheme + host + port` per RFC 6454, so HTTPS→HTTP redirects on the same host are cross-origin even though hostname matches — this prevents Vertex bearer tokens leaking to e.g. signed GCS URLs on `storage.googleapis.com`, which already carry their own auth and would also reject a bearer token) and `gs://` (downloads via `@google-cloud/storage` client — that library handles its own retries and timeouts). The Vertex AI Veo API can return either form in `status.videoUrl` depending on project configuration. |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The downloadFile logic mentions cleaning up partial files on error using fs.unlink. However, if the process is killed or crashes, a corrupt partial file will remain at the outputPath. A more robust approach is to download to a temporary file (e.g., ${outputPath}.tmp) and only rename it to the final outputPath upon successful completion and validation of the download.

| Module | Public API | Internal |
|---|---|---|
| `auth.ts` | `getAccessToken(): Promise<string>` | **Uses `google-auth-library`** npm package (not shelled-out `gcloud` CLI). Supports Service Accounts, ADC, and Workload Identity natively. Removes dependency on gcloud CLI being installed in the execution context — important for CI/CD and containerized usage. The previous `gcloud auth print-access-token` call in `veo-generate.ts` is replaced. |
| `api.ts` | `submitGeneration(config, token): Promise<operationName>`, `pollOperation(opName, token): Promise<status>`, `downloadFile(target, path, token): Promise<void>` (where `target` is either an `https://…` URL or a `gs://…` URI) | URL building, request handling. `makeRequest` (internal) handles HTTPS API calls — does **not** follow redirects (the predict/poll endpoints are not expected to return 3xx) — and uses a **30-second total-request timeout** appropriate for short metadata calls. `downloadFile` accepts both `https://` (downloads via HTTPS, follows redirects manually with a **max-depth limit of 5**, validates non-2xx/3xx → throws with status + body, cleans up partial files on error; uses a **socket idle timeout of 30s** plus a **max total duration of 15 minutes** as a belt-and-suspenders guard against connections that keep sending bytes at near-zero throughput — neither limit alone protects against the other; **on each redirect, if the target origin differs from the current request's origin, the `Authorization` header is stripped before following** — where *origin* is `scheme + host + port` per RFC 6454, so HTTPS→HTTP redirects on the same host are cross-origin even though hostname matches — this prevents Vertex bearer tokens leaking to e.g. signed GCS URLs on `storage.googleapis.com`, which already carry their own auth and would also reject a bearer token) and `gs://` (downloads via `@google-cloud/storage` client — that library handles its own retries and timeouts). The Vertex AI Veo API can return either form in `status.videoUrl` depending on project configuration. |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The proposed max-depth limit of 5 for redirects is somewhat low. While sufficient for most cases, complex CDN chains or authentication flows (like those involving GCS signed URLs) can sometimes exceed this. Increasing the limit to 10 would align better with common defaults in libraries like axios or follow-redirects and reduce the risk of spurious failures.

rule #10 forward-declared warning

Resolved 7 findings: 4 gemini (1 security-high + 3 medium) + 3 claude
bot (1 medium + 2 low).

Security (High + Medium):
- downloadFile rejects HTTPS→HTTP redirects outright. Even with the
  Authorization header stripped, the video payload itself is
  sensitive and must not transit cleartext. Defense in depth.
- Error response body capped at 1 KB before being included in
  thrown Error. Prevents OOM DoS from a malicious or misconfigured
  server returning a huge error payload.

Reliability (Medium):
- Atomic write pattern: download streams to ${outputPath}.tmp and
  renames to outputPath only on full success. A crashed or killed
  process leaves a stranded .tmp instead of a corrupt final file
  that downstream code might mistake for valid output.
- Redirect max-depth raised from 5 to 10, aligning with
  follow-redirects and axios defaults. CDN chains and GCS
  signed-URL auth flows can legitimately exceed 5 hops.

Type completeness (Medium):
- MODEL_DURATIONS explicit shape: ReadonlyMap<string, ReadonlySet<
  number>>. Veo 3.x maps to {4,6,8}; Veo 2 to {5,6,7,8} (PROVISIONAL
  per Open Question #4).
- MODEL_SAMPLE_MAX explicit shape: Readonly<Record<string, number>>.
  Defensive initial values from current docs; refined per Open
  Question #3 during the probe pass.
- Both maps return undefined for unknown model keys; rules #1 and #7
  skip with soft warnings as documented.

Coherence (Low):
- MODEL_SUGGESTIONS rationale comment corrected. The previous
  comment said "lite is image-to-video only (not text-to-video)" to
  explain why social uses no lite suggestion — but the model table
  in §1 explicitly says Lite supports text-to-video. The actual
  rationale is the quality argument already in the comment;
  removed the factual error and reframed as "lite is omitted on
  quality grounds, not capability."

Architecture (Low):
- Rule #10 added: forward-declared VeoConfig fields without
  Foundation-level semantics (currently videoExtensionInput; future
  additions inherit automatically) trigger a warning when set, then
  are dropped from the request body so the API receives a clean
  call. Previously the spec asserted "validateConfig emits a
  warning" without grounding the warning in a rule — now it's
  rule #10 in the table. The buildRequestBody pseudocode comment
  is updated to reference the rule.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@GiuseppeIuculano

Copy link
Copy Markdown
Member Author

Twenty-first-round review applied (commit b9d3b1f)

Resolved 7 findings (4 gemini + 3 claude bot).

Security hardening (2)

# Severity Finding Resolution
1 High downloadFile could follow HTTPS→HTTP redirects even with Authorization stripped — video payload in cleartext Rejected outright. Defense in depth: even without credentials, sensitive content must not transit cleartext
2 Medium Error body included verbatim in thrown Error → OOM DoS if server returns huge payload Capped at 1 KB before inclusion

Reliability (2)

# Severity Finding Resolution
3 Medium fs.unlink cleanup leaves corrupt partial file if process killed mid-download Atomic write: streams to ${outputPath}.tmp, renames to outputPath only on full success
4 Medium Max redirect depth 5 too low for CDN chains and GCS signed-URL flows Raised to 10 (aligned with follow-redirects and axios defaults)

Type completeness (1)

# Severity Finding Resolution
5 Medium MODEL_DURATIONS and MODEL_SAMPLE_MAX lacked TypeScript shapes — implementer trap Both fully typed: ReadonlyMap<string, ReadonlySet<number>> and Readonly<Record<string, number>> with provisional values + Open Question references

Coherence + architecture (2)

# Severity Finding Resolution
6 Low MODEL_SUGGESTIONS lite comment said "lite is image-to-video only" — contradicting model table which says lite supports text-to-video Comment corrected: "omitted on quality grounds, not capability"
7 Low videoExtensionInput warning attributed to validateConfig but no rule triggered it Rule #10 added: forward-declared VeoConfig fields → warning + dropped from request body. Future forward declarations inherit automatically

Diff summary

b9d3b1f: 41 insertions, 8 deletions. The security findings (HTTPS downgrade + body cap) closed real attack vectors. The atomic-write pattern is a clear robustness improvement. The type shapes close a concrete implementer trap. Rule #10 properly grounds the previously-floating warning in the validation table.

PR #2 had no new reviews this round.

@GiuseppeIuculano

Copy link
Copy Markdown
Member Author

/gemini review

@GiuseppeIuculano

Copy link
Copy Markdown
Member Author

@claude review

@claude

claude Bot commented May 24, 2026

Copy link
Copy Markdown

Claude finished @GiuseppeIuculano's task in 3m 15s —— View job


Tasks

  • Load tools
  • Read spec file (current state at commit b9d3b1f after 21 rounds)
  • Analyze for remaining issues
  • Post review findings

The spec is in excellent shape after 21 rounds of review. Architecture decisions, security hardening, type contracts, validation composition, and migration plan are all well-specified. Four narrow issues remain — all Low severity, all cheap to fix before implementation starts.


Low — Data flow diagram: redirect max depth is 5, module boundary says 10

Round 21 raised downloadFile's max redirect depth from 5 to 10 in the module boundary table (line 186: "max-depth limit of 10"). The data flow diagram at line 213 still reads:

HTTPS request with 30s socket-idle timeout; follow redirects (max 5)

The two values now disagree. An implementer reading only the data flow would implement 5.

Fix this →


Low — Orphan #### REGIONS constant section header with no content

There are two #### \REGIONS` constantheadings in the spec. The first (line 442) has no content — it's immediately followed by#### `MODEL_DURATIONS` and `MODEL_SAMPLE_MAX` shapes. The actual REGIONS content appears under the second occurrence (line 473). The first heading is a leftover placeholder from when the MODEL_DURATIONS/MODEL_SAMPLE_MAX` section was inserted between the header and its content.

This won't affect an implementer much, but it creates an empty ghost section visible in any rendered markdown outline.

Fix this →


Low — Rule table ordering anomaly: rule #10 appears before rules #8 and #9

The Foundation rules table in §3 lists rules in this order: 1, 2, 3, 4, 5, 6, 7, 10, 8, 9, F1…F5. Rule #10 (forward-declared field warning) was added late in the review process and landed in the wrong row position — before rules #8 (aspectRatio) and #9 (output destination). An implementer reading the table sequentially would naturally expect rules 8 and 9 to precede 10. The FOUNDATION_RULES array comment [/* rules #1–#10 */] at line 662 also implies sequential ordering.

Fix this →


Low — DURATION_SUGGESTIONS has no test file in the test plan

The test plan (§ "Unit tests") includes audio-default.test.ts for AUDIO_DEFAULTS and model-routing.test.ts for MODEL_SUGGESTIONS. DURATION_SUGGESTIONS (added in round 20, an analogous constant mapping use case → duration hint) has no corresponding test file. The constant follows the same pattern as the others and is equally easy to test.

Either add duration-suggestions.test.ts to the test plan, or fold its coverage into audio-default.test.ts with a note (since both test pure lookup constants).

Fix this →


Not a bug — confirming round-21 fixes

  • HTTPS→HTTP redirect rejection: correctly documented in line 186: "HTTPS→HTTP redirects are rejected outright regardless of header handling". ✓
  • Error body cap at 1 KB: line 186 specifies "status + body capped at 1 KB". ✓
  • Atomic write via .tmp: line 186 specifies "writes to ${outputPath}.tmp and atomically renames to outputPath only on full success". ✓
  • Max redirect depth raised to 10: line 186 correctly says "max-depth limit of 10". ✓ (only the data flow diagram inconsistency noted above)
  • MODEL_DURATIONS and MODEL_SAMPLE_MAX TypeScript shapes: fully typed at lines 449–468. ✓
  • Rule #10 for forward-declared fields: correctly added to the table (out of order — see above — but semantically complete). ✓
  • MODEL_SUGGESTIONS lite comment corrected: no longer claims Lite is "image-to-video only"; correctly says "omitted on quality grounds, not capability". ✓

Overall verdict: The spec is implementation-ready. All four Low findings are spec hygiene issues that are cheap to fix now; none are design or contract gaps. The data flow / module-boundary redirect-depth inconsistency is the one a careful implementer might act on incorrectly. After 21 rounds, no unresolved architectural, security, or contract issues remain.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a design specification for the Veo Foundation sub-project, which refactors the veo-tools plugin into a shared library architecture. The proposal includes support for new API parameters, expanded model options, and a context-aware audio system. Review feedback suggests applying atomic write/rename logic consistently across both HTTPS and GCS download methods and using unique temporary filenames to prevent potential race conditions during file operations.

| Module | Public API | Internal |
|---|---|---|
| `auth.ts` | `getAccessToken(): Promise<string>` | **Uses `google-auth-library`** npm package (not shelled-out `gcloud` CLI). Supports Service Accounts, ADC, and Workload Identity natively. Removes dependency on gcloud CLI being installed in the execution context — important for CI/CD and containerized usage. The previous `gcloud auth print-access-token` call in `veo-generate.ts` is replaced. |
| `api.ts` | `submitGeneration(config, token): Promise<operationName>`, `pollOperation(opName, token): Promise<status>`, `downloadFile(target, path, token): Promise<void>` (where `target` is either an `https://…` URL or a `gs://…` URI) | URL building, request handling. `makeRequest` (internal) handles HTTPS API calls — does **not** follow redirects (the predict/poll endpoints are not expected to return 3xx) — and uses a **30-second total-request timeout** appropriate for short metadata calls. `downloadFile` accepts both `https://` (downloads via HTTPS, follows redirects manually with a **max-depth limit of 10** — aligned with `follow-redirects` and `axios` defaults; CDN chains and GCS signed-URL flows can exceed 5; validates non-2xx/3xx → throws with status + body **capped at 1 KB** to prevent OOM if a misbehaving server returns a huge error payload; **writes to `${outputPath}.tmp` and atomically renames to `outputPath` only on full success**, so a crashed or killed process leaves a stranded `.tmp` instead of a corrupt final file; uses a **socket idle timeout of 30s** plus a **max total duration of 15 minutes** as a belt-and-suspenders guard against connections that keep sending bytes at near-zero throughput — neither limit alone protects against the other; **on each redirect, if the target origin differs from the current request's origin, the `Authorization` header is stripped before following** — where *origin* is `scheme + host + port` per RFC 6454, so HTTPS→HTTP redirects on the same host are cross-origin even though hostname matches — this prevents Vertex bearer tokens leaking to e.g. signed GCS URLs on `storage.googleapis.com`, which already carry their own auth and would also reject a bearer token; **HTTPS→HTTP redirects are rejected outright** regardless of header handling — the video payload itself is sensitive and must not transit cleartext even with credentials stripped) and `gs://` (downloads via `@google-cloud/storage` client — that library handles its own retries and timeouts). The Vertex AI Veo API can return either form in `status.videoUrl` depending on project configuration. |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The atomic write/rename logic (${outputPath}.tmp -> outputPath) is described for the manual HTTPS download implementation but not explicitly for the GCS client download. For consistency and robustness, both download methods should use the same temporary file pattern to ensure that a crashed or killed process does not leave a corrupt or partial file at the final destination.

| Module | Public API | Internal |
|---|---|---|
| `auth.ts` | `getAccessToken(): Promise<string>` | **Uses `google-auth-library`** npm package (not shelled-out `gcloud` CLI). Supports Service Accounts, ADC, and Workload Identity natively. Removes dependency on gcloud CLI being installed in the execution context — important for CI/CD and containerized usage. The previous `gcloud auth print-access-token` call in `veo-generate.ts` is replaced. |
| `api.ts` | `submitGeneration(config, token): Promise<operationName>`, `pollOperation(opName, token): Promise<status>`, `downloadFile(target, path, token): Promise<void>` (where `target` is either an `https://…` URL or a `gs://…` URI) | URL building, request handling. `makeRequest` (internal) handles HTTPS API calls — does **not** follow redirects (the predict/poll endpoints are not expected to return 3xx) — and uses a **30-second total-request timeout** appropriate for short metadata calls. `downloadFile` accepts both `https://` (downloads via HTTPS, follows redirects manually with a **max-depth limit of 10** — aligned with `follow-redirects` and `axios` defaults; CDN chains and GCS signed-URL flows can exceed 5; validates non-2xx/3xx → throws with status + body **capped at 1 KB** to prevent OOM if a misbehaving server returns a huge error payload; **writes to `${outputPath}.tmp` and atomically renames to `outputPath` only on full success**, so a crashed or killed process leaves a stranded `.tmp` instead of a corrupt final file; uses a **socket idle timeout of 30s** plus a **max total duration of 15 minutes** as a belt-and-suspenders guard against connections that keep sending bytes at near-zero throughput — neither limit alone protects against the other; **on each redirect, if the target origin differs from the current request's origin, the `Authorization` header is stripped before following** — where *origin* is `scheme + host + port` per RFC 6454, so HTTPS→HTTP redirects on the same host are cross-origin even though hostname matches — this prevents Vertex bearer tokens leaking to e.g. signed GCS URLs on `storage.googleapis.com`, which already carry their own auth and would also reject a bearer token; **HTTPS→HTTP redirects are rejected outright** regardless of header handling — the video payload itself is sensitive and must not transit cleartext even with credentials stripped) and `gs://` (downloads via `@google-cloud/storage` client — that library handles its own retries and timeouts). The Vertex AI Veo API can return either form in `status.videoUrl` depending on project configuration. |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using a fixed .tmp suffix for atomic writes can lead to race conditions or collisions if multiple processes attempt to write to the same output path simultaneously. It is safer to use a unique temporary filename (e.g., by appending a UUID or a random string) within the same directory before the final atomic rename.

Resolved 6 review findings (2 gemini + 4 claude bot).

Atomic write robustness:
- gemini #1: gs:// branch now uses the same atomic-write pattern as
  HTTPS. The @google-cloud/storage downloads also go through temp
  file + rename, not direct to outputPath. Consistent crash safety.
- gemini #2: temp file naming now includes a random suffix:
  ${outputPath}.${randomSuffix}.tmp instead of fixed .tmp. Prevents
  collision if two processes target the same output path
  concurrently (low likelihood in practice but trivially correct).

Coherence cleanup:
- claude #3: data flow diagram updated to "max 10" matching the
  module boundary (was still showing the pre-round-21 "max 5").
  Diagram also now shows the cross-origin auth strip, HTTPS→HTTP
  rejection, and atomic write — full alignment with the module
  boundary description.
- claude #4: orphan "#### `REGIONS` constant" heading removed.
  Round 21 inserted MODEL_DURATIONS shapes between the heading
  and its content, leaving an empty section visible in the
  rendered outline.
- claude #5: rule #10 row moved to its correct position after
  rules #8 and #9 in the validation table. Numeric sequence
  is now #1-#10 followed by F1-F5.
- claude #6: DURATION_SUGGESTIONS test coverage added to
  audio-default.test.ts (co-located since both are pure lookup
  constants with identical test shape — avoids trivial file
  proliferation). Test plan now explicitly references both.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@GiuseppeIuculano

Copy link
Copy Markdown
Member Author

Twenty-second-round review applied (commit 6376f41)

Resolved 6 findings (2 gemini Medium + 4 claude bot Low).

Reliability (2)

# Source Finding Resolution
1 gemini Atomic write pattern documented only for HTTPS branch, not GCS Now consistent: @google-cloud/storage downloads also use .tmp + rename, not direct write
2 gemini Fixed .tmp suffix → race condition for concurrent processes Random suffix: ${outputPath}.${randomSuffix}.tmp

Coherence cleanup (4)

# Source Finding Resolution
3 claude Data flow diagram still said "max 5"; module boundary updated to 10 in round 21 Diagram aligned to 10, also added cross-origin auth strip, HTTPS→HTTP rejection, atomic write
4 claude Orphan #### REGIONS constant heading with no content Removed (was leftover from round 21 insertion of MODEL_DURATIONS section in between)
5 claude Rule #10 row out of order — appeared before #8/#9 in the table Moved to correct sequential position after #9
6 claude DURATION_SUGGESTIONS had no test file in plan Coverage added to audio-default.test.ts (co-located since both are pure lookup constants with identical test shape)

Diff summary

6376f41: 6 insertions, 7 deletions. All findings closed — none architectural, all cleanup of stragglers from prior rounds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants