Skip to content

feat(scripts): pnpm smoke — batch model/prompt API tester#108

Open
hqhq1025 wants to merge 6 commits intomainfrom
worktree-feat-smoke-models-cli
Open

feat(scripts): pnpm smoke — batch model/prompt API tester#108
hqhq1025 wants to merge 6 commits intomainfrom
worktree-feat-smoke-models-cli

Conversation

@hqhq1025
Copy link
Copy Markdown
Collaborator

Summary

  • New `scripts/smoke-models.ts` runs a (provider × model × prompt) matrix through the same `generate()` code path the desktop app uses — same prompt assembly, same reasoning self-healing, same parser.
  • Saves each artifact to `/tmp/smoke/___.html` so they can be opened in a browser side by side.
  • Quality checks per artifact: acorn JS syntax (catches the kind of typos we hit yesterday), multiple `` elements, emoji-as-icon usage, missing `TWEAK_DEFAULTS` EDITMODE block.
  • API keys come from environment variables (`OPENROUTER_API_KEY`, `ANTHROPIC_API_KEY`, etc.) — never stored in repo.

CLI flags:
```
pnpm smoke
pnpm smoke --model openai/gpt-oss-120b:free
pnpm smoke --prompt "数据看板"
pnpm smoke --only-failed
```

The `scripts/smoke-models.toml` ships with 5 default models × 4 default prompts (dashboard, mobile onboarding, marketing landing, settings) — easy to extend.

Why

Manual loop of "type prompt in app → wait → look at result → switch model in Settings → repeat" was the bottleneck for debugging API issues (today's reasoning-mandatory bug, prompt quality gaps). `pnpm smoke` collapses that into one command across the whole model lineup.

Test plan

  • Typechecks (`pnpm exec tsc --noEmit ... scripts/smoke-models.ts`)
  • Lints clean (`pnpm exec biome check scripts/ package.json`)
  • Dry-run with no API keys exits cleanly with `0/0 passed` and per-model "skipped (no $KEY)" message
  • Bogus `--model` errors out with exit 2 and red message
  • End-to-end run with real keys (deferred — requires user's keys; `pnpm smoke` after merge will validate)

🤖 Generated with Claude Code

Comment thread scripts/smoke-models.ts
// distinguish module from script (Babel handles JSX in the artifact at
// runtime) — just check it's lexically clean enough that tools like esbuild
// wouldn't choke.
const scripts = [...html.matchAll(/<script\b[^>]*>([\s\S]*?)<\/script>/gi)];
Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings

  • [Major] Unknown providers are silently treated as missing API keys, which can hide config typos and produce false-green smoke runs — this weakens the harness signal and violates the no-silent-fallback expectation for error handling, evidence scripts/smoke-models.ts:212
    Suggested fix:
    const envName = ENV_KEY[m.provider];
    if (!envName) {
      throw new Error(
        `Unsupported provider in smoke config: ${m.provider}. ` +
          `Add it to ENV_KEY or fix scripts/smoke-models.toml.`,
      );
    }
    
    const apiKey = process.env[envName];
    if (!apiKey) {
      console.log(
        `${COLOR.dim}${m.provider}/${m.modelId}  (no $${envName}; skipped)${COLOR.reset}`,
      );
      continue;
    }

Summary

  • Review mode: initial
  • 1 Major issue found in the newly added smoke harness.
  • Residual risk: no automated tests were added for CLI behavior (--only-failed, unknown provider handling, and exit codes), so regressions here may only be caught manually.

Testing

  • Not run (automation)

open-codesign Bot

Comment thread scripts/smoke-models.ts Outdated
const results: RunResult[] = [];
for (const m of models) {
const envName = ENV_KEY[m.provider];
const apiKey = envName ? process.env[envName] : undefined;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unknown providers currently fall through as envName undefined and get reported as "no $; skipped". That makes provider typos in smoke-models.toml look like normal skips and can produce false-green runs. Please fail fast for unsupported providers before checking env vars.

const envName = ENV_KEY[m.provider];
if (!envName) {
  throw new Error(
    `Unsupported provider in smoke config: ${m.provider}. ` +
      `Add it to ENV_KEY or fix scripts/smoke-models.toml.`,
  );
}

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings

  • [Major] Unsupported providers still fall through as a normal "missing key" skip, which can hide config typos and produce false-green smoke runs; this remains a silent fallback path, evidence scripts/smoke-models.ts:211
    Suggested fix:

    const envName = ENV_KEY[m.provider];
    if (!envName) {
      throw new Error(
        `Unsupported provider in smoke config: ${m.provider}. ` +
          `Add it to ENV_KEY or fix scripts/smoke-models.toml.`,
      );
    }
    const apiKey = process.env[envName];
  • [Minor] New CLI behavior (--model, --prompt, --only-failed, exit codes) has no Vitest coverage in this PR, so regressions in selection and status handling may ship unnoticed, evidence scripts/smoke-models.ts:241
    Suggested fix:

    // scripts/smoke-models.test.ts
    it('fails fast on unsupported provider', async () => {
      await expect(runMatrix([{ provider: 'typo', modelId: 'x' }], prompts, null))
        .rejects.toThrow(/Unsupported provider/);
    });
    
    it('throws when --model does not match', () => {
      expect(() => selectModels(models, 'does/not-exist')).toThrow();
    });

Summary

  • Review mode: follow-up after new commits
  • 2 issues found (1 Major, 1 Minor).
  • docs/VISION.md and docs/PRINCIPLES.md: Not found in repo/docs.

Testing

  • Not run (automation)

open-codesign Bot

Comment thread scripts/smoke-models.ts
): Promise<RunResult[]> {
const results: RunResult[] = [];
for (const m of models) {
const envName = ENV_KEY[m.provider];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsupported providers still degrade into a normal skipped run (no $<unknown env>). That masks provider typos in config and creates false-green results. Please fail fast before env lookup.

const envName = ENV_KEY[m.provider];
if (!envName) {
  throw new Error(
    `Unsupported provider in smoke config: ${m.provider}. ` +
      `Add it to ENV_KEY or fix scripts/smoke-models.toml.`,
  );
}
const apiKey = process.env[envName];

Comment thread scripts/smoke-models.ts
process.exit(passed === results.length ? 0 : 1);
}

async function main(): Promise<void> {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR adds a non-trivial CLI surface (--model, --prompt, --only-failed, exit codes), but there is no Vitest coverage in the diff for behavior and failure modes. Please add focused tests for unsupported-provider handling and selection/exit-code paths.

it('fails fast on unsupported provider', async () => {
  await expect(runMatrix([{ provider: 'typo', modelId: 'x' }], prompts, null))
    .rejects.toThrow(/Unsupported provider/);
});

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings

  • [Major] Unsanitized prompt.name is embedded in the output filename, allowing path traversal via TOML config (../) and writes outside /tmp/smoke; this can overwrite arbitrary files the current user can access, evidence scripts/smoke-models.ts:83, scripts/smoke-models.ts:152
    Suggested fix:

    function slug(model: SmokeModel, prompt: SmokePrompt): string {
      const safeModel = `${model.provider}_${model.modelId}`.replace(/[^a-zA-Z0-9._-]/g, '_');
      const safePrompt = prompt.name.replace(/[^a-zA-Z0-9._-]/g, '_');
      return `${safeModel}__${safePrompt}`;
    }
    
    const artifactPath = resolve(SMOKE_DIR, `${slug(model, prompt)}.html`);
    if (!artifactPath.startsWith(`${SMOKE_DIR}/`)) {
      throw new Error(`Refusing to write outside ${SMOKE_DIR}: ${artifactPath}`);
    }
    writeFileSync(artifactPath, html);
  • [Minor] New CLI behavior (--model, --prompt, --only-failed, exit semantics) still has no Vitest coverage in this PR, so regressions in argument selection and failure handling can slip through, evidence scripts/smoke-models.ts:176, scripts/smoke-models.ts:188, scripts/smoke-models.ts:193, scripts/smoke-models.ts:247
    Suggested fix:

    // scripts/smoke-models.test.ts
    it('throws on unsupported provider in config', async () => {
      await expect(runMatrix([{ provider: 'typo', modelId: 'x' }], prompts, null))
        .rejects.toThrow(/Unsupported provider/);
    });
    
    it('fails when --model has no matches', () => {
      expect(() => selectModels(models, 'does/not-exist')).toThrow();
    });

Summary

  • Review mode: follow-up after new commits
  • 2 issues found (1 Major, 1 Minor).
  • docs/VISION.md and docs/PRINCIPLES.md: Not found in repo/docs.

Testing

  • Not run (automation)

open-codesign Bot

Comment thread scripts/smoke-models.ts Outdated

function slug(model: SmokeModel, prompt: SmokePrompt): string {
const safeModel = `${model.provider}_${model.modelId}`.replace(/[/:]/g, '_');
return `${safeModel}__${prompt.name}`;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prompt.name is used verbatim in the generated filename, so a config value like ../foo can escape /tmp/smoke and write outside the intended directory. Please sanitize prompt.name and enforce resolve(...).startsWith(${SMOKE_DIR}/) before writeFileSync.

Comment thread scripts/smoke-models.ts
process.exit(passed === results.length ? 0 : 1);
}

async function main(): Promise<void> {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script introduces a meaningful CLI surface (--model, --prompt, --only-failed, exit behavior), but the PR still has no Vitest coverage for selection and failure paths. Please add focused tests for unsupported-provider, no-model-match, and only-failed behavior.

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings

  • [Major] Path-boundary check is platform-fragile and can reject valid writes on Windows — resolve() returns platform-native separators, but startsWith("/tmp/smoke/") assumes POSIX; this can throw Refusing to write outside /tmp/smoke for legitimate paths, breaking pnpm smoke on Windows, evidence scripts/smoke-models.ts:154
    Suggested fix:

    import { isAbsolute, relative, resolve } from 'node:path';
    
    const artifactPath = resolve(SMOKE_DIR, `${slug(model, prompt)}.html`);
    const rel = relative(SMOKE_DIR, artifactPath);
    if (rel.startsWith('..') || isAbsolute(rel)) {
      throw new Error(`Refusing to write outside ${SMOKE_DIR}: ${artifactPath}`);
    }
    writeFileSync(artifactPath, html);
  • [Minor] New CLI behaviors still have no automated tests — --model, --prompt, --only-failed, and unsupported-provider paths were added/changed without Vitest coverage, increasing regression risk in selection/exit semantics, evidence scripts/smoke-models.ts:251
    Suggested fix:

    // scripts/smoke-models.test.ts
    it('throws on unsupported provider in config', async () => {
      await expect(runMatrix([{ provider: 'typo', modelId: 'x' }], prompts, null))
        .rejects.toThrow(/Unsupported provider/);
    });
    
    it('fails when --model has no matches', () => {
      expect(() => selectModels(models, 'does/not-exist')).toThrow();
    });

Summary

  • Review mode: follow-up after new commits
  • 2 issues found (1 Major, 1 Minor).
  • docs/VISION.md and docs/PRINCIPLES.md: Not found in repo/docs.

Testing

  • Not run (automation)

open-codesign Bot

Comment thread scripts/smoke-models.ts Outdated
const ms = Date.now() - t0;
const html = result.artifacts[0]?.content ?? '';
const artifactPath = resolve(SMOKE_DIR, `${slug(model, prompt)}.html`);
if (!artifactPath.startsWith(`${SMOKE_DIR}/`)) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

startsWith(${SMOKE_DIR}/) is POSIX-specific after resolve(). On Windows this can reject valid in-dir paths due to \ separators. Please switch to relative() + isAbsolute() boundary checks to keep traversal protection cross-platform.

Comment thread scripts/smoke-models.ts
process.exit(passed === results.length ? 0 : 1);
}

async function main(): Promise<void> {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new CLI surface (--model, --prompt, --only-failed, unsupported-provider handling) still lacks Vitest coverage. Please add focused tests for no-match model selection and unsupported provider failure to lock behavior.

Adds scripts/smoke-models.ts and scripts/smoke-models.toml for batch-testing
(provider × model × prompt) combinations through the exact same generate()
code path the desktop app uses. Saves each artifact to /tmp/smoke/ and runs
lightweight quality checks (acorn syntax, multiple <main>, emoji icons,
missing EDITMODE block).

API keys come from environment variables (OPENROUTER_API_KEY, etc.). CLI
flags: --model, --prompt, --only-failed, --config.

Cuts the manual click-and-wait debug loop down to 'pnpm smoke' for quick
checks across the model lineup.

Signed-off-by: hqhq1025 <1506751656@qq.com>
Adds 8 free OpenRouter models worth comparing for design-generation quality:
qwen3-coder, deepseek-r1, nemotron-3-super, llama-3.3-70b, gpt-oss-120b,
minimax-m2.5, devstral-2, mimo-v2-flash. Two of them (deepseek-r1, minimax)
exercise the reasoning self-healing path.

Documents the gitignored scripts/.smoke-keys.env pattern for stashing keys
without exporting them every shell session.

Signed-off-by: hqhq1025 <1506751656@qq.com>
Signed-off-by: hqhq1025 <1506751656@qq.com>
Signed-off-by: hqhq1025 <1506751656@qq.com>
@hqhq1025 hqhq1025 force-pushed the worktree-feat-smoke-models-cli branch from 128bae7 to 8221dbc Compare April 19, 2026 13:45
Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings

  • [Major] Path-boundary guard is platform-fragile and can reject valid in-dir writes on Windows — resolve() returns platform-native separators, but startsWith("/tmp/smoke/") assumes POSIX separators; this can trigger a false Refusing to write outside /tmp/smoke and break pnpm smoke, evidence scripts/smoke-models.ts:154
    Suggested fix:

    import { isAbsolute, relative, resolve } from 'node:path';
    
    const artifactPath = resolve(SMOKE_DIR, `${slug(model, prompt)}.html`);
    const rel = relative(SMOKE_DIR, artifactPath);
    if (rel.startsWith('..') || isAbsolute(rel)) {
      throw new Error(`Refusing to write outside ${SMOKE_DIR}: ${artifactPath}`);
    }
    writeFileSync(artifactPath, html);
  • [Minor] New smoke CLI behavior still lacks automated coverage for regression-prone paths — newly added --model, --prompt, --only-failed, and unsupported-provider failure behavior has no Vitest guard, evidence scripts/smoke-models.ts:180, scripts/smoke-models.ts:192, scripts/smoke-models.ts:197, scripts/smoke-models.ts:216
    Suggested fix:

    // scripts/smoke-models.test.ts
    it('throws on unsupported provider in config', async () => {
      await expect(runMatrix([{ provider: 'typo', modelId: 'x' }], prompts, null))
        .rejects.toThrow(/Unsupported provider/);
    });
    
    it('fails when --model has no matches', () => {
      expect(() => selectModels(models, 'does/not-exist')).toThrow();
    });

Summary

  • Review mode: follow-up after new commits
  • 2 issues found (1 Major, 1 Minor).
  • docs/VISION.md: Not found in repo/docs.
  • docs/PRINCIPLES.md: Not found in repo/docs.

Testing

  • Not run (automation)

open-codesign Bot

Comment thread scripts/smoke-models.ts Outdated
const ms = Date.now() - t0;
const html = result.artifacts[0]?.content ?? '';
const artifactPath = resolve(SMOKE_DIR, `${slug(model, prompt)}.html`);
if (!artifactPath.startsWith(`${SMOKE_DIR}/`)) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

startsWith(${SMOKE_DIR}/) after resolve() is POSIX-specific. On Windows, path separators are \\, so valid in-dir artifact paths can fail the boundary check and throw. Please use relative() + isAbsolute() for cross-platform traversal protection.

Codex flagged startsWith('/tmp/smoke/') as POSIX-only — Windows path
separators would trigger a false 'Refusing to write outside' rejection.
Switch to relative() + isAbsolute() checks which are platform-agnostic.

Signed-off-by: hqhq1025 <1506751656@qq.com>
Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings

  • [Major] Empty-artifact generations are counted as success, creating false-green smoke results — runOne writes result.artifacts[0]?.content ?? '' and returns ok: true even when no artifact is produced, so failed provider/model outputs can be reported as passed, evidence scripts/smoke-models.ts:152, scripts/smoke-models.ts:161
    Suggested fix:

    const artifact = result.artifacts[0];
    if (!artifact?.content?.trim()) {
      throw new Error('No HTML artifact returned from generate()');
    }
    const html = artifact.content;
  • [Minor] New smoke CLI logic is still untested in Vitest for key regression paths — changed behavior like unsupported-provider fail-fast and empty-artifact handling can regress silently without tests, evidence scripts/smoke-models.ts:211, scripts/smoke-models.ts:220, scripts/smoke-models.ts:152
    Suggested fix:

    // scripts/smoke-models.test.ts
    it('throws on unsupported provider in config', async () => {
      await expect(runMatrix([{ provider: 'typo', modelId: 'x' }], prompts, null))
        .rejects.toThrow(/Unsupported provider/);
    });
    
    it('fails when generate returns no artifact', async () => {
      await expect(runOne(model, prompt, 'k')).resolves.toMatchObject({ ok: false });
    });

Summary

  • Review mode: follow-up after new commits
  • 2 issues found (1 Major, 1 Minor).

Testing

  • Not run (automation)

open-codesign Bot

Comment thread scripts/smoke-models.ts Outdated
},
});
const ms = Date.now() - t0;
const html = result.artifacts[0]?.content ?? '';
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

result.artifacts[0]?.content ?? allows empty/missing artifacts to pass as success. This can hide real model/provider failures in smoke runs. Please fail this run when no non-empty artifact content exists before writing the file.

Comment thread scripts/smoke-models.ts
for (const m of models) {
const envName = ENV_KEY[m.provider];
if (!envName) {
throw new Error(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Behavioral changes in this script (unsupported-provider fail-fast and success criteria) should have Vitest coverage to prevent regressions. Please add focused tests for these paths.

Codex flagged false-green smoke results when generate() returns no artifact:
runOne previously coerced missing content to '' and reported ok: true. Now
throw when artifact is missing or whitespace-only — error message surfaces
in the report and the run is correctly counted as failed.

Signed-off-by: hqhq1025 <1506751656@qq.com>
Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings

  • [Minor] Missing Vitest coverage for new fail-fast behavior in smoke harness — this PR introduces unsupported-provider failure logic without tests, so later refactors can silently regress back to false-green smoke runs, evidence scripts/smoke-models.ts:223.
    Suggested fix:
    // scripts/smoke-models.test.ts
    import { describe, expect, it } from 'vitest';
    import { runMatrix } from './smoke-models';
    
    it('throws when config contains an unsupported provider', async () => {
      await expect(
        runMatrix([{ provider: 'typo', modelId: 'm' }], [{ name: 'p', text: 't' }], null),
      ).rejects.toThrow(/Unsupported provider/);
    });

Summary

  • Review mode: follow-up after new commits
  • 1 issue found (Minor).
  • docs/VISION.md and docs/PRINCIPLES.md: Not found in repo/docs in this runner (only CLAUDE.md and docs/research/* available).

Testing

  • Not run (automation)

open-codesign Bot

Comment thread scripts/smoke-models.ts
const results: RunResult[] = [];
for (const m of models) {
const envName = ENV_KEY[m.provider];
if (!envName) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a Vitest regression test for this fail-fast branch. This behavior is new in the PR and currently untested, which makes it easy to regress back to silent skips in later edits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants