feat(scripts): pnpm smoke — batch model/prompt API tester by hqhq1025 · Pull Request #108 · OpenCoworkAI/open-codesign

hqhq1025 · 2026-04-19T10:35:04Z

Summary

New `scripts/smoke-models.ts` runs a (provider × model × prompt) matrix through the same `generate()` code path the desktop app uses — same prompt assembly, same reasoning self-healing, same parser.
Saves each artifact to `/tmp/smoke/___.html` so they can be opened in a browser side by side.
Quality checks per artifact: acorn JS syntax (catches the kind of typos we hit yesterday), multiple `` elements, emoji-as-icon usage, missing `TWEAK_DEFAULTS` EDITMODE block.
API keys come from environment variables (`OPENROUTER_API_KEY`, `ANTHROPIC_API_KEY`, etc.) — never stored in repo.

CLI flags:
```
pnpm smoke
pnpm smoke --model openai/gpt-oss-120b:free
pnpm smoke --prompt "数据看板"
pnpm smoke --only-failed
```

The `scripts/smoke-models.toml` ships with 5 default models × 4 default prompts (dashboard, mobile onboarding, marketing landing, settings) — easy to extend.

Why

Manual loop of "type prompt in app → wait → look at result → switch model in Settings → repeat" was the bottleneck for debugging API issues (today's reasoning-mandatory bug, prompt quality gaps). `pnpm smoke` collapses that into one command across the whole model lineup.

Test plan

Typechecks (`pnpm exec tsc --noEmit ... scripts/smoke-models.ts`)
Lints clean (`pnpm exec biome check scripts/ package.json`)
Dry-run with no API keys exits cleanly with `0/0 passed` and per-model "skipped (no $KEY)" message
Bogus `--model` errors out with exit 2 and red message
End-to-end run with real keys (deferred — requires user's keys; `pnpm smoke` after merge will validate)

🤖 Generated with Claude Code

+  // distinguish module from script (Babel handles JSX in the artifact at
+  // runtime) — just check it's lexically clean enough that tools like esbuild
+  // wouldn't choke.
+  const scripts = [...html.matchAll(/<script\b[^>]*>([\s\S]*?)<\/script>/gi)];


github-actions

Findings

[Major] Unknown providers are silently treated as missing API keys, which can hide config typos and produce false-green smoke runs — this weakens the harness signal and violates the no-silent-fallback expectation for error handling, evidence scripts/smoke-models.ts:212
Suggested fix:

const envName = ENV_KEY[m.provider];
if (!envName) {
  throw new Error(
    `Unsupported provider in smoke config: ${m.provider}. ` +
      `Add it to ENV_KEY or fix scripts/smoke-models.toml.`,
  );
}

const apiKey = process.env[envName];
if (!apiKey) {
  console.log(
    `${COLOR.dim}— ${m.provider}/${m.modelId}  (no $${envName}; skipped)${COLOR.reset}`,
  );
  continue;
}

Summary

Review mode: initial
1 Major issue found in the newly added smoke harness.
Residual risk: no automated tests were added for CLI behavior (--only-failed, unknown provider handling, and exit codes), so regressions here may only be caught manually.

Testing

Not run (automation)

open-codesign Bot

github-actions · 2026-04-19T10:37:28Z

+  const results: RunResult[] = [];
+  for (const m of models) {
+    const envName = ENV_KEY[m.provider];
+    const apiKey = envName ? process.env[envName] : undefined;


Unknown providers currently fall through as envName undefined and get reported as "no $; skipped". That makes provider typos in smoke-models.toml look like normal skips and can produce false-green runs. Please fail fast for unsupported providers before checking env vars.

const envName = ENV_KEY[m.provider]; if (!envName) { throw new Error( `Unsupported provider in smoke config: ${m.provider}. ` + `Add it to ENV_KEY or fix scripts/smoke-models.toml.`, ); }

github-actions

Findings

[Major] Unsupported providers still fall through as a normal "missing key" skip, which can hide config typos and produce false-green smoke runs; this remains a silent fallback path, evidence scripts/smoke-models.ts:211
Suggested fix:

const envName = ENV_KEY[m.provider];
if (!envName) {
  throw new Error(
    `Unsupported provider in smoke config: ${m.provider}. ` +
      `Add it to ENV_KEY or fix scripts/smoke-models.toml.`,
  );
}
const apiKey = process.env[envName];

[Minor] New CLI behavior (--model, --prompt, --only-failed, exit codes) has no Vitest coverage in this PR, so regressions in selection and status handling may ship unnoticed, evidence scripts/smoke-models.ts:241
Suggested fix:

// scripts/smoke-models.test.ts
it('fails fast on unsupported provider', async () => {
  await expect(runMatrix([{ provider: 'typo', modelId: 'x' }], prompts, null))
    .rejects.toThrow(/Unsupported provider/);
});

it('throws when --model does not match', () => {
  expect(() => selectModels(models, 'does/not-exist')).toThrow();
});

Summary

Review mode: follow-up after new commits
2 issues found (1 Major, 1 Minor).
docs/VISION.md and docs/PRINCIPLES.md: Not found in repo/docs.

Testing

Not run (automation)

open-codesign Bot

github-actions · 2026-04-19T10:47:56Z

+): Promise<RunResult[]> {
+  const results: RunResult[] = [];
+  for (const m of models) {
+    const envName = ENV_KEY[m.provider];


Unsupported providers still degrade into a normal skipped run (no $<unknown env>). That masks provider typos in config and creates false-green results. Please fail fast before env lookup.

const envName = ENV_KEY[m.provider]; if (!envName) { throw new Error( `Unsupported provider in smoke config: ${m.provider}. ` + `Add it to ENV_KEY or fix scripts/smoke-models.toml.`, ); } const apiKey = process.env[envName];

github-actions · 2026-04-19T10:47:56Z

+  process.exit(passed === results.length ? 0 : 1);
+}
+
+async function main(): Promise<void> {


This PR adds a non-trivial CLI surface (--model, --prompt, --only-failed, exit codes), but there is no Vitest coverage in the diff for behavior and failure modes. Please add focused tests for unsupported-provider handling and selection/exit-code paths.

it('fails fast on unsupported provider', async () => { await expect(runMatrix([{ provider: 'typo', modelId: 'x' }], prompts, null)) .rejects.toThrow(/Unsupported provider/); });

github-actions

Findings

[Major] Unsanitized prompt.name is embedded in the output filename, allowing path traversal via TOML config (../) and writes outside /tmp/smoke; this can overwrite arbitrary files the current user can access, evidence scripts/smoke-models.ts:83, scripts/smoke-models.ts:152
Suggested fix:

function slug(model: SmokeModel, prompt: SmokePrompt): string {
  const safeModel = `${model.provider}_${model.modelId}`.replace(/[^a-zA-Z0-9._-]/g, '_');
  const safePrompt = prompt.name.replace(/[^a-zA-Z0-9._-]/g, '_');
  return `${safeModel}__${safePrompt}`;
}

const artifactPath = resolve(SMOKE_DIR, `${slug(model, prompt)}.html`);
if (!artifactPath.startsWith(`${SMOKE_DIR}/`)) {
  throw new Error(`Refusing to write outside ${SMOKE_DIR}: ${artifactPath}`);
}
writeFileSync(artifactPath, html);

[Minor] New CLI behavior (--model, --prompt, --only-failed, exit semantics) still has no Vitest coverage in this PR, so regressions in argument selection and failure handling can slip through, evidence scripts/smoke-models.ts:176, scripts/smoke-models.ts:188, scripts/smoke-models.ts:193, scripts/smoke-models.ts:247
Suggested fix:

// scripts/smoke-models.test.ts
it('throws on unsupported provider in config', async () => {
  await expect(runMatrix([{ provider: 'typo', modelId: 'x' }], prompts, null))
    .rejects.toThrow(/Unsupported provider/);
});

it('fails when --model has no matches', () => {
  expect(() => selectModels(models, 'does/not-exist')).toThrow();
});

Summary

Review mode: follow-up after new commits
2 issues found (1 Major, 1 Minor).
docs/VISION.md and docs/PRINCIPLES.md: Not found in repo/docs.

Testing

Not run (automation)

open-codesign Bot

github-actions · 2026-04-19T11:41:34Z

+
+function slug(model: SmokeModel, prompt: SmokePrompt): string {
+  const safeModel = `${model.provider}_${model.modelId}`.replace(/[/:]/g, '_');
+  return `${safeModel}__${prompt.name}`;


prompt.name is used verbatim in the generated filename, so a config value like ../foo can escape /tmp/smoke and write outside the intended directory. Please sanitize prompt.name and enforce resolve(...).startsWith(${SMOKE_DIR}/) before writeFileSync.

github-actions · 2026-04-19T11:41:34Z

+  process.exit(passed === results.length ? 0 : 1);
+}
+
+async function main(): Promise<void> {


This script introduces a meaningful CLI surface (--model, --prompt, --only-failed, exit behavior), but the PR still has no Vitest coverage for selection and failure paths. Please add focused tests for unsupported-provider, no-model-match, and only-failed behavior.

github-actions

Findings

[Major] Path-boundary check is platform-fragile and can reject valid writes on Windows — resolve() returns platform-native separators, but startsWith("/tmp/smoke/") assumes POSIX; this can throw Refusing to write outside /tmp/smoke for legitimate paths, breaking pnpm smoke on Windows, evidence scripts/smoke-models.ts:154
Suggested fix:

import { isAbsolute, relative, resolve } from 'node:path';

const artifactPath = resolve(SMOKE_DIR, `${slug(model, prompt)}.html`);
const rel = relative(SMOKE_DIR, artifactPath);
if (rel.startsWith('..') || isAbsolute(rel)) {
  throw new Error(`Refusing to write outside ${SMOKE_DIR}: ${artifactPath}`);
}
writeFileSync(artifactPath, html);

[Minor] New CLI behaviors still have no automated tests — --model, --prompt, --only-failed, and unsupported-provider paths were added/changed without Vitest coverage, increasing regression risk in selection/exit semantics, evidence scripts/smoke-models.ts:251
Suggested fix:

// scripts/smoke-models.test.ts
it('throws on unsupported provider in config', async () => {
  await expect(runMatrix([{ provider: 'typo', modelId: 'x' }], prompts, null))
    .rejects.toThrow(/Unsupported provider/);
});

it('fails when --model has no matches', () => {
  expect(() => selectModels(models, 'does/not-exist')).toThrow();
});

Summary

Review mode: follow-up after new commits
2 issues found (1 Major, 1 Minor).
docs/VISION.md and docs/PRINCIPLES.md: Not found in repo/docs.

Testing

Not run (automation)

open-codesign Bot

github-actions · 2026-04-19T13:44:35Z

+    const ms = Date.now() - t0;
+    const html = result.artifacts[0]?.content ?? '';
+    const artifactPath = resolve(SMOKE_DIR, `${slug(model, prompt)}.html`);
+    if (!artifactPath.startsWith(`${SMOKE_DIR}/`)) {


startsWith(${SMOKE_DIR}/) is POSIX-specific after resolve(). On Windows this can reject valid in-dir paths due to \ separators. Please switch to relative() + isAbsolute() boundary checks to keep traversal protection cross-platform.

github-actions · 2026-04-19T13:44:35Z

+  process.exit(passed === results.length ? 0 : 1);
+}
+
+async function main(): Promise<void> {


The new CLI surface (--model, --prompt, --only-failed, unsupported-provider handling) still lacks Vitest coverage. Please add focused tests for no-match model selection and unsupported provider failure to lock behavior.

Adds scripts/smoke-models.ts and scripts/smoke-models.toml for batch-testing (provider × model × prompt) combinations through the exact same generate() code path the desktop app uses. Saves each artifact to /tmp/smoke/ and runs lightweight quality checks (acorn syntax, multiple <main>, emoji icons, missing EDITMODE block). API keys come from environment variables (OPENROUTER_API_KEY, etc.). CLI flags: --model, --prompt, --only-failed, --config. Cuts the manual click-and-wait debug loop down to 'pnpm smoke' for quick checks across the model lineup. Signed-off-by: hqhq1025 <1506751656@qq.com>

Adds 8 free OpenRouter models worth comparing for design-generation quality: qwen3-coder, deepseek-r1, nemotron-3-super, llama-3.3-70b, gpt-oss-120b, minimax-m2.5, devstral-2, mimo-v2-flash. Two of them (deepseek-r1, minimax) exercise the reasoning self-healing path. Documents the gitignored scripts/.smoke-keys.env pattern for stashing keys without exporting them every shell session. Signed-off-by: hqhq1025 <1506751656@qq.com>

Signed-off-by: hqhq1025 <1506751656@qq.com>

github-actions

Findings

[Major] Path-boundary guard is platform-fragile and can reject valid in-dir writes on Windows — resolve() returns platform-native separators, but startsWith("/tmp/smoke/") assumes POSIX separators; this can trigger a false Refusing to write outside /tmp/smoke and break pnpm smoke, evidence scripts/smoke-models.ts:154
Suggested fix:

import { isAbsolute, relative, resolve } from 'node:path';

const artifactPath = resolve(SMOKE_DIR, `${slug(model, prompt)}.html`);
const rel = relative(SMOKE_DIR, artifactPath);
if (rel.startsWith('..') || isAbsolute(rel)) {
  throw new Error(`Refusing to write outside ${SMOKE_DIR}: ${artifactPath}`);
}
writeFileSync(artifactPath, html);

[Minor] New smoke CLI behavior still lacks automated coverage for regression-prone paths — newly added --model, --prompt, --only-failed, and unsupported-provider failure behavior has no Vitest guard, evidence scripts/smoke-models.ts:180, scripts/smoke-models.ts:192, scripts/smoke-models.ts:197, scripts/smoke-models.ts:216
Suggested fix:

// scripts/smoke-models.test.ts
it('throws on unsupported provider in config', async () => {
  await expect(runMatrix([{ provider: 'typo', modelId: 'x' }], prompts, null))
    .rejects.toThrow(/Unsupported provider/);
});

it('fails when --model has no matches', () => {
  expect(() => selectModels(models, 'does/not-exist')).toThrow();
});

Summary

Review mode: follow-up after new commits
2 issues found (1 Major, 1 Minor).
docs/VISION.md: Not found in repo/docs.
docs/PRINCIPLES.md: Not found in repo/docs.

Testing

Not run (automation)

open-codesign Bot

github-actions · 2026-04-19T13:48:39Z

+    const ms = Date.now() - t0;
+    const html = result.artifacts[0]?.content ?? '';
+    const artifactPath = resolve(SMOKE_DIR, `${slug(model, prompt)}.html`);
+    if (!artifactPath.startsWith(`${SMOKE_DIR}/`)) {


startsWith(${SMOKE_DIR}/) after resolve() is POSIX-specific. On Windows, path separators are \\, so valid in-dir artifact paths can fail the boundary check and throw. Please use relative() + isAbsolute() for cross-platform traversal protection.

Codex flagged startsWith('/tmp/smoke/') as POSIX-only — Windows path separators would trigger a false 'Refusing to write outside' rejection. Switch to relative() + isAbsolute() checks which are platform-agnostic. Signed-off-by: hqhq1025 <1506751656@qq.com>

github-actions

Findings

[Major] Empty-artifact generations are counted as success, creating false-green smoke results — runOne writes result.artifacts[0]?.content ?? '' and returns ok: true even when no artifact is produced, so failed provider/model outputs can be reported as passed, evidence scripts/smoke-models.ts:152, scripts/smoke-models.ts:161
Suggested fix:
```
const artifact = result.artifacts[0];
if (!artifact?.content?.trim()) {
  throw new Error('No HTML artifact returned from generate()');
}
const html = artifact.content;
```

[Minor] New smoke CLI logic is still untested in Vitest for key regression paths — changed behavior like unsupported-provider fail-fast and empty-artifact handling can regress silently without tests, evidence scripts/smoke-models.ts:211, scripts/smoke-models.ts:220, scripts/smoke-models.ts:152
Suggested fix:

// scripts/smoke-models.test.ts
it('throws on unsupported provider in config', async () => {
  await expect(runMatrix([{ provider: 'typo', modelId: 'x' }], prompts, null))
    .rejects.toThrow(/Unsupported provider/);
});

it('fails when generate returns no artifact', async () => {
  await expect(runOne(model, prompt, 'k')).resolves.toMatchObject({ ok: false });
});

Summary

Review mode: follow-up after new commits
2 issues found (1 Major, 1 Minor).

Testing

Not run (automation)

open-codesign Bot

github-actions · 2026-04-19T16:56:34Z

+      },
+    });
+    const ms = Date.now() - t0;
+    const html = result.artifacts[0]?.content ?? '';


result.artifacts[0]?.content ?? allows empty/missing artifacts to pass as success. This can hide real model/provider failures in smoke runs. Please fail this run when no non-empty artifact content exists before writing the file.

github-actions · 2026-04-19T16:56:34Z

+  for (const m of models) {
+    const envName = ENV_KEY[m.provider];
+    if (!envName) {
+      throw new Error(


Behavioral changes in this script (unsupported-provider fail-fast and success criteria) should have Vitest coverage to prevent regressions. Please add focused tests for these paths.

Codex flagged false-green smoke results when generate() returns no artifact: runOne previously coerced missing content to '' and reported ok: true. Now throw when artifact is missing or whitespace-only — error message surfaces in the report and the run is correctly counted as failed. Signed-off-by: hqhq1025 <1506751656@qq.com>

github-actions

Findings

[Minor] Missing Vitest coverage for new fail-fast behavior in smoke harness — this PR introduces unsupported-provider failure logic without tests, so later refactors can silently regress back to false-green smoke runs, evidence scripts/smoke-models.ts:223.
Suggested fix:

// scripts/smoke-models.test.ts
import { describe, expect, it } from 'vitest';
import { runMatrix } from './smoke-models';

it('throws when config contains an unsupported provider', async () => {
  await expect(
    runMatrix([{ provider: 'typo', modelId: 'm' }], [{ name: 'p', text: 't' }], null),
  ).rejects.toThrow(/Unsupported provider/);
});

Summary

Review mode: follow-up after new commits
1 issue found (Minor).
docs/VISION.md and docs/PRINCIPLES.md: Not found in repo/docs in this runner (only CLAUDE.md and docs/research/* available).

Testing

Not run (automation)

open-codesign Bot

github-actions · 2026-04-19T17:08:24Z

+  const results: RunResult[] = [];
+  for (const m of models) {
+    const envName = ENV_KEY[m.provider];
+    if (!envName) {


Please add a Vitest regression test for this fail-fast branch. This behavior is new in the PR and currently untested, which makes it easy to regress back to silent skips in later edits.

github-advanced-security AI found potential problems Apr 19, 2026

View reviewed changes

Comment thread scripts/smoke-models.ts

// distinguish module from script (Babel handles JSX in the artifact at

// runtime) — just check it's lexically clean enough that tools like esbuild

// wouldn't choke.

const scripts = [...html.matchAll(/<script\b[^>]*>([\s\S]*?)<\/script>/gi)];

github-actions bot reviewed Apr 19, 2026

View reviewed changes

hqhq1025 added 4 commits April 19, 2026 21:45

fix(scripts): fail fast on unknown provider in smoke config

d5b6f88

Signed-off-by: hqhq1025 <1506751656@qq.com>

fix(scripts): sanitize TOML prompt names against path traversal

8221dbc

Signed-off-by: hqhq1025 <1506751656@qq.com>

hqhq1025 force-pushed the worktree-feat-smoke-models-cli branch from 128bae7 to 8221dbc Compare April 19, 2026 13:45

github-actions bot reviewed Apr 19, 2026

View reviewed changes

Conversation

hqhq1025 commented Apr 19, 2026

Summary

Why

Test plan

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants