feat(scripts): pnpm smoke — batch model/prompt API tester#108
feat(scripts): pnpm smoke — batch model/prompt API tester#108
Conversation
| // distinguish module from script (Babel handles JSX in the artifact at | ||
| // runtime) — just check it's lexically clean enough that tools like esbuild | ||
| // wouldn't choke. | ||
| const scripts = [...html.matchAll(/<script\b[^>]*>([\s\S]*?)<\/script>/gi)]; |
There was a problem hiding this comment.
Findings
- [Major] Unknown providers are silently treated as missing API keys, which can hide config typos and produce false-green smoke runs — this weakens the harness signal and violates the no-silent-fallback expectation for error handling, evidence
scripts/smoke-models.ts:212
Suggested fix:const envName = ENV_KEY[m.provider]; if (!envName) { throw new Error( `Unsupported provider in smoke config: ${m.provider}. ` + `Add it to ENV_KEY or fix scripts/smoke-models.toml.`, ); } const apiKey = process.env[envName]; if (!apiKey) { console.log( `${COLOR.dim}— ${m.provider}/${m.modelId} (no $${envName}; skipped)${COLOR.reset}`, ); continue; }
Summary
- Review mode: initial
- 1 Major issue found in the newly added smoke harness.
- Residual risk: no automated tests were added for CLI behavior (
--only-failed, unknown provider handling, and exit codes), so regressions here may only be caught manually.
Testing
- Not run (automation)
open-codesign Bot
| const results: RunResult[] = []; | ||
| for (const m of models) { | ||
| const envName = ENV_KEY[m.provider]; | ||
| const apiKey = envName ? process.env[envName] : undefined; |
There was a problem hiding this comment.
Unknown providers currently fall through as envName undefined and get reported as "no $; skipped". That makes provider typos in smoke-models.toml look like normal skips and can produce false-green runs. Please fail fast for unsupported providers before checking env vars.
const envName = ENV_KEY[m.provider];
if (!envName) {
throw new Error(
`Unsupported provider in smoke config: ${m.provider}. ` +
`Add it to ENV_KEY or fix scripts/smoke-models.toml.`,
);
}There was a problem hiding this comment.
Findings
-
[Major] Unsupported providers still fall through as a normal "missing key" skip, which can hide config typos and produce false-green smoke runs; this remains a silent fallback path, evidence
scripts/smoke-models.ts:211
Suggested fix:const envName = ENV_KEY[m.provider]; if (!envName) { throw new Error( `Unsupported provider in smoke config: ${m.provider}. ` + `Add it to ENV_KEY or fix scripts/smoke-models.toml.`, ); } const apiKey = process.env[envName];
-
[Minor] New CLI behavior (
--model,--prompt,--only-failed, exit codes) has no Vitest coverage in this PR, so regressions in selection and status handling may ship unnoticed, evidencescripts/smoke-models.ts:241
Suggested fix:// scripts/smoke-models.test.ts it('fails fast on unsupported provider', async () => { await expect(runMatrix([{ provider: 'typo', modelId: 'x' }], prompts, null)) .rejects.toThrow(/Unsupported provider/); }); it('throws when --model does not match', () => { expect(() => selectModels(models, 'does/not-exist')).toThrow(); });
Summary
- Review mode: follow-up after new commits
- 2 issues found (1 Major, 1 Minor).
docs/VISION.mdanddocs/PRINCIPLES.md: Not found in repo/docs.
Testing
- Not run (automation)
open-codesign Bot
| ): Promise<RunResult[]> { | ||
| const results: RunResult[] = []; | ||
| for (const m of models) { | ||
| const envName = ENV_KEY[m.provider]; |
There was a problem hiding this comment.
Unsupported providers still degrade into a normal skipped run (no $<unknown env>). That masks provider typos in config and creates false-green results. Please fail fast before env lookup.
const envName = ENV_KEY[m.provider];
if (!envName) {
throw new Error(
`Unsupported provider in smoke config: ${m.provider}. ` +
`Add it to ENV_KEY or fix scripts/smoke-models.toml.`,
);
}
const apiKey = process.env[envName];| process.exit(passed === results.length ? 0 : 1); | ||
| } | ||
|
|
||
| async function main(): Promise<void> { |
There was a problem hiding this comment.
This PR adds a non-trivial CLI surface (--model, --prompt, --only-failed, exit codes), but there is no Vitest coverage in the diff for behavior and failure modes. Please add focused tests for unsupported-provider handling and selection/exit-code paths.
it('fails fast on unsupported provider', async () => {
await expect(runMatrix([{ provider: 'typo', modelId: 'x' }], prompts, null))
.rejects.toThrow(/Unsupported provider/);
});There was a problem hiding this comment.
Findings
-
[Major] Unsanitized
prompt.nameis embedded in the output filename, allowing path traversal via TOML config (../) and writes outside/tmp/smoke; this can overwrite arbitrary files the current user can access, evidencescripts/smoke-models.ts:83,scripts/smoke-models.ts:152
Suggested fix:function slug(model: SmokeModel, prompt: SmokePrompt): string { const safeModel = `${model.provider}_${model.modelId}`.replace(/[^a-zA-Z0-9._-]/g, '_'); const safePrompt = prompt.name.replace(/[^a-zA-Z0-9._-]/g, '_'); return `${safeModel}__${safePrompt}`; } const artifactPath = resolve(SMOKE_DIR, `${slug(model, prompt)}.html`); if (!artifactPath.startsWith(`${SMOKE_DIR}/`)) { throw new Error(`Refusing to write outside ${SMOKE_DIR}: ${artifactPath}`); } writeFileSync(artifactPath, html);
-
[Minor] New CLI behavior (
--model,--prompt,--only-failed, exit semantics) still has no Vitest coverage in this PR, so regressions in argument selection and failure handling can slip through, evidencescripts/smoke-models.ts:176,scripts/smoke-models.ts:188,scripts/smoke-models.ts:193,scripts/smoke-models.ts:247
Suggested fix:// scripts/smoke-models.test.ts it('throws on unsupported provider in config', async () => { await expect(runMatrix([{ provider: 'typo', modelId: 'x' }], prompts, null)) .rejects.toThrow(/Unsupported provider/); }); it('fails when --model has no matches', () => { expect(() => selectModels(models, 'does/not-exist')).toThrow(); });
Summary
- Review mode: follow-up after new commits
- 2 issues found (1 Major, 1 Minor).
docs/VISION.mdanddocs/PRINCIPLES.md: Not found in repo/docs.
Testing
- Not run (automation)
open-codesign Bot
|
|
||
| function slug(model: SmokeModel, prompt: SmokePrompt): string { | ||
| const safeModel = `${model.provider}_${model.modelId}`.replace(/[/:]/g, '_'); | ||
| return `${safeModel}__${prompt.name}`; |
There was a problem hiding this comment.
prompt.name is used verbatim in the generated filename, so a config value like ../foo can escape /tmp/smoke and write outside the intended directory. Please sanitize prompt.name and enforce resolve(...).startsWith(${SMOKE_DIR}/) before writeFileSync.
| process.exit(passed === results.length ? 0 : 1); | ||
| } | ||
|
|
||
| async function main(): Promise<void> { |
There was a problem hiding this comment.
This script introduces a meaningful CLI surface (--model, --prompt, --only-failed, exit behavior), but the PR still has no Vitest coverage for selection and failure paths. Please add focused tests for unsupported-provider, no-model-match, and only-failed behavior.
There was a problem hiding this comment.
Findings
-
[Major] Path-boundary check is platform-fragile and can reject valid writes on Windows —
resolve()returns platform-native separators, butstartsWith("/tmp/smoke/")assumes POSIX; this can throwRefusing to write outside /tmp/smokefor legitimate paths, breakingpnpm smokeon Windows, evidencescripts/smoke-models.ts:154
Suggested fix:import { isAbsolute, relative, resolve } from 'node:path'; const artifactPath = resolve(SMOKE_DIR, `${slug(model, prompt)}.html`); const rel = relative(SMOKE_DIR, artifactPath); if (rel.startsWith('..') || isAbsolute(rel)) { throw new Error(`Refusing to write outside ${SMOKE_DIR}: ${artifactPath}`); } writeFileSync(artifactPath, html);
-
[Minor] New CLI behaviors still have no automated tests —
--model,--prompt,--only-failed, and unsupported-provider paths were added/changed without Vitest coverage, increasing regression risk in selection/exit semantics, evidencescripts/smoke-models.ts:251
Suggested fix:// scripts/smoke-models.test.ts it('throws on unsupported provider in config', async () => { await expect(runMatrix([{ provider: 'typo', modelId: 'x' }], prompts, null)) .rejects.toThrow(/Unsupported provider/); }); it('fails when --model has no matches', () => { expect(() => selectModels(models, 'does/not-exist')).toThrow(); });
Summary
- Review mode: follow-up after new commits
- 2 issues found (1 Major, 1 Minor).
docs/VISION.mdanddocs/PRINCIPLES.md: Not found in repo/docs.
Testing
- Not run (automation)
open-codesign Bot
| const ms = Date.now() - t0; | ||
| const html = result.artifacts[0]?.content ?? ''; | ||
| const artifactPath = resolve(SMOKE_DIR, `${slug(model, prompt)}.html`); | ||
| if (!artifactPath.startsWith(`${SMOKE_DIR}/`)) { |
There was a problem hiding this comment.
startsWith(${SMOKE_DIR}/) is POSIX-specific after resolve(). On Windows this can reject valid in-dir paths due to \ separators. Please switch to relative() + isAbsolute() boundary checks to keep traversal protection cross-platform.
| process.exit(passed === results.length ? 0 : 1); | ||
| } | ||
|
|
||
| async function main(): Promise<void> { |
There was a problem hiding this comment.
The new CLI surface (--model, --prompt, --only-failed, unsupported-provider handling) still lacks Vitest coverage. Please add focused tests for no-match model selection and unsupported provider failure to lock behavior.
Adds scripts/smoke-models.ts and scripts/smoke-models.toml for batch-testing (provider × model × prompt) combinations through the exact same generate() code path the desktop app uses. Saves each artifact to /tmp/smoke/ and runs lightweight quality checks (acorn syntax, multiple <main>, emoji icons, missing EDITMODE block). API keys come from environment variables (OPENROUTER_API_KEY, etc.). CLI flags: --model, --prompt, --only-failed, --config. Cuts the manual click-and-wait debug loop down to 'pnpm smoke' for quick checks across the model lineup. Signed-off-by: hqhq1025 <1506751656@qq.com>
Adds 8 free OpenRouter models worth comparing for design-generation quality: qwen3-coder, deepseek-r1, nemotron-3-super, llama-3.3-70b, gpt-oss-120b, minimax-m2.5, devstral-2, mimo-v2-flash. Two of them (deepseek-r1, minimax) exercise the reasoning self-healing path. Documents the gitignored scripts/.smoke-keys.env pattern for stashing keys without exporting them every shell session. Signed-off-by: hqhq1025 <1506751656@qq.com>
Signed-off-by: hqhq1025 <1506751656@qq.com>
Signed-off-by: hqhq1025 <1506751656@qq.com>
128bae7 to
8221dbc
Compare
There was a problem hiding this comment.
Findings
-
[Major] Path-boundary guard is platform-fragile and can reject valid in-dir writes on Windows —
resolve()returns platform-native separators, butstartsWith("/tmp/smoke/")assumes POSIX separators; this can trigger a falseRefusing to write outside /tmp/smokeand breakpnpm smoke, evidencescripts/smoke-models.ts:154
Suggested fix:import { isAbsolute, relative, resolve } from 'node:path'; const artifactPath = resolve(SMOKE_DIR, `${slug(model, prompt)}.html`); const rel = relative(SMOKE_DIR, artifactPath); if (rel.startsWith('..') || isAbsolute(rel)) { throw new Error(`Refusing to write outside ${SMOKE_DIR}: ${artifactPath}`); } writeFileSync(artifactPath, html);
-
[Minor] New smoke CLI behavior still lacks automated coverage for regression-prone paths — newly added
--model,--prompt,--only-failed, and unsupported-provider failure behavior has no Vitest guard, evidencescripts/smoke-models.ts:180,scripts/smoke-models.ts:192,scripts/smoke-models.ts:197,scripts/smoke-models.ts:216
Suggested fix:// scripts/smoke-models.test.ts it('throws on unsupported provider in config', async () => { await expect(runMatrix([{ provider: 'typo', modelId: 'x' }], prompts, null)) .rejects.toThrow(/Unsupported provider/); }); it('fails when --model has no matches', () => { expect(() => selectModels(models, 'does/not-exist')).toThrow(); });
Summary
- Review mode: follow-up after new commits
- 2 issues found (1 Major, 1 Minor).
docs/VISION.md: Not found in repo/docs.docs/PRINCIPLES.md: Not found in repo/docs.
Testing
- Not run (automation)
open-codesign Bot
| const ms = Date.now() - t0; | ||
| const html = result.artifacts[0]?.content ?? ''; | ||
| const artifactPath = resolve(SMOKE_DIR, `${slug(model, prompt)}.html`); | ||
| if (!artifactPath.startsWith(`${SMOKE_DIR}/`)) { |
There was a problem hiding this comment.
startsWith(${SMOKE_DIR}/) after resolve() is POSIX-specific. On Windows, path separators are \\, so valid in-dir artifact paths can fail the boundary check and throw. Please use relative() + isAbsolute() for cross-platform traversal protection.
Codex flagged startsWith('/tmp/smoke/') as POSIX-only — Windows path
separators would trigger a false 'Refusing to write outside' rejection.
Switch to relative() + isAbsolute() checks which are platform-agnostic.
Signed-off-by: hqhq1025 <1506751656@qq.com>
There was a problem hiding this comment.
Findings
-
[Major] Empty-artifact generations are counted as success, creating false-green smoke results —
runOnewritesresult.artifacts[0]?.content ?? ''and returnsok: trueeven when no artifact is produced, so failed provider/model outputs can be reported as passed, evidencescripts/smoke-models.ts:152,scripts/smoke-models.ts:161
Suggested fix:const artifact = result.artifacts[0]; if (!artifact?.content?.trim()) { throw new Error('No HTML artifact returned from generate()'); } const html = artifact.content;
-
[Minor] New smoke CLI logic is still untested in Vitest for key regression paths — changed behavior like unsupported-provider fail-fast and empty-artifact handling can regress silently without tests, evidence
scripts/smoke-models.ts:211,scripts/smoke-models.ts:220,scripts/smoke-models.ts:152
Suggested fix:// scripts/smoke-models.test.ts it('throws on unsupported provider in config', async () => { await expect(runMatrix([{ provider: 'typo', modelId: 'x' }], prompts, null)) .rejects.toThrow(/Unsupported provider/); }); it('fails when generate returns no artifact', async () => { await expect(runOne(model, prompt, 'k')).resolves.toMatchObject({ ok: false }); });
Summary
- Review mode: follow-up after new commits
- 2 issues found (1 Major, 1 Minor).
Testing
- Not run (automation)
open-codesign Bot
| }, | ||
| }); | ||
| const ms = Date.now() - t0; | ||
| const html = result.artifacts[0]?.content ?? ''; |
There was a problem hiding this comment.
result.artifacts[0]?.content ?? allows empty/missing artifacts to pass as success. This can hide real model/provider failures in smoke runs. Please fail this run when no non-empty artifact content exists before writing the file.
| for (const m of models) { | ||
| const envName = ENV_KEY[m.provider]; | ||
| if (!envName) { | ||
| throw new Error( |
There was a problem hiding this comment.
Behavioral changes in this script (unsupported-provider fail-fast and success criteria) should have Vitest coverage to prevent regressions. Please add focused tests for these paths.
Codex flagged false-green smoke results when generate() returns no artifact: runOne previously coerced missing content to '' and reported ok: true. Now throw when artifact is missing or whitespace-only — error message surfaces in the report and the run is correctly counted as failed. Signed-off-by: hqhq1025 <1506751656@qq.com>
There was a problem hiding this comment.
Findings
- [Minor] Missing Vitest coverage for new fail-fast behavior in smoke harness — this PR introduces unsupported-provider failure logic without tests, so later refactors can silently regress back to false-green smoke runs, evidence
scripts/smoke-models.ts:223.
Suggested fix:// scripts/smoke-models.test.ts import { describe, expect, it } from 'vitest'; import { runMatrix } from './smoke-models'; it('throws when config contains an unsupported provider', async () => { await expect( runMatrix([{ provider: 'typo', modelId: 'm' }], [{ name: 'p', text: 't' }], null), ).rejects.toThrow(/Unsupported provider/); });
Summary
- Review mode: follow-up after new commits
- 1 issue found (Minor).
docs/VISION.mdanddocs/PRINCIPLES.md: Not found in repo/docs in this runner (onlyCLAUDE.mdanddocs/research/*available).
Testing
- Not run (automation)
open-codesign Bot
| const results: RunResult[] = []; | ||
| for (const m of models) { | ||
| const envName = ENV_KEY[m.provider]; | ||
| if (!envName) { |
There was a problem hiding this comment.
Please add a Vitest regression test for this fail-fast branch. This behavior is new in the PR and currently untested, which makes it easy to regress back to silent skips in later edits.
Summary
CLI flags:
```
pnpm smoke
pnpm smoke --model openai/gpt-oss-120b:free
pnpm smoke --prompt "数据看板"
pnpm smoke --only-failed
```
The `scripts/smoke-models.toml` ships with 5 default models × 4 default prompts (dashboard, mobile onboarding, marketing landing, settings) — easy to extend.
Why
Manual loop of "type prompt in app → wait → look at result → switch model in Settings → repeat" was the bottleneck for debugging API issues (today's reasoning-mandatory bug, prompt quality gaps). `pnpm smoke` collapses that into one command across the whole model lineup.
Test plan
🤖 Generated with Claude Code