diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 70a20c4a2..fbb09d90f 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -108,3 +108,12 @@ jobs: - name: Smoke test interactive TTY prompt if: ${{ matrix.os == 'ubuntu-latest' && matrix.node-version == '22.18.0' }} run: pnpm smoke:tty-prompt + + # SlopBench task self-test: every benchmark task's clean reference solution + # must still pass its functional gate AND score reward > 0. Guards the + # corpus against drift in the verifier, the scoring profile, or React + # Doctor. The prior `pnpm build` step left react-doctor + the verifier + # built; React-component tasks install their own dev deps during grading. + - name: Validate SlopBench task reference solutions + if: ${{ matrix.os == 'ubuntu-latest' && matrix.node-version == '22.18.0' }} + run: pnpm benchmark:validate diff --git a/docs/SLOPBENCH.md b/docs/SLOPBENCH.md new file mode 100644 index 000000000..e219db136 --- /dev/null +++ b/docs/SLOPBENCH.md @@ -0,0 +1,105 @@ +# SlopBench — methodology + +SlopBench (in [`packages/benchmark`](../packages/benchmark)) measures how good a +model is at frontend engineering, with a deliberate focus on **how much React / +TypeScript slop it emits**. It extends the DeepSWE / Harbor approach with a +second, continuous quality axis. + +## Why two axes + +Correctness-only benchmarks reward a working feature regardless of how it was +built. Real frontend review cares about both: does it work, _and_ is it clean? +SlopBench keeps a hard **functional gate** (hidden behavioral tests) and adds a +**slop score** computed purely by static analysis on the diff: + +``` +reward = functional_pass × (slop_score / 100) +``` + +- `functional_pass ∈ {0,1}` — the DeepSWE-style gate. +- `slop_score ∈ [0,100]` — higher = cleaner. + +Reporting both separately (plus per-dimension subscores) lets a leaderboard rank +by correctness, by cleanliness, or by the product. Setting the slop weight to +zero recovers a pure correctness benchmark. + +## How the slop score is computed + +The verifier (`slop-verify`, the `@react-doctor/benchmark` package) runs +**offline** over the agent's diff against the task's base commit: + +1. **React Doctor** (`--json --no-score --no-dead-code`) — the canonical React + diagnostic engine, scoped to the files the agent changed. Its five categories + map to the `react-correctness`, `react-performance`, `accessibility`, and + `maintainability` dimensions; specific bundle/waterfall rules are rerouted to + the `bundle` and `async-waterfall` dimensions. +2. **TypeScript strictness** (AST, no type-checker needed) — explicit `any`, + `as` casts, non-null `!`, and `@ts-ignore`/`@ts-nocheck`/`@ts-expect-error`. +3. **Composition** (AST, distilled from Vercel's composition-patterns) — + boolean-prop soup and function-valued render props. +4. **deslop heuristic** — nested ternaries. + +Each finding is weighted `severity × category × rule-impact`, the per-dimension +penalty is **size-normalized** by the diff's added lines (so large legitimate +features are not punished as hard as the same violations in a tiny diff), and +each dimension scores `clamp(100 − penalty, 0, 100)`. The composite is the +profile-weighted mean across dimensions. + +Every number lives in [`scoring-profiles/default.json`](../packages/benchmark/scoring-profiles/default.json) +(mirrored by `src/constants.ts`); the `scoringVersion` is stamped into every +report so scores are reproducible and comparable. + +### Why local scoring (not the react.doctor score API) + +React Doctor's canonical 0–100 score is a remote API call. Benchmark grading is +**air-gapped** (`allow_internet = false`), so SlopBench computes its own +deterministic score from the offline `diagnostics[]`. The remote API is never on +the grading path. + +## Reference influences + +The dimensions and checks are grounded in: + +- **React Doctor rules** — the React correctness/performance/a11y/security engine. +- **deslop skill** — indirection, dead code, nested ternaries, near-duplicates. +- **Vercel [react-best-practices]** — waterfalls, bundle, re-render, rendering tiers. +- **Vercel [composition-patterns]** — boolean-prop soup, render-props, compound components. +- **Vercel [next-best-practices]** — RSC boundaries, async APIs, `next/image`, bundling. + +To avoid double-counting, [`rule-overlap.md`](../packages/benchmark/rule-overlap.md) +records which tool owns each signal; SlopBench only adds checks for gaps React +Doctor does not already cover (TS strictness + composition). + +[react-best-practices]: https://github.com/vercel-labs/agent-skills/tree/main/skills/react-best-practices +[composition-patterns]: https://github.com/vercel-labs/agent-skills/tree/main/skills/composition-patterns +[next-best-practices]: https://github.com/vercel-labs/next-skills#next-best-practices + +## Task families + +- **produce-clean** — implement a working feature; slop is measured on the diff. + Measures the slop a model emits _unprompted_ (the instruction never mentions + quality). +- **handle-slop** — the seed ships working-but-sloppy code; a small change is + requested. Measures whether the model _adds_ slop or cleans what it touches. +- **explicit-deslop** _(v2)_ — the instruction asks to clean up while preserving + behavior; isolates capability from inclination. + +## Anti-gaming + +- Scanners run over the whole diff, not a fixed file the agent can target. +- Suppression escape hatches (`@ts-ignore`, eslint-disable-style comments) are + themselves scored as slop. +- Tests, fixtures, generated files, and lockfiles are excluded from grading, so + an agent neither earns credit for tests nor is charged for vendored slop. +- Hidden tests are applied only at grade time. + +## Reproducibility + +- React Doctor + the verifier are installed from a single pinned checkout in the + base image (`tasks/_base/Dockerfile`); pin `REACT_DOCTOR_REF` for a release. +- `doctorVersion` + `scoringVersion` are recorded in every `slop-report.json`. +- `scripts/validate-all.sh` asserts every task's reference solution still passes + and scores `reward > 0` — run it before cutting a benchmark release. + +See [`packages/benchmark/README.md`](../packages/benchmark/README.md) for the run +and authoring workflow. diff --git a/package.json b/package.json index 7f7b79a28..b2a5cdc56 100644 --- a/package.json +++ b/package.json @@ -26,6 +26,7 @@ "release": "pnpm build && pnpm check:published-deps && node scripts/sentry-sourcemaps.mjs && changeset publish", "check:published-deps": "node --experimental-strip-types --no-warnings scripts/check-published-deps.ts", "smoke:json-report": "node --experimental-strip-types --no-warnings scripts/smoke-json-report.ts", + "benchmark:validate": "bash packages/benchmark/scripts/validate-all.sh", "smoke:tty-prompt": "python3 scripts/smoke-tty-prompt.py", "build:schema": "node --experimental-strip-types --no-warnings scripts/generate-config-schema.ts" }, diff --git a/packages/benchmark/README.md b/packages/benchmark/README.md new file mode 100644 index 000000000..ee4caf52a --- /dev/null +++ b/packages/benchmark/README.md @@ -0,0 +1,149 @@ +# SlopBench + +A benchmark for measuring how good individual models are at **frontend +engineering — and specifically how much React/TypeScript "slop" they produce**. + +Unlike correctness-only SWE benchmarks, SlopBench scores **two axes** per task: + +1. **Functional correctness** (gate) — hidden behavioral tests, exactly like + [DeepSWE](https://github.com/datacurve-ai/deep-swe). If the feature does not + work, the task is failed. +2. **Slop score** (0–100, continuous) — how clean the code the model wrote is, + measured **offline** by [React Doctor](https://react.doctor) plus a strict + TypeScript pass, Vercel-derived composition checks, and deslop heuristics. + +A model can make the feature work and **still score poorly** for shipping slop +(inline components, array-index keys, `any`, type casts, `@ts-ignore`, +boolean-prop soup, …). The headline **reward** combines them: + +``` +reward = functional_pass × (slop_score / 100) +``` + +## Task format + +SlopBench uses the [Harbor](https://www.harborframework.com/docs/tasks) task +format (so it runs under [Pier](https://github.com/datacurve-ai/pier) / +Harbor unchanged): + +```text +tasks// + task.toml metadata: family, target_dimensions, base commit, image, limits + instruction.md the prompt the agent sees (no mention of "slop" / quality) + seed/ the starting project (committed as the base commit) + environment/Dockerfile reproduces the env (FROM slopbench-base) + tests/ + test.sh thin wrapper -> `slopbench-grade` (functional gate + slop scan) + test.patch hidden tests, applied at grade time + solution/ reference clean solution (reviewer aid; never used at grading) + _authoring/ human-readable source for the patches (solved/ + hidden/) +``` + +The verifier writes `reward.txt` (the composite float) and a rich +`slop-report.json` artifact (per-dimension scores + every violation). + +## Quickstart (Pier — swappable harness) + +The task format is harness-agnostic. Pier drives `mini-swe-agent` (model-agnostic) +**and** the CLI agents directly — pass `--agent` to switch: + +```bash +git clone https://github.com/millionco/react-doctor +uv tool install datacurve-pier + +# Build the shared base image once (provides react-doctor + slop-verify + grader) +docker build -t slopbench-base:latest -f packages/benchmark/tasks/_base/Dockerfile . + +# Claude Code as the harness +export ANTHROPIC_API_KEY=... +pier run -p packages/benchmark/tasks --agent claude-code --model anthropic/claude-opus-4-7 + +# Codex +export OPENAI_API_KEY=... +pier run -p packages/benchmark/tasks --agent codex --model openai/gpt-5.5 + +# Other harnesses Pier drives directly: +pier run -p packages/benchmark/tasks --agent gemini-cli --model google/gemini-2.5-pro +pier run -p packages/benchmark/tasks --agent opencode --model anthropic/claude-opus-4-7 + +# Model-agnostic harness (works with any provider) +pier run -p packages/benchmark/tasks --agent mini-swe-agent --model anthropic/claude-opus-4-7 +``` + +Single task or a deterministic subset: + +```bash +pier run -p packages/benchmark/tasks/notification-list --agent claude-code +pier run -p packages/benchmark/tasks --agent mini-swe-agent --n-tasks 3 --sample-seed 0 +``` + +## Aggregating results into a scorecard + +After a run, turn the per-task reports into one model scorecard: + +```bash +node packages/benchmark/scripts/aggregate-results.mjs \ + --logs --model claude-opus-4-7 \ + --out packages/benchmark/results/claude-opus-4-7.json +``` + +It reports `functionalPassRate`, `meanSlopScore`, `meanReward`, and per-dimension +means — the shape a (v2) leaderboard renders. A web leaderboard is intentionally +out of scope for v1. + +## Slop dimensions + +Each violation maps to exactly one dimension (no double-counting — see +[`rule-overlap.md`](./rule-overlap.md)): + +| Dimension | Owner | +| ---------------------------------------------------------------------------- | --------------------------------------------------------- | +| `react-correctness`, `react-performance`, `accessibility`, `maintainability` | React Doctor | +| `bundle`, `async-waterfall` | React Doctor (specific rules rerouted) | +| `ts-strictness` | SlopBench TS checks (`any`, casts, `!`, `@ts-ignore`) | +| `composition` | SlopBench Vercel checks (boolean-prop soup, render props) | + +Weights live in [`scoring-profiles/default.json`](./scoring-profiles/default.json) +(mirrored by `src/constants.ts`); the active scoring version is stamped into +every report. + +## Authoring a new task + +```bash +cd packages/benchmark +# 1. scaffold boilerplate (task.toml, test.sh, Dockerfile, solve.sh) +scripts/scaffold-task.sh my-task produce-clean "ts-strictness" \ + "node --experimental-strip-types --test tests/my-task.test.ts" \ + "My task title" "One-line description" +# 2. author tasks/my-task/seed/, instruction.md, +# _authoring/solved/** (clean reference) and _authoring/hidden/** (hidden tests) +# 3. format first, THEN generate the patches (patches embed seed context, +# so formatting the seed after generating would make them stale) +pnpm format +scripts/gen-task-patches.sh tasks/my-task +# 4. validate end-to-end WITHOUT Docker (seed -> grade reference solution) +scripts/validate-task.sh tasks/my-task --expect-pass +``` + +Validate the whole corpus (reference solutions must pass + score reward>0): + +```bash +scripts/validate-all.sh # from packages/benchmark +pnpm benchmark:validate # from the repo root (also run in CI) +``` + +Pure-TS tasks use Node's built-in test runner (`node --experimental-strip-types +--test`) and need no dependency install; React tasks use `vitest` + +`react-dom/server` (install happens at image-build time). Both run **air-gapped** +at agent time. + +## The verifier CLI + +`slop-verify` scores a graded diff directly (used by the grader, handy in dev): + +```bash +slop-verify --root --base --json +``` + +See `slop-verify --help` for all flags (`--profile`, `--functional-pass`, +`--out`, `--fail-under`, …). diff --git a/packages/benchmark/bin/slop-verify.js b/packages/benchmark/bin/slop-verify.js new file mode 100755 index 000000000..0cc765eca --- /dev/null +++ b/packages/benchmark/bin/slop-verify.js @@ -0,0 +1,4 @@ +#!/usr/bin/env node +import { runCli } from "../dist/index.mjs"; + +runCli(process.argv.slice(2)); diff --git a/packages/benchmark/package.json b/packages/benchmark/package.json new file mode 100644 index 000000000..768b669a0 --- /dev/null +++ b/packages/benchmark/package.json @@ -0,0 +1,40 @@ +{ + "name": "@react-doctor/benchmark", + "version": "0.4.2", + "private": true, + "description": "Internal: SlopBench — a Harbor/Pier-compatible benchmark measuring how much React/TypeScript slop a model produces, scored through React Doctor plus a strict TypeScript pass, Vercel-derived AST checks, and deslop heuristics. Not published.", + "license": "MIT", + "bin": { + "slop-verify": "./bin/slop-verify.js" + }, + "files": [ + "bin/**", + "dist/**/*.mjs", + "dist/**/*.d.mts", + "scoring-profiles/**" + ], + "type": "module", + "sideEffects": false, + "exports": { + ".": { + "types": "./dist/index.d.mts", + "default": "./dist/index.mjs" + } + }, + "scripts": { + "build": "node -e \"require('node:fs').rmSync('dist', { recursive: true, force: true })\" && cross-env NODE_ENV=production vp pack", + "test": "vp test run tests", + "typecheck": "tsc --noEmit" + }, + "dependencies": { + "@react-doctor/core": "workspace:*", + "oxc-parser": "^0.132.0" + }, + "devDependencies": { + "@types/node": "^25.6.0", + "react-doctor": "workspace:*" + }, + "engines": { + "node": "^20.19.0 || >=22.12.0" + } +} diff --git a/packages/benchmark/rule-overlap.md b/packages/benchmark/rule-overlap.md new file mode 100644 index 000000000..7ee6b1928 --- /dev/null +++ b/packages/benchmark/rule-overlap.md @@ -0,0 +1,71 @@ +# Rule overlap & ownership + +SlopBench scores slop from multiple scanners. To avoid **double-counting** the +same defect, every slop signal has exactly one owner. This table is the single +source of truth: when adding a check, confirm React Doctor does not already +cover it — if it does, **defer** and (optionally) route its rule id into a finer +dimension instead of re-implementing detection. + +## Ownership by dimension + +| Dimension | Owner | How | +| ------------------- | ------------------------------- | ----------------------------------------------------------------------------------------------- | +| `react-correctness` | React Doctor | categories **Security**, **Bugs** | +| `react-performance` | React Doctor | category **Performance** (minus the rules rerouted below) | +| `accessibility` | React Doctor | category **Accessibility** | +| `maintainability` | React Doctor + deslop heuristic | category **Maintainability** (incl. the `ln`/deslop dead-code plugin) + `deslop/nested-ternary` | +| `bundle` | React Doctor (rerouted) | specific Performance-category rule ids → `bundle` | +| `async-waterfall` | React Doctor (rerouted) | specific Performance-category rule ids → `async-waterfall` | +| `ts-strictness` | SlopBench TS checks | React Doctor does **not** cover generic TS slop | +| `composition` | SlopBench Vercel checks | proliferation / render-prop not counted by React Doctor | + +## React Doctor rules rerouted to finer dimensions + +React Doctor files these under the broad **Performance** category; SlopBench +routes the exact rule ids into dedicated dimensions +(`REACT_DOCTOR_RULE_TO_DIMENSION` in `src/constants.ts`) so the leaderboard can +report them separately. Detection still belongs to React Doctor — we only +relabel the dimension. + +- `react-doctor/no-barrel-import` → `bundle` +- `react-doctor/no-full-lodash-import` → `bundle` +- `react-doctor/no-moment` → `bundle` +- `react-doctor/no-undeferred-third-party` → `bundle` +- `react-doctor/prefer-dynamic-import` → `bundle` +- `react-doctor/no-dynamic-import-path` → `bundle` +- `react-doctor/use-lazy-motion` → `bundle` +- `react-doctor/server-sequential-independent-await` → `async-waterfall` +- `react-doctor/tanstack-start-loader-parallel-fetch` → `async-waterfall` + +## Vercel rules deliberately DEFERRED to React Doctor (no custom check) + +These Vercel best-practices map onto an existing React Doctor rule, so SlopBench +does **not** add a duplicate detector: + +| Vercel rule | Covered by React Doctor | +| ---------------------------------- | ------------------------------------------------------------------------------ | +| `bundle-barrel-imports` | `react-doctor/no-barrel-import`, `no-full-lodash-import` | +| `bundle-dynamic-imports` | `react-doctor/prefer-dynamic-import`, `no-dynamic-import-path` | +| `async-parallel` / waterfalls | `react-doctor/server-sequential-independent-await` | +| `rerender-no-inline-components` | `react-doctor/no-nested-component-definition`, `no-unstable-nested-components` | +| `rerender-derived-state-no-effect` | React Doctor `state-and-effects` rules | +| `react19-no-forwardref` | `react-doctor/forward-ref-uses-ref`, `no-react19-deprecated-apis` | +| `rendering-*` (img, etc.) | `react-doctor/nextjs-no-img-element`, … | + +## Signals SlopBench OWNS (custom checks — React Doctor gap) + +TypeScript strictness (`src/checks/ts-*.ts`, dimension `ts-strictness`): + +- `ts/no-explicit-any` — explicit `any` annotations +- `ts/no-non-null-assertion` — the `!` operator +- `ts/no-type-assertion` — `as Foo` / `x` casts (`as const` exempt) +- `ts/ban-ts-comment` — `@ts-ignore` / `@ts-nocheck` / `@ts-expect-error` (scored as error) + +Composition (`src/checks/vercel-*.ts`, dimension `composition`): + +- `vercel/architecture-boolean-prop-soup` — `*Props` types with ≥ `BOOLEAN_PROP_SOUP_THRESHOLD` boolean flags +- `vercel/patterns-render-prop` — function-valued `render` / `renderX` props + +deslop (`src/checks/deslop-*.ts`, dimension `maintainability`): + +- `deslop/nested-ternary` — nested conditional expressions (one finding per chain) diff --git a/packages/benchmark/scoring-profiles/default.json b/packages/benchmark/scoring-profiles/default.json new file mode 100644 index 000000000..cc6a0b74d --- /dev/null +++ b/packages/benchmark/scoring-profiles/default.json @@ -0,0 +1,35 @@ +{ + "version": "1.0.0", + "severityWeights": { + "error": 5, + "warning": 2 + }, + "categoryMultipliers": { + "Security": 3, + "Bugs": 2, + "Performance": 1.5, + "Accessibility": 1.2, + "Maintainability": 1 + }, + "ruleImpactMultipliers": { + "ts/ban-ts-comment": 2.5, + "ts/no-explicit-any": 2, + "ts/no-non-null-assertion": 1.5, + "ts/no-type-assertion": 1.5, + "vercel/architecture-boolean-prop-soup": 1.8, + "vercel/patterns-render-prop": 1.3, + "deslop/nested-ternary": 1.2 + }, + "dimensionWeights": { + "react-correctness": 1.5, + "ts-strictness": 1.5, + "react-performance": 1.2, + "composition": 1, + "async-waterfall": 1, + "bundle": 1, + "maintainability": 1, + "accessibility": 0.8 + }, + "diffSizeNormalizerLines": 40, + "minNormalizerLines": 25 +} diff --git a/packages/benchmark/scripts/aggregate-results.mjs b/packages/benchmark/scripts/aggregate-results.mjs new file mode 100644 index 000000000..43018c081 --- /dev/null +++ b/packages/benchmark/scripts/aggregate-results.mjs @@ -0,0 +1,116 @@ +#!/usr/bin/env node +// Aggregate a model's per-task SlopBench reports into one scorecard. +// +// After a `pier run`, each task leaves a slop-report.json under the run's logs. +// This walks a logs directory, collects every slop-report.json, and emits a +// results JSON: functional pass-rate, mean slop score, mean reward, and +// per-dimension means — the shape a (v2) leaderboard renders. +// +// Usage: +// node scripts/aggregate-results.mjs --logs --model [--out ] +import * as fs from "node:fs"; +import * as path from "node:path"; + +const parseArgs = (argv) => { + const args = {}; + for (let index = 0; index < argv.length; index++) { + const token = argv[index]; + if (!token.startsWith("--")) continue; + const key = token.slice(2); + const next = argv[index + 1]; + if (next && !next.startsWith("--")) { + args[key] = next; + index++; + } else { + args[key] = true; + } + } + return args; +}; + +const findReports = (root) => { + const found = []; + const walk = (dir) => { + let entries; + try { + entries = fs.readdirSync(dir, { withFileTypes: true }); + } catch { + return; + } + for (const entry of entries) { + const full = path.join(dir, entry.name); + if (entry.isDirectory()) walk(full); + else if (entry.name === "slop-report.json") found.push(full); + } + }; + walk(root); + return found; +}; + +const mean = (values) => + values.length === 0 ? null : values.reduce((total, value) => total + value, 0) / values.length; + +const main = () => { + const args = parseArgs(process.argv.slice(2)); + const logsDir = args.logs; + const model = args.model ?? "unknown-model"; + if (!logsDir) { + process.stderr.write( + "usage: aggregate-results.mjs --logs --model [--out ]\n", + ); + process.exit(2); + } + + const reportPaths = findReports(logsDir); + const tasks = []; + const dimensionScores = new Map(); + + for (const reportPath of reportPaths) { + let report; + try { + report = JSON.parse(fs.readFileSync(reportPath, "utf8")); + } catch { + continue; + } + const taskId = path.basename(path.dirname(path.dirname(reportPath))); + tasks.push({ + task: taskId, + slopScore: report.slopScore, + functionalPass: report.functionalPass, + reward: report.reward, + violationCount: Array.isArray(report.violations) ? report.violations.length : 0, + }); + for (const dimension of report.dimensions ?? []) { + const bucket = dimensionScores.get(dimension.dimension) ?? []; + bucket.push(dimension.score); + dimensionScores.set(dimension.dimension, bucket); + } + } + + const passed = tasks.filter((task) => task.functionalPass === true).length; + const rewards = tasks.map((task) => task.reward).filter((value) => typeof value === "number"); + const perDimensionMean = {}; + for (const [dimension, scores] of dimensionScores) perDimensionMean[dimension] = mean(scores); + + const result = { + model, + generatedAt: new Date().toISOString(), + taskCount: tasks.length, + functionalPassRate: tasks.length === 0 ? null : passed / tasks.length, + meanSlopScore: mean(tasks.map((task) => task.slopScore)), + meanReward: mean(rewards), + perDimensionMean, + tasks: tasks.sort((left, right) => left.task.localeCompare(right.task)), + }; + + const output = `${JSON.stringify(result, null, 2)}\n`; + if (args.out) { + fs.mkdirSync(path.dirname(path.resolve(args.out)), { recursive: true }); + fs.writeFileSync(args.out, output); + process.stderr.write(`wrote ${args.out} (${tasks.length} tasks)\n`); + } else { + process.stdout.write(output); + } +}; + +main(); diff --git a/packages/benchmark/scripts/gen-task-patches.sh b/packages/benchmark/scripts/gen-task-patches.sh new file mode 100755 index 000000000..d0d52b03d --- /dev/null +++ b/packages/benchmark/scripts/gen-task-patches.sh @@ -0,0 +1,45 @@ +#!/usr/bin/env bash +# +# Generate a task's solution.patch and test.patch from authoring inputs: +# tasks//seed/ the starting repo +# tasks//_authoring/solved/ files overwriting seed paths = the reference fix +# tasks//_authoring/hidden/ files ADDED (e.g. tests/*.test.ts) = hidden tests +# +# Produces solution/solution.patch (seed -> solved) and tests/test.patch (added +# hidden files), as real git patches. The _authoring/ inputs stay in-tree as the +# human-readable source for the (otherwise opaque) patches. +# +# Usage: scripts/gen-task-patches.sh +set -euo pipefail + +TASK_DIR="$(cd "$1" && pwd)" +SOLVED="$TASK_DIR/_authoring/solved" +HIDDEN="$TASK_DIR/_authoring/hidden" +WORK="$(mktemp -d)" +trap 'rm -rf "$WORK"' EXIT + +cp -a "$TASK_DIR/seed/." "$WORK/" +cd "$WORK" +git init -q && git config user.email t@t.co && git config user.name t +git add -A && git commit -qm base >/dev/null + +# solution.patch: overlay the solved files, diff against the seed. +if [ -d "$SOLVED" ]; then + cp -a "$SOLVED/." "$WORK/" + git diff > "$TASK_DIR/solution/solution.patch" + git checkout -- . >/dev/null 2>&1 + echo "wrote solution.patch ($(grep -c '^diff' "$TASK_DIR/solution/solution.patch") file(s))" +fi + +# test.patch: add the hidden files (intent-to-add), diff just those. +if [ -d "$HIDDEN" ]; then + cp -a "$HIDDEN/." "$WORK/" + ( cd "$HIDDEN" && find . -type f -printf '%P\n' ) | while IFS= read -r rel; do + git add -N -- "$rel" + done + ( cd "$HIDDEN" && find . -type f -printf '%P\n' ) | sed "s#^#$WORK/#" >/dev/null + HIDDEN_PATHS=$(cd "$HIDDEN" && find . -type f -printf '%P\n') + # shellcheck disable=SC2086 + git -c core.quotepath=false diff -- $HIDDEN_PATHS > "$TASK_DIR/tests/test.patch" + echo "wrote test.patch ($(grep -c '^diff' "$TASK_DIR/tests/test.patch") file(s))" +fi diff --git a/packages/benchmark/scripts/scaffold-task.sh b/packages/benchmark/scripts/scaffold-task.sh new file mode 100755 index 000000000..168366b31 --- /dev/null +++ b/packages/benchmark/scripts/scaffold-task.sh @@ -0,0 +1,105 @@ +#!/usr/bin/env bash +# +# Scaffold the boilerplate for a new SlopBench task (task.toml, tests/test.sh, +# environment/Dockerfile, solution/solve.sh). You still author seed/, +# instruction.md, the reference solution, and the hidden test — then run +# scripts/gen-task-patches.sh to produce solution.patch + test.patch. +# +# Usage: +# scripts/scaffold-task.sh [--needs-install] "" "<desc>" +set -euo pipefail + +BENCH_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" +ID="$1"; FAMILY="$2"; DIMS_CSV="$3"; FUNC_CMD="$4"; shift 4 +NEEDS_INSTALL="no" +if [ "${1:-}" = "--needs-install" ]; then NEEDS_INSTALL="yes"; shift; fi +TITLE="${1:-$ID}"; DESC="${2:-$ID}" +TASK_DIR="$BENCH_ROOT/tasks/$ID" +DIMS_TOML="$(python3 -c "import sys;print(', '.join('\"%s\"'%d for d in sys.argv[1].split(',') for d in [d.strip()] if d))" "$DIMS_CSV")" + +mkdir -p "$TASK_DIR/tests" "$TASK_DIR/environment" "$TASK_DIR/solution" + +cat > "$TASK_DIR/task.toml" <<EOF +schema_version = "1.1" +artifacts = [] + +[task] +name = "slopbench/$ID" +description = "$DESC" +authors = [] +keywords = ["react", "typescript", "slop", "frontend"] + +[metadata] +task_id = "$ID" +display_title = "$TITLE" +display_description = "$DESC" +family = "$FAMILY" +target_dimensions = [$DIMS_TOML] +language = "typescript" +repository_url = "in-tree" +base_commit_hash = "root" +slop_profile = "" + +[verifier] +timeout_sec = 1200.0 + +[verifier.env] + +[agent] +timeout_sec = 3600.0 + +[environment] +build_timeout_sec = 1200.0 +docker_image = "slopbench-base:latest" +os = "linux" +cpus = 2 +memory_mb = 4096 +storage_mb = 10240 +gpus = 0 +allow_internet = false +mcp_servers = [] + +[environment.env] + +[solution.env] +EOF + +cat > "$TASK_DIR/tests/test.sh" <<EOF +#!/usr/bin/env bash +set -euo pipefail +export BASE_COMMIT="\$(git -C "\${APP_DIR:-/app}" rev-list --max-parents=0 HEAD | tail -1)" +export FUNCTIONAL_TEST_CMD="$FUNC_CMD" +exec slopbench-grade +EOF + +if [ "$NEEDS_INSTALL" = "yes" ]; then + INSTALL_STEP='RUN pnpm install --frozen-lockfile --ignore-scripts || pnpm install --ignore-scripts' +else + INSTALL_STEP='# Pure-TS task: no dependency install (functional test uses node --test).' +fi + +cat > "$TASK_DIR/environment/Dockerfile" <<EOF +FROM slopbench-base:latest + +WORKDIR /app + +COPY seed/ . +$INSTALL_STEP +RUN git init -q \\ + && git add -A \\ + && git -c user.email=bench@react.doctor -c user.name=slopbench commit -qm "base" \\ + && git config --global --add safe.directory /app + +CMD ["/bin/bash"] +EOF + +cat > "$TASK_DIR/solution/solve.sh" <<'EOF' +#!/usr/bin/env bash +# Reference solution applier (reviewer aid only — never used at grade time). +set -euo pipefail +cd /app +git apply --whitespace=nowarn /solution/solution.patch +EOF + +chmod +x "$TASK_DIR/tests/test.sh" "$TASK_DIR/solution/solve.sh" +echo "scaffolded $TASK_DIR (author seed/, instruction.md, then gen-task-patches.sh)" diff --git a/packages/benchmark/scripts/validate-all.sh b/packages/benchmark/scripts/validate-all.sh new file mode 100755 index 000000000..3852b203a --- /dev/null +++ b/packages/benchmark/scripts/validate-all.sh @@ -0,0 +1,37 @@ +#!/usr/bin/env bash +# +# Validate every SlopBench task's reference solution end-to-end (no Docker): +# each task's clean solution must pass its functional gate and earn reward > 0. +# Run this before publishing a benchmark release (CI job with network, since +# vitest-based tasks install their dev deps). +# +# Usage: scripts/validate-all.sh +set -uo pipefail + +BENCH_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" +FAILED=() +COUNT=0 + +for task_toml in "$BENCH_ROOT"/tasks/*/task.toml; do + task_dir="$(dirname "$task_toml")" + name="$(basename "$task_dir")" + case "$name" in + _template | _base) continue ;; + esac + COUNT=$((COUNT + 1)) + echo "::: validating $name" + if bash "$BENCH_ROOT/scripts/validate-task.sh" "$task_dir" --expect-pass >/tmp/slopbench-validate-"$name".log 2>&1; then + tail -3 /tmp/slopbench-validate-"$name".log | sed 's/^/ /' + else + echo " FAILED — see /tmp/slopbench-validate-$name.log" + tail -6 /tmp/slopbench-validate-"$name".log | sed 's/^/ /' + FAILED+=("$name") + fi +done + +echo +if [ "${#FAILED[@]}" -ne 0 ]; then + echo "VALIDATE-ALL: ${#FAILED[@]}/$COUNT task(s) FAILED: ${FAILED[*]}" + exit 1 +fi +echo "VALIDATE-ALL: all $COUNT task reference solutions pass + score reward>0" diff --git a/packages/benchmark/scripts/validate-task.sh b/packages/benchmark/scripts/validate-task.sh new file mode 100755 index 000000000..08c009b66 --- /dev/null +++ b/packages/benchmark/scripts/validate-task.sh @@ -0,0 +1,72 @@ +#!/usr/bin/env bash +# +# Locally validate one SlopBench task WITHOUT Docker, by simulating the sandbox: +# seed/ -> git repo (root commit = BASE) -> apply a patch (the "agent") -> +# run the task's tests/test.sh through the shared grader -> inspect reward. +# +# Usage: +# scripts/validate-task.sh <task-dir> [--patch solution|<path>] [--expect-pass|--expect-fail] +# +# Defaults to applying the task's reference solution and expecting a passing, +# high-scoring run. Pass `--patch <file>` to grade an alternative (e.g. sloppy) +# diff. Requires the workspace react-doctor + slop-verify to be built. +set -euo pipefail + +BENCH_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" +TASK_DIR="$(cd "$1" && pwd)"; shift +PATCH="solution" +EXPECT="pass" +while [ $# -gt 0 ]; do + case "$1" in + --patch) PATCH="$2"; shift 2 ;; + --expect-pass) EXPECT="pass"; shift ;; + --expect-fail) EXPECT="fail"; shift ;; + *) echo "unknown arg: $1"; exit 2 ;; + esac +done + +RD_BIN="${RD_BIN:-$BENCH_ROOT/node_modules/.bin/react-doctor}" +SV_BIN="${SV_BIN:-$BENCH_ROOT/bin/slop-verify.js}" +[ -f "$BENCH_ROOT/dist/index.mjs" ] || { echo "build the verifier first: pnpm --filter @react-doctor/benchmark build"; exit 3; } + +WORK="$(mktemp -d)" +trap 'rm -rf "$WORK"' EXIT +APP="$WORK/app"; LOGS="$WORK/logs"; BIN="$WORK/bin" +mkdir -p "$APP" "$LOGS" "$BIN" + +cp -a "$TASK_DIR/seed/." "$APP/" +cd "$APP" +git init -q && git config user.email t@t.co && git config user.name t +git add -A && git commit -qm base >/dev/null + +if [ "${INSTALL:-auto}" != "skip" ] && [ -f package.json ] && grep -q '"vitest"' package.json; then + echo "[validate] installing seed deps (vitest)…" + pnpm install --silent >/dev/null 2>&1 || pnpm install >/dev/null +fi + +PATCH_FILE="$PATCH" +[ "$PATCH" = "solution" ] && PATCH_FILE="$TASK_DIR/solution/solution.patch" +if [ -s "$PATCH_FILE" ] && ! grep -q "^# Replace" "$PATCH_FILE"; then + echo "[validate] applying patch: $PATCH_FILE" + git apply --whitespace=nowarn "$PATCH_FILE" +else + echo "[validate] no usable patch ($PATCH_FILE) — grading the bare seed" +fi + +# Install the shared grader as `slopbench-grade` on PATH. +ln -s "$BENCH_ROOT/tasks/_base/run-verifier.sh" "$BIN/slopbench-grade" +chmod +x "$BENCH_ROOT/tasks/_base/run-verifier.sh" + +PATH="$BIN:$PATH" APP_DIR="$APP" TESTS_DIR="$TASK_DIR/tests" LOG_DIR="$LOGS" \ + SLOP_VERIFY="$SV_BIN" REACT_DOCTOR_BIN="$RD_BIN" \ + bash "$TASK_DIR/tests/test.sh" + +REWARD="$(cat "$LOGS/verifier/reward.txt")" +SCORE="$(python3 -c "import json;print(round(json.load(open('$LOGS/verifier/slop-report.json'))['slopScore'],2))")" +echo "[validate] reward=$REWARD slopScore=$SCORE expect=$EXPECT" +python3 -c "import json;r=json.load(open('$LOGS/verifier/slop-report.json'));print('[validate] violations:', sorted(set(v['ruleId'] for v in r['violations'])))" + +PASS_NUM="$(python3 -c "print(1 if float('$REWARD')>0 else 0)")" +if [ "$EXPECT" = "pass" ] && [ "$PASS_NUM" != "1" ]; then echo "[validate] FAIL: expected reward>0"; exit 1; fi +if [ "$EXPECT" = "fail" ] && [ "$PASS_NUM" != "0" ]; then echo "[validate] FAIL: expected reward==0"; exit 1; fi +echo "[validate] OK" diff --git a/packages/benchmark/src/checks/deslop-nested-ternary.ts b/packages/benchmark/src/checks/deslop-nested-ternary.ts new file mode 100644 index 000000000..d14872be9 --- /dev/null +++ b/packages/benchmark/src/checks/deslop-nested-ternary.ts @@ -0,0 +1,42 @@ +import type { AstCheck, ScanFinding } from "../types/index.js"; +import { makeAstFinding } from "../utils/make-ast-finding.js"; +import { walkAst } from "../utils/walk-ast.js"; + +const asNode = (value: unknown): { type?: string; start?: unknown } | null => + typeof value === "object" && value !== null ? (value as { type?: string }) : null; + +// Flags nested ternaries (the deslop skill calls them out explicitly): a +// `ConditionalExpression` whose consequent or alternate is itself a +// `ConditionalExpression`. Only the outermost of each chain is reported — inner +// conditionals reached as a parent's branch are tracked and skipped — so one +// `a ? b : c ? d : e` chain yields exactly one finding, not a cascade. +export const deslopNestedTernary: AstCheck = (file): ScanFinding[] => { + const nestedChildren = new Set<unknown>(); + const candidates: Array<{ type?: string; start?: unknown }> = []; + + walkAst(file.program, (node) => { + if (node.type !== "ConditionalExpression") return; + const consequent = asNode(node.consequent); + const alternate = asNode(node.alternate); + const consequentIsTernary = consequent?.type === "ConditionalExpression"; + const alternateIsTernary = alternate?.type === "ConditionalExpression"; + if (consequentIsTernary) nestedChildren.add(node.consequent); + if (alternateIsTernary) nestedChildren.add(node.alternate); + if (consequentIsTernary || alternateIsTernary) candidates.push(node); + }); + + return candidates + .filter((node) => !nestedChildren.has(node)) + .map((node) => + makeAstFinding({ + file, + scanner: "deslop-heuristics", + dimension: "maintainability", + ruleId: "deslop/nested-ternary", + severity: "warning", + offset: typeof node.start === "number" ? node.start : 0, + message: + "Nested ternary is hard to read; use an if/else chain, switch, or extracted helper.", + }), + ); +}; diff --git a/packages/benchmark/src/checks/index.ts b/packages/benchmark/src/checks/index.ts new file mode 100644 index 000000000..e3660406b --- /dev/null +++ b/packages/benchmark/src/checks/index.ts @@ -0,0 +1,21 @@ +import type { AstCheck } from "../types/index.js"; +import { deslopNestedTernary } from "./deslop-nested-ternary.js"; +import { tsBanTsComment } from "./ts-ban-ts-comment.js"; +import { tsNoExplicitAny } from "./ts-no-explicit-any.js"; +import { tsNoNonNullAssertion } from "./ts-no-non-null-assertion.js"; +import { tsNoTypeAssertion } from "./ts-no-type-assertion.js"; +import { vercelBooleanPropSoup } from "./vercel-boolean-prop-soup.js"; +import { vercelRenderProp } from "./vercel-render-prop.js"; + +// Every AST check, run once per parsed source file. These cover the slop React +// Doctor does not: TypeScript strictness, Vercel composition patterns, and the +// deslop nested-ternary heuristic. See `rule-overlap.md` for ownership. +export const AST_CHECKS: readonly AstCheck[] = [ + tsNoExplicitAny, + tsNoNonNullAssertion, + tsNoTypeAssertion, + tsBanTsComment, + vercelBooleanPropSoup, + vercelRenderProp, + deslopNestedTernary, +]; diff --git a/packages/benchmark/src/checks/ts-ban-ts-comment.ts b/packages/benchmark/src/checks/ts-ban-ts-comment.ts new file mode 100644 index 000000000..9fdec9f20 --- /dev/null +++ b/packages/benchmark/src/checks/ts-ban-ts-comment.ts @@ -0,0 +1,21 @@ +import type { AstCheck, ScanFinding } from "../types/index.js"; +import { offsetToLine } from "../utils/offset-to-line.js"; + +const TS_SUPPRESSION_PATTERN = /@ts-(ignore|nocheck|expect-error)\b/; + +// Flags `@ts-ignore` / `@ts-nocheck` / `@ts-expect-error` directives. These +// silence the compiler wholesale and are the most severe TypeScript escape +// hatch, so they are scored as errors. Works on the comment stream rather than +// the AST (directives are comments, not nodes). +export const tsBanTsComment: AstCheck = (file): ScanFinding[] => + file.comments + .filter((comment) => TS_SUPPRESSION_PATTERN.test(comment.value)) + .map((comment) => ({ + scanner: "typescript", + dimension: "ts-strictness", + ruleId: "ts/ban-ts-comment", + severity: "error", + filePath: file.filePath, + line: offsetToLine(file.sourceText, comment.start), + message: "TypeScript suppression directive hides real type errors; fix the underlying type.", + })); diff --git a/packages/benchmark/src/checks/ts-no-explicit-any.ts b/packages/benchmark/src/checks/ts-no-explicit-any.ts new file mode 100644 index 000000000..e346b9789 --- /dev/null +++ b/packages/benchmark/src/checks/ts-no-explicit-any.ts @@ -0,0 +1,26 @@ +import type { AstCheck, ScanFinding } from "../types/index.js"; +import { makeAstFinding } from "../utils/make-ast-finding.js"; +import { walkAst } from "../utils/walk-ast.js"; + +// Flags every explicit `any` type annotation. `any` opts a value out of the +// type system entirely — the single loudest TypeScript slop signal — so each +// occurrence is a finding. (Implicit `any` is a tsc concern; this catches the +// explicit, agent-authored kind without needing a type-checker.) +export const tsNoExplicitAny: AstCheck = (file): ScanFinding[] => { + const findings: ScanFinding[] = []; + walkAst(file.program, (node) => { + if (node.type !== "TSAnyKeyword") return; + findings.push( + makeAstFinding({ + file, + scanner: "typescript", + dimension: "ts-strictness", + ruleId: "ts/no-explicit-any", + severity: "warning", + offset: typeof node.start === "number" ? node.start : 0, + message: "Explicit `any` disables type checking for this value; give it a real type.", + }), + ); + }); + return findings; +}; diff --git a/packages/benchmark/src/checks/ts-no-non-null-assertion.ts b/packages/benchmark/src/checks/ts-no-non-null-assertion.ts new file mode 100644 index 000000000..366cf0fa5 --- /dev/null +++ b/packages/benchmark/src/checks/ts-no-non-null-assertion.ts @@ -0,0 +1,26 @@ +import type { AstCheck, ScanFinding } from "../types/index.js"; +import { makeAstFinding } from "../utils/make-ast-finding.js"; +import { walkAst } from "../utils/walk-ast.js"; + +// Flags the non-null assertion operator (`value!`). It silences the compiler's +// null/undefined check without proving the value is present, turning a +// would-be type error into a potential runtime crash. +export const tsNoNonNullAssertion: AstCheck = (file): ScanFinding[] => { + const findings: ScanFinding[] = []; + walkAst(file.program, (node) => { + if (node.type !== "TSNonNullExpression") return; + findings.push( + makeAstFinding({ + file, + scanner: "typescript", + dimension: "ts-strictness", + ruleId: "ts/no-non-null-assertion", + severity: "warning", + offset: typeof node.start === "number" ? node.start : 0, + message: + "Non-null assertion (`!`) hides a possible null/undefined; narrow the type instead.", + }), + ); + }); + return findings; +}; diff --git a/packages/benchmark/src/checks/ts-no-type-assertion.ts b/packages/benchmark/src/checks/ts-no-type-assertion.ts new file mode 100644 index 000000000..9cd135e10 --- /dev/null +++ b/packages/benchmark/src/checks/ts-no-type-assertion.ts @@ -0,0 +1,36 @@ +import type { AstCheck, AstVisitorNode, ScanFinding } from "../types/index.js"; +import { makeAstFinding } from "../utils/make-ast-finding.js"; +import { walkAst } from "../utils/walk-ast.js"; + +// `as const` is a readonly/literal-narrowing assertion, not a slop type-cast, +// so it is exempt. +const isAsConst = (node: AstVisitorNode): boolean => { + const annotation = node.typeAnnotation; + if (typeof annotation !== "object" || annotation === null) return false; + const reference = annotation as { type?: string; typeName?: { name?: string } }; + return reference.type === "TSTypeReference" && reference.typeName?.name === "const"; +}; + +// Flags type assertions (`value as Foo` and `<Foo>value`). A cast overrides the +// compiler's inferred type and is a frequent source of unsound code; `as const` +// is exempt because it narrows rather than overrides. +export const tsNoTypeAssertion: AstCheck = (file): ScanFinding[] => { + const findings: ScanFinding[] = []; + walkAst(file.program, (node) => { + if (node.type !== "TSAsExpression" && node.type !== "TSTypeAssertion") return; + if (isAsConst(node)) return; + findings.push( + makeAstFinding({ + file, + scanner: "typescript", + dimension: "ts-strictness", + ruleId: "ts/no-type-assertion", + severity: "warning", + offset: typeof node.start === "number" ? node.start : 0, + message: + "Type assertion overrides the inferred type; prefer a correct type or a runtime guard.", + }), + ); + }); + return findings; +}; diff --git a/packages/benchmark/src/checks/vercel-boolean-prop-soup.ts b/packages/benchmark/src/checks/vercel-boolean-prop-soup.ts new file mode 100644 index 000000000..34d308d9c --- /dev/null +++ b/packages/benchmark/src/checks/vercel-boolean-prop-soup.ts @@ -0,0 +1,72 @@ +import { BOOLEAN_PROP_SOUP_THRESHOLD } from "../constants.js"; +import type { AstCheck, AstVisitorNode, ParsedSourceFile, ScanFinding } from "../types/index.js"; +import { makeAstFinding } from "../utils/make-ast-finding.js"; +import { walkAst } from "../utils/walk-ast.js"; + +interface PropertySignature { + type?: string; + typeAnnotation?: { typeAnnotation?: { type?: string } }; +} + +const countBooleanMembers = (members: unknown): number => { + if (!Array.isArray(members)) return 0; + let count = 0; + for (const member of members) { + const signature = member as PropertySignature; + if ( + signature.type === "TSPropertySignature" && + signature.typeAnnotation?.typeAnnotation?.type === "TSBooleanKeyword" + ) { + count++; + } + } + return count; +}; + +const endsWithProps = (name: unknown): boolean => typeof name === "string" && /Props$/.test(name); + +const makeFinding = ( + file: ParsedSourceFile, + node: AstVisitorNode, + booleanCount: number, +): ScanFinding => + makeAstFinding({ + file, + scanner: "vercel-checks", + dimension: "composition", + ruleId: "vercel/architecture-boolean-prop-soup", + severity: "warning", + offset: typeof node.start === "number" ? node.start : 0, + message: `Props type declares ${booleanCount} boolean flags; prefer composition (variants / compound components) over boolean-prop soup.`, + }); + +// Flags a props type carrying many boolean flags (Vercel +// architecture-avoid-boolean-props). Each boolean doubles the component's +// possible states; past the threshold this is the classic boolean-prop soup +// that composition (variants / compound components) should replace. Scoped to +// `*Props` declarations so unrelated config types are not penalized. +export const vercelBooleanPropSoup: AstCheck = (file): ScanFinding[] => { + const findings: ScanFinding[] = []; + walkAst(file.program, (node) => { + if ( + node.type === "TSInterfaceDeclaration" && + endsWithProps((node.id as { name?: string })?.name) + ) { + const booleanCount = countBooleanMembers((node.body as { body?: unknown })?.body); + if (booleanCount >= BOOLEAN_PROP_SOUP_THRESHOLD) + findings.push(makeFinding(file, node, booleanCount)); + } + if ( + node.type === "TSTypeAliasDeclaration" && + endsWithProps((node.id as { name?: string })?.name) && + (node.typeAnnotation as { type?: string })?.type === "TSTypeLiteral" + ) { + const booleanCount = countBooleanMembers( + (node.typeAnnotation as { members?: unknown }).members, + ); + if (booleanCount >= BOOLEAN_PROP_SOUP_THRESHOLD) + findings.push(makeFinding(file, node, booleanCount)); + } + }); + return findings; +}; diff --git a/packages/benchmark/src/checks/vercel-render-prop.ts b/packages/benchmark/src/checks/vercel-render-prop.ts new file mode 100644 index 000000000..b9b3cdf95 --- /dev/null +++ b/packages/benchmark/src/checks/vercel-render-prop.ts @@ -0,0 +1,40 @@ +import type { AstCheck, ScanFinding } from "../types/index.js"; +import { makeAstFinding } from "../utils/make-ast-finding.js"; +import { walkAst } from "../utils/walk-ast.js"; + +const RENDER_PROP_NAME_PATTERN = /^render([A-Z].*)?$/; + +const keyName = (key: unknown): string | undefined => { + const identifier = key as { type?: string; name?: string; value?: string }; + if (identifier?.type === "Identifier") return identifier.name; + if (identifier?.type === "Literal") return identifier.value; + return undefined; +}; + +// Flags function-valued `render` / `renderX` props (Vercel +// patterns-children-over-render-props). A render prop threads JSX through a +// callback where `children` (or a compound component) would compose more +// cleanly and stay readable as the component grows. +export const vercelRenderProp: AstCheck = (file): ScanFinding[] => { + const findings: ScanFinding[] = []; + walkAst(file.program, (node) => { + if (node.type !== "TSPropertySignature") return; + const name = keyName(node.key); + if (!name || !RENDER_PROP_NAME_PATTERN.test(name)) return; + const annotationType = (node.typeAnnotation as { typeAnnotation?: { type?: string } }) + ?.typeAnnotation?.type; + if (annotationType !== "TSFunctionType") return; + findings.push( + makeAstFinding({ + file, + scanner: "vercel-checks", + dimension: "composition", + ruleId: "vercel/patterns-render-prop", + severity: "warning", + offset: typeof node.start === "number" ? node.start : 0, + message: `Render prop \`${name}\` passes JSX through a callback; prefer \`children\` / compound components for composition.`, + }), + ); + }); + return findings; +}; diff --git a/packages/benchmark/src/cli.ts b/packages/benchmark/src/cli.ts new file mode 100644 index 000000000..957b7c858 --- /dev/null +++ b/packages/benchmark/src/cli.ts @@ -0,0 +1,86 @@ +import * as fs from "node:fs"; +import * as path from "node:path"; +import { runSlopVerifier } from "./run-slop-verifier.js"; +import type { SlopReport } from "./types/index.js"; +import { parseCliArgs } from "./utils/parse-cli-args.js"; + +const USAGE = `slop-verify — score the React/TypeScript slop in a graded diff + +Usage: + slop-verify --root <dir> --base <ref> [options] + +Options: + --root <dir> Project the agent edited (default: cwd) + --base <ref> Git ref the agent started from (default: HEAD) + --doctor-bin <path> React Doctor CLI to invoke (default: react-doctor on PATH) + --profile <path> Scoring-profile JSON (default: built-in profile) + --functional-pass <b> Functional gate outcome: true|false (default: unknown) + --out <path> Write the full JSON SlopReport here + --json Print the JSON SlopReport to stdout (instead of a summary) + --fail-under <score> Exit non-zero if slopScore < <score> (default: never) + --quiet Suppress the human-readable summary`; + +const asBoolean = (value: string | boolean | undefined): boolean | null => { + if (value === undefined) return null; + if (value === true || value === "true" || value === "1") return true; + if (value === false || value === "false" || value === "0") return false; + return null; +}; + +const asString = (value: string | boolean | undefined): string | undefined => + typeof value === "string" ? value : undefined; + +const renderSummary = (report: SlopReport): string => { + const lines = [ + `SlopBench score: ${report.slopScore.toFixed(1)} / 100 (scoring ${report.scoringVersion})`, + `Changed files: ${report.diffStats.changedFileCount} Added lines: ${report.diffStats.addedLineCount} Violations: ${report.violations.length}`, + "Dimensions:", + ...report.dimensions.map( + (dimension) => + ` ${dimension.dimension.padEnd(20)} ${dimension.score.toFixed(1).padStart(6)} (${dimension.violationCount} findings)`, + ), + ]; + if (report.functionalPass !== null) { + lines.push( + `Functional gate: ${report.functionalPass ? "PASS" : "FAIL"} Reward: ${report.reward?.toFixed(3)}`, + ); + } + for (const error of report.scannerErrors) lines.push(`! scanner issue: ${error}`); + return lines.join("\n"); +}; + +// CLI entry. Runs the verifier and reports; only exits non-zero when an +// explicit `--fail-under` gate is set and missed, so a normal grading run +// always succeeds and lets `test.sh` own the reward. +export const runCli = (argv: string[]): void => { + const args = parseCliArgs(argv); + if (args.help || args.h) { + process.stdout.write(`${USAGE}\n`); + return; + } + + const report = runSlopVerifier({ + rootDirectory: path.resolve(asString(args.root) ?? process.cwd()), + baseRef: asString(args.base) ?? "HEAD", + reactDoctorBin: asString(args["doctor-bin"]), + profilePath: asString(args.profile), + functionalPass: asBoolean(args["functional-pass"]), + }); + + const outPath = asString(args.out); + if (outPath) { + fs.mkdirSync(path.dirname(path.resolve(outPath)), { recursive: true }); + fs.writeFileSync(outPath, `${JSON.stringify(report, null, 2)}\n`); + } + + if (args.json) { + process.stdout.write(`${JSON.stringify(report)}\n`); + } else if (!args.quiet) { + process.stdout.write(`${renderSummary(report)}\n`); + } + + const failUnder = asString(args["fail-under"]); + if (failUnder !== undefined && report.slopScore < Number.parseFloat(failUnder)) { + process.exitCode = 1; + } +}; diff --git a/packages/benchmark/src/constants.ts b/packages/benchmark/src/constants.ts new file mode 100644 index 000000000..981ce9da2 --- /dev/null +++ b/packages/benchmark/src/constants.ts @@ -0,0 +1,98 @@ +import type { ScoringProfile, SlopDimension } from "./types/index.js"; + +// Bump when the scoring formula or the built-in profile changes in a way that +// makes scores incomparable across versions. Stamped into every SlopReport. +export const SCORING_VERSION = "1.0.0"; + +export const SCORE_MAX = 100; +export const SCORE_MIN = 0; + +// Default CLI to invoke when a task does not pin one (resolved on PATH). +export const DEFAULT_REACT_DOCTOR_BIN = "react-doctor"; + +// React Doctor emits five user-facing categories; each maps to exactly one +// SlopBench dimension so a React Doctor finding lands in a single bucket. +export const REACT_DOCTOR_CATEGORY_TO_DIMENSION: Record<string, SlopDimension> = { + Security: "react-correctness", + Bugs: "react-correctness", + Performance: "react-performance", + Accessibility: "accessibility", + Maintainability: "maintainability", +}; + +// Where a React Doctor diagnostic falls when its category string is +// unrecognized (e.g. a newly added bucket): treated as a correctness signal. +export const REACT_DOCTOR_FALLBACK_DIMENSION: SlopDimension = "react-correctness"; + +// Specific React Doctor rules whose intent is finer than their category +// bucket. React Doctor files bundle- and waterfall-rules under the broad +// "Performance" category; routing those exact rule ids into the dedicated +// `bundle` / `async-waterfall` dimensions lets SlopBench report them +// separately without us re-implementing detection (we DEFER to React Doctor — +// see `rule-overlap.md`). Checked before the category mapping. +export const REACT_DOCTOR_RULE_TO_DIMENSION: Record<string, SlopDimension> = { + "react-doctor/no-barrel-import": "bundle", + "react-doctor/no-full-lodash-import": "bundle", + "react-doctor/no-moment": "bundle", + "react-doctor/no-undeferred-third-party": "bundle", + "react-doctor/prefer-dynamic-import": "bundle", + "react-doctor/no-dynamic-import-path": "bundle", + "react-doctor/use-lazy-motion": "bundle", + "react-doctor/server-sequential-independent-await": "async-waterfall", + "react-doctor/tanstack-start-loader-parallel-fetch": "async-waterfall", +}; + +// Threshold for the boolean-prop-soup composition check: a props type with at +// least this many boolean members is flagged (Vercel architecture-avoid- +// boolean-props). Below it, a couple of flags is normal and not slop. +export const BOOLEAN_PROP_SOUP_THRESHOLD = 4; + +// Conditional-expression nesting depth at or above which the deslop nested- +// ternary heuristic fires (the deslop skill calls out nested ternaries). +export const NESTED_TERNARY_DEPTH_THRESHOLD = 2; + +// The built-in fallback profile and single source of truth for default +// weights. `scoring-profiles/default.json` mirrors this object; a drift test +// keeps them identical. Tasks may override via `slop-verify --profile <path>`. +export const DEFAULT_SCORING_PROFILE: ScoringProfile = { + version: SCORING_VERSION, + severityWeights: { + error: 5, + warning: 2, + }, + categoryMultipliers: { + Security: 3, + Bugs: 2, + Performance: 1.5, + Accessibility: 1.2, + Maintainability: 1, + }, + ruleImpactMultipliers: { + // TypeScript slop tiers — escape hatches that silence the compiler hurt most. + "ts/ban-ts-comment": 2.5, + "ts/no-explicit-any": 2, + "ts/no-non-null-assertion": 1.5, + "ts/no-type-assertion": 1.5, + // Composition gap-fillers (React Doctor does not count these). + "vercel/architecture-boolean-prop-soup": 1.8, + "vercel/patterns-render-prop": 1.3, + // deslop maintainability heuristic. + "deslop/nested-ternary": 1.2, + }, + dimensionWeights: { + "react-correctness": 1.5, + "ts-strictness": 1.5, + "react-performance": 1.2, + composition: 1, + "async-waterfall": 1, + bundle: 1, + maintainability: 1, + accessibility: 0.8, + }, + diffSizeNormalizerLines: 40, + minNormalizerLines: 25, +}; + +// Default multiplier for a finding whose category / rule is not in the +// profile's multiplier tables. +export const DEFAULT_WEIGHT_MULTIPLIER = 1; diff --git a/packages/benchmark/src/index.ts b/packages/benchmark/src/index.ts new file mode 100644 index 000000000..e47449901 --- /dev/null +++ b/packages/benchmark/src/index.ts @@ -0,0 +1,17 @@ +export { runSlopVerifier } from "./run-slop-verifier.js"; +export type { SlopVerifierOptions } from "./run-slop-verifier.js"; +export { runCli } from "./cli.js"; +export { computeSlopScore } from "./scoring/slop-score.js"; +export { loadScoringProfile } from "./scoring/load-scoring-profile.js"; +export { DEFAULT_SCORING_PROFILE, SCORING_VERSION } from "./constants.js"; +export type { + ScanFinding, + ScannerContext, + ScannerName, + ScoringProfile, + SlopDimension, + SlopDimensionScore, + SlopDiffStats, + SlopReport, + SlopViolation, +} from "./types/index.js"; diff --git a/packages/benchmark/src/run-slop-verifier.ts b/packages/benchmark/src/run-slop-verifier.ts new file mode 100644 index 000000000..3cfbca452 --- /dev/null +++ b/packages/benchmark/src/run-slop-verifier.ts @@ -0,0 +1,70 @@ +import { DEFAULT_REACT_DOCTOR_BIN } from "./constants.js"; +import { runAstChecks } from "./scanners/run-ast-checks.js"; +import { runReactDoctor } from "./scanners/run-react-doctor.js"; +import { loadScoringProfile } from "./scoring/load-scoring-profile.js"; +import { computeSlopScore } from "./scoring/slop-score.js"; +import type { ScannerContext, SlopReport } from "./types/index.js"; +import { collectDiff } from "./utils/collect-diff.js"; + +export interface SlopVerifierOptions { + // Absolute path to the project the agent edited. + rootDirectory: string; + // Git ref the agent started from; the diff is computed against it. + baseRef: string; + // React Doctor CLI to invoke; defaults to `react-doctor` on PATH. + reactDoctorBin?: string; + // Optional scoring-profile JSON path; defaults to the built-in profile. + profilePath?: string; + // The functional-test outcome, when known, so the report can carry the + // composite reward. `null`/omitted ⇒ quality-only run. + functionalPass?: boolean | null; +} + +const computeReward = (functionalPass: boolean | null, slopScore: number): number | null => { + if (functionalPass === null) return null; + return functionalPass ? slopScore / 100 : 0; +}; + +// Run the full slop verification pipeline over a graded diff and assemble the +// SlopReport: collect the diff, run React Doctor (offline) plus the AST checks, +// score deterministically, and combine with the functional gate. Pure of any +// process exit — the caller (CLI / test.sh) decides how to act on the report. +export const runSlopVerifier = (options: SlopVerifierOptions): SlopReport => { + const profile = loadScoringProfile(options.profilePath); + const diff = collectDiff(options.rootDirectory, options.baseRef); + const scannerErrors: string[] = []; + if (diff.error) scannerErrors.push(`diff: ${diff.error}`); + + const context: ScannerContext = { + rootDirectory: options.rootDirectory, + changedFiles: diff.changedFiles, + baseRef: options.baseRef, + addedLineCount: diff.addedLineCount, + reactDoctorBin: options.reactDoctorBin ?? DEFAULT_REACT_DOCTOR_BIN, + }; + + const reactDoctor = runReactDoctor(context); + if (reactDoctor.error) scannerErrors.push(`react-doctor: ${reactDoctor.error}`); + const astFindings = runAstChecks(context); + + const findings = [...reactDoctor.findings, ...astFindings]; + const scored = computeSlopScore(findings, diff.addedLineCount, profile); + const functionalPass = options.functionalPass ?? null; + + return { + scoringVersion: profile.version, + doctorVersion: reactDoctor.doctorVersion, + generatedAt: new Date().toISOString(), + diffStats: { + changedFileCount: diff.changedFiles.length, + addedLineCount: diff.addedLineCount, + normalizerLines: scored.normalizerLines, + }, + violations: scored.violations, + dimensions: scored.dimensions, + slopScore: scored.slopScore, + scannerErrors, + functionalPass, + reward: computeReward(functionalPass, scored.slopScore), + }; +}; diff --git a/packages/benchmark/src/scanners/run-ast-checks.ts b/packages/benchmark/src/scanners/run-ast-checks.ts new file mode 100644 index 000000000..ff6dff0bc --- /dev/null +++ b/packages/benchmark/src/scanners/run-ast-checks.ts @@ -0,0 +1,19 @@ +import { AST_CHECKS } from "../checks/index.js"; +import type { ScanFinding, ScannerContext } from "../types/index.js"; +import { parseSourceFile } from "../utils/parse-source-file.js"; + +// Parse each changed source file once and run every AST check over it. Covers +// the TypeScript-strictness, composition, and deslop dimensions that React +// Doctor does not. Unparsable / non-source files are silently skipped — a file +// the parser rejects cannot be fairly scored for AST-level slop. +export const runAstChecks = (context: ScannerContext): ScanFinding[] => { + const findings: ScanFinding[] = []; + for (const filePath of context.changedFiles) { + const parsed = parseSourceFile(context.rootDirectory, filePath); + if (!parsed) continue; + for (const check of AST_CHECKS) { + findings.push(...check(parsed)); + } + } + return findings; +}; diff --git a/packages/benchmark/src/scanners/run-react-doctor.ts b/packages/benchmark/src/scanners/run-react-doctor.ts new file mode 100644 index 000000000..07adc55af --- /dev/null +++ b/packages/benchmark/src/scanners/run-react-doctor.ts @@ -0,0 +1,98 @@ +import type { JsonReport } from "@react-doctor/core"; +import { + REACT_DOCTOR_CATEGORY_TO_DIMENSION, + REACT_DOCTOR_FALLBACK_DIMENSION, + REACT_DOCTOR_RULE_TO_DIMENSION, +} from "../constants.js"; +import type { ScanFinding, ScannerContext, SlopDimension } from "../types/index.js"; +import { resolveBinInvocation } from "../utils/resolve-bin-invocation.js"; +import { runCommand } from "../utils/run-command.js"; + +export interface ReactDoctorScanResult { + findings: ScanFinding[]; + // The CLI's reported version, for the SlopReport provenance field. + doctorVersion: string | null; + // Set when the CLI could not be run or its output was unparseable. A failed + // React Doctor scan must not silently score as "clean", so the orchestrator + // surfaces this rather than treating zero findings as success. + error: string | null; +} + +// React Doctor exits non-zero whenever it finds issues, so a clean JSON parse — +// not the exit code — is the success signal. +const parseReport = (stdout: string): JsonReport | null => { + const trimmed = stdout.trim(); + if (!trimmed) return null; + try { + const parsed: unknown = JSON.parse(trimmed); + if (parsed && typeof parsed === "object" && "diagnostics" in parsed) { + return parsed as JsonReport; + } + return null; + } catch { + return null; + } +}; + +const resolveDimension = (ruleId: string, category: string): SlopDimension => + REACT_DOCTOR_RULE_TO_DIMENSION[ruleId] ?? + REACT_DOCTOR_CATEGORY_TO_DIMENSION[category] ?? + REACT_DOCTOR_FALLBACK_DIMENSION; + +const toFinding = (diagnostic: JsonReport["diagnostics"][number]): ScanFinding => { + const ruleId = `${diagnostic.plugin}/${diagnostic.rule}`; + return { + scanner: "react-doctor", + dimension: resolveDimension(ruleId, diagnostic.category), + ruleId, + severity: diagnostic.severity, + filePath: diagnostic.filePath, + line: diagnostic.line, + message: diagnostic.message, + category: diagnostic.category, + }; +}; + +// Run React Doctor over the whole project (offline, no remote score), then keep +// only diagnostics in files the agent changed. Diff-scoping by changed file — +// rather than React Doctor's own `--diff` git semantics — keeps grading +// deterministic and ensures pre-existing, untouched slop is never charged to +// the agent. +// +// Dead-code analysis is disabled (`--no-dead-code`): whole-file reachability +// needs a real application entry point, which a diff-scoped grade of an +// isolated change does not reliably provide, so it would false-fire +// "unused file" on legitimately clean new code. The deslop/maintainability +// signal is still covered by the AST `deslop/nested-ternary` check and React +// Doctor's other Maintainability rules. +export const runReactDoctor = (context: ScannerContext): ReactDoctorScanResult => { + const changed = new Set(context.changedFiles); + const { command, prefixArgs } = resolveBinInvocation(context.reactDoctorBin); + const result = runCommand( + command, + [...prefixArgs, context.rootDirectory, "--json", "--no-score", "--no-dead-code"], + { cwd: context.rootDirectory }, + ); + + if (result.spawnFailed) { + return { + findings: [], + doctorVersion: null, + error: `react-doctor failed to start: ${result.stderr}`, + }; + } + + const report = parseReport(result.stdout); + if (!report) { + return { + findings: [], + doctorVersion: null, + error: `react-doctor produced no parseable JSON report (exit ${result.exitCode})`, + }; + } + + const findings = report.diagnostics + .filter((diagnostic) => changed.has(diagnostic.filePath)) + .map(toFinding); + return { findings, doctorVersion: report.version ?? null, error: null }; +}; diff --git a/packages/benchmark/src/scoring/compute-violation-weight.ts b/packages/benchmark/src/scoring/compute-violation-weight.ts new file mode 100644 index 000000000..6a327a939 --- /dev/null +++ b/packages/benchmark/src/scoring/compute-violation-weight.ts @@ -0,0 +1,30 @@ +import { DEFAULT_WEIGHT_MULTIPLIER } from "../constants.js"; +import type { ScanFinding, ScoringProfile, SlopViolation } from "../types/index.js"; + +// Turn a raw scanner finding into a weighted violation. Weight is the single +// place severity, React Doctor category, and per-rule Vercel/TS impact tiers +// combine, so every scanner is scored on the same scale: +// weight = severityBase × categoryMultiplier × ruleImpactMultiplier +export const computeViolationWeight = ( + finding: ScanFinding, + profile: ScoringProfile, +): SlopViolation => { + const severityBase = profile.severityWeights[finding.severity]; + const categoryMultiplier = + finding.category === undefined + ? DEFAULT_WEIGHT_MULTIPLIER + : (profile.categoryMultipliers[finding.category] ?? DEFAULT_WEIGHT_MULTIPLIER); + const ruleImpactMultiplier = + profile.ruleImpactMultipliers[finding.ruleId] ?? DEFAULT_WEIGHT_MULTIPLIER; + + return { + scanner: finding.scanner, + dimension: finding.dimension, + ruleId: finding.ruleId, + severity: finding.severity, + weight: severityBase * categoryMultiplier * ruleImpactMultiplier, + filePath: finding.filePath, + line: finding.line, + message: finding.message, + }; +}; diff --git a/packages/benchmark/src/scoring/load-scoring-profile.ts b/packages/benchmark/src/scoring/load-scoring-profile.ts new file mode 100644 index 000000000..2f3d275c0 --- /dev/null +++ b/packages/benchmark/src/scoring/load-scoring-profile.ts @@ -0,0 +1,24 @@ +import * as fs from "node:fs"; +import { DEFAULT_SCORING_PROFILE } from "../constants.js"; +import type { ScoringProfile } from "../types/index.js"; + +// A loaded profile is trusted shape-wise (it is repo-controlled config, not +// agent input), but we validate the few fields the scorer divides by so a +// malformed override fails loudly instead of producing NaN scores. +const assertUsableProfile = (profile: ScoringProfile, source: string): void => { + if (profile.diffSizeNormalizerLines <= 0 || profile.minNormalizerLines <= 0) { + throw new Error(`scoring profile ${source} must use positive normalizer line counts`); + } + if (!profile.severityWeights || !profile.dimensionWeights) { + throw new Error(`scoring profile ${source} is missing severity or dimension weights`); + } +}; + +// Resolve the scoring profile: the built-in default, or a JSON override when a +// task pins one via `--profile <path>`. +export const loadScoringProfile = (profilePath?: string): ScoringProfile => { + if (!profilePath) return DEFAULT_SCORING_PROFILE; + const parsed: ScoringProfile = JSON.parse(fs.readFileSync(profilePath, "utf8")); + assertUsableProfile(parsed, profilePath); + return parsed; +}; diff --git a/packages/benchmark/src/scoring/slop-score.ts b/packages/benchmark/src/scoring/slop-score.ts new file mode 100644 index 000000000..e2afc9b0d --- /dev/null +++ b/packages/benchmark/src/scoring/slop-score.ts @@ -0,0 +1,80 @@ +import { SCORE_MAX, SCORE_MIN } from "../constants.js"; +import type { + ScanFinding, + ScoringProfile, + SlopDimension, + SlopDimensionScore, + SlopViolation, +} from "../types/index.js"; +import { clamp } from "../utils/clamp.js"; +import { computeViolationWeight } from "./compute-violation-weight.js"; + +export interface SlopScoreResult { + violations: SlopViolation[]; + dimensions: SlopDimensionScore[]; + slopScore: number; + normalizerLines: number; +} + +// Divisor that makes penalties "per reference unit of code" so a large +// legitimate feature is not punished as hard as the same violations in a tiny +// diff. Floored by `minNormalizerLines` so a one-line change can't divide by a +// near-zero size and crater the score on a single finding. +const computeNormalizer = (addedLineCount: number, profile: ScoringProfile): number => { + const effectiveLines = Math.max(addedLineCount, profile.minNormalizerLines); + return effectiveLines / profile.diffSizeNormalizerLines; +}; + +const dimensionScoreFrom = ( + dimension: SlopDimension, + dimensionViolations: SlopViolation[], + normalizer: number, +): SlopDimensionScore => { + const rawPenalty = dimensionViolations.reduce((total, violation) => total + violation.weight, 0); + const normalizedPenalty = rawPenalty / normalizer; + return { + dimension, + score: clamp(SCORE_MAX - normalizedPenalty, SCORE_MIN, SCORE_MAX), + violationCount: dimensionViolations.length, + weightedPenalty: normalizedPenalty, + }; +}; + +// Score a set of findings into per-dimension scores and one composite. A +// dimension with no findings scores a full 100 (you cannot be penalized for +// slop you had no opportunity to introduce); the composite is the +// profile-weighted mean across every dimension the profile defines. +export const computeSlopScore = ( + findings: ScanFinding[], + addedLineCount: number, + profile: ScoringProfile, +): SlopScoreResult => { + const violations = findings.map((finding) => computeViolationWeight(finding, profile)); + const normalizer = computeNormalizer(addedLineCount, profile); + + const dimensions = Object.keys(profile.dimensionWeights).map( + (dimensionKey): SlopDimensionScore => { + const dimension = dimensionKey as SlopDimension; + const dimensionViolations = violations.filter( + (violation) => violation.dimension === dimension, + ); + return dimensionScoreFrom(dimension, dimensionViolations, normalizer); + }, + ); + + let weightedScoreTotal = 0; + let weightTotal = 0; + for (const dimensionScore of dimensions) { + const dimensionWeight = profile.dimensionWeights[dimensionScore.dimension]; + weightedScoreTotal += dimensionScore.score * dimensionWeight; + weightTotal += dimensionWeight; + } + const slopScore = weightTotal === 0 ? SCORE_MAX : weightedScoreTotal / weightTotal; + + return { + violations, + dimensions, + slopScore: clamp(slopScore, SCORE_MIN, SCORE_MAX), + normalizerLines: normalizer * profile.diffSizeNormalizerLines, + }; +}; diff --git a/packages/benchmark/src/types/index.ts b/packages/benchmark/src/types/index.ts new file mode 100644 index 000000000..52a29d1a0 --- /dev/null +++ b/packages/benchmark/src/types/index.ts @@ -0,0 +1,12 @@ +export type { ScannerName, SlopDimension } from "./slop-dimension.js"; +export type { SlopViolation } from "./slop-violation.js"; +export type { ScanFinding } from "./scan-finding.js"; +export type { ScoringProfile } from "./scoring-profile.js"; +export type { SlopDiffStats, SlopDimensionScore, SlopReport } from "./slop-report.js"; +export type { ScannerContext } from "./scanner-context.js"; +export type { + AstCheck, + AstVisitorNode, + ParsedSourceFile, + SourceComment, +} from "./parsed-source-file.js"; diff --git a/packages/benchmark/src/types/parsed-source-file.ts b/packages/benchmark/src/types/parsed-source-file.ts new file mode 100644 index 000000000..f70a35681 --- /dev/null +++ b/packages/benchmark/src/types/parsed-source-file.ts @@ -0,0 +1,34 @@ +import type { ScanFinding } from "./scan-finding.js"; + +// A comment as reported by oxc-parser (byte-offset spans, no line info). +export interface SourceComment { + type: "Line" | "Block"; + value: string; + start: number; + end: number; +} + +// One changed source file, parsed once and shared by every AST check. `program` +// is the oxc ESTree `Program` node; it is intentionally untyped (`unknown`) +// because the checks walk it structurally by `type` rather than against a +// committed AST type surface. +export interface ParsedSourceFile { + // Repo-relative path, matching React Doctor's `filePath` convention. + filePath: string; + sourceText: string; + program: unknown; + comments: SourceComment[]; +} + +// A structurally-typed AST node: anything with a string `type`, plus arbitrary +// child fields the checks read by name. The oxc AST is walked this way rather +// than against a committed, versioned AST type surface. +export interface AstVisitorNode { + type: string; + [key: string]: unknown; +} + +// An AST check: a pure function from one parsed file to its findings. Lives in +// `src/checks/<kebab-name>.ts`, one check per file, and is registered in +// `checks/index.ts`. +export type AstCheck = (file: ParsedSourceFile) => ScanFinding[]; diff --git a/packages/benchmark/src/types/scan-finding.ts b/packages/benchmark/src/types/scan-finding.ts new file mode 100644 index 000000000..261028777 --- /dev/null +++ b/packages/benchmark/src/types/scan-finding.ts @@ -0,0 +1,18 @@ +import type { ScannerName, SlopDimension } from "./slop-dimension.js"; + +// What a scanner emits before scoring. The orchestrator converts every +// `ScanFinding` into a weighted `SlopViolation` in one place +// (`scoring/compute-violation-weight.ts`), so scanners stay weight-agnostic. +export interface ScanFinding { + scanner: ScannerName; + dimension: SlopDimension; + ruleId: string; + severity: "error" | "warning"; + filePath: string; + line: number; + message: string; + // React Doctor's user-facing category, when the finding came from it. Used + // only to pick the profile's `categoryMultipliers` entry; absent for the + // custom scanners, which rely on `ruleImpactMultipliers` instead. + category?: string; +} diff --git a/packages/benchmark/src/types/scanner-context.ts b/packages/benchmark/src/types/scanner-context.ts new file mode 100644 index 000000000..d0e1d9f13 --- /dev/null +++ b/packages/benchmark/src/types/scanner-context.ts @@ -0,0 +1,16 @@ +// Shared, read-only input every scanner receives. Built once by the +// orchestrator so each scanner sees the same view of the graded diff. +export interface ScannerContext { + // Absolute path to the project under test (the repo the agent edited). + rootDirectory: string; + // Repo-relative paths of the files the agent changed, already filtered to + // gradable source (tests, fixtures, generated, and lockfiles removed). + changedFiles: string[]; + // Base git ref the agent started from, used for diff-scoped scans. + baseRef: string; + // Total added lines across `changedFiles`, the basis for size-normalization. + addedLineCount: number; + // Absolute path to the React Doctor CLI entry to invoke. Lets the sandbox + // image point at a pinned binary; falls back to `react-doctor` on PATH. + reactDoctorBin: string; +} diff --git a/packages/benchmark/src/types/scoring-profile.ts b/packages/benchmark/src/types/scoring-profile.ts new file mode 100644 index 000000000..77bb28088 --- /dev/null +++ b/packages/benchmark/src/types/scoring-profile.ts @@ -0,0 +1,29 @@ +import type { SlopDimension } from "./slop-dimension.js"; + +// A versioned, fully-declarative weight table. Every number that influences a +// score lives here (loaded from `scoring-profiles/<name>.json`) so a score is +// reproducible from its `version` alone. No weights are hard-coded in the +// scorer — `constants.ts` only carries the built-in fallback profile. +export interface ScoringProfile { + version: string; + // Base penalty per finding, before category/impact multipliers. + severityWeights: { + error: number; + warning: number; + }; + // React Doctor's five user-facing categories → penalty multiplier. + // Keyed by the exact category string React Doctor emits + // (Security, Bugs, Performance, Accessibility, Maintainability). + categoryMultipliers: Record<string, number>; + // Optional per-rule multiplier (e.g. derived from a Vercel rule's CRITICAL + // / HIGH impact tier). Keyed by fully-qualified `ruleId`. Missing ⇒ 1. + ruleImpactMultipliers: Record<string, number>; + // How much each dimension counts toward the composite slop score. Need not + // sum to 1 — the scorer normalizes by the total of present dimensions. + dimensionWeights: Record<SlopDimension, number>; + // Penalty is divided by `max(changedLines, minNormalizerLines) / + // diffSizeNormalizerLines`, so a large legitimate feature is not punished as + // hard as the same violation count in a tiny diff. + diffSizeNormalizerLines: number; + minNormalizerLines: number; +} diff --git a/packages/benchmark/src/types/slop-dimension.ts b/packages/benchmark/src/types/slop-dimension.ts new file mode 100644 index 000000000..a3891b51f --- /dev/null +++ b/packages/benchmark/src/types/slop-dimension.ts @@ -0,0 +1,17 @@ +// The eight slop dimensions SlopBench reports on. Each violation maps to +// exactly one dimension so penalties never double-count across scanners. +// Four are owned by React Doctor (mapped from its five user-facing +// categories), the rest by SlopBench's own scanners — see `rule-overlap.md`. +export type SlopDimension = + | "react-correctness" + | "react-performance" + | "accessibility" + | "maintainability" + | "ts-strictness" + | "composition" + | "async-waterfall" + | "bundle"; + +// The scanner that produced a violation. Used for provenance in the report +// and to let reviewers trace a penalty back to its source tool. +export type ScannerName = "react-doctor" | "typescript" | "vercel-checks" | "deslop-heuristics"; diff --git a/packages/benchmark/src/types/slop-report.ts b/packages/benchmark/src/types/slop-report.ts new file mode 100644 index 000000000..ebba22753 --- /dev/null +++ b/packages/benchmark/src/types/slop-report.ts @@ -0,0 +1,44 @@ +import type { SlopDimension } from "./slop-dimension.js"; +import type { SlopViolation } from "./slop-violation.js"; + +// Per-dimension rollup. `score` is 0–100 (higher = cleaner); `weightedPenalty` +// is the size-normalized penalty that drove it down from 100. +export interface SlopDimensionScore { + dimension: SlopDimension; + score: number; + violationCount: number; + weightedPenalty: number; +} + +// Size of the graded diff, used to normalize penalties. Tests and generated +// files are excluded upstream so they neither earn nor dodge penalties. +export interface SlopDiffStats { + changedFileCount: number; + addedLineCount: number; + // The effective divisor the scorer used (after clamping to the profile's + // min), recorded for auditability. + normalizerLines: number; +} + +// The machine-readable grading artifact every task emits. Consumed by the +// runner aggregation script and (v2) the leaderboard. +export interface SlopReport { + scoringVersion: string; + // React Doctor CLI version that produced the diagnostics, when detectable. + doctorVersion: string | null; + generatedAt: string; + diffStats: SlopDiffStats; + violations: SlopViolation[]; + dimensions: SlopDimensionScore[]; + // Composite 0–100 cleanliness score (higher = less slop). + slopScore: number; + // Non-fatal scanner problems (e.g. React Doctor failed to start). Empty on a + // clean run; a populated array means some dimensions may be under-reported, + // which reviewers and the aggregator can surface. + scannerErrors: string[]; + // Filled by the task's `test.sh` once the functional gate is known; `null` + // when the verifier runs standalone (quality-only). + functionalPass: boolean | null; + // `functionalPass ? slopScore / 100 : 0`, or `null` when the gate is unknown. + reward: number | null; +} diff --git a/packages/benchmark/src/types/slop-violation.ts b/packages/benchmark/src/types/slop-violation.ts new file mode 100644 index 000000000..a731023dd --- /dev/null +++ b/packages/benchmark/src/types/slop-violation.ts @@ -0,0 +1,19 @@ +import type { ScannerName, SlopDimension } from "./slop-dimension.js"; + +// A single penalized finding. Every scanner normalizes its native output into +// this shape so the scorer can treat all slop uniformly. +export interface SlopViolation { + scanner: ScannerName; + dimension: SlopDimension; + // Fully-qualified rule id, e.g. `react-doctor/no-nested-component-definition` + // or `ts/no-explicit-any`. Namespaced by scanner to stay collision-free. + ruleId: string; + severity: "error" | "warning"; + // The penalty this violation contributes before size-normalization. + weight: number; + // Repo-relative path. Empty string for project-wide findings (e.g. tsc + // config errors) that carry no single source location. + filePath: string; + line: number; + message: string; +} diff --git a/packages/benchmark/src/utils/clamp.ts b/packages/benchmark/src/utils/clamp.ts new file mode 100644 index 000000000..2a76a40c8 --- /dev/null +++ b/packages/benchmark/src/utils/clamp.ts @@ -0,0 +1,3 @@ +// Clamp a number into an inclusive range. +export const clamp = (value: number, minimum: number, maximum: number): number => + Math.min(Math.max(value, minimum), maximum); diff --git a/packages/benchmark/src/utils/collect-diff.ts b/packages/benchmark/src/utils/collect-diff.ts new file mode 100644 index 000000000..3b74b6efb --- /dev/null +++ b/packages/benchmark/src/utils/collect-diff.ts @@ -0,0 +1,49 @@ +import { isGradableFile } from "./is-gradable-file.js"; +import { runCommand } from "./run-command.js"; + +export interface DiffSummary { + changedFiles: string[]; + addedLineCount: number; + // Set when git could not produce a diff (not a repo, bad base ref). The + // caller decides whether to fall back to scanning the whole tree. + error: string | null; +} + +// Parse `git diff --numstat` output ("added<TAB>deleted<TAB>path") into the set +// of gradable changed files and their total added lines. Binary files report +// "-" for counts and contribute zero added lines. +const parseNumstat = (numstat: string): DiffSummary => { + const changedFiles: string[] = []; + let addedLineCount = 0; + for (const line of numstat.split("\n")) { + const trimmed = line.trim(); + if (!trimmed) continue; + const [addedRaw, , ...pathParts] = trimmed.split("\t"); + const filePath = pathParts.join("\t"); + if (!filePath || !isGradableFile(filePath)) continue; + changedFiles.push(filePath); + const added = Number.parseInt(addedRaw ?? "", 10); + if (Number.isFinite(added)) addedLineCount += added; + } + return { changedFiles, addedLineCount, error: null }; +}; + +// Compute the agent's graded diff against `baseRef`. Marks untracked files with +// intent-to-add first (`git add -A -N`) so brand-new files the agent created +// show up in `git diff` exactly like edits to tracked files. +export const collectDiff = (rootDirectory: string, baseRef: string): DiffSummary => { + runCommand("git", ["-C", rootDirectory, "add", "-A", "-N"], { cwd: rootDirectory }); + const result = runCommand( + "git", + ["-C", rootDirectory, "diff", "--numstat", "--no-color", baseRef], + { cwd: rootDirectory }, + ); + if (result.spawnFailed || result.exitCode !== 0) { + return { + changedFiles: [], + addedLineCount: 0, + error: result.stderr.trim() || "git diff failed", + }; + } + return parseNumstat(result.stdout); +}; diff --git a/packages/benchmark/src/utils/is-gradable-file.ts b/packages/benchmark/src/utils/is-gradable-file.ts new file mode 100644 index 000000000..0e337c54d --- /dev/null +++ b/packages/benchmark/src/utils/is-gradable-file.ts @@ -0,0 +1,25 @@ +// Paths excluded from slop grading. Tests, stories, fixtures, generated output, +// and dependency/build directories are neither rewarded nor penalized: an agent +// should not earn credit for writing tests, nor be charged for slop in code it +// did not author (vendored / generated). The agent's *product* code is graded. +const NON_GRADABLE_PATTERNS: readonly RegExp[] = [ + /(^|\/)node_modules\//, + /(^|\/)(dist|build|out|coverage|\.next|\.turbo)\//, + /(^|\/)__tests__\//, + /(^|\/)tests?\//, + /(^|\/)__mocks__\//, + /(^|\/)__fixtures__\//, + /(^|\/)fixtures?\//, + /\.(test|spec|stories)\.[mc]?[jt]sx?$/, + /\.d\.[mc]?ts$/, + /(^|\/)[^/]*\.(lock|lockb)$/, + /(^|\/)(pnpm-lock\.yaml|package-lock\.json|yarn\.lock|bun\.lockb?)$/, +]; + +// Only these extensions carry React/TS slop the scanners understand. +const GRADABLE_EXTENSION_PATTERN = /\.[mc]?[jt]sx?$/; + +export const isGradableFile = (filePath: string): boolean => { + if (!GRADABLE_EXTENSION_PATTERN.test(filePath)) return false; + return !NON_GRADABLE_PATTERNS.some((pattern) => pattern.test(filePath)); +}; diff --git a/packages/benchmark/src/utils/make-ast-finding.ts b/packages/benchmark/src/utils/make-ast-finding.ts new file mode 100644 index 000000000..50982cbf4 --- /dev/null +++ b/packages/benchmark/src/utils/make-ast-finding.ts @@ -0,0 +1,26 @@ +import type { ParsedSourceFile, ScanFinding, ScannerName, SlopDimension } from "../types/index.js"; +import { offsetToLine } from "./offset-to-line.js"; + +export interface MakeAstFindingInput { + file: ParsedSourceFile; + scanner: ScannerName; + dimension: SlopDimension; + ruleId: string; + severity: "error" | "warning"; + // Byte offset of the offending node (oxc `node.start`); converted to a line. + offset: number; + message: string; +} + +// Build a `ScanFinding` from an AST node offset, resolving the 1-based line +// from the file's source text. Keeps the individual checks free of +// line-bookkeeping boilerplate. +export const makeAstFinding = (input: MakeAstFindingInput): ScanFinding => ({ + scanner: input.scanner, + dimension: input.dimension, + ruleId: input.ruleId, + severity: input.severity, + filePath: input.file.filePath, + line: offsetToLine(input.file.sourceText, input.offset), + message: input.message, +}); diff --git a/packages/benchmark/src/utils/offset-to-line.ts b/packages/benchmark/src/utils/offset-to-line.ts new file mode 100644 index 000000000..1da198960 --- /dev/null +++ b/packages/benchmark/src/utils/offset-to-line.ts @@ -0,0 +1,11 @@ +// Convert a 0-based byte/char offset into a 1-based line number by counting +// newlines before it. oxc reports spans as offsets only, so checks use this to +// fill `ScanFinding.line`. +export const offsetToLine = (sourceText: string, offset: number): number => { + let line = 1; + const limit = Math.min(offset, sourceText.length); + for (let index = 0; index < limit; index++) { + if (sourceText.charCodeAt(index) === 10) line++; + } + return line; +}; diff --git a/packages/benchmark/src/utils/parse-cli-args.ts b/packages/benchmark/src/utils/parse-cli-args.ts new file mode 100644 index 000000000..74e0a58ba --- /dev/null +++ b/packages/benchmark/src/utils/parse-cli-args.ts @@ -0,0 +1,24 @@ +// Minimal `--flag value` / `--flag` parser. Avoids a CLI-framework dependency +// so the verifier bundles tiny and starts fast in the sandbox. Unknown flags +// are ignored; `--flag=value` and `--flag value` are both accepted. +export const parseCliArgs = (argv: string[]): Record<string, string | boolean> => { + const parsed: Record<string, string | boolean> = {}; + for (let index = 0; index < argv.length; index++) { + const token = argv[index]; + if (!token || !token.startsWith("--")) continue; + const body = token.slice(2); + const equalsIndex = body.indexOf("="); + if (equalsIndex !== -1) { + parsed[body.slice(0, equalsIndex)] = body.slice(equalsIndex + 1); + continue; + } + const next = argv[index + 1]; + if (next !== undefined && !next.startsWith("--")) { + parsed[body] = next; + index++; + } else { + parsed[body] = true; + } + } + return parsed; +}; diff --git a/packages/benchmark/src/utils/parse-source-file.ts b/packages/benchmark/src/utils/parse-source-file.ts new file mode 100644 index 000000000..7ec1481f0 --- /dev/null +++ b/packages/benchmark/src/utils/parse-source-file.ts @@ -0,0 +1,57 @@ +import * as fs from "node:fs"; +import * as path from "node:path"; +import { parseSync } from "oxc-parser"; +import type { ParsedSourceFile, SourceComment } from "../types/index.js"; + +const EXTENSION_TO_LANG: Record<string, "ts" | "tsx" | "js" | "jsx"> = { + ".ts": "ts", + ".tsx": "tsx", + ".mts": "ts", + ".cts": "ts", + ".js": "js", + ".jsx": "jsx", + ".mjs": "js", + ".cjs": "js", +}; + +// Extensions the AST checks understand. Declaration files are excluded — they +// are types-only and carry no slop the agent can be charged for. +export const isParsableSourcePath = (filePath: string): boolean => { + if (/\.d\.[mc]?ts$/.test(filePath)) return false; + return path.extname(filePath).toLowerCase() in EXTENSION_TO_LANG; +}; + +// Parse source text for a given (repo-relative) path into a `ParsedSourceFile`, +// or `null` when the path is not a source extension or the parser hits a fatal +// error. Pure (no disk access) so checks can be unit-tested from strings. +export const parseSourceText = (filePath: string, sourceText: string): ParsedSourceFile | null => { + if (!isParsableSourcePath(filePath)) return null; + const lang = EXTENSION_TO_LANG[path.extname(filePath).toLowerCase()] ?? "tsx"; + try { + const result = parseSync(filePath, sourceText, { astType: "ts", lang }); + if (result.errors.some((parseError) => parseError.severity === "Error")) return null; + return { + filePath, + sourceText, + program: result.program, + comments: result.comments as unknown as SourceComment[], + }; + } catch { + return null; + } +}; + +// Read and parse one repo-relative source file, or `null` when it is missing, +// unparsable, or not a source extension. +export const parseSourceFile = ( + rootDirectory: string, + filePath: string, +): ParsedSourceFile | null => { + if (!isParsableSourcePath(filePath)) return null; + try { + const sourceText = fs.readFileSync(path.join(rootDirectory, filePath), "utf8"); + return parseSourceText(filePath, sourceText); + } catch { + return null; + } +}; diff --git a/packages/benchmark/src/utils/resolve-bin-invocation.ts b/packages/benchmark/src/utils/resolve-bin-invocation.ts new file mode 100644 index 000000000..830384fb6 --- /dev/null +++ b/packages/benchmark/src/utils/resolve-bin-invocation.ts @@ -0,0 +1,10 @@ +// Resolve how to spawn a CLI entry. A bare command name (e.g. `react-doctor`) +// is invoked directly so the OS resolves it on PATH; a `.js`/`.mjs` file path +// is run through the current Node binary so it works without an executable bit +// (the common case when pointing at a monorepo's built `bin/*.js` in dev). +export const resolveBinInvocation = (bin: string): { command: string; prefixArgs: string[] } => { + if (/\.[mc]?js$/.test(bin)) { + return { command: process.execPath, prefixArgs: [bin] }; + } + return { command: bin, prefixArgs: [] }; +}; diff --git a/packages/benchmark/src/utils/run-command.ts b/packages/benchmark/src/utils/run-command.ts new file mode 100644 index 000000000..2ef43e5b6 --- /dev/null +++ b/packages/benchmark/src/utils/run-command.ts @@ -0,0 +1,43 @@ +import { spawnSync } from "node:child_process"; + +export interface CommandResult { + stdout: string; + stderr: string; + exitCode: number; + // True when the binary could not be spawned at all (ENOENT, permissions). + spawnFailed: boolean; +} + +// Run a command to completion, capturing output. Never throws and never treats +// a non-zero exit as an error — many tools the verifier drives (React Doctor, +// tsc) exit non-zero precisely when they have findings, which is the signal we +// want, not a failure. Callers inspect `spawnFailed` to distinguish a tool +// that ran-and-complained from one that never started. +export const runCommand = ( + command: string, + args: string[], + options: { cwd: string; maxBufferBytes?: number; timeoutMs?: number } = { cwd: process.cwd() }, +): CommandResult => { + const result = spawnSync(command, args, { + cwd: options.cwd, + encoding: "utf8", + maxBuffer: options.maxBufferBytes ?? 64 * 1024 * 1024, + timeout: options.timeoutMs, + }); + + if (result.error) { + return { + stdout: result.stdout ?? "", + stderr: result.stderr ?? String(result.error), + exitCode: typeof result.status === "number" ? result.status : 1, + spawnFailed: true, + }; + } + + return { + stdout: result.stdout ?? "", + stderr: result.stderr ?? "", + exitCode: typeof result.status === "number" ? result.status : 1, + spawnFailed: false, + }; +}; diff --git a/packages/benchmark/src/utils/walk-ast.ts b/packages/benchmark/src/utils/walk-ast.ts new file mode 100644 index 000000000..7d8966540 --- /dev/null +++ b/packages/benchmark/src/utils/walk-ast.ts @@ -0,0 +1,26 @@ +import type { AstVisitorNode } from "../types/index.js"; + +const isAstNode = (value: unknown): value is AstVisitorNode => + typeof value === "object" && + value !== null && + typeof (value as { type?: unknown }).type === "string"; + +// Depth-first walk over an oxc ESTree tree, invoking `visit` for every node +// that has a string `type`. The oxc AST has no parent back-references (we never +// attach them), so a plain recursive descent is cycle-free. Used by the AST +// checks, which match on `node.type` rather than a committed AST type surface. +export const walkAst = (root: unknown, visit: (node: AstVisitorNode) => void): void => { + const visitValue = (value: unknown): void => { + if (Array.isArray(value)) { + for (const element of value) visitValue(element); + return; + } + if (!isAstNode(value)) return; + visit(value); + for (const key of Object.keys(value)) { + if (key === "type" || key === "start" || key === "end") continue; + visitValue(value[key]); + } + }; + visitValue(root); +}; diff --git a/packages/benchmark/tasks/_base/Dockerfile b/packages/benchmark/tasks/_base/Dockerfile new file mode 100644 index 000000000..eef91cdce --- /dev/null +++ b/packages/benchmark/tasks/_base/Dockerfile @@ -0,0 +1,53 @@ +# SlopBench shared base image. +# +# Built once (internet IS allowed at image-build time); every task's +# environment/Dockerfile does `FROM slopbench-base`. The agent run itself stays +# air-gapped (`allow_internet = false` in task.toml) — both the React Doctor +# scan and the slop verifier run fully offline, so nothing here needs the +# network at grade time. +# +# It installs two pinned CLIs onto PATH: +# react-doctor — the offline diagnostic engine the verifier shells out to +# slop-verify — the SlopBench verifier (this repo's @react-doctor/benchmark) +# plus the shared grader `slopbench-grade` every task's test.sh execs. +# +# Both CLIs come from a single pinned checkout of the react-doctor monorepo so +# scoring is reproducible. Pin REACT_DOCTOR_REF to a tag or full SHA — never a +# moving branch — when cutting a benchmark release. +FROM node:22-bookworm-slim + +# Pin to an immutable ref for reproducible scores. Override at build: +# docker build --build-arg REACT_DOCTOR_REF=<sha> ... +ARG REACT_DOCTOR_REF=react-doctor@0.4.2 +ARG REACT_DOCTOR_REPO=https://github.com/millionco/react-doctor + +RUN apt-get update \ + && apt-get install -y --no-install-recommends git ca-certificates \ + && rm -rf /var/lib/apt/lists/* + +RUN corepack enable + +# Build react-doctor + the slop verifier from a single pinned checkout. +RUN git clone "${REACT_DOCTOR_REPO}" /opt/react-doctor \ + && cd /opt/react-doctor \ + && git checkout "${REACT_DOCTOR_REF}" \ + && pnpm install --frozen-lockfile --ignore-scripts \ + && pnpm --filter react-doctor --filter @react-doctor/benchmark run build + +# Expose both CLIs by their bin names. The bin scripts resolve their own +# dist/* and node_modules relative to the real (symlink-resolved) path. +RUN ln -s /opt/react-doctor/packages/react-doctor/bin/react-doctor.js /usr/local/bin/react-doctor \ + && ln -s /opt/react-doctor/packages/benchmark/bin/slop-verify.js /usr/local/bin/slop-verify \ + && chmod +x /opt/react-doctor/packages/benchmark/bin/slop-verify.js + +# The shared grader every task's tests/test.sh execs. Lives in the image so the +# per-task test.sh stays a thin, Harbor-friendly wrapper (no shared files needed +# in the /tests context). +RUN cp /opt/react-doctor/packages/benchmark/tasks/_base/run-verifier.sh /usr/local/bin/slopbench-grade \ + && chmod +x /usr/local/bin/slopbench-grade + +# Sanity-check the CLIs are runnable. +RUN slop-verify --help >/dev/null + +WORKDIR /app +CMD ["/bin/bash"] diff --git a/packages/benchmark/tasks/_base/run-verifier.sh b/packages/benchmark/tasks/_base/run-verifier.sh new file mode 100755 index 000000000..c18990b86 --- /dev/null +++ b/packages/benchmark/tasks/_base/run-verifier.sh @@ -0,0 +1,118 @@ +#!/usr/bin/env bash +# +# SlopBench shared grader (installed as `slopbench-grade` in the base image). +# +# A task's tests/test.sh is a thin wrapper that exports BASE_COMMIT + +# FUNCTIONAL_TEST_CMD and then `exec slopbench-grade`. This script: +# 0. Captures the agent's diff as model.patch (reviewer artifact). +# 1. Resets the files the hidden test.patch touches, then applies it. +# 2. Runs the task's functional tests (the correctness GATE). +# 3. Runs slop-verify offline to score React/TypeScript slop in the diff. +# 4. Writes the composite reward (functional_pass × slopScore/100) to +# reward.txt and saves the full slop-report.json artifact. +# +# Every path is overridable by env var so the same script runs unchanged in the +# Harbor sandbox (the defaults) and locally for development. +set -uo pipefail + +APP_DIR="${APP_DIR:-/app}" +TESTS_DIR="${TESTS_DIR:-/tests}" +LOG_DIR="${LOG_DIR:-/logs}" +ARTIFACT_DIR="${ARTIFACT_DIR:-${LOG_DIR}/artifacts}" +VERIFIER_DIR="${VERIFIER_DIR:-${LOG_DIR}/verifier}" +SLOP_VERIFY="${SLOP_VERIFY:-slop-verify}" +REACT_DOCTOR_BIN="${REACT_DOCTOR_BIN:-react-doctor}" +SLOP_PROFILE="${SLOP_PROFILE:-}" +# Optional hard floor: fail the task (reward 0) if slopScore drops below this, +# even when the functional tests pass. Default 0 = no floor. +SLOP_MIN_SCORE="${SLOP_MIN_SCORE:-0}" + +log() { echo "[slopbench] $*"; } +fail() { log "ERROR: $*"; exit "${2:-1}"; } + +[ -n "${BASE_COMMIT:-}" ] || fail "BASE_COMMIT is not set (task test.sh must export it)" 2 +command -v "$SLOP_VERIFY" >/dev/null 2>&1 || [ -x "$SLOP_VERIFY" ] || fail "slop-verify not found: $SLOP_VERIFY" 3 + +mkdir -p "$ARTIFACT_DIR" "$VERIFIER_DIR" || fail "cannot create log dirs" 4 +cd "$APP_DIR" || fail "app dir missing: $APP_DIR" 5 +git config --global --add safe.directory "$APP_DIR" 2>/dev/null || true + +git rev-parse --verify "${BASE_COMMIT}^{commit}" >/dev/null 2>&1 \ + || fail "base commit $BASE_COMMIT not present in repo" 6 + +# --- Step 0: capture the agent's diff as model.patch (reviewer artifact) --- +log "Step 0: capturing model.patch" +git reset -q --soft "$BASE_COMMIT" && git add -A -- . \ + && git diff --cached --binary > "${ARTIFACT_DIR}/model.patch" \ + && git reset -q \ + || log "warning: could not capture model.patch (continuing)" + +# --- Step 1: score slop on the agent's tree BEFORE hidden tests touch it --- +# The hidden tests only add test files (filtered out of grading), so scoring +# here vs. after is equivalent — doing it first keeps the scored tree purely the +# agent's product code. +log "Step 1: scoring slop" +slop_args=(--root "$APP_DIR" --base "$BASE_COMMIT" --doctor-bin "$REACT_DOCTOR_BIN" \ + --out "${VERIFIER_DIR}/slop-report.json" --quiet) +[ -n "$SLOP_PROFILE" ] && slop_args+=(--profile "$SLOP_PROFILE") +"$SLOP_VERIFY" "${slop_args[@]}" || log "warning: slop-verify exited non-zero" +[ -f "${VERIFIER_DIR}/slop-report.json" ] || fail "slop-report.json was not produced" 7 + +# --- Step 2: apply the hidden test patch (if any) --- +if [ -f "${TESTS_DIR}/test.patch" ] && [ -s "${TESTS_DIR}/test.patch" ]; then + log "Step 2: applying hidden test.patch" + python3 - "$APP_DIR" "${TESTS_DIR}/test.patch" <<'PY' | while IFS= read -r f; do +import re, sys +patch = open(sys.argv[2], encoding="utf-8").read() +files = set() +for line in patch.splitlines(): + m = re.match(r'^diff --git "?a/.+ "?b/(.+?)"?$', line) + if m: + files.add(m.group(1)) +for f in sorted(files): + print(f) +PY + git checkout HEAD -- "$f" 2>/dev/null || rm -rf "$f" 2>/dev/null || true + done + git apply --whitespace=nowarn "${TESTS_DIR}/test.patch" || fail "failed to apply test.patch" 8 +else + log "Step 2: no test.patch (skipping)" +fi + +# --- Step 3: functional correctness gate --- +log "Step 3: running functional tests" +FUNCTIONAL_PASS=0 +if [ -n "${FUNCTIONAL_TEST_CMD:-}" ]; then + if bash -c "$FUNCTIONAL_TEST_CMD"; then + FUNCTIONAL_PASS=1 + log "functional tests PASSED" + else + log "functional tests FAILED" + fi +else + log "warning: no FUNCTIONAL_TEST_CMD set — treating functional gate as failed" +fi + +# --- Step 4: combine into the composite reward + finalize the report --- +log "Step 4: computing reward" +REWARD=$(FUNCTIONAL_PASS="$FUNCTIONAL_PASS" SLOP_MIN_SCORE="$SLOP_MIN_SCORE" \ + python3 - "${VERIFIER_DIR}/slop-report.json" <<'PY' +import json, os, sys +path = sys.argv[1] +report = json.load(open(path)) +passed = os.environ.get("FUNCTIONAL_PASS") == "1" +floor = float(os.environ.get("SLOP_MIN_SCORE", "0")) +score = float(report.get("slopScore", 0.0)) +gated = passed and score >= floor +reward = (score / 100.0) if gated else 0.0 +report["functionalPass"] = passed +report["reward"] = reward +json.dump(report, open(path, "w"), indent=2) +print(f"{reward:.6f}") +PY +) +echo "$REWARD" > "${VERIFIER_DIR}/reward.txt" || fail "could not write reward.txt" 9 + +SCORE=$(python3 -c "import json;print(json.load(open('${VERIFIER_DIR}/slop-report.json'))['slopScore'])") +log "RESULT functional_pass=${FUNCTIONAL_PASS} slop_score=${SCORE} reward=${REWARD}" +exit 0 diff --git a/packages/benchmark/tasks/_template/environment/Dockerfile b/packages/benchmark/tasks/_template/environment/Dockerfile new file mode 100644 index 000000000..d215e3e69 --- /dev/null +++ b/packages/benchmark/tasks/_template/environment/Dockerfile @@ -0,0 +1,18 @@ +# Reproduces this task's environment (fallback if the prebuilt image is absent). +# FROM the shared SlopBench base, which already provides react-doctor + +# slop-verify + the slopbench-grade script. +FROM slopbench-base:latest + +WORKDIR /app + +# TODO: bring in the seed repo at the task's base commit. Either clone an +# external repo: +# RUN git clone <repository_url> . \ +# && git checkout <base_commit_hash> \ +# && pnpm install --frozen-lockfile --ignore-scripts +# or COPY an in-tree seed (see tasks that ship a `seed/` directory) and init git: +# COPY seed/ . +# RUN git init -q && git add -A && git -c user.email=t@t.co -c user.name=t commit -qm base \ +# && pnpm install --ignore-scripts + +CMD ["/bin/bash"] diff --git a/packages/benchmark/tasks/_template/instruction.md b/packages/benchmark/tasks/_template/instruction.md new file mode 100644 index 000000000..b124f3d3e --- /dev/null +++ b/packages/benchmark/tasks/_template/instruction.md @@ -0,0 +1,24 @@ +<!-- +SlopBench task instruction (what the agent sees). + +Write a normal feature/bug request. Do NOT mention React Doctor, "slop", code +quality, lint, or any of the dimensions being scored — SlopBench measures the +slop a model emits *unprompted*. Specify only the observable behavior the hidden +tests verify (plus any error-message/contract requirements), exactly like a real +ticket. Delete this comment. +--> + +Implement the following feature. + +## Expected behavior + +TODO: Describe the behavior the hidden tests assert. Be precise about inputs, +outputs, edge cases, and any required error messages. + +## Where + +TODO: Point at the file(s) / component(s) to add or change. + +## Constraints + +TODO: Any API/contract the tests depend on (exported names, props, routes). diff --git a/packages/benchmark/tasks/_template/solution/solution.patch b/packages/benchmark/tasks/_template/solution/solution.patch new file mode 100644 index 000000000..a36e79a0a --- /dev/null +++ b/packages/benchmark/tasks/_template/solution/solution.patch @@ -0,0 +1,5 @@ +# Replace with a git patch implementing a CLEAN reference solution: it must make +# FUNCTIONAL_TEST_CMD pass AND score high on the slop verifier (no any/casts, no +# nested components, composition over boolean props, etc.). Never used at +# grading — it exists so reviewers can spot-check that the task is both solvable +# and that a clean solution is rewarded. diff --git a/packages/benchmark/tasks/_template/solution/solve.sh b/packages/benchmark/tasks/_template/solution/solve.sh new file mode 100755 index 000000000..bf0c911f6 --- /dev/null +++ b/packages/benchmark/tasks/_template/solution/solve.sh @@ -0,0 +1,7 @@ +#!/usr/bin/env bash +# Reference solution applier (reviewer aid only — NEVER used at grade time). +# Applies a clean, high-scoring implementation so reviewers can confirm the task +# is solvable and that a good solution scores well on both axes. +set -euo pipefail +cd /app +git apply --whitespace=nowarn /solution/solution.patch diff --git a/packages/benchmark/tasks/_template/task.toml b/packages/benchmark/tasks/_template/task.toml new file mode 100644 index 000000000..7310fa6df --- /dev/null +++ b/packages/benchmark/tasks/_template/task.toml @@ -0,0 +1,46 @@ +schema_version = "1.1" +artifacts = [] + +[task] +name = "slopbench/_template" +description = "Copy this directory to author a new SlopBench task." +authors = [] +keywords = ["react", "typescript", "slop", "frontend"] + +[metadata] +task_id = "_template" +display_title = "SlopBench task template" +display_description = "Template task — replace every TODO before use." +# SlopBench taxonomy (informational; the verifier scores all dimensions). +family = "produce-clean" # produce-clean | handle-slop | explicit-deslop +target_dimensions = ["react-correctness", "react-performance"] +language = "typescript" +repository_url = "TODO: seed repo URL or 'in-tree'" +base_commit_hash = "TODO: base commit sha the agent starts from" +# Optional scoring-profile override (path inside the image); empty = built-in. +slop_profile = "" + +[verifier] +timeout_sec = 1800.0 + +[verifier.env] + +[agent] +timeout_sec = 5400.0 + +[environment] +build_timeout_sec = 1800.0 +# Prefer a prebuilt image for speed; environment/Dockerfile reproduces it. +docker_image = "slopbench-base:latest" +os = "linux" +cpus = 2 +memory_mb = 8192 +storage_mb = 20480 +gpus = 0 +# Air-gapped at agent runtime: the slop verifier + React Doctor run offline. +allow_internet = false +mcp_servers = [] + +[environment.env] + +[solution.env] diff --git a/packages/benchmark/tasks/_template/tests/test.patch b/packages/benchmark/tasks/_template/tests/test.patch new file mode 100644 index 000000000..f0f40f5ea --- /dev/null +++ b/packages/benchmark/tasks/_template/tests/test.patch @@ -0,0 +1,5 @@ +# Replace this file with a real git patch (created with `git diff`) that ADDS +# the hidden test file(s) for this task — e.g. tests/feature.test.ts. It is +# applied at grade time so the agent never sees the tests. The patch must only +# ADD test files (never modify product code), so the slop scan of the agent's +# diff is unaffected. diff --git a/packages/benchmark/tasks/_template/tests/test.sh b/packages/benchmark/tasks/_template/tests/test.sh new file mode 100755 index 000000000..81f5b7487 --- /dev/null +++ b/packages/benchmark/tasks/_template/tests/test.sh @@ -0,0 +1,14 @@ +#!/usr/bin/env bash +# Thin wrapper — the shared `slopbench-grade` script (baked into the base image) +# does the model.patch capture, hidden-test apply, functional gate, slop scan, +# and reward.txt write. Just declare this task's specifics. +set -euo pipefail + +# The commit the agent started from (matches task.toml base_commit_hash). +export BASE_COMMIT="TODO: base commit sha" + +# Command that runs THIS task's functional tests (added by tests/test.patch). +# Must exit 0 iff the implemented behavior is correct. +export FUNCTIONAL_TEST_CMD="TODO: e.g. pnpm exec vitest run tests/feature.test.ts" + +exec slopbench-grade diff --git a/packages/benchmark/tasks/avatar-initials-util/_authoring/hidden/tests/avatar-initials.test.ts b/packages/benchmark/tasks/avatar-initials-util/_authoring/hidden/tests/avatar-initials.test.ts new file mode 100644 index 000000000..b9087ec39 --- /dev/null +++ b/packages/benchmark/tasks/avatar-initials-util/_authoring/hidden/tests/avatar-initials.test.ts @@ -0,0 +1,21 @@ +import { test } from "node:test"; +import assert from "node:assert/strict"; +import { avatarInitials } from "../src/avatar-initials.ts"; + +test("takes first and last initials, uppercased", () => { + assert.equal(avatarInitials("Ada Lovelace"), "AL"); + assert.equal(avatarInitials("grace hopper"), "GH"); +}); + +test("uses a single initial for one word", () => { + assert.equal(avatarInitials("Cher"), "C"); +}); + +test("ignores extra whitespace and middle words", () => { + assert.equal(avatarInitials(" Margaret Heafield Hamilton "), "MH"); +}); + +test("returns empty string for empty input", () => { + assert.equal(avatarInitials(""), ""); + assert.equal(avatarInitials(" "), ""); +}); diff --git a/packages/benchmark/tasks/avatar-initials-util/_authoring/solved/src/avatar-initials.ts b/packages/benchmark/tasks/avatar-initials-util/_authoring/solved/src/avatar-initials.ts new file mode 100644 index 000000000..5a493f9cc --- /dev/null +++ b/packages/benchmark/tasks/avatar-initials-util/_authoring/solved/src/avatar-initials.ts @@ -0,0 +1,13 @@ +// Up to two uppercase initials (first + last word) for an avatar badge. +export const avatarInitials = (fullName: string): string => { + const words = fullName + .trim() + .split(/\s+/) + .filter((word) => word.length > 0); + if (words.length === 0) return ""; + + const firstWord = words[0] ?? ""; + const lastWord = words[words.length - 1] ?? ""; + const initials = words.length === 1 ? firstWord[0] : `${firstWord[0]}${lastWord[0]}`; + return initials.toUpperCase(); +}; diff --git a/packages/benchmark/tasks/avatar-initials-util/environment/Dockerfile b/packages/benchmark/tasks/avatar-initials-util/environment/Dockerfile new file mode 100644 index 000000000..0717d0595 --- /dev/null +++ b/packages/benchmark/tasks/avatar-initials-util/environment/Dockerfile @@ -0,0 +1,12 @@ +FROM slopbench-base:latest + +WORKDIR /app + +COPY seed/ . +# Pure-TS task: no dependency install (functional test uses node --test). +RUN git init -q \ + && git add -A \ + && git -c user.email=bench@react.doctor -c user.name=slopbench commit -qm "base" \ + && git config --global --add safe.directory /app + +CMD ["/bin/bash"] diff --git a/packages/benchmark/tasks/avatar-initials-util/instruction.md b/packages/benchmark/tasks/avatar-initials-util/instruction.md new file mode 100644 index 000000000..fff473db7 --- /dev/null +++ b/packages/benchmark/tasks/avatar-initials-util/instruction.md @@ -0,0 +1,26 @@ +Implement `avatarInitials` in `src/avatar-initials.ts`. + +## Expected behavior + +`avatarInitials(fullName)` returns up to two uppercase initials for an avatar: + +- Split the name on whitespace, ignoring empty segments (so extra spaces are + fine). +- With two or more words: take the first letter of the **first** and **last** + word. +- With one word: take just its first letter. +- With no words (empty/whitespace-only): return `""`. +- Always uppercase the result. + +Examples: + +- `avatarInitials("Ada Lovelace")` → `"AL"` +- `avatarInitials("grace hopper")` → `"GH"` +- `avatarInitials("Cher")` → `"C"` +- `avatarInitials(" Margaret Heafield Hamilton ")` → `"MH"` +- `avatarInitials("")` → `""` + +## Constraints + +Keep the exported `avatarInitials(fullName: string): string` signature. Do not +change `src/avatar.tsx`. diff --git a/packages/benchmark/tasks/avatar-initials-util/seed/package.json b/packages/benchmark/tasks/avatar-initials-util/seed/package.json new file mode 100644 index 000000000..e47ac0be0 --- /dev/null +++ b/packages/benchmark/tasks/avatar-initials-util/seed/package.json @@ -0,0 +1,10 @@ +{ + "name": "slopbench-avatar-initials-util", + "version": "1.0.0", + "private": true, + "type": "module", + "dependencies": { + "react": "^18.3.1", + "react-dom": "^18.3.1" + } +} diff --git a/packages/benchmark/tasks/avatar-initials-util/seed/src/avatar-initials.ts b/packages/benchmark/tasks/avatar-initials-util/seed/src/avatar-initials.ts new file mode 100644 index 000000000..7ee3eed8e --- /dev/null +++ b/packages/benchmark/tasks/avatar-initials-util/seed/src/avatar-initials.ts @@ -0,0 +1,4 @@ +// TODO(agent): implement. See instruction.md. +export const avatarInitials = (_fullName: string): string => { + throw new Error("not implemented"); +}; diff --git a/packages/benchmark/tasks/avatar-initials-util/seed/src/avatar.tsx b/packages/benchmark/tasks/avatar-initials-util/seed/src/avatar.tsx new file mode 100644 index 000000000..f3855eeab --- /dev/null +++ b/packages/benchmark/tasks/avatar-initials-util/seed/src/avatar.tsx @@ -0,0 +1,12 @@ +import { avatarInitials } from "./avatar-initials.ts"; + +interface AvatarProps { + fullName: string; +} + +// Existing consumer (keeps avatar-initials.ts reachable). Do not edit. +export const Avatar = ({ fullName }: AvatarProps) => ( + <span className="avatar" aria-label={fullName}> + {avatarInitials(fullName)} + </span> +); diff --git a/packages/benchmark/tasks/avatar-initials-util/seed/tsconfig.json b/packages/benchmark/tasks/avatar-initials-util/seed/tsconfig.json new file mode 100644 index 000000000..ffbea3d66 --- /dev/null +++ b/packages/benchmark/tasks/avatar-initials-util/seed/tsconfig.json @@ -0,0 +1,13 @@ +{ + "compilerOptions": { + "target": "ES2022", + "module": "ESNext", + "moduleResolution": "Bundler", + "jsx": "react-jsx", + "strict": true, + "allowImportingTsExtensions": true, + "noEmit": true, + "skipLibCheck": true + }, + "include": ["src", "tests"] +} diff --git a/packages/benchmark/tasks/avatar-initials-util/solution/solution.patch b/packages/benchmark/tasks/avatar-initials-util/solution/solution.patch new file mode 100644 index 000000000..1455b5b7d --- /dev/null +++ b/packages/benchmark/tasks/avatar-initials-util/solution/solution.patch @@ -0,0 +1,21 @@ +diff --git a/src/avatar-initials.ts b/src/avatar-initials.ts +index 7ee3eed..5a493f9 100644 +--- a/src/avatar-initials.ts ++++ b/src/avatar-initials.ts +@@ -1,4 +1,13 @@ +-// TODO(agent): implement. See instruction.md. +-export const avatarInitials = (_fullName: string): string => { +- throw new Error("not implemented"); ++// Up to two uppercase initials (first + last word) for an avatar badge. ++export const avatarInitials = (fullName: string): string => { ++ const words = fullName ++ .trim() ++ .split(/\s+/) ++ .filter((word) => word.length > 0); ++ if (words.length === 0) return ""; ++ ++ const firstWord = words[0] ?? ""; ++ const lastWord = words[words.length - 1] ?? ""; ++ const initials = words.length === 1 ? firstWord[0] : `${firstWord[0]}${lastWord[0]}`; ++ return initials.toUpperCase(); + }; diff --git a/packages/benchmark/tasks/avatar-initials-util/solution/solve.sh b/packages/benchmark/tasks/avatar-initials-util/solution/solve.sh new file mode 100755 index 000000000..764e03155 --- /dev/null +++ b/packages/benchmark/tasks/avatar-initials-util/solution/solve.sh @@ -0,0 +1,5 @@ +#!/usr/bin/env bash +# Reference solution applier (reviewer aid only — never used at grade time). +set -euo pipefail +cd /app +git apply --whitespace=nowarn /solution/solution.patch diff --git a/packages/benchmark/tasks/avatar-initials-util/task.toml b/packages/benchmark/tasks/avatar-initials-util/task.toml new file mode 100644 index 000000000..fa494f01b --- /dev/null +++ b/packages/benchmark/tasks/avatar-initials-util/task.toml @@ -0,0 +1,42 @@ +schema_version = "1.1" +artifacts = [] + +[task] +name = "slopbench/avatar-initials-util" +description = "Implement avatarInitials(fullName) returning up to two uppercase initials." +authors = [] +keywords = ["react", "typescript", "slop", "frontend"] + +[metadata] +task_id = "avatar-initials-util" +display_title = "Avatar initials" +display_description = "Implement avatarInitials(fullName) returning up to two uppercase initials." +family = "produce-clean" +target_dimensions = ["ts-strictness", "maintainability"] +language = "typescript" +repository_url = "in-tree" +base_commit_hash = "root" +slop_profile = "" + +[verifier] +timeout_sec = 1200.0 + +[verifier.env] + +[agent] +timeout_sec = 3600.0 + +[environment] +build_timeout_sec = 1200.0 +docker_image = "slopbench-base:latest" +os = "linux" +cpus = 2 +memory_mb = 4096 +storage_mb = 10240 +gpus = 0 +allow_internet = false +mcp_servers = [] + +[environment.env] + +[solution.env] diff --git a/packages/benchmark/tasks/avatar-initials-util/tests/test.patch b/packages/benchmark/tasks/avatar-initials-util/tests/test.patch new file mode 100644 index 000000000..4da1fe0cb --- /dev/null +++ b/packages/benchmark/tasks/avatar-initials-util/tests/test.patch @@ -0,0 +1,27 @@ +diff --git a/tests/avatar-initials.test.ts b/tests/avatar-initials.test.ts +new file mode 100644 +index 0000000..b9087ec +--- /dev/null ++++ b/tests/avatar-initials.test.ts +@@ -0,0 +1,21 @@ ++import { test } from "node:test"; ++import assert from "node:assert/strict"; ++import { avatarInitials } from "../src/avatar-initials.ts"; ++ ++test("takes first and last initials, uppercased", () => { ++ assert.equal(avatarInitials("Ada Lovelace"), "AL"); ++ assert.equal(avatarInitials("grace hopper"), "GH"); ++}); ++ ++test("uses a single initial for one word", () => { ++ assert.equal(avatarInitials("Cher"), "C"); ++}); ++ ++test("ignores extra whitespace and middle words", () => { ++ assert.equal(avatarInitials(" Margaret Heafield Hamilton "), "MH"); ++}); ++ ++test("returns empty string for empty input", () => { ++ assert.equal(avatarInitials(""), ""); ++ assert.equal(avatarInitials(" "), ""); ++}); diff --git a/packages/benchmark/tasks/avatar-initials-util/tests/test.sh b/packages/benchmark/tasks/avatar-initials-util/tests/test.sh new file mode 100755 index 000000000..903d4fec4 --- /dev/null +++ b/packages/benchmark/tasks/avatar-initials-util/tests/test.sh @@ -0,0 +1,5 @@ +#!/usr/bin/env bash +set -euo pipefail +export BASE_COMMIT="$(git -C "${APP_DIR:-/app}" rev-list --max-parents=0 HEAD | tail -1)" +export FUNCTIONAL_TEST_CMD="node --experimental-strip-types --test tests/avatar-initials.test.ts" +exec slopbench-grade diff --git a/packages/benchmark/tasks/chunk-util/_authoring/hidden/tests/chunk.test.ts b/packages/benchmark/tasks/chunk-util/_authoring/hidden/tests/chunk.test.ts new file mode 100644 index 000000000..8c2a566fe --- /dev/null +++ b/packages/benchmark/tasks/chunk-util/_authoring/hidden/tests/chunk.test.ts @@ -0,0 +1,19 @@ +import { test } from "node:test"; +import assert from "node:assert/strict"; +import { chunkize } from "../src/chunk.ts"; + +test("splits into chunks with a shorter final chunk", () => { + assert.deepEqual(chunkize([1, 2, 3, 4, 5], 2), [[1, 2], [3, 4], [5]]); +}); + +test("returns a single chunk when size >= length", () => { + assert.deepEqual(chunkize(["a", "b", "c"], 5), [["a", "b", "c"]]); +}); + +test("returns an empty array for empty input", () => { + assert.deepEqual(chunkize([], 3), []); +}); + +test("returns an empty array for size < 1", () => { + assert.deepEqual(chunkize([1, 2], 0), []); +}); diff --git a/packages/benchmark/tasks/chunk-util/_authoring/solved/src/chunk.ts b/packages/benchmark/tasks/chunk-util/_authoring/solved/src/chunk.ts new file mode 100644 index 000000000..254539e26 --- /dev/null +++ b/packages/benchmark/tasks/chunk-util/_authoring/solved/src/chunk.ts @@ -0,0 +1,10 @@ +// Splits an array into consecutive chunks of length `size`. Implemented inline +// (no utility-library dependency) to keep the bundle lean. +export const chunkize = <Item>(items: readonly Item[], size: number): Item[][] => { + if (size < 1) return []; + const chunks: Item[][] = []; + for (let start = 0; start < items.length; start += size) { + chunks.push(items.slice(start, start + size)); + } + return chunks; +}; diff --git a/packages/benchmark/tasks/chunk-util/environment/Dockerfile b/packages/benchmark/tasks/chunk-util/environment/Dockerfile new file mode 100644 index 000000000..fcbfdb374 --- /dev/null +++ b/packages/benchmark/tasks/chunk-util/environment/Dockerfile @@ -0,0 +1,12 @@ +FROM slopbench-base:latest + +WORKDIR /app + +COPY seed/ . +RUN pnpm install --frozen-lockfile --ignore-scripts || pnpm install --ignore-scripts +RUN git init -q \ + && git add -A \ + && git -c user.email=bench@react.doctor -c user.name=slopbench commit -qm "base" \ + && git config --global --add safe.directory /app + +CMD ["/bin/bash"] diff --git a/packages/benchmark/tasks/chunk-util/instruction.md b/packages/benchmark/tasks/chunk-util/instruction.md new file mode 100644 index 000000000..af786a914 --- /dev/null +++ b/packages/benchmark/tasks/chunk-util/instruction.md @@ -0,0 +1,25 @@ +Implement `chunkize` in `src/chunk.ts`. + +## Expected behavior + +`chunkize(items, size)` splits an array into consecutive chunks of length +`size`: + +- The final chunk holds the remainder and may be shorter. +- If `size` is greater than or equal to the length, return a single chunk with + every item. +- An empty input returns `[]`. +- If `size` is less than 1, return `[]`. + +Examples: + +- `chunkize([1, 2, 3, 4, 5], 2)` → `[[1, 2], [3, 4], [5]]` +- `chunkize(["a", "b", "c"], 5)` → `[["a", "b", "c"]]` +- `chunkize([], 3)` → `[]` +- `chunkize([1, 2], 0)` → `[]` + +## Constraints + +Keep the exported generic signature +`chunkize<Item>(items: readonly Item[], size: number): Item[][]`. Do not change +`src/photo-grid.tsx`. diff --git a/packages/benchmark/tasks/chunk-util/seed/package.json b/packages/benchmark/tasks/chunk-util/seed/package.json new file mode 100644 index 000000000..f977827c5 --- /dev/null +++ b/packages/benchmark/tasks/chunk-util/seed/package.json @@ -0,0 +1,11 @@ +{ + "name": "slopbench-chunk-util", + "version": "1.0.0", + "private": true, + "type": "module", + "dependencies": { + "lodash": "^4.17.21", + "react": "^18.3.1", + "react-dom": "^18.3.1" + } +} diff --git a/packages/benchmark/tasks/chunk-util/seed/src/chunk.ts b/packages/benchmark/tasks/chunk-util/seed/src/chunk.ts new file mode 100644 index 000000000..376b16a9c --- /dev/null +++ b/packages/benchmark/tasks/chunk-util/seed/src/chunk.ts @@ -0,0 +1,4 @@ +// TODO(agent): implement. See instruction.md. +export const chunkize = <Item>(_items: readonly Item[], _size: number): Item[][] => { + throw new Error("not implemented"); +}; diff --git a/packages/benchmark/tasks/chunk-util/seed/src/photo-grid.tsx b/packages/benchmark/tasks/chunk-util/seed/src/photo-grid.tsx new file mode 100644 index 000000000..5f081ac81 --- /dev/null +++ b/packages/benchmark/tasks/chunk-util/seed/src/photo-grid.tsx @@ -0,0 +1,16 @@ +import { chunkize } from "./chunk.ts"; + +interface PhotoGridProps { + urls: string[]; +} + +// Existing consumer (keeps chunk.ts reachable). Do not edit. +export const PhotoGrid = ({ urls }: PhotoGridProps) => ( + <div className="grid"> + {chunkize(urls, 3).map((row, rowIndex) => ( + <div className="row" key={rowIndex}> + {row.length} + </div> + ))} + </div> +); diff --git a/packages/benchmark/tasks/chunk-util/seed/tsconfig.json b/packages/benchmark/tasks/chunk-util/seed/tsconfig.json new file mode 100644 index 000000000..ffbea3d66 --- /dev/null +++ b/packages/benchmark/tasks/chunk-util/seed/tsconfig.json @@ -0,0 +1,13 @@ +{ + "compilerOptions": { + "target": "ES2022", + "module": "ESNext", + "moduleResolution": "Bundler", + "jsx": "react-jsx", + "strict": true, + "allowImportingTsExtensions": true, + "noEmit": true, + "skipLibCheck": true + }, + "include": ["src", "tests"] +} diff --git a/packages/benchmark/tasks/chunk-util/solution/solution.patch b/packages/benchmark/tasks/chunk-util/solution/solution.patch new file mode 100644 index 000000000..98a44b980 --- /dev/null +++ b/packages/benchmark/tasks/chunk-util/solution/solution.patch @@ -0,0 +1,18 @@ +diff --git a/src/chunk.ts b/src/chunk.ts +index 376b16a..254539e 100644 +--- a/src/chunk.ts ++++ b/src/chunk.ts +@@ -1,4 +1,10 @@ +-// TODO(agent): implement. See instruction.md. +-export const chunkize = <Item>(_items: readonly Item[], _size: number): Item[][] => { +- throw new Error("not implemented"); ++// Splits an array into consecutive chunks of length `size`. Implemented inline ++// (no utility-library dependency) to keep the bundle lean. ++export const chunkize = <Item>(items: readonly Item[], size: number): Item[][] => { ++ if (size < 1) return []; ++ const chunks: Item[][] = []; ++ for (let start = 0; start < items.length; start += size) { ++ chunks.push(items.slice(start, start + size)); ++ } ++ return chunks; + }; diff --git a/packages/benchmark/tasks/chunk-util/solution/solve.sh b/packages/benchmark/tasks/chunk-util/solution/solve.sh new file mode 100755 index 000000000..764e03155 --- /dev/null +++ b/packages/benchmark/tasks/chunk-util/solution/solve.sh @@ -0,0 +1,5 @@ +#!/usr/bin/env bash +# Reference solution applier (reviewer aid only — never used at grade time). +set -euo pipefail +cd /app +git apply --whitespace=nowarn /solution/solution.patch diff --git a/packages/benchmark/tasks/chunk-util/task.toml b/packages/benchmark/tasks/chunk-util/task.toml new file mode 100644 index 000000000..c31ff1ed7 --- /dev/null +++ b/packages/benchmark/tasks/chunk-util/task.toml @@ -0,0 +1,42 @@ +schema_version = "1.1" +artifacts = [] + +[task] +name = "slopbench/chunk-util" +description = "Implement chunkize(items, size) inline (avoid a full utility-library import)." +authors = [] +keywords = ["react", "typescript", "slop", "frontend"] + +[metadata] +task_id = "chunk-util" +display_title = "Array chunk utility" +display_description = "Implement chunkize(items, size) inline (avoid a full utility-library import)." +family = "produce-clean" +target_dimensions = ["bundle", "ts-strictness"] +language = "typescript" +repository_url = "in-tree" +base_commit_hash = "root" +slop_profile = "" + +[verifier] +timeout_sec = 1200.0 + +[verifier.env] + +[agent] +timeout_sec = 3600.0 + +[environment] +build_timeout_sec = 1200.0 +docker_image = "slopbench-base:latest" +os = "linux" +cpus = 2 +memory_mb = 4096 +storage_mb = 10240 +gpus = 0 +allow_internet = false +mcp_servers = [] + +[environment.env] + +[solution.env] diff --git a/packages/benchmark/tasks/chunk-util/tests/test.patch b/packages/benchmark/tasks/chunk-util/tests/test.patch new file mode 100644 index 000000000..4b0b9c9ef --- /dev/null +++ b/packages/benchmark/tasks/chunk-util/tests/test.patch @@ -0,0 +1,25 @@ +diff --git a/tests/chunk.test.ts b/tests/chunk.test.ts +new file mode 100644 +index 0000000..8c2a566 +--- /dev/null ++++ b/tests/chunk.test.ts +@@ -0,0 +1,19 @@ ++import { test } from "node:test"; ++import assert from "node:assert/strict"; ++import { chunkize } from "../src/chunk.ts"; ++ ++test("splits into chunks with a shorter final chunk", () => { ++ assert.deepEqual(chunkize([1, 2, 3, 4, 5], 2), [[1, 2], [3, 4], [5]]); ++}); ++ ++test("returns a single chunk when size >= length", () => { ++ assert.deepEqual(chunkize(["a", "b", "c"], 5), [["a", "b", "c"]]); ++}); ++ ++test("returns an empty array for empty input", () => { ++ assert.deepEqual(chunkize([], 3), []); ++}); ++ ++test("returns an empty array for size < 1", () => { ++ assert.deepEqual(chunkize([1, 2], 0), []); ++}); diff --git a/packages/benchmark/tasks/chunk-util/tests/test.sh b/packages/benchmark/tasks/chunk-util/tests/test.sh new file mode 100755 index 000000000..756de07d8 --- /dev/null +++ b/packages/benchmark/tasks/chunk-util/tests/test.sh @@ -0,0 +1,5 @@ +#!/usr/bin/env bash +set -euo pipefail +export BASE_COMMIT="$(git -C "${APP_DIR:-/app}" rev-list --max-parents=0 HEAD | tail -1)" +export FUNCTIONAL_TEST_CMD="node --experimental-strip-types --test tests/chunk.test.ts" +exec slopbench-grade diff --git a/packages/benchmark/tasks/comment-thread-extend/_authoring/hidden/tests/comment-thread.test.tsx b/packages/benchmark/tasks/comment-thread-extend/_authoring/hidden/tests/comment-thread.test.tsx new file mode 100644 index 000000000..06c73a48c --- /dev/null +++ b/packages/benchmark/tasks/comment-thread-extend/_authoring/hidden/tests/comment-thread.test.tsx @@ -0,0 +1,17 @@ +import { test, expect } from "vitest"; +import { renderToStaticMarkup } from "react-dom/server"; +import { CommentThread, type Comment } from "../src/comment-thread.tsx"; + +const COMMENTS: Comment[] = [ + { id: "c1", author: "Ada", text: "Hello", replies: 2 }, + { id: "c2", author: "Grace", text: "Nice", replies: 0 }, +]; + +test("renders each comment with its reply count, in order", () => { + const html = renderToStaticMarkup(<CommentThread comments={COMMENTS} />); + expect(html).toContain('<ul class="thread">'); + expect(html).toContain("Ada: Hello (2 replies)"); + expect(html).toContain("Grace: Nice (0 replies)"); + expect(html.indexOf("Ada")).toBeLessThan(html.indexOf("Grace")); + expect(html.match(/<li>/g) ?? []).toHaveLength(2); +}); diff --git a/packages/benchmark/tasks/comment-thread-extend/_authoring/solved/src/comment-thread.tsx b/packages/benchmark/tasks/comment-thread-extend/_authoring/solved/src/comment-thread.tsx new file mode 100644 index 000000000..725165245 --- /dev/null +++ b/packages/benchmark/tasks/comment-thread-extend/_authoring/solved/src/comment-thread.tsx @@ -0,0 +1,24 @@ +export interface Comment { + id: string; + author: string; + text: string; + replies: number; +} + +export interface CommentThreadProps { + comments: Comment[]; +} + +const CommentRow = ({ comment }: { comment: Comment }) => ( + <li> + {comment.author}: {comment.text} ({comment.replies} replies) + </li> +); + +export const CommentThread = ({ comments }: CommentThreadProps) => ( + <ul className="thread"> + {comments.map((comment) => ( + <CommentRow key={comment.id} comment={comment} /> + ))} + </ul> +); diff --git a/packages/benchmark/tasks/comment-thread-extend/environment/Dockerfile b/packages/benchmark/tasks/comment-thread-extend/environment/Dockerfile new file mode 100644 index 000000000..fcbfdb374 --- /dev/null +++ b/packages/benchmark/tasks/comment-thread-extend/environment/Dockerfile @@ -0,0 +1,12 @@ +FROM slopbench-base:latest + +WORKDIR /app + +COPY seed/ . +RUN pnpm install --frozen-lockfile --ignore-scripts || pnpm install --ignore-scripts +RUN git init -q \ + && git add -A \ + && git -c user.email=bench@react.doctor -c user.name=slopbench commit -qm "base" \ + && git config --global --add safe.directory /app + +CMD ["/bin/bash"] diff --git a/packages/benchmark/tasks/comment-thread-extend/instruction.md b/packages/benchmark/tasks/comment-thread-extend/instruction.md new file mode 100644 index 000000000..3b89902a6 --- /dev/null +++ b/packages/benchmark/tasks/comment-thread-extend/instruction.md @@ -0,0 +1,21 @@ +`src/comment-thread.tsx` renders a list of comments. Extend it. + +## Expected behavior + +Each comment list item must now also show its reply count. Render each comment's +text as exactly: + +``` +<author>: <text> (<replies> replies) +``` + +For example a comment `{ author: "Ada", text: "Hello", replies: 2 }` renders a +`<li>` whose text content is `Ada: Hello (2 replies)`. + +Keep the existing `<ul className="thread">` wrapper and render one `<li>` per +comment, in order. + +## Constraints + +Keep the exported `CommentThread` component and the `Comment` / +`CommentThreadProps` types. diff --git a/packages/benchmark/tasks/comment-thread-extend/seed/package.json b/packages/benchmark/tasks/comment-thread-extend/seed/package.json new file mode 100644 index 000000000..ba8bd8ddc --- /dev/null +++ b/packages/benchmark/tasks/comment-thread-extend/seed/package.json @@ -0,0 +1,13 @@ +{ + "name": "slopbench-comment-thread", + "version": "1.0.0", + "private": true, + "type": "module", + "dependencies": { + "react": "^18.3.1", + "react-dom": "^18.3.1" + }, + "devDependencies": { + "vitest": "^4.1.8" + } +} diff --git a/packages/benchmark/tasks/comment-thread-extend/seed/src/comment-thread.tsx b/packages/benchmark/tasks/comment-thread-extend/seed/src/comment-thread.tsx new file mode 100644 index 000000000..95d3dbea5 --- /dev/null +++ b/packages/benchmark/tasks/comment-thread-extend/seed/src/comment-thread.tsx @@ -0,0 +1,25 @@ +export interface Comment { + id: string; + author: string; + text: string; + replies: number; +} + +export interface CommentThreadProps { + comments: Comment[]; +} + +export const CommentThread = ({ comments }: CommentThreadProps) => { + const Item = (props: any) => ( + <li> + {props.author}: {props.text} + </li> + ); + return ( + <ul className="thread"> + {comments.map((comment, index) => ( + <Item key={index} author={comment.author} text={comment.text} /> + ))} + </ul> + ); +}; diff --git a/packages/benchmark/tasks/comment-thread-extend/seed/tsconfig.json b/packages/benchmark/tasks/comment-thread-extend/seed/tsconfig.json new file mode 100644 index 000000000..ffbea3d66 --- /dev/null +++ b/packages/benchmark/tasks/comment-thread-extend/seed/tsconfig.json @@ -0,0 +1,13 @@ +{ + "compilerOptions": { + "target": "ES2022", + "module": "ESNext", + "moduleResolution": "Bundler", + "jsx": "react-jsx", + "strict": true, + "allowImportingTsExtensions": true, + "noEmit": true, + "skipLibCheck": true + }, + "include": ["src", "tests"] +} diff --git a/packages/benchmark/tasks/comment-thread-extend/seed/vitest.config.ts b/packages/benchmark/tasks/comment-thread-extend/seed/vitest.config.ts new file mode 100644 index 000000000..8409b1f8e --- /dev/null +++ b/packages/benchmark/tasks/comment-thread-extend/seed/vitest.config.ts @@ -0,0 +1,9 @@ +import { defineConfig } from "vitest/config"; + +export default defineConfig({ + esbuild: { jsx: "automatic" }, + test: { + environment: "node", + include: ["tests/**/*.test.tsx"], + }, +}); diff --git a/packages/benchmark/tasks/comment-thread-extend/solution/solution.patch b/packages/benchmark/tasks/comment-thread-extend/solution/solution.patch new file mode 100644 index 000000000..4c272558c --- /dev/null +++ b/packages/benchmark/tasks/comment-thread-extend/solution/solution.patch @@ -0,0 +1,35 @@ +diff --git a/src/comment-thread.tsx b/src/comment-thread.tsx +index 95d3dbe..7251652 100644 +--- a/src/comment-thread.tsx ++++ b/src/comment-thread.tsx +@@ -9,17 +9,16 @@ export interface CommentThreadProps { + comments: Comment[]; + } + +-export const CommentThread = ({ comments }: CommentThreadProps) => { +- const Item = (props: any) => ( +- <li> +- {props.author}: {props.text} +- </li> +- ); +- return ( +- <ul className="thread"> +- {comments.map((comment, index) => ( +- <Item key={index} author={comment.author} text={comment.text} /> +- ))} +- </ul> +- ); +-}; ++const CommentRow = ({ comment }: { comment: Comment }) => ( ++ <li> ++ {comment.author}: {comment.text} ({comment.replies} replies) ++ </li> ++); ++ ++export const CommentThread = ({ comments }: CommentThreadProps) => ( ++ <ul className="thread"> ++ {comments.map((comment) => ( ++ <CommentRow key={comment.id} comment={comment} /> ++ ))} ++ </ul> ++); diff --git a/packages/benchmark/tasks/comment-thread-extend/solution/solve.sh b/packages/benchmark/tasks/comment-thread-extend/solution/solve.sh new file mode 100755 index 000000000..764e03155 --- /dev/null +++ b/packages/benchmark/tasks/comment-thread-extend/solution/solve.sh @@ -0,0 +1,5 @@ +#!/usr/bin/env bash +# Reference solution applier (reviewer aid only — never used at grade time). +set -euo pipefail +cd /app +git apply --whitespace=nowarn /solution/solution.patch diff --git a/packages/benchmark/tasks/comment-thread-extend/task.toml b/packages/benchmark/tasks/comment-thread-extend/task.toml new file mode 100644 index 000000000..40c3b3347 --- /dev/null +++ b/packages/benchmark/tasks/comment-thread-extend/task.toml @@ -0,0 +1,42 @@ +schema_version = "1.1" +artifacts = [] + +[task] +name = "slopbench/comment-thread-extend" +description = "Add reply counts to a comment list seeded with index keys + an inline component." +authors = [] +keywords = ["react", "typescript", "slop", "frontend"] + +[metadata] +task_id = "comment-thread-extend" +display_title = "Extend comment thread" +display_description = "Add reply counts to a comment list seeded with index keys + an inline component." +family = "handle-slop" +target_dimensions = ["react-correctness", "react-performance"] +language = "typescript" +repository_url = "in-tree" +base_commit_hash = "root" +slop_profile = "" + +[verifier] +timeout_sec = 1200.0 + +[verifier.env] + +[agent] +timeout_sec = 3600.0 + +[environment] +build_timeout_sec = 1200.0 +docker_image = "slopbench-base:latest" +os = "linux" +cpus = 2 +memory_mb = 4096 +storage_mb = 10240 +gpus = 0 +allow_internet = false +mcp_servers = [] + +[environment.env] + +[solution.env] diff --git a/packages/benchmark/tasks/comment-thread-extend/tests/test.patch b/packages/benchmark/tasks/comment-thread-extend/tests/test.patch new file mode 100644 index 000000000..1a670b7e0 --- /dev/null +++ b/packages/benchmark/tasks/comment-thread-extend/tests/test.patch @@ -0,0 +1,23 @@ +diff --git a/tests/comment-thread.test.tsx b/tests/comment-thread.test.tsx +new file mode 100644 +index 0000000..06c73a4 +--- /dev/null ++++ b/tests/comment-thread.test.tsx +@@ -0,0 +1,17 @@ ++import { test, expect } from "vitest"; ++import { renderToStaticMarkup } from "react-dom/server"; ++import { CommentThread, type Comment } from "../src/comment-thread.tsx"; ++ ++const COMMENTS: Comment[] = [ ++ { id: "c1", author: "Ada", text: "Hello", replies: 2 }, ++ { id: "c2", author: "Grace", text: "Nice", replies: 0 }, ++]; ++ ++test("renders each comment with its reply count, in order", () => { ++ const html = renderToStaticMarkup(<CommentThread comments={COMMENTS} />); ++ expect(html).toContain('<ul class="thread">'); ++ expect(html).toContain("Ada: Hello (2 replies)"); ++ expect(html).toContain("Grace: Nice (0 replies)"); ++ expect(html.indexOf("Ada")).toBeLessThan(html.indexOf("Grace")); ++ expect(html.match(/<li>/g) ?? []).toHaveLength(2); ++}); diff --git a/packages/benchmark/tasks/comment-thread-extend/tests/test.sh b/packages/benchmark/tasks/comment-thread-extend/tests/test.sh new file mode 100755 index 000000000..4003f69b2 --- /dev/null +++ b/packages/benchmark/tasks/comment-thread-extend/tests/test.sh @@ -0,0 +1,5 @@ +#!/usr/bin/env bash +set -euo pipefail +export BASE_COMMIT="$(git -C "${APP_DIR:-/app}" rev-list --max-parents=0 HEAD | tail -1)" +export FUNCTIONAL_TEST_CMD="pnpm exec vitest run" +exec slopbench-grade diff --git a/packages/benchmark/tasks/dashboard-loader/_authoring/hidden/tests/load-dashboard.test.ts b/packages/benchmark/tasks/dashboard-loader/_authoring/hidden/tests/load-dashboard.test.ts new file mode 100644 index 000000000..6b8aa58a6 --- /dev/null +++ b/packages/benchmark/tasks/dashboard-loader/_authoring/hidden/tests/load-dashboard.test.ts @@ -0,0 +1,23 @@ +import { test } from "node:test"; +import assert from "node:assert/strict"; +import { loadDashboard } from "../src/load-dashboard.ts"; + +test("combines the three sources into one object", async () => { + const data = await loadDashboard({ + fetchUser: async () => "Ada", + fetchStats: async () => 42, + fetchActivity: async () => ["login", "edit"], + }); + assert.deepEqual(data, { user: "Ada", stats: 42, activity: ["login", "edit"] }); +}); + +test("resolves every source value", async () => { + const data = await loadDashboard({ + fetchUser: async () => "Grace", + fetchStats: async () => 0, + fetchActivity: async () => [], + }); + assert.equal(data.user, "Grace"); + assert.equal(data.stats, 0); + assert.deepEqual(data.activity, []); +}); diff --git a/packages/benchmark/tasks/dashboard-loader/_authoring/solved/src/load-dashboard.ts b/packages/benchmark/tasks/dashboard-loader/_authoring/solved/src/load-dashboard.ts new file mode 100644 index 000000000..1afcde41a --- /dev/null +++ b/packages/benchmark/tasks/dashboard-loader/_authoring/solved/src/load-dashboard.ts @@ -0,0 +1,22 @@ +export interface DashboardSources { + fetchUser: () => Promise<string>; + fetchStats: () => Promise<number>; + fetchActivity: () => Promise<string[]>; +} + +export interface DashboardData { + user: string; + stats: number; + activity: string[]; +} + +// The three sources are independent, so fetch them in parallel rather than +// awaiting each in sequence (which would serialize three round-trips). +export const loadDashboard = async (sources: DashboardSources): Promise<DashboardData> => { + const [user, stats, activity] = await Promise.all([ + sources.fetchUser(), + sources.fetchStats(), + sources.fetchActivity(), + ]); + return { user, stats, activity }; +}; diff --git a/packages/benchmark/tasks/dashboard-loader/environment/Dockerfile b/packages/benchmark/tasks/dashboard-loader/environment/Dockerfile new file mode 100644 index 000000000..0717d0595 --- /dev/null +++ b/packages/benchmark/tasks/dashboard-loader/environment/Dockerfile @@ -0,0 +1,12 @@ +FROM slopbench-base:latest + +WORKDIR /app + +COPY seed/ . +# Pure-TS task: no dependency install (functional test uses node --test). +RUN git init -q \ + && git add -A \ + && git -c user.email=bench@react.doctor -c user.name=slopbench commit -qm "base" \ + && git config --global --add safe.directory /app + +CMD ["/bin/bash"] diff --git a/packages/benchmark/tasks/dashboard-loader/instruction.md b/packages/benchmark/tasks/dashboard-loader/instruction.md new file mode 100644 index 000000000..c61b4e933 --- /dev/null +++ b/packages/benchmark/tasks/dashboard-loader/instruction.md @@ -0,0 +1,22 @@ +Implement `loadDashboard` in `src/load-dashboard.ts`. + +## Expected behavior + +`loadDashboard(sources)` loads the three pieces of dashboard data from the +provided `sources` and returns them combined: + +- Calls `sources.fetchUser()`, `sources.fetchStats()`, and + `sources.fetchActivity()`. +- Returns `{ user, stats, activity }` with each field set to the resolved value + of the matching call. + +The three sources are independent of one another. + +Example: if `fetchUser` resolves to `"Ada"`, `fetchStats` to `42`, and +`fetchActivity` to `["login"]`, then `loadDashboard(sources)` resolves to +`{ user: "Ada", stats: 42, activity: ["login"] }`. + +## Constraints + +Keep the exported `loadDashboard` signature and the `DashboardSources` / +`DashboardData` interfaces. Do not change `src/dashboard-page.tsx`. diff --git a/packages/benchmark/tasks/dashboard-loader/seed/package.json b/packages/benchmark/tasks/dashboard-loader/seed/package.json new file mode 100644 index 000000000..4dac0a54e --- /dev/null +++ b/packages/benchmark/tasks/dashboard-loader/seed/package.json @@ -0,0 +1,11 @@ +{ + "name": "slopbench-dashboard-loader", + "version": "1.0.0", + "private": true, + "type": "module", + "dependencies": { + "next": "^15.0.0", + "react": "^18.3.1", + "react-dom": "^18.3.1" + } +} diff --git a/packages/benchmark/tasks/dashboard-loader/seed/src/dashboard-page.tsx b/packages/benchmark/tasks/dashboard-loader/seed/src/dashboard-page.tsx new file mode 100644 index 000000000..fe2fe1506 --- /dev/null +++ b/packages/benchmark/tasks/dashboard-loader/seed/src/dashboard-page.tsx @@ -0,0 +1,13 @@ +import { loadDashboard, type DashboardSources } from "./load-dashboard.ts"; + +// Existing server component that consumes the loader (keeps load-dashboard.ts +// reachable). Do not edit. +export default async function DashboardPage({ sources }: { sources: DashboardSources }) { + const data = await loadDashboard(sources); + return ( + <main> + <h1>{data.user}</h1> + <p>{data.stats}</p> + </main> + ); +} diff --git a/packages/benchmark/tasks/dashboard-loader/seed/src/load-dashboard.ts b/packages/benchmark/tasks/dashboard-loader/seed/src/load-dashboard.ts new file mode 100644 index 000000000..5d2d5add2 --- /dev/null +++ b/packages/benchmark/tasks/dashboard-loader/seed/src/load-dashboard.ts @@ -0,0 +1,16 @@ +export interface DashboardSources { + fetchUser: () => Promise<string>; + fetchStats: () => Promise<number>; + fetchActivity: () => Promise<string[]>; +} + +export interface DashboardData { + user: string; + stats: number; + activity: string[]; +} + +// TODO(agent): implement. See instruction.md. +export const loadDashboard = async (_sources: DashboardSources): Promise<DashboardData> => { + throw new Error("not implemented"); +}; diff --git a/packages/benchmark/tasks/dashboard-loader/seed/tsconfig.json b/packages/benchmark/tasks/dashboard-loader/seed/tsconfig.json new file mode 100644 index 000000000..ffbea3d66 --- /dev/null +++ b/packages/benchmark/tasks/dashboard-loader/seed/tsconfig.json @@ -0,0 +1,13 @@ +{ + "compilerOptions": { + "target": "ES2022", + "module": "ESNext", + "moduleResolution": "Bundler", + "jsx": "react-jsx", + "strict": true, + "allowImportingTsExtensions": true, + "noEmit": true, + "skipLibCheck": true + }, + "include": ["src", "tests"] +} diff --git a/packages/benchmark/tasks/dashboard-loader/solution/solution.patch b/packages/benchmark/tasks/dashboard-loader/solution/solution.patch new file mode 100644 index 000000000..c20d706eb --- /dev/null +++ b/packages/benchmark/tasks/dashboard-loader/solution/solution.patch @@ -0,0 +1,21 @@ +diff --git a/src/load-dashboard.ts b/src/load-dashboard.ts +index 5d2d5ad..1afcde4 100644 +--- a/src/load-dashboard.ts ++++ b/src/load-dashboard.ts +@@ -10,7 +10,13 @@ export interface DashboardData { + activity: string[]; + } + +-// TODO(agent): implement. See instruction.md. +-export const loadDashboard = async (_sources: DashboardSources): Promise<DashboardData> => { +- throw new Error("not implemented"); ++// The three sources are independent, so fetch them in parallel rather than ++// awaiting each in sequence (which would serialize three round-trips). ++export const loadDashboard = async (sources: DashboardSources): Promise<DashboardData> => { ++ const [user, stats, activity] = await Promise.all([ ++ sources.fetchUser(), ++ sources.fetchStats(), ++ sources.fetchActivity(), ++ ]); ++ return { user, stats, activity }; + }; diff --git a/packages/benchmark/tasks/dashboard-loader/solution/solve.sh b/packages/benchmark/tasks/dashboard-loader/solution/solve.sh new file mode 100755 index 000000000..764e03155 --- /dev/null +++ b/packages/benchmark/tasks/dashboard-loader/solution/solve.sh @@ -0,0 +1,5 @@ +#!/usr/bin/env bash +# Reference solution applier (reviewer aid only — never used at grade time). +set -euo pipefail +cd /app +git apply --whitespace=nowarn /solution/solution.patch diff --git a/packages/benchmark/tasks/dashboard-loader/task.toml b/packages/benchmark/tasks/dashboard-loader/task.toml new file mode 100644 index 000000000..6b66f08c4 --- /dev/null +++ b/packages/benchmark/tasks/dashboard-loader/task.toml @@ -0,0 +1,42 @@ +schema_version = "1.1" +artifacts = [] + +[task] +name = "slopbench/dashboard-loader" +description = "Load three independent server resources without a request waterfall." +authors = [] +keywords = ["react", "typescript", "slop", "frontend"] + +[metadata] +task_id = "dashboard-loader" +display_title = "Parallel dashboard loader" +display_description = "Load three independent server resources without a request waterfall." +family = "produce-clean" +target_dimensions = ["async-waterfall", "ts-strictness"] +language = "typescript" +repository_url = "in-tree" +base_commit_hash = "root" +slop_profile = "" + +[verifier] +timeout_sec = 1200.0 + +[verifier.env] + +[agent] +timeout_sec = 3600.0 + +[environment] +build_timeout_sec = 1200.0 +docker_image = "slopbench-base:latest" +os = "linux" +cpus = 2 +memory_mb = 4096 +storage_mb = 10240 +gpus = 0 +allow_internet = false +mcp_servers = [] + +[environment.env] + +[solution.env] diff --git a/packages/benchmark/tasks/dashboard-loader/tests/test.patch b/packages/benchmark/tasks/dashboard-loader/tests/test.patch new file mode 100644 index 000000000..73979b56c --- /dev/null +++ b/packages/benchmark/tasks/dashboard-loader/tests/test.patch @@ -0,0 +1,29 @@ +diff --git a/tests/load-dashboard.test.ts b/tests/load-dashboard.test.ts +new file mode 100644 +index 0000000..6b8aa58 +--- /dev/null ++++ b/tests/load-dashboard.test.ts +@@ -0,0 +1,23 @@ ++import { test } from "node:test"; ++import assert from "node:assert/strict"; ++import { loadDashboard } from "../src/load-dashboard.ts"; ++ ++test("combines the three sources into one object", async () => { ++ const data = await loadDashboard({ ++ fetchUser: async () => "Ada", ++ fetchStats: async () => 42, ++ fetchActivity: async () => ["login", "edit"], ++ }); ++ assert.deepEqual(data, { user: "Ada", stats: 42, activity: ["login", "edit"] }); ++}); ++ ++test("resolves every source value", async () => { ++ const data = await loadDashboard({ ++ fetchUser: async () => "Grace", ++ fetchStats: async () => 0, ++ fetchActivity: async () => [], ++ }); ++ assert.equal(data.user, "Grace"); ++ assert.equal(data.stats, 0); ++ assert.deepEqual(data.activity, []); ++}); diff --git a/packages/benchmark/tasks/dashboard-loader/tests/test.sh b/packages/benchmark/tasks/dashboard-loader/tests/test.sh new file mode 100755 index 000000000..a6df58e26 --- /dev/null +++ b/packages/benchmark/tasks/dashboard-loader/tests/test.sh @@ -0,0 +1,5 @@ +#!/usr/bin/env bash +set -euo pipefail +export BASE_COMMIT="$(git -C "${APP_DIR:-/app}" rev-list --max-parents=0 HEAD | tail -1)" +export FUNCTIONAL_TEST_CMD="node --experimental-strip-types --test tests/load-dashboard.test.ts" +exec slopbench-grade diff --git a/packages/benchmark/tasks/format-duration-util/_authoring/hidden/tests/format-duration.test.ts b/packages/benchmark/tasks/format-duration-util/_authoring/hidden/tests/format-duration.test.ts new file mode 100644 index 000000000..58f20fc8f --- /dev/null +++ b/packages/benchmark/tasks/format-duration-util/_authoring/hidden/tests/format-duration.test.ts @@ -0,0 +1,25 @@ +import { test } from "node:test"; +import assert from "node:assert/strict"; +import { formatDuration } from "../src/format-duration.ts"; + +test("returns 0s for zero and negative input", () => { + assert.equal(formatDuration(0), "0s"); + assert.equal(formatDuration(-10), "0s"); +}); + +test("renders seconds only under a minute", () => { + assert.equal(formatDuration(5_000), "5s"); +}); + +test("renders minutes and seconds", () => { + assert.equal(formatDuration(65_000), "1m 5s"); +}); + +test("drops trailing zero units", () => { + assert.equal(formatDuration(3_600_000), "1h"); +}); + +test("keeps a zero unit between two non-zero units", () => { + assert.equal(formatDuration(3_601_000), "1h 0m 1s"); + assert.equal(formatDuration(3_661_000), "1h 1m 1s"); +}); diff --git a/packages/benchmark/tasks/format-duration-util/_authoring/solved/src/format-duration.ts b/packages/benchmark/tasks/format-duration-util/_authoring/solved/src/format-duration.ts new file mode 100644 index 000000000..884e17253 --- /dev/null +++ b/packages/benchmark/tasks/format-duration-util/_authoring/solved/src/format-duration.ts @@ -0,0 +1,31 @@ +interface DurationUnit { + value: number; + suffix: string; +} + +// Compact "1h 0m 1s" label. Leading and trailing zero units are dropped, but a +// zero unit between two non-zero units is kept so the ordering stays readable. +export const formatDuration = (milliseconds: number): string => { + if (milliseconds <= 0) return "0s"; + + const totalSeconds = Math.floor(milliseconds / 1000); + const units: DurationUnit[] = [ + { value: Math.floor(totalSeconds / 3600), suffix: "h" }, + { value: Math.floor((totalSeconds % 3600) / 60), suffix: "m" }, + { value: totalSeconds % 60, suffix: "s" }, + ]; + + const firstNonZero = units.findIndex((unit) => unit.value > 0); + if (firstNonZero === -1) return "0s"; + + let lastNonZero = firstNonZero; + for (let index = firstNonZero; index < units.length; index++) { + const unit = units[index]; + if (unit && unit.value > 0) lastNonZero = index; + } + + return units + .slice(firstNonZero, lastNonZero + 1) + .map((unit) => `${unit.value}${unit.suffix}`) + .join(" "); +}; diff --git a/packages/benchmark/tasks/format-duration-util/environment/Dockerfile b/packages/benchmark/tasks/format-duration-util/environment/Dockerfile new file mode 100644 index 000000000..0717d0595 --- /dev/null +++ b/packages/benchmark/tasks/format-duration-util/environment/Dockerfile @@ -0,0 +1,12 @@ +FROM slopbench-base:latest + +WORKDIR /app + +COPY seed/ . +# Pure-TS task: no dependency install (functional test uses node --test). +RUN git init -q \ + && git add -A \ + && git -c user.email=bench@react.doctor -c user.name=slopbench commit -qm "base" \ + && git config --global --add safe.directory /app + +CMD ["/bin/bash"] diff --git a/packages/benchmark/tasks/format-duration-util/instruction.md b/packages/benchmark/tasks/format-duration-util/instruction.md new file mode 100644 index 000000000..15bc552c8 --- /dev/null +++ b/packages/benchmark/tasks/format-duration-util/instruction.md @@ -0,0 +1,25 @@ +Implement `formatDuration` in `src/format-duration.ts`. + +## Expected behavior + +`formatDuration(milliseconds)` returns a compact human label built from hours, +minutes, and seconds (sub-second precision is dropped via truncation). + +- Units are space-separated, largest first, suffixed `h` / `m` / `s`: + `formatDuration(3_661_000)` → `"1h 1m 1s"`. +- Leading zero units are omitted, but lower units after a non-zero unit are + kept: `formatDuration(3_600_000)` → `"1h"`, `formatDuration(3_601_000)` → + `"1h 0m 1s"`. +- Under a minute returns just seconds: `formatDuration(5_000)` → `"5s"`. +- Zero (and any negative input) returns `"0s"`. + +Examples: + +- `formatDuration(0)` → `"0s"` +- `formatDuration(65_000)` → `"1m 5s"` +- `formatDuration(-10)` → `"0s"` + +## Constraints + +Keep the exported `formatDuration(milliseconds: number): string` signature. Do +not change `src/elapsed-label.tsx`. diff --git a/packages/benchmark/tasks/format-duration-util/seed/package.json b/packages/benchmark/tasks/format-duration-util/seed/package.json new file mode 100644 index 000000000..eb61cf10e --- /dev/null +++ b/packages/benchmark/tasks/format-duration-util/seed/package.json @@ -0,0 +1,10 @@ +{ + "name": "slopbench-format-duration", + "version": "1.0.0", + "private": true, + "type": "module", + "dependencies": { + "react": "^18.3.1", + "react-dom": "^18.3.1" + } +} diff --git a/packages/benchmark/tasks/format-duration-util/seed/src/elapsed-label.tsx b/packages/benchmark/tasks/format-duration-util/seed/src/elapsed-label.tsx new file mode 100644 index 000000000..4da0bbaec --- /dev/null +++ b/packages/benchmark/tasks/format-duration-util/seed/src/elapsed-label.tsx @@ -0,0 +1,10 @@ +import { formatDuration } from "./format-duration.ts"; + +interface ElapsedLabelProps { + milliseconds: number; +} + +// Existing consumer (keeps format-duration.ts reachable). Do not edit. +export const ElapsedLabel = ({ milliseconds }: ElapsedLabelProps) => ( + <span className="elapsed">{formatDuration(milliseconds)}</span> +); diff --git a/packages/benchmark/tasks/format-duration-util/seed/src/format-duration.ts b/packages/benchmark/tasks/format-duration-util/seed/src/format-duration.ts new file mode 100644 index 000000000..c76c6dd14 --- /dev/null +++ b/packages/benchmark/tasks/format-duration-util/seed/src/format-duration.ts @@ -0,0 +1,4 @@ +// TODO(agent): implement. See instruction.md. +export const formatDuration = (_milliseconds: number): string => { + throw new Error("not implemented"); +}; diff --git a/packages/benchmark/tasks/format-duration-util/seed/tsconfig.json b/packages/benchmark/tasks/format-duration-util/seed/tsconfig.json new file mode 100644 index 000000000..ffbea3d66 --- /dev/null +++ b/packages/benchmark/tasks/format-duration-util/seed/tsconfig.json @@ -0,0 +1,13 @@ +{ + "compilerOptions": { + "target": "ES2022", + "module": "ESNext", + "moduleResolution": "Bundler", + "jsx": "react-jsx", + "strict": true, + "allowImportingTsExtensions": true, + "noEmit": true, + "skipLibCheck": true + }, + "include": ["src", "tests"] +} diff --git a/packages/benchmark/tasks/format-duration-util/solution/solution.patch b/packages/benchmark/tasks/format-duration-util/solution/solution.patch new file mode 100644 index 000000000..be2a07b04 --- /dev/null +++ b/packages/benchmark/tasks/format-duration-util/solution/solution.patch @@ -0,0 +1,39 @@ +diff --git a/src/format-duration.ts b/src/format-duration.ts +index c76c6dd..884e172 100644 +--- a/src/format-duration.ts ++++ b/src/format-duration.ts +@@ -1,4 +1,31 @@ +-// TODO(agent): implement. See instruction.md. +-export const formatDuration = (_milliseconds: number): string => { +- throw new Error("not implemented"); ++interface DurationUnit { ++ value: number; ++ suffix: string; ++} ++ ++// Compact "1h 0m 1s" label. Leading and trailing zero units are dropped, but a ++// zero unit between two non-zero units is kept so the ordering stays readable. ++export const formatDuration = (milliseconds: number): string => { ++ if (milliseconds <= 0) return "0s"; ++ ++ const totalSeconds = Math.floor(milliseconds / 1000); ++ const units: DurationUnit[] = [ ++ { value: Math.floor(totalSeconds / 3600), suffix: "h" }, ++ { value: Math.floor((totalSeconds % 3600) / 60), suffix: "m" }, ++ { value: totalSeconds % 60, suffix: "s" }, ++ ]; ++ ++ const firstNonZero = units.findIndex((unit) => unit.value > 0); ++ if (firstNonZero === -1) return "0s"; ++ ++ let lastNonZero = firstNonZero; ++ for (let index = firstNonZero; index < units.length; index++) { ++ const unit = units[index]; ++ if (unit && unit.value > 0) lastNonZero = index; ++ } ++ ++ return units ++ .slice(firstNonZero, lastNonZero + 1) ++ .map((unit) => `${unit.value}${unit.suffix}`) ++ .join(" "); + }; diff --git a/packages/benchmark/tasks/format-duration-util/solution/solve.sh b/packages/benchmark/tasks/format-duration-util/solution/solve.sh new file mode 100755 index 000000000..764e03155 --- /dev/null +++ b/packages/benchmark/tasks/format-duration-util/solution/solve.sh @@ -0,0 +1,5 @@ +#!/usr/bin/env bash +# Reference solution applier (reviewer aid only — never used at grade time). +set -euo pipefail +cd /app +git apply --whitespace=nowarn /solution/solution.patch diff --git a/packages/benchmark/tasks/format-duration-util/task.toml b/packages/benchmark/tasks/format-duration-util/task.toml new file mode 100644 index 000000000..f3ea7a24d --- /dev/null +++ b/packages/benchmark/tasks/format-duration-util/task.toml @@ -0,0 +1,42 @@ +schema_version = "1.1" +artifacts = [] + +[task] +name = "slopbench/format-duration-util" +description = "Implement formatDuration(ms) producing a compact h/m/s label." +authors = [] +keywords = ["react", "typescript", "slop", "frontend"] + +[metadata] +task_id = "format-duration-util" +display_title = "Format duration label" +display_description = "Implement formatDuration(ms) producing a compact h/m/s label." +family = "produce-clean" +target_dimensions = ["ts-strictness", "maintainability"] +language = "typescript" +repository_url = "in-tree" +base_commit_hash = "root" +slop_profile = "" + +[verifier] +timeout_sec = 1200.0 + +[verifier.env] + +[agent] +timeout_sec = 3600.0 + +[environment] +build_timeout_sec = 1200.0 +docker_image = "slopbench-base:latest" +os = "linux" +cpus = 2 +memory_mb = 4096 +storage_mb = 10240 +gpus = 0 +allow_internet = false +mcp_servers = [] + +[environment.env] + +[solution.env] diff --git a/packages/benchmark/tasks/format-duration-util/tests/test.patch b/packages/benchmark/tasks/format-duration-util/tests/test.patch new file mode 100644 index 000000000..8c461584f --- /dev/null +++ b/packages/benchmark/tasks/format-duration-util/tests/test.patch @@ -0,0 +1,31 @@ +diff --git a/tests/format-duration.test.ts b/tests/format-duration.test.ts +new file mode 100644 +index 0000000..58f20fc +--- /dev/null ++++ b/tests/format-duration.test.ts +@@ -0,0 +1,25 @@ ++import { test } from "node:test"; ++import assert from "node:assert/strict"; ++import { formatDuration } from "../src/format-duration.ts"; ++ ++test("returns 0s for zero and negative input", () => { ++ assert.equal(formatDuration(0), "0s"); ++ assert.equal(formatDuration(-10), "0s"); ++}); ++ ++test("renders seconds only under a minute", () => { ++ assert.equal(formatDuration(5_000), "5s"); ++}); ++ ++test("renders minutes and seconds", () => { ++ assert.equal(formatDuration(65_000), "1m 5s"); ++}); ++ ++test("drops trailing zero units", () => { ++ assert.equal(formatDuration(3_600_000), "1h"); ++}); ++ ++test("keeps a zero unit between two non-zero units", () => { ++ assert.equal(formatDuration(3_601_000), "1h 0m 1s"); ++ assert.equal(formatDuration(3_661_000), "1h 1m 1s"); ++}); diff --git a/packages/benchmark/tasks/format-duration-util/tests/test.sh b/packages/benchmark/tasks/format-duration-util/tests/test.sh new file mode 100755 index 000000000..338001146 --- /dev/null +++ b/packages/benchmark/tasks/format-duration-util/tests/test.sh @@ -0,0 +1,5 @@ +#!/usr/bin/env bash +set -euo pipefail +export BASE_COMMIT="$(git -C "${APP_DIR:-/app}" rev-list --max-parents=0 HEAD | tail -1)" +export FUNCTIONAL_TEST_CMD="node --experimental-strip-types --test tests/format-duration.test.ts" +exec slopbench-grade diff --git a/packages/benchmark/tasks/format-list-extend/_authoring/hidden/tests/format-list.test.ts b/packages/benchmark/tasks/format-list-extend/_authoring/hidden/tests/format-list.test.ts new file mode 100644 index 000000000..fb30cd9d9 --- /dev/null +++ b/packages/benchmark/tasks/format-list-extend/_authoring/hidden/tests/format-list.test.ts @@ -0,0 +1,19 @@ +import { test } from "node:test"; +import assert from "node:assert/strict"; +import { formatList } from "../src/format-list.ts"; + +test("keeps existing default joining behavior", () => { + assert.equal(formatList([]), ""); + assert.equal(formatList(["a"]), "a"); + assert.equal(formatList(["a", "b"]), "a and b"); + assert.equal(formatList(["a", "b", "c"]), "a, b and c"); +}); + +test("adds an Oxford comma when requested for 3+ items", () => { + assert.equal(formatList(["a", "b", "c"], { oxford: true }), "a, b, and c"); +}); + +test("honors a custom conjunction", () => { + assert.equal(formatList(["a", "b", "c"], { conjunction: "or" }), "a, b or c"); + assert.equal(formatList(["a", "b"], { conjunction: "or" }), "a or b"); +}); diff --git a/packages/benchmark/tasks/format-list-extend/_authoring/solved/src/format-list.ts b/packages/benchmark/tasks/format-list-extend/_authoring/solved/src/format-list.ts new file mode 100644 index 000000000..6742df7e6 --- /dev/null +++ b/packages/benchmark/tasks/format-list-extend/_authoring/solved/src/format-list.ts @@ -0,0 +1,17 @@ +export interface FormatListOptions { + conjunction?: string; + oxford?: boolean; +} + +// Joins a list into a human sentence, with an optional Oxford comma. +export const formatList = (items: readonly string[], options: FormatListOptions = {}): string => { + const conjunction = options.conjunction ?? "and"; + if (items.length === 0) return ""; + if (items.length === 1) return items[0] ?? ""; + if (items.length === 2) return `${items[0]} ${conjunction} ${items[1]}`; + + const head = items.slice(0, -1).join(", "); + const last = items[items.length - 1]; + const oxfordComma = options.oxford ? "," : ""; + return `${head}${oxfordComma} ${conjunction} ${last}`; +}; diff --git a/packages/benchmark/tasks/format-list-extend/environment/Dockerfile b/packages/benchmark/tasks/format-list-extend/environment/Dockerfile new file mode 100644 index 000000000..0717d0595 --- /dev/null +++ b/packages/benchmark/tasks/format-list-extend/environment/Dockerfile @@ -0,0 +1,12 @@ +FROM slopbench-base:latest + +WORKDIR /app + +COPY seed/ . +# Pure-TS task: no dependency install (functional test uses node --test). +RUN git init -q \ + && git add -A \ + && git -c user.email=bench@react.doctor -c user.name=slopbench commit -qm "base" \ + && git config --global --add safe.directory /app + +CMD ["/bin/bash"] diff --git a/packages/benchmark/tasks/format-list-extend/instruction.md b/packages/benchmark/tasks/format-list-extend/instruction.md new file mode 100644 index 000000000..c79a292e9 --- /dev/null +++ b/packages/benchmark/tasks/format-list-extend/instruction.md @@ -0,0 +1,24 @@ +`src/format-list.ts` joins a list of strings into a sentence. Extend it. + +## Expected behavior + +Change `formatList` to take an options object as its second argument: +`formatList(items, options?)` where `options` is +`{ conjunction?: string; oxford?: boolean }`. + +- `conjunction` defaults to `"and"`. +- Existing joining behavior is unchanged by default: + - `formatList([])` → `""` + - `formatList(["a"])` → `"a"` + - `formatList(["a", "b"])` → `"a and b"` + - `formatList(["a", "b", "c"])` → `"a, b and c"` +- New: when `options.oxford` is `true` and there are 3+ items, place a comma + before the conjunction (the Oxford comma): + - `formatList(["a", "b", "c"], { oxford: true })` → `"a, b, and c"` +- `conjunction` still applies: + - `formatList(["a", "b", "c"], { conjunction: "or" })` → `"a, b or c"` + +## Constraints + +Export `formatList` with the new `(items: string[], options?) => string` +signature. Do not change `src/attendees-label.tsx` (it calls `formatList(names)`). diff --git a/packages/benchmark/tasks/format-list-extend/seed/package.json b/packages/benchmark/tasks/format-list-extend/seed/package.json new file mode 100644 index 000000000..d4322f8e4 --- /dev/null +++ b/packages/benchmark/tasks/format-list-extend/seed/package.json @@ -0,0 +1,10 @@ +{ + "name": "slopbench-format-list-extend", + "version": "1.0.0", + "private": true, + "type": "module", + "dependencies": { + "react": "^18.3.1", + "react-dom": "^18.3.1" + } +} diff --git a/packages/benchmark/tasks/format-list-extend/seed/src/attendees-label.tsx b/packages/benchmark/tasks/format-list-extend/seed/src/attendees-label.tsx new file mode 100644 index 000000000..6a69c1f98 --- /dev/null +++ b/packages/benchmark/tasks/format-list-extend/seed/src/attendees-label.tsx @@ -0,0 +1,10 @@ +import { formatList } from "./format-list.ts"; + +interface AttendeesLabelProps { + names: string[]; +} + +// Existing consumer (keeps format-list.ts reachable). Do not edit. +export const AttendeesLabel = ({ names }: AttendeesLabelProps) => ( + <span className="attendees">{formatList(names)}</span> +); diff --git a/packages/benchmark/tasks/format-list-extend/seed/src/format-list.ts b/packages/benchmark/tasks/format-list-extend/seed/src/format-list.ts new file mode 100644 index 000000000..f64566516 --- /dev/null +++ b/packages/benchmark/tasks/format-list-extend/seed/src/format-list.ts @@ -0,0 +1,11 @@ +// Joins a list of names into a human sentence, e.g. ["a","b","c"] -> "a, b and c". +export function formatList(items: any, conjunction?: any): any { + const c = conjunction ? conjunction : "and"; + return items.length === 0 + ? "" + : items.length === 1 + ? items[0] + : items.length === 2 + ? items[0] + " " + c + " " + items[1] + : items.slice(0, -1).join(", ") + " " + c + " " + items[items.length - 1]; +} diff --git a/packages/benchmark/tasks/format-list-extend/seed/tsconfig.json b/packages/benchmark/tasks/format-list-extend/seed/tsconfig.json new file mode 100644 index 000000000..ffbea3d66 --- /dev/null +++ b/packages/benchmark/tasks/format-list-extend/seed/tsconfig.json @@ -0,0 +1,13 @@ +{ + "compilerOptions": { + "target": "ES2022", + "module": "ESNext", + "moduleResolution": "Bundler", + "jsx": "react-jsx", + "strict": true, + "allowImportingTsExtensions": true, + "noEmit": true, + "skipLibCheck": true + }, + "include": ["src", "tests"] +} diff --git a/packages/benchmark/tasks/format-list-extend/solution/solution.patch b/packages/benchmark/tasks/format-list-extend/solution/solution.patch new file mode 100644 index 000000000..c002142b0 --- /dev/null +++ b/packages/benchmark/tasks/format-list-extend/solution/solution.patch @@ -0,0 +1,32 @@ +diff --git a/src/format-list.ts b/src/format-list.ts +index f645665..6742df7 100644 +--- a/src/format-list.ts ++++ b/src/format-list.ts +@@ -1,11 +1,17 @@ +-// Joins a list of names into a human sentence, e.g. ["a","b","c"] -> "a, b and c". +-export function formatList(items: any, conjunction?: any): any { +- const c = conjunction ? conjunction : "and"; +- return items.length === 0 +- ? "" +- : items.length === 1 +- ? items[0] +- : items.length === 2 +- ? items[0] + " " + c + " " + items[1] +- : items.slice(0, -1).join(", ") + " " + c + " " + items[items.length - 1]; ++export interface FormatListOptions { ++ conjunction?: string; ++ oxford?: boolean; + } ++ ++// Joins a list into a human sentence, with an optional Oxford comma. ++export const formatList = (items: readonly string[], options: FormatListOptions = {}): string => { ++ const conjunction = options.conjunction ?? "and"; ++ if (items.length === 0) return ""; ++ if (items.length === 1) return items[0] ?? ""; ++ if (items.length === 2) return `${items[0]} ${conjunction} ${items[1]}`; ++ ++ const head = items.slice(0, -1).join(", "); ++ const last = items[items.length - 1]; ++ const oxfordComma = options.oxford ? "," : ""; ++ return `${head}${oxfordComma} ${conjunction} ${last}`; ++}; diff --git a/packages/benchmark/tasks/format-list-extend/solution/solve.sh b/packages/benchmark/tasks/format-list-extend/solution/solve.sh new file mode 100755 index 000000000..764e03155 --- /dev/null +++ b/packages/benchmark/tasks/format-list-extend/solution/solve.sh @@ -0,0 +1,5 @@ +#!/usr/bin/env bash +# Reference solution applier (reviewer aid only — never used at grade time). +set -euo pipefail +cd /app +git apply --whitespace=nowarn /solution/solution.patch diff --git a/packages/benchmark/tasks/format-list-extend/task.toml b/packages/benchmark/tasks/format-list-extend/task.toml new file mode 100644 index 000000000..f3c68a3b8 --- /dev/null +++ b/packages/benchmark/tasks/format-list-extend/task.toml @@ -0,0 +1,42 @@ +schema_version = "1.1" +artifacts = [] + +[task] +name = "slopbench/format-list-extend" +description = "Extend a sloppy formatList to support an Oxford-comma option while keeping behavior." +authors = [] +keywords = ["react", "typescript", "slop", "frontend"] + +[metadata] +task_id = "format-list-extend" +display_title = "Extend formatList with Oxford comma" +display_description = "Extend a sloppy formatList to support an Oxford-comma option while keeping behavior." +family = "handle-slop" +target_dimensions = ["maintainability", "ts-strictness"] +language = "typescript" +repository_url = "in-tree" +base_commit_hash = "root" +slop_profile = "" + +[verifier] +timeout_sec = 1200.0 + +[verifier.env] + +[agent] +timeout_sec = 3600.0 + +[environment] +build_timeout_sec = 1200.0 +docker_image = "slopbench-base:latest" +os = "linux" +cpus = 2 +memory_mb = 4096 +storage_mb = 10240 +gpus = 0 +allow_internet = false +mcp_servers = [] + +[environment.env] + +[solution.env] diff --git a/packages/benchmark/tasks/format-list-extend/tests/test.patch b/packages/benchmark/tasks/format-list-extend/tests/test.patch new file mode 100644 index 000000000..a3cfd2dc3 --- /dev/null +++ b/packages/benchmark/tasks/format-list-extend/tests/test.patch @@ -0,0 +1,25 @@ +diff --git a/tests/format-list.test.ts b/tests/format-list.test.ts +new file mode 100644 +index 0000000..fb30cd9 +--- /dev/null ++++ b/tests/format-list.test.ts +@@ -0,0 +1,19 @@ ++import { test } from "node:test"; ++import assert from "node:assert/strict"; ++import { formatList } from "../src/format-list.ts"; ++ ++test("keeps existing default joining behavior", () => { ++ assert.equal(formatList([]), ""); ++ assert.equal(formatList(["a"]), "a"); ++ assert.equal(formatList(["a", "b"]), "a and b"); ++ assert.equal(formatList(["a", "b", "c"]), "a, b and c"); ++}); ++ ++test("adds an Oxford comma when requested for 3+ items", () => { ++ assert.equal(formatList(["a", "b", "c"], { oxford: true }), "a, b, and c"); ++}); ++ ++test("honors a custom conjunction", () => { ++ assert.equal(formatList(["a", "b", "c"], { conjunction: "or" }), "a, b or c"); ++ assert.equal(formatList(["a", "b"], { conjunction: "or" }), "a or b"); ++}); diff --git a/packages/benchmark/tasks/format-list-extend/tests/test.sh b/packages/benchmark/tasks/format-list-extend/tests/test.sh new file mode 100755 index 000000000..658edfba1 --- /dev/null +++ b/packages/benchmark/tasks/format-list-extend/tests/test.sh @@ -0,0 +1,5 @@ +#!/usr/bin/env bash +set -euo pipefail +export BASE_COMMIT="$(git -C "${APP_DIR:-/app}" rev-list --max-parents=0 HEAD | tail -1)" +export FUNCTIONAL_TEST_CMD="node --experimental-strip-types --test tests/format-list.test.ts" +exec slopbench-grade diff --git a/packages/benchmark/tasks/format-money-util/_authoring/hidden/tests/format-money.test.ts b/packages/benchmark/tasks/format-money-util/_authoring/hidden/tests/format-money.test.ts new file mode 100644 index 000000000..c330bb463 --- /dev/null +++ b/packages/benchmark/tasks/format-money-util/_authoring/hidden/tests/format-money.test.ts @@ -0,0 +1,34 @@ +import { test } from "node:test"; +import assert from "node:assert/strict"; +import { formatMoney } from "../src/format-money.ts"; + +test("formats USD by default with two decimals", () => { + assert.equal(formatMoney(1234), "$12.34"); + assert.equal(formatMoney(0), "$0.00"); +}); + +test("supports known currency symbols", () => { + assert.equal(formatMoney(500, { currency: "EUR" }), "€5.00"); + assert.equal(formatMoney(500, { currency: "GBP" }), "£5.00"); +}); + +test("treats JPY as a zero-decimal currency", () => { + assert.equal(formatMoney(1200, { currency: "JPY" }), "¥1,200"); +}); + +test("falls back to an uppercased code prefix for unknown currencies", () => { + assert.equal(formatMoney(500, { currency: "chf" }), "CHF 5.00"); +}); + +test("renders negatives with a leading minus", () => { + assert.equal(formatMoney(-1234), "-$12.34"); +}); + +test("trims zero cents only for whole amounts when asked", () => { + assert.equal(formatMoney(1000, { trimZeroCents: true }), "$10"); + assert.equal(formatMoney(1050, { trimZeroCents: true }), "$10.50"); +}); + +test("groups thousands with commas", () => { + assert.equal(formatMoney(123456789), "$1,234,567.89"); +}); diff --git a/packages/benchmark/tasks/format-money-util/_authoring/solved/src/format-money.ts b/packages/benchmark/tasks/format-money-util/_authoring/solved/src/format-money.ts new file mode 100644 index 000000000..3e2fa0521 --- /dev/null +++ b/packages/benchmark/tasks/format-money-util/_authoring/solved/src/format-money.ts @@ -0,0 +1,47 @@ +export interface FormatMoneyOptions { + // ISO 4217 currency code, e.g. "USD", "EUR", "JPY". Defaults to "USD". + currency?: string; + // When true, drop the fractional part for whole amounts ($10 instead of $10.00). + trimZeroCents?: boolean; +} + +interface CurrencyFormat { + symbol: string; + fractionDigits: number; +} + +const CURRENCY_FORMATS: Record<string, CurrencyFormat> = { + USD: { symbol: "$", fractionDigits: 2 }, + EUR: { symbol: "€", fractionDigits: 2 }, + GBP: { symbol: "£", fractionDigits: 2 }, + JPY: { symbol: "¥", fractionDigits: 0 }, +}; + +const groupThousands = (digits: string): string => digits.replace(/\B(?=(\d{3})+(?!\d))/g, ","); + +const resolveFormat = (currency: string): CurrencyFormat => { + const known = CURRENCY_FORMATS[currency]; + if (known) return known; + return { symbol: `${currency} `, fractionDigits: 2 }; +}; + +export const formatMoney = (amountCents: number, options: FormatMoneyOptions = {}): string => { + const currency = (options.currency ?? "USD").toUpperCase(); + const format = resolveFormat(currency); + const isNegative = amountCents < 0; + const absoluteCents = Math.abs(amountCents); + + if (format.fractionDigits === 0) { + const whole = groupThousands(String(absoluteCents)); + return `${isNegative ? "-" : ""}${format.symbol}${whole}`; + } + + const divisor = 10 ** format.fractionDigits; + const major = Math.floor(absoluteCents / divisor); + const minor = absoluteCents % divisor; + const groupedMajor = groupThousands(String(major)); + const showDecimals = !(options.trimZeroCents && minor === 0); + const fraction = showDecimals ? `.${String(minor).padStart(format.fractionDigits, "0")}` : ""; + + return `${isNegative ? "-" : ""}${format.symbol}${groupedMajor}${fraction}`; +}; diff --git a/packages/benchmark/tasks/format-money-util/environment/Dockerfile b/packages/benchmark/tasks/format-money-util/environment/Dockerfile new file mode 100644 index 000000000..f0a7a2504 --- /dev/null +++ b/packages/benchmark/tasks/format-money-util/environment/Dockerfile @@ -0,0 +1,13 @@ +FROM slopbench-base:latest + +WORKDIR /app + +# In-tree seed committed as the base commit. No dependency install needed — +# the functional test uses Node's built-in test runner with type stripping. +COPY seed/ . +RUN git init -q \ + && git add -A \ + && git -c user.email=bench@react.doctor -c user.name=slopbench commit -qm "base" \ + && git config --global --add safe.directory /app + +CMD ["/bin/bash"] diff --git a/packages/benchmark/tasks/format-money-util/instruction.md b/packages/benchmark/tasks/format-money-util/instruction.md new file mode 100644 index 000000000..ca319aa52 --- /dev/null +++ b/packages/benchmark/tasks/format-money-util/instruction.md @@ -0,0 +1,28 @@ +Implement the `formatMoney` utility in `src/format-money.ts`. + +## Expected behavior + +`formatMoney(amountCents, options?)` converts an integer amount in **minor +units** (cents) into a display string. + +- The amount is always an integer number of cents. Divide by 100 for the major + unit. Example: `1234` → `"$12.34"`. +- `options.currency` is an ISO 4217 code (default `"USD"`). Render the correct + symbol for at least: `USD` → `$`, `EUR` → `€`, `GBP` → `£`, `JPY` → `¥`. + For any other code, prefix the amount with the uppercased code and a space, + e.g. `formatMoney(500, { currency: "chf" })` → `"CHF 5.00"`. +- `JPY` has **no minor unit**: render no decimals and treat `amountCents` as + whole yen — `formatMoney(1200, { currency: "JPY" })` → `"¥1,200"`. +- Negative amounts render with a leading minus before the symbol: + `formatMoney(-1234)` → `"-$12.34"`. +- Always show exactly two decimals for minor-unit currencies, **unless** + `options.trimZeroCents` is `true` and the amount is a whole major unit, in + which case drop the decimals: `formatMoney(1000, { trimZeroCents: true })` + → `"$10"`, but `formatMoney(1050, { trimZeroCents: true })` → `"$10.50"`. +- Group the integer part with commas: `formatMoney(123456789)` → + `"$1,234,567.89"` (grouping applies to every currency, including `JPY`). + +## Constraints + +Keep the exported `formatMoney` signature and the `FormatMoneyOptions` +interface. Do not change `src/price-tag.tsx`. diff --git a/packages/benchmark/tasks/format-money-util/seed/package.json b/packages/benchmark/tasks/format-money-util/seed/package.json new file mode 100644 index 000000000..119c8d7c4 --- /dev/null +++ b/packages/benchmark/tasks/format-money-util/seed/package.json @@ -0,0 +1,10 @@ +{ + "name": "slopbench-format-money", + "version": "1.0.0", + "private": true, + "type": "module", + "dependencies": { + "react": "^18.3.1", + "react-dom": "^18.3.1" + } +} diff --git a/packages/benchmark/tasks/format-money-util/seed/src/format-money.ts b/packages/benchmark/tasks/format-money-util/seed/src/format-money.ts new file mode 100644 index 000000000..9f0e0ec7d --- /dev/null +++ b/packages/benchmark/tasks/format-money-util/seed/src/format-money.ts @@ -0,0 +1,11 @@ +export interface FormatMoneyOptions { + // ISO 4217 currency code, e.g. "USD", "EUR", "JPY". Defaults to "USD". + currency?: string; + // When true, drop the fractional part for whole amounts ($10 instead of $10.00). + trimZeroCents?: boolean; +} + +// TODO(agent): implement. See instruction.md for the exact contract. +export const formatMoney = (_amountCents: number, _options?: FormatMoneyOptions): string => { + throw new Error("not implemented"); +}; diff --git a/packages/benchmark/tasks/format-money-util/seed/src/price-tag.tsx b/packages/benchmark/tasks/format-money-util/seed/src/price-tag.tsx new file mode 100644 index 000000000..84468ec1f --- /dev/null +++ b/packages/benchmark/tasks/format-money-util/seed/src/price-tag.tsx @@ -0,0 +1,12 @@ +import { formatMoney } from "./format-money.ts"; + +interface PriceTagProps { + amountCents: number; + currency?: string; +} + +// Existing component that consumes the util (keeps format-money.ts reachable). +// Not part of the task — do not edit. +export const PriceTag = ({ amountCents, currency }: PriceTagProps) => ( + <span className="price-tag">{formatMoney(amountCents, { currency })}</span> +); diff --git a/packages/benchmark/tasks/format-money-util/seed/tsconfig.json b/packages/benchmark/tasks/format-money-util/seed/tsconfig.json new file mode 100644 index 000000000..ffbea3d66 --- /dev/null +++ b/packages/benchmark/tasks/format-money-util/seed/tsconfig.json @@ -0,0 +1,13 @@ +{ + "compilerOptions": { + "target": "ES2022", + "module": "ESNext", + "moduleResolution": "Bundler", + "jsx": "react-jsx", + "strict": true, + "allowImportingTsExtensions": true, + "noEmit": true, + "skipLibCheck": true + }, + "include": ["src", "tests"] +} diff --git a/packages/benchmark/tasks/format-money-util/solution/solution.patch b/packages/benchmark/tasks/format-money-util/solution/solution.patch new file mode 100644 index 000000000..2e1480b4d --- /dev/null +++ b/packages/benchmark/tasks/format-money-util/solution/solution.patch @@ -0,0 +1,51 @@ +diff --git a/src/format-money.ts b/src/format-money.ts +index 9f0e0ec..3e2fa05 100644 +--- a/src/format-money.ts ++++ b/src/format-money.ts +@@ -5,7 +5,43 @@ export interface FormatMoneyOptions { + trimZeroCents?: boolean; + } + +-// TODO(agent): implement. See instruction.md for the exact contract. +-export const formatMoney = (_amountCents: number, _options?: FormatMoneyOptions): string => { +- throw new Error("not implemented"); ++interface CurrencyFormat { ++ symbol: string; ++ fractionDigits: number; ++} ++ ++const CURRENCY_FORMATS: Record<string, CurrencyFormat> = { ++ USD: { symbol: "$", fractionDigits: 2 }, ++ EUR: { symbol: "€", fractionDigits: 2 }, ++ GBP: { symbol: "£", fractionDigits: 2 }, ++ JPY: { symbol: "¥", fractionDigits: 0 }, ++}; ++ ++const groupThousands = (digits: string): string => digits.replace(/\B(?=(\d{3})+(?!\d))/g, ","); ++ ++const resolveFormat = (currency: string): CurrencyFormat => { ++ const known = CURRENCY_FORMATS[currency]; ++ if (known) return known; ++ return { symbol: `${currency} `, fractionDigits: 2 }; ++}; ++ ++export const formatMoney = (amountCents: number, options: FormatMoneyOptions = {}): string => { ++ const currency = (options.currency ?? "USD").toUpperCase(); ++ const format = resolveFormat(currency); ++ const isNegative = amountCents < 0; ++ const absoluteCents = Math.abs(amountCents); ++ ++ if (format.fractionDigits === 0) { ++ const whole = groupThousands(String(absoluteCents)); ++ return `${isNegative ? "-" : ""}${format.symbol}${whole}`; ++ } ++ ++ const divisor = 10 ** format.fractionDigits; ++ const major = Math.floor(absoluteCents / divisor); ++ const minor = absoluteCents % divisor; ++ const groupedMajor = groupThousands(String(major)); ++ const showDecimals = !(options.trimZeroCents && minor === 0); ++ const fraction = showDecimals ? `.${String(minor).padStart(format.fractionDigits, "0")}` : ""; ++ ++ return `${isNegative ? "-" : ""}${format.symbol}${groupedMajor}${fraction}`; + }; diff --git a/packages/benchmark/tasks/format-money-util/solution/solve.sh b/packages/benchmark/tasks/format-money-util/solution/solve.sh new file mode 100755 index 000000000..764e03155 --- /dev/null +++ b/packages/benchmark/tasks/format-money-util/solution/solve.sh @@ -0,0 +1,5 @@ +#!/usr/bin/env bash +# Reference solution applier (reviewer aid only — never used at grade time). +set -euo pipefail +cd /app +git apply --whitespace=nowarn /solution/solution.patch diff --git a/packages/benchmark/tasks/format-money-util/task.toml b/packages/benchmark/tasks/format-money-util/task.toml new file mode 100644 index 000000000..d00460b9c --- /dev/null +++ b/packages/benchmark/tasks/format-money-util/task.toml @@ -0,0 +1,42 @@ +schema_version = "1.1" +artifacts = [] + +[task] +name = "slopbench/format-money-util" +description = "Implement a currency-formatting utility used by a React price tag." +authors = [] +keywords = ["typescript", "utility", "formatting", "slop"] + +[metadata] +task_id = "format-money-util" +display_title = "Currency formatter utility" +display_description = "Implement formatMoney(amountCents, options) with currency, JPY, negatives, trimming, and grouping." +family = "produce-clean" +target_dimensions = ["ts-strictness", "maintainability"] +language = "typescript" +repository_url = "in-tree" +base_commit_hash = "root" +slop_profile = "" + +[verifier] +timeout_sec = 1200.0 + +[verifier.env] + +[agent] +timeout_sec = 3600.0 + +[environment] +build_timeout_sec = 1200.0 +docker_image = "slopbench-base:latest" +os = "linux" +cpus = 2 +memory_mb = 4096 +storage_mb = 10240 +gpus = 0 +allow_internet = false +mcp_servers = [] + +[environment.env] + +[solution.env] diff --git a/packages/benchmark/tasks/format-money-util/tests/test.patch b/packages/benchmark/tasks/format-money-util/tests/test.patch new file mode 100644 index 000000000..86cbc75b2 --- /dev/null +++ b/packages/benchmark/tasks/format-money-util/tests/test.patch @@ -0,0 +1,40 @@ +diff --git a/tests/format-money.test.ts b/tests/format-money.test.ts +new file mode 100644 +index 0000000..c330bb4 +--- /dev/null ++++ b/tests/format-money.test.ts +@@ -0,0 +1,34 @@ ++import { test } from "node:test"; ++import assert from "node:assert/strict"; ++import { formatMoney } from "../src/format-money.ts"; ++ ++test("formats USD by default with two decimals", () => { ++ assert.equal(formatMoney(1234), "$12.34"); ++ assert.equal(formatMoney(0), "$0.00"); ++}); ++ ++test("supports known currency symbols", () => { ++ assert.equal(formatMoney(500, { currency: "EUR" }), "€5.00"); ++ assert.equal(formatMoney(500, { currency: "GBP" }), "£5.00"); ++}); ++ ++test("treats JPY as a zero-decimal currency", () => { ++ assert.equal(formatMoney(1200, { currency: "JPY" }), "¥1,200"); ++}); ++ ++test("falls back to an uppercased code prefix for unknown currencies", () => { ++ assert.equal(formatMoney(500, { currency: "chf" }), "CHF 5.00"); ++}); ++ ++test("renders negatives with a leading minus", () => { ++ assert.equal(formatMoney(-1234), "-$12.34"); ++}); ++ ++test("trims zero cents only for whole amounts when asked", () => { ++ assert.equal(formatMoney(1000, { trimZeroCents: true }), "$10"); ++ assert.equal(formatMoney(1050, { trimZeroCents: true }), "$10.50"); ++}); ++ ++test("groups thousands with commas", () => { ++ assert.equal(formatMoney(123456789), "$1,234,567.89"); ++}); diff --git a/packages/benchmark/tasks/format-money-util/tests/test.sh b/packages/benchmark/tasks/format-money-util/tests/test.sh new file mode 100755 index 000000000..f650bab01 --- /dev/null +++ b/packages/benchmark/tasks/format-money-util/tests/test.sh @@ -0,0 +1,12 @@ +#!/usr/bin/env bash +set -euo pipefail + +# In-tree seed: the base commit is the repo's root commit (created when the +# image seeds the project), resolved at runtime so no fixed sha is needed. +export BASE_COMMIT="$(git -C "${APP_DIR:-/app}" rev-list --max-parents=0 HEAD | tail -1)" + +# Pure-TS task: run with Node's built-in test runner + type stripping (offline, +# no dependency install). +export FUNCTIONAL_TEST_CMD="node --experimental-strip-types --test tests/format-money.test.ts" + +exec slopbench-grade diff --git a/packages/benchmark/tasks/group-by-extend/_authoring/hidden/tests/group-by.test.ts b/packages/benchmark/tasks/group-by-extend/_authoring/hidden/tests/group-by.test.ts new file mode 100644 index 000000000..a08c1872a --- /dev/null +++ b/packages/benchmark/tasks/group-by-extend/_authoring/hidden/tests/group-by.test.ts @@ -0,0 +1,26 @@ +import { test } from "node:test"; +import assert from "node:assert/strict"; +import { groupBy } from "../src/group-by.ts"; + +test("groups by a property name (existing behavior)", () => { + const result = groupBy([{ t: "a" }, { t: "b" }, { t: "a" }], "t"); + assert.deepEqual(result, { a: [{ t: "a" }, { t: "a" }], b: [{ t: "b" }] }); +}); + +test("groups by a selector function (new behavior)", () => { + const result = groupBy([1, 2, 3, 4], (n: number) => (n % 2 === 0 ? "even" : "odd")); + assert.deepEqual(result, { odd: [1, 3], even: [2, 4] }); +}); + +test("keeps first-seen order of items within a group", () => { + const items = [ + { id: 1, g: "x" }, + { id: 2, g: "x" }, + { id: 3, g: "x" }, + ]; + const result = groupBy(items, "g"); + assert.deepEqual( + result.x.map((item) => item.id), + [1, 2, 3], + ); +}); diff --git a/packages/benchmark/tasks/group-by-extend/_authoring/solved/src/group-by.ts b/packages/benchmark/tasks/group-by-extend/_authoring/solved/src/group-by.ts new file mode 100644 index 000000000..96a04dded --- /dev/null +++ b/packages/benchmark/tasks/group-by-extend/_authoring/solved/src/group-by.ts @@ -0,0 +1,16 @@ +export type GroupKey = string | number; + +// Groups a list by a property name or a selector function. The result maps each +// distinct key (stringified) to the items that produced it, in first-seen order. +export const groupBy = <Item>( + items: readonly Item[], + key: keyof Item | ((item: Item) => GroupKey), +): Record<string, Item[]> => { + const deriveKey = typeof key === "function" ? key : (item: Item) => String(item[key]); + const result: Record<string, Item[]> = {}; + for (const item of items) { + const groupKey = String(deriveKey(item)); + (result[groupKey] ??= []).push(item); + } + return result; +}; diff --git a/packages/benchmark/tasks/group-by-extend/environment/Dockerfile b/packages/benchmark/tasks/group-by-extend/environment/Dockerfile new file mode 100644 index 000000000..0717d0595 --- /dev/null +++ b/packages/benchmark/tasks/group-by-extend/environment/Dockerfile @@ -0,0 +1,12 @@ +FROM slopbench-base:latest + +WORKDIR /app + +COPY seed/ . +# Pure-TS task: no dependency install (functional test uses node --test). +RUN git init -q \ + && git add -A \ + && git -c user.email=bench@react.doctor -c user.name=slopbench commit -qm "base" \ + && git config --global --add safe.directory /app + +CMD ["/bin/bash"] diff --git a/packages/benchmark/tasks/group-by-extend/instruction.md b/packages/benchmark/tasks/group-by-extend/instruction.md new file mode 100644 index 000000000..8ac9441c8 --- /dev/null +++ b/packages/benchmark/tasks/group-by-extend/instruction.md @@ -0,0 +1,25 @@ +`src/group-by.ts` groups a list of records by a property name. Extend it. + +## Expected behavior + +`groupBy(items, key)` must support **two** forms of `key`: + +1. A **property name** (existing behavior): `groupBy(items, "category")` groups + by `item.category`. +2. A **selector function**: `groupBy(items, (item) => …)` groups by the value the + function returns for each item. + +In both cases the result is an object mapping each distinct key (as a string) to +the array of items that produced it, in first-seen order. Existing callers that +pass a property name must keep working unchanged. + +Examples: + +- `groupBy([{ t: "a" }, { t: "b" }, { t: "a" }], "t")` → + `{ a: [{ t: "a" }, { t: "a" }], b: [{ t: "b" }] }` +- `groupBy([1, 2, 3, 4], (n) => (n % 2 === 0 ? "even" : "odd"))` → + `{ odd: [1, 3], even: [2, 4] }` + +## Constraints + +Keep the export named `groupBy`. Do not change `src/inventory-report.ts`. diff --git a/packages/benchmark/tasks/group-by-extend/seed/package.json b/packages/benchmark/tasks/group-by-extend/seed/package.json new file mode 100644 index 000000000..c70f26cf2 --- /dev/null +++ b/packages/benchmark/tasks/group-by-extend/seed/package.json @@ -0,0 +1,10 @@ +{ + "name": "slopbench-group-by", + "version": "1.0.0", + "private": true, + "type": "module", + "dependencies": { + "react": "^18.3.1", + "react-dom": "^18.3.1" + } +} diff --git a/packages/benchmark/tasks/group-by-extend/seed/src/group-by.ts b/packages/benchmark/tasks/group-by-extend/seed/src/group-by.ts new file mode 100644 index 000000000..2e0308b68 --- /dev/null +++ b/packages/benchmark/tasks/group-by-extend/seed/src/group-by.ts @@ -0,0 +1,14 @@ +// Groups a list of records by the value of a property. Currently only supports +// a property name as the key selector. +export function groupBy(items: any, key: any): any { + const result: any = {}; + for (let i = 0; i < items.length; i++) { + const item = items[i]; + const k = item[key]; + if (result[k] === undefined) { + result[k] = []; + } + result[k].push(item); + } + return result; +} diff --git a/packages/benchmark/tasks/group-by-extend/seed/src/inventory-report.ts b/packages/benchmark/tasks/group-by-extend/seed/src/inventory-report.ts new file mode 100644 index 000000000..bc7a9e272 --- /dev/null +++ b/packages/benchmark/tasks/group-by-extend/seed/src/inventory-report.ts @@ -0,0 +1,10 @@ +import { groupBy } from "./group-by.ts"; + +export interface InventoryItem { + sku: string; + category: string; + quantity: number; +} + +// Existing consumer (keeps group-by.ts reachable). Do not edit. +export const groupByCategory = (items: InventoryItem[]) => groupBy(items, "category"); diff --git a/packages/benchmark/tasks/group-by-extend/seed/tsconfig.json b/packages/benchmark/tasks/group-by-extend/seed/tsconfig.json new file mode 100644 index 000000000..ffbea3d66 --- /dev/null +++ b/packages/benchmark/tasks/group-by-extend/seed/tsconfig.json @@ -0,0 +1,13 @@ +{ + "compilerOptions": { + "target": "ES2022", + "module": "ESNext", + "moduleResolution": "Bundler", + "jsx": "react-jsx", + "strict": true, + "allowImportingTsExtensions": true, + "noEmit": true, + "skipLibCheck": true + }, + "include": ["src", "tests"] +} diff --git a/packages/benchmark/tasks/group-by-extend/solution/solution.patch b/packages/benchmark/tasks/group-by-extend/solution/solution.patch new file mode 100644 index 000000000..9384b2375 --- /dev/null +++ b/packages/benchmark/tasks/group-by-extend/solution/solution.patch @@ -0,0 +1,33 @@ +diff --git a/src/group-by.ts b/src/group-by.ts +index 2e0308b..96a04dd 100644 +--- a/src/group-by.ts ++++ b/src/group-by.ts +@@ -1,14 +1,16 @@ +-// Groups a list of records by the value of a property. Currently only supports +-// a property name as the key selector. +-export function groupBy(items: any, key: any): any { +- const result: any = {}; +- for (let i = 0; i < items.length; i++) { +- const item = items[i]; +- const k = item[key]; +- if (result[k] === undefined) { +- result[k] = []; +- } +- result[k].push(item); ++export type GroupKey = string | number; ++ ++// Groups a list by a property name or a selector function. The result maps each ++// distinct key (stringified) to the items that produced it, in first-seen order. ++export const groupBy = <Item>( ++ items: readonly Item[], ++ key: keyof Item | ((item: Item) => GroupKey), ++): Record<string, Item[]> => { ++ const deriveKey = typeof key === "function" ? key : (item: Item) => String(item[key]); ++ const result: Record<string, Item[]> = {}; ++ for (const item of items) { ++ const groupKey = String(deriveKey(item)); ++ (result[groupKey] ??= []).push(item); + } + return result; +-} ++}; diff --git a/packages/benchmark/tasks/group-by-extend/solution/solve.sh b/packages/benchmark/tasks/group-by-extend/solution/solve.sh new file mode 100755 index 000000000..764e03155 --- /dev/null +++ b/packages/benchmark/tasks/group-by-extend/solution/solve.sh @@ -0,0 +1,5 @@ +#!/usr/bin/env bash +# Reference solution applier (reviewer aid only — never used at grade time). +set -euo pipefail +cd /app +git apply --whitespace=nowarn /solution/solution.patch diff --git a/packages/benchmark/tasks/group-by-extend/task.toml b/packages/benchmark/tasks/group-by-extend/task.toml new file mode 100644 index 000000000..1f98d7608 --- /dev/null +++ b/packages/benchmark/tasks/group-by-extend/task.toml @@ -0,0 +1,42 @@ +schema_version = "1.1" +artifacts = [] + +[task] +name = "slopbench/group-by-extend" +description = "Extend a sloppy groupBy to accept a selector function while keeping behavior." +authors = [] +keywords = ["react", "typescript", "slop", "frontend"] + +[metadata] +task_id = "group-by-extend" +display_title = "Extend groupBy with a selector" +display_description = "Extend a sloppy groupBy to accept a selector function while keeping behavior." +family = "handle-slop" +target_dimensions = ["ts-strictness", "maintainability"] +language = "typescript" +repository_url = "in-tree" +base_commit_hash = "root" +slop_profile = "" + +[verifier] +timeout_sec = 1200.0 + +[verifier.env] + +[agent] +timeout_sec = 3600.0 + +[environment] +build_timeout_sec = 1200.0 +docker_image = "slopbench-base:latest" +os = "linux" +cpus = 2 +memory_mb = 4096 +storage_mb = 10240 +gpus = 0 +allow_internet = false +mcp_servers = [] + +[environment.env] + +[solution.env] diff --git a/packages/benchmark/tasks/group-by-extend/tests/test.patch b/packages/benchmark/tasks/group-by-extend/tests/test.patch new file mode 100644 index 000000000..da7c63ede --- /dev/null +++ b/packages/benchmark/tasks/group-by-extend/tests/test.patch @@ -0,0 +1,32 @@ +diff --git a/tests/group-by.test.ts b/tests/group-by.test.ts +new file mode 100644 +index 0000000..a08c187 +--- /dev/null ++++ b/tests/group-by.test.ts +@@ -0,0 +1,26 @@ ++import { test } from "node:test"; ++import assert from "node:assert/strict"; ++import { groupBy } from "../src/group-by.ts"; ++ ++test("groups by a property name (existing behavior)", () => { ++ const result = groupBy([{ t: "a" }, { t: "b" }, { t: "a" }], "t"); ++ assert.deepEqual(result, { a: [{ t: "a" }, { t: "a" }], b: [{ t: "b" }] }); ++}); ++ ++test("groups by a selector function (new behavior)", () => { ++ const result = groupBy([1, 2, 3, 4], (n: number) => (n % 2 === 0 ? "even" : "odd")); ++ assert.deepEqual(result, { odd: [1, 3], even: [2, 4] }); ++}); ++ ++test("keeps first-seen order of items within a group", () => { ++ const items = [ ++ { id: 1, g: "x" }, ++ { id: 2, g: "x" }, ++ { id: 3, g: "x" }, ++ ]; ++ const result = groupBy(items, "g"); ++ assert.deepEqual( ++ result.x.map((item) => item.id), ++ [1, 2, 3], ++ ); ++}); diff --git a/packages/benchmark/tasks/group-by-extend/tests/test.sh b/packages/benchmark/tasks/group-by-extend/tests/test.sh new file mode 100755 index 000000000..da9c1cfc4 --- /dev/null +++ b/packages/benchmark/tasks/group-by-extend/tests/test.sh @@ -0,0 +1,5 @@ +#!/usr/bin/env bash +set -euo pipefail +export BASE_COMMIT="$(git -C "${APP_DIR:-/app}" rev-list --max-parents=0 HEAD | tail -1)" +export FUNCTIONAL_TEST_CMD="node --experimental-strip-types --test tests/group-by.test.ts" +exec slopbench-grade diff --git a/packages/benchmark/tasks/icon-button-a11y/_authoring/hidden/tests/icon-button.test.tsx b/packages/benchmark/tasks/icon-button-a11y/_authoring/hidden/tests/icon-button.test.tsx new file mode 100644 index 000000000..ac5ccf5b2 --- /dev/null +++ b/packages/benchmark/tasks/icon-button-a11y/_authoring/hidden/tests/icon-button.test.tsx @@ -0,0 +1,15 @@ +import { test, expect } from "vitest"; +import { renderToStaticMarkup } from "react-dom/server"; +import { IconButton } from "../src/icon-button.tsx"; + +const render = () => + renderToStaticMarkup(<IconButton label="Close" glyph={"\u00d7"} onPress={() => {}} />); + +test("renders a control with the accessible name", () => { + const html = render(); + expect(html).toContain('aria-label="Close"'); +}); + +test("displays the glyph", () => { + expect(render()).toContain("\u00d7"); +}); diff --git a/packages/benchmark/tasks/icon-button-a11y/_authoring/solved/src/icon-button.tsx b/packages/benchmark/tasks/icon-button-a11y/_authoring/solved/src/icon-button.tsx new file mode 100644 index 000000000..b8d039416 --- /dev/null +++ b/packages/benchmark/tasks/icon-button-a11y/_authoring/solved/src/icon-button.tsx @@ -0,0 +1,11 @@ +export interface IconButtonProps { + label: string; + glyph: string; + onPress: () => void; +} + +export const IconButton = ({ label, glyph, onPress }: IconButtonProps) => ( + <button type="button" aria-label={label} onClick={onPress} className="icon-button"> + <span aria-hidden="true">{glyph}</span> + </button> +); diff --git a/packages/benchmark/tasks/icon-button-a11y/environment/Dockerfile b/packages/benchmark/tasks/icon-button-a11y/environment/Dockerfile new file mode 100644 index 000000000..fcbfdb374 --- /dev/null +++ b/packages/benchmark/tasks/icon-button-a11y/environment/Dockerfile @@ -0,0 +1,12 @@ +FROM slopbench-base:latest + +WORKDIR /app + +COPY seed/ . +RUN pnpm install --frozen-lockfile --ignore-scripts || pnpm install --ignore-scripts +RUN git init -q \ + && git add -A \ + && git -c user.email=bench@react.doctor -c user.name=slopbench commit -qm "base" \ + && git config --global --add safe.directory /app + +CMD ["/bin/bash"] diff --git a/packages/benchmark/tasks/icon-button-a11y/instruction.md b/packages/benchmark/tasks/icon-button-a11y/instruction.md new file mode 100644 index 000000000..1fb6eb1dd --- /dev/null +++ b/packages/benchmark/tasks/icon-button-a11y/instruction.md @@ -0,0 +1,22 @@ +Implement the `IconButton` component in `src/icon-button.tsx`. + +## Expected behavior + +`IconButton` renders an icon-only clickable control. It receives: + +- `label` — an accessible name for the control. +- `glyph` — the icon character to display (e.g. `"×"`). +- `onPress` — called when the control is activated. + +The rendered control must: + +- expose the accessible name `label` to assistive technology, +- display the `glyph` as its visible content, +- invoke `onPress` when activated. + +Example: `<IconButton label="Close" glyph="×" onPress={fn} />` renders a control +named "Close" showing `×`. + +## Constraints + +Keep the exported `IconButton` component and the `IconButtonProps` type. diff --git a/packages/benchmark/tasks/icon-button-a11y/seed/package.json b/packages/benchmark/tasks/icon-button-a11y/seed/package.json new file mode 100644 index 000000000..c9e71c6ea --- /dev/null +++ b/packages/benchmark/tasks/icon-button-a11y/seed/package.json @@ -0,0 +1,13 @@ +{ + "name": "slopbench-icon-button-a11y", + "version": "1.0.0", + "private": true, + "type": "module", + "dependencies": { + "react": "^18.3.1", + "react-dom": "^18.3.1" + }, + "devDependencies": { + "vitest": "^4.1.8" + } +} diff --git a/packages/benchmark/tasks/icon-button-a11y/seed/src/icon-button.tsx b/packages/benchmark/tasks/icon-button-a11y/seed/src/icon-button.tsx new file mode 100644 index 000000000..890537942 --- /dev/null +++ b/packages/benchmark/tasks/icon-button-a11y/seed/src/icon-button.tsx @@ -0,0 +1,10 @@ +export interface IconButtonProps { + label: string; + glyph: string; + onPress: () => void; +} + +// TODO(agent): implement. See instruction.md. +export const IconButton = (_props: IconButtonProps) => { + throw new Error("not implemented"); +}; diff --git a/packages/benchmark/tasks/icon-button-a11y/seed/tsconfig.json b/packages/benchmark/tasks/icon-button-a11y/seed/tsconfig.json new file mode 100644 index 000000000..ffbea3d66 --- /dev/null +++ b/packages/benchmark/tasks/icon-button-a11y/seed/tsconfig.json @@ -0,0 +1,13 @@ +{ + "compilerOptions": { + "target": "ES2022", + "module": "ESNext", + "moduleResolution": "Bundler", + "jsx": "react-jsx", + "strict": true, + "allowImportingTsExtensions": true, + "noEmit": true, + "skipLibCheck": true + }, + "include": ["src", "tests"] +} diff --git a/packages/benchmark/tasks/icon-button-a11y/seed/vitest.config.ts b/packages/benchmark/tasks/icon-button-a11y/seed/vitest.config.ts new file mode 100644 index 000000000..8409b1f8e --- /dev/null +++ b/packages/benchmark/tasks/icon-button-a11y/seed/vitest.config.ts @@ -0,0 +1,9 @@ +import { defineConfig } from "vitest/config"; + +export default defineConfig({ + esbuild: { jsx: "automatic" }, + test: { + environment: "node", + include: ["tests/**/*.test.tsx"], + }, +}); diff --git a/packages/benchmark/tasks/icon-button-a11y/solution/solution.patch b/packages/benchmark/tasks/icon-button-a11y/solution/solution.patch new file mode 100644 index 000000000..61ae6fee6 --- /dev/null +++ b/packages/benchmark/tasks/icon-button-a11y/solution/solution.patch @@ -0,0 +1,17 @@ +diff --git a/src/icon-button.tsx b/src/icon-button.tsx +index 8905379..b8d0394 100644 +--- a/src/icon-button.tsx ++++ b/src/icon-button.tsx +@@ -4,7 +4,8 @@ export interface IconButtonProps { + onPress: () => void; + } + +-// TODO(agent): implement. See instruction.md. +-export const IconButton = (_props: IconButtonProps) => { +- throw new Error("not implemented"); +-}; ++export const IconButton = ({ label, glyph, onPress }: IconButtonProps) => ( ++ <button type="button" aria-label={label} onClick={onPress} className="icon-button"> ++ <span aria-hidden="true">{glyph}</span> ++ </button> ++); diff --git a/packages/benchmark/tasks/icon-button-a11y/solution/solve.sh b/packages/benchmark/tasks/icon-button-a11y/solution/solve.sh new file mode 100755 index 000000000..764e03155 --- /dev/null +++ b/packages/benchmark/tasks/icon-button-a11y/solution/solve.sh @@ -0,0 +1,5 @@ +#!/usr/bin/env bash +# Reference solution applier (reviewer aid only — never used at grade time). +set -euo pipefail +cd /app +git apply --whitespace=nowarn /solution/solution.patch diff --git a/packages/benchmark/tasks/icon-button-a11y/task.toml b/packages/benchmark/tasks/icon-button-a11y/task.toml new file mode 100644 index 000000000..94a15ee2e --- /dev/null +++ b/packages/benchmark/tasks/icon-button-a11y/task.toml @@ -0,0 +1,42 @@ +schema_version = "1.1" +artifacts = [] + +[task] +name = "slopbench/icon-button-a11y" +description = "Implement an icon-only button with an accessible name (real button, not a div)." +authors = [] +keywords = ["react", "typescript", "slop", "frontend"] + +[metadata] +task_id = "icon-button-a11y" +display_title = "Accessible icon button" +display_description = "Implement an icon-only button with an accessible name (real button, not a div)." +family = "produce-clean" +target_dimensions = ["accessibility", "react-correctness"] +language = "typescript" +repository_url = "in-tree" +base_commit_hash = "root" +slop_profile = "" + +[verifier] +timeout_sec = 1200.0 + +[verifier.env] + +[agent] +timeout_sec = 3600.0 + +[environment] +build_timeout_sec = 1200.0 +docker_image = "slopbench-base:latest" +os = "linux" +cpus = 2 +memory_mb = 4096 +storage_mb = 10240 +gpus = 0 +allow_internet = false +mcp_servers = [] + +[environment.env] + +[solution.env] diff --git a/packages/benchmark/tasks/icon-button-a11y/tests/test.patch b/packages/benchmark/tasks/icon-button-a11y/tests/test.patch new file mode 100644 index 000000000..87993f3f0 --- /dev/null +++ b/packages/benchmark/tasks/icon-button-a11y/tests/test.patch @@ -0,0 +1,21 @@ +diff --git a/tests/icon-button.test.tsx b/tests/icon-button.test.tsx +new file mode 100644 +index 0000000..ac5ccf5 +--- /dev/null ++++ b/tests/icon-button.test.tsx +@@ -0,0 +1,15 @@ ++import { test, expect } from "vitest"; ++import { renderToStaticMarkup } from "react-dom/server"; ++import { IconButton } from "../src/icon-button.tsx"; ++ ++const render = () => ++ renderToStaticMarkup(<IconButton label="Close" glyph={"\u00d7"} onPress={() => {}} />); ++ ++test("renders a control with the accessible name", () => { ++ const html = render(); ++ expect(html).toContain('aria-label="Close"'); ++}); ++ ++test("displays the glyph", () => { ++ expect(render()).toContain("\u00d7"); ++}); diff --git a/packages/benchmark/tasks/icon-button-a11y/tests/test.sh b/packages/benchmark/tasks/icon-button-a11y/tests/test.sh new file mode 100755 index 000000000..4003f69b2 --- /dev/null +++ b/packages/benchmark/tasks/icon-button-a11y/tests/test.sh @@ -0,0 +1,5 @@ +#!/usr/bin/env bash +set -euo pipefail +export BASE_COMMIT="$(git -C "${APP_DIR:-/app}" rev-list --max-parents=0 HEAD | tail -1)" +export FUNCTIONAL_TEST_CMD="pnpm exec vitest run" +exec slopbench-grade diff --git a/packages/benchmark/tasks/notification-list/_authoring/hidden/tests/notification-list.test.tsx b/packages/benchmark/tasks/notification-list/_authoring/hidden/tests/notification-list.test.tsx new file mode 100644 index 000000000..f5f01fbd1 --- /dev/null +++ b/packages/benchmark/tasks/notification-list/_authoring/hidden/tests/notification-list.test.tsx @@ -0,0 +1,24 @@ +import { test, expect } from "vitest"; +import { renderToStaticMarkup } from "react-dom/server"; +import { NotificationList, type Notification } from "../src/notification-list.tsx"; + +const NOTIFICATIONS: Notification[] = [ + { id: "a", message: "Saved" }, + { id: "b", message: "Deleted" }, + { id: "c", message: "Shared" }, +]; + +test("renders one list item per notification, in order", () => { + const html = renderToStaticMarkup(<NotificationList notifications={NOTIFICATIONS} />); + expect(html).toContain('<ul class="notifications">'); + const items = html.match(/<li[^>]*>/g) ?? []; + expect(items).toHaveLength(3); + expect(html.indexOf("Saved")).toBeLessThan(html.indexOf("Deleted")); + expect(html).toContain("Shared"); +}); + +test("renders an empty list without items", () => { + const html = renderToStaticMarkup(<NotificationList notifications={[]} />); + expect(html).toContain('<ul class="notifications">'); + expect(html).not.toContain("<li"); +}); diff --git a/packages/benchmark/tasks/notification-list/_authoring/solved/src/notification-list.tsx b/packages/benchmark/tasks/notification-list/_authoring/solved/src/notification-list.tsx new file mode 100644 index 000000000..9c89d13d7 --- /dev/null +++ b/packages/benchmark/tasks/notification-list/_authoring/solved/src/notification-list.tsx @@ -0,0 +1,16 @@ +export interface Notification { + id: string; + message: string; +} + +export interface NotificationListProps { + notifications: Notification[]; +} + +export const NotificationList = ({ notifications }: NotificationListProps) => ( + <ul className="notifications"> + {notifications.map((notification) => ( + <li key={notification.id}>{notification.message}</li> + ))} + </ul> +); diff --git a/packages/benchmark/tasks/notification-list/environment/Dockerfile b/packages/benchmark/tasks/notification-list/environment/Dockerfile new file mode 100644 index 000000000..fcbfdb374 --- /dev/null +++ b/packages/benchmark/tasks/notification-list/environment/Dockerfile @@ -0,0 +1,12 @@ +FROM slopbench-base:latest + +WORKDIR /app + +COPY seed/ . +RUN pnpm install --frozen-lockfile --ignore-scripts || pnpm install --ignore-scripts +RUN git init -q \ + && git add -A \ + && git -c user.email=bench@react.doctor -c user.name=slopbench commit -qm "base" \ + && git config --global --add safe.directory /app + +CMD ["/bin/bash"] diff --git a/packages/benchmark/tasks/notification-list/instruction.md b/packages/benchmark/tasks/notification-list/instruction.md new file mode 100644 index 000000000..2533d78c0 --- /dev/null +++ b/packages/benchmark/tasks/notification-list/instruction.md @@ -0,0 +1,18 @@ +Implement the `NotificationList` component in `src/notification-list.tsx`. + +## Expected behavior + +`NotificationList` takes a `notifications` array (each item is +`{ id: string; message: string }`) and renders: + +- A `<ul className="notifications">` wrapper. +- One `<li>` per notification, in order, whose text content is the + notification's `message`. + +Example: `<NotificationList notifications={[{ id: "a", message: "Saved" }]} />` +renders `<ul class="notifications"><li>Saved</li></ul>`. + +## Constraints + +Keep the exported `NotificationList` component and the `Notification` / +`NotificationListProps` types. diff --git a/packages/benchmark/tasks/notification-list/seed/package.json b/packages/benchmark/tasks/notification-list/seed/package.json new file mode 100644 index 000000000..d7abc3def --- /dev/null +++ b/packages/benchmark/tasks/notification-list/seed/package.json @@ -0,0 +1,13 @@ +{ + "name": "slopbench-notification-list", + "version": "1.0.0", + "private": true, + "type": "module", + "dependencies": { + "react": "^18.3.1", + "react-dom": "^18.3.1" + }, + "devDependencies": { + "vitest": "^4.1.8" + } +} diff --git a/packages/benchmark/tasks/notification-list/seed/src/notification-list.tsx b/packages/benchmark/tasks/notification-list/seed/src/notification-list.tsx new file mode 100644 index 000000000..13df3b406 --- /dev/null +++ b/packages/benchmark/tasks/notification-list/seed/src/notification-list.tsx @@ -0,0 +1,13 @@ +export interface Notification { + id: string; + message: string; +} + +export interface NotificationListProps { + notifications: Notification[]; +} + +// TODO(agent): implement. See instruction.md. +export const NotificationList = (_props: NotificationListProps) => { + throw new Error("not implemented"); +}; diff --git a/packages/benchmark/tasks/notification-list/seed/tsconfig.json b/packages/benchmark/tasks/notification-list/seed/tsconfig.json new file mode 100644 index 000000000..ffbea3d66 --- /dev/null +++ b/packages/benchmark/tasks/notification-list/seed/tsconfig.json @@ -0,0 +1,13 @@ +{ + "compilerOptions": { + "target": "ES2022", + "module": "ESNext", + "moduleResolution": "Bundler", + "jsx": "react-jsx", + "strict": true, + "allowImportingTsExtensions": true, + "noEmit": true, + "skipLibCheck": true + }, + "include": ["src", "tests"] +} diff --git a/packages/benchmark/tasks/notification-list/seed/vitest.config.ts b/packages/benchmark/tasks/notification-list/seed/vitest.config.ts new file mode 100644 index 000000000..8409b1f8e --- /dev/null +++ b/packages/benchmark/tasks/notification-list/seed/vitest.config.ts @@ -0,0 +1,9 @@ +import { defineConfig } from "vitest/config"; + +export default defineConfig({ + esbuild: { jsx: "automatic" }, + test: { + environment: "node", + include: ["tests/**/*.test.tsx"], + }, +}); diff --git a/packages/benchmark/tasks/notification-list/solution/solution.patch b/packages/benchmark/tasks/notification-list/solution/solution.patch new file mode 100644 index 000000000..39779ac64 --- /dev/null +++ b/packages/benchmark/tasks/notification-list/solution/solution.patch @@ -0,0 +1,19 @@ +diff --git a/src/notification-list.tsx b/src/notification-list.tsx +index 13df3b4..9c89d13 100644 +--- a/src/notification-list.tsx ++++ b/src/notification-list.tsx +@@ -7,7 +7,10 @@ export interface NotificationListProps { + notifications: Notification[]; + } + +-// TODO(agent): implement. See instruction.md. +-export const NotificationList = (_props: NotificationListProps) => { +- throw new Error("not implemented"); +-}; ++export const NotificationList = ({ notifications }: NotificationListProps) => ( ++ <ul className="notifications"> ++ {notifications.map((notification) => ( ++ <li key={notification.id}>{notification.message}</li> ++ ))} ++ </ul> ++); diff --git a/packages/benchmark/tasks/notification-list/solution/solve.sh b/packages/benchmark/tasks/notification-list/solution/solve.sh new file mode 100755 index 000000000..764e03155 --- /dev/null +++ b/packages/benchmark/tasks/notification-list/solution/solve.sh @@ -0,0 +1,5 @@ +#!/usr/bin/env bash +# Reference solution applier (reviewer aid only — never used at grade time). +set -euo pipefail +cd /app +git apply --whitespace=nowarn /solution/solution.patch diff --git a/packages/benchmark/tasks/notification-list/task.toml b/packages/benchmark/tasks/notification-list/task.toml new file mode 100644 index 000000000..342b48615 --- /dev/null +++ b/packages/benchmark/tasks/notification-list/task.toml @@ -0,0 +1,42 @@ +schema_version = "1.1" +artifacts = [] + +[task] +name = "slopbench/notification-list" +description = "Render a notification list with stable keys (not array-index keys / inline components)." +authors = [] +keywords = ["react", "typescript", "slop", "frontend"] + +[metadata] +task_id = "notification-list" +display_title = "Notification list" +display_description = "Render a notification list with stable keys (not array-index keys / inline components)." +family = "produce-clean" +target_dimensions = ["react-correctness", "react-performance"] +language = "typescript" +repository_url = "in-tree" +base_commit_hash = "root" +slop_profile = "" + +[verifier] +timeout_sec = 1200.0 + +[verifier.env] + +[agent] +timeout_sec = 3600.0 + +[environment] +build_timeout_sec = 1200.0 +docker_image = "slopbench-base:latest" +os = "linux" +cpus = 2 +memory_mb = 4096 +storage_mb = 10240 +gpus = 0 +allow_internet = false +mcp_servers = [] + +[environment.env] + +[solution.env] diff --git a/packages/benchmark/tasks/notification-list/tests/test.patch b/packages/benchmark/tasks/notification-list/tests/test.patch new file mode 100644 index 000000000..6eeb9a51b --- /dev/null +++ b/packages/benchmark/tasks/notification-list/tests/test.patch @@ -0,0 +1,30 @@ +diff --git a/tests/notification-list.test.tsx b/tests/notification-list.test.tsx +new file mode 100644 +index 0000000..f5f01fb +--- /dev/null ++++ b/tests/notification-list.test.tsx +@@ -0,0 +1,24 @@ ++import { test, expect } from "vitest"; ++import { renderToStaticMarkup } from "react-dom/server"; ++import { NotificationList, type Notification } from "../src/notification-list.tsx"; ++ ++const NOTIFICATIONS: Notification[] = [ ++ { id: "a", message: "Saved" }, ++ { id: "b", message: "Deleted" }, ++ { id: "c", message: "Shared" }, ++]; ++ ++test("renders one list item per notification, in order", () => { ++ const html = renderToStaticMarkup(<NotificationList notifications={NOTIFICATIONS} />); ++ expect(html).toContain('<ul class="notifications">'); ++ const items = html.match(/<li[^>]*>/g) ?? []; ++ expect(items).toHaveLength(3); ++ expect(html.indexOf("Saved")).toBeLessThan(html.indexOf("Deleted")); ++ expect(html).toContain("Shared"); ++}); ++ ++test("renders an empty list without items", () => { ++ const html = renderToStaticMarkup(<NotificationList notifications={[]} />); ++ expect(html).toContain('<ul class="notifications">'); ++ expect(html).not.toContain("<li"); ++}); diff --git a/packages/benchmark/tasks/notification-list/tests/test.sh b/packages/benchmark/tasks/notification-list/tests/test.sh new file mode 100755 index 000000000..4003f69b2 --- /dev/null +++ b/packages/benchmark/tasks/notification-list/tests/test.sh @@ -0,0 +1,5 @@ +#!/usr/bin/env bash +set -euo pipefail +export BASE_COMMIT="$(git -C "${APP_DIR:-/app}" rev-list --max-parents=0 HEAD | tail -1)" +export FUNCTIONAL_TEST_CMD="pnpm exec vitest run" +exec slopbench-grade diff --git a/packages/benchmark/tasks/paginate-util/_authoring/hidden/tests/paginate.test.ts b/packages/benchmark/tasks/paginate-util/_authoring/hidden/tests/paginate.test.ts new file mode 100644 index 000000000..80a217b15 --- /dev/null +++ b/packages/benchmark/tasks/paginate-util/_authoring/hidden/tests/paginate.test.ts @@ -0,0 +1,29 @@ +import { test } from "node:test"; +import assert from "node:assert/strict"; +import { paginate } from "../src/paginate.ts"; + +test("returns the first page slice with metadata", () => { + const result = paginate([1, 2, 3, 4, 5], 1, 2); + assert.deepEqual(result.items, [1, 2]); + assert.equal(result.page, 1); + assert.equal(result.totalPages, 3); + assert.equal(result.totalItems, 5); +}); + +test("returns the final partial page", () => { + assert.deepEqual(paginate([1, 2, 3, 4, 5], 3, 2).items, [5]); +}); + +test("clamps an out-of-range page to the last page", () => { + const result = paginate([1, 2, 3, 4, 5], 99, 2); + assert.deepEqual(result.items, [5]); + assert.equal(result.page, 3); +}); + +test("an empty list still has one empty page", () => { + const result = paginate([], 1, 2); + assert.deepEqual(result.items, []); + assert.equal(result.page, 1); + assert.equal(result.totalPages, 1); + assert.equal(result.totalItems, 0); +}); diff --git a/packages/benchmark/tasks/paginate-util/_authoring/solved/src/paginate.ts b/packages/benchmark/tasks/paginate-util/_authoring/solved/src/paginate.ts new file mode 100644 index 000000000..8e0eaafa1 --- /dev/null +++ b/packages/benchmark/tasks/paginate-util/_authoring/solved/src/paginate.ts @@ -0,0 +1,29 @@ +export interface Page<Item> { + items: Item[]; + page: number; + perPage: number; + totalItems: number; + totalPages: number; +} + +const clampToRange = (value: number, minimum: number, maximum: number): number => + Math.min(Math.max(value, minimum), maximum); + +export const paginate = <Item>( + items: readonly Item[], + page: number, + perPage: number, +): Page<Item> => { + const safePerPage = Math.max(1, Math.floor(perPage)); + const totalItems = items.length; + const totalPages = Math.max(1, Math.ceil(totalItems / safePerPage)); + const safePage = clampToRange(Math.floor(page), 1, totalPages); + const start = (safePage - 1) * safePerPage; + return { + items: items.slice(start, start + safePerPage), + page: safePage, + perPage: safePerPage, + totalItems, + totalPages, + }; +}; diff --git a/packages/benchmark/tasks/paginate-util/environment/Dockerfile b/packages/benchmark/tasks/paginate-util/environment/Dockerfile new file mode 100644 index 000000000..0717d0595 --- /dev/null +++ b/packages/benchmark/tasks/paginate-util/environment/Dockerfile @@ -0,0 +1,12 @@ +FROM slopbench-base:latest + +WORKDIR /app + +COPY seed/ . +# Pure-TS task: no dependency install (functional test uses node --test). +RUN git init -q \ + && git add -A \ + && git -c user.email=bench@react.doctor -c user.name=slopbench commit -qm "base" \ + && git config --global --add safe.directory /app + +CMD ["/bin/bash"] diff --git a/packages/benchmark/tasks/paginate-util/instruction.md b/packages/benchmark/tasks/paginate-util/instruction.md new file mode 100644 index 000000000..be6b82ad7 --- /dev/null +++ b/packages/benchmark/tasks/paginate-util/instruction.md @@ -0,0 +1,29 @@ +Implement `paginate` in `src/paginate.ts`. + +## Expected behavior + +`paginate(items, page, perPage)` returns the slice for a 1-indexed page plus +pagination metadata. + +- `perPage` is coerced to at least `1`. +- `totalItems` is the input length; `totalPages` is + `ceil(totalItems / perPage)`, but at least `1` (an empty list still has one + empty page). +- `page` is clamped to the range `[1, totalPages]`. +- `items` is the slice for the clamped page. + +Returns `{ items, page, perPage, totalItems, totalPages }` where `page` and +`perPage` are the clamped/coerced values actually used. + +Examples (with `perPage = 2`): + +- `paginate([1,2,3,4,5], 1, 2)` → items `[1,2]`, page 1, totalPages 3, totalItems 5 +- `paginate([1,2,3,4,5], 3, 2)` → items `[5]`, page 3 +- `paginate([1,2,3,4,5], 99, 2)` → items `[5]`, page 3 (clamped) +- `paginate([], 1, 2)` → items `[]`, page 1, totalPages 1, totalItems 0 + +## Constraints + +Keep the exported generic signature +`paginate<Item>(items: readonly Item[], page: number, perPage: number): Page<Item>`. +Do not change `src/results-view.tsx`. diff --git a/packages/benchmark/tasks/paginate-util/seed/package.json b/packages/benchmark/tasks/paginate-util/seed/package.json new file mode 100644 index 000000000..4e9f772fa --- /dev/null +++ b/packages/benchmark/tasks/paginate-util/seed/package.json @@ -0,0 +1,10 @@ +{ + "name": "slopbench-paginate-util", + "version": "1.0.0", + "private": true, + "type": "module", + "dependencies": { + "react": "^18.3.1", + "react-dom": "^18.3.1" + } +} diff --git a/packages/benchmark/tasks/paginate-util/seed/src/paginate.ts b/packages/benchmark/tasks/paginate-util/seed/src/paginate.ts new file mode 100644 index 000000000..fb522837b --- /dev/null +++ b/packages/benchmark/tasks/paginate-util/seed/src/paginate.ts @@ -0,0 +1,16 @@ +export interface Page<Item> { + items: Item[]; + page: number; + perPage: number; + totalItems: number; + totalPages: number; +} + +// TODO(agent): implement. See instruction.md. +export const paginate = <Item>( + _items: readonly Item[], + _page: number, + _perPage: number, +): Page<Item> => { + throw new Error("not implemented"); +}; diff --git a/packages/benchmark/tasks/paginate-util/seed/src/results-view.tsx b/packages/benchmark/tasks/paginate-util/seed/src/results-view.tsx new file mode 100644 index 000000000..a12bed926 --- /dev/null +++ b/packages/benchmark/tasks/paginate-util/seed/src/results-view.tsx @@ -0,0 +1,16 @@ +import { paginate } from "./paginate.ts"; + +interface ResultsViewProps { + rows: string[]; + page: number; +} + +// Existing consumer (keeps paginate.ts reachable). Do not edit. +export const ResultsView = ({ rows, page }: ResultsViewProps) => { + const result = paginate(rows, page, 10); + return ( + <p> + Page {result.page} of {result.totalPages} + </p> + ); +}; diff --git a/packages/benchmark/tasks/paginate-util/seed/tsconfig.json b/packages/benchmark/tasks/paginate-util/seed/tsconfig.json new file mode 100644 index 000000000..ffbea3d66 --- /dev/null +++ b/packages/benchmark/tasks/paginate-util/seed/tsconfig.json @@ -0,0 +1,13 @@ +{ + "compilerOptions": { + "target": "ES2022", + "module": "ESNext", + "moduleResolution": "Bundler", + "jsx": "react-jsx", + "strict": true, + "allowImportingTsExtensions": true, + "noEmit": true, + "skipLibCheck": true + }, + "include": ["src", "tests"] +} diff --git a/packages/benchmark/tasks/paginate-util/solution/solution.patch b/packages/benchmark/tasks/paginate-util/solution/solution.patch new file mode 100644 index 000000000..1dc0ae47f --- /dev/null +++ b/packages/benchmark/tasks/paginate-util/solution/solution.patch @@ -0,0 +1,34 @@ +diff --git a/src/paginate.ts b/src/paginate.ts +index fb52283..8e0eaaf 100644 +--- a/src/paginate.ts ++++ b/src/paginate.ts +@@ -6,11 +6,24 @@ export interface Page<Item> { + totalPages: number; + } + +-// TODO(agent): implement. See instruction.md. ++const clampToRange = (value: number, minimum: number, maximum: number): number => ++ Math.min(Math.max(value, minimum), maximum); ++ + export const paginate = <Item>( +- _items: readonly Item[], +- _page: number, +- _perPage: number, ++ items: readonly Item[], ++ page: number, ++ perPage: number, + ): Page<Item> => { +- throw new Error("not implemented"); ++ const safePerPage = Math.max(1, Math.floor(perPage)); ++ const totalItems = items.length; ++ const totalPages = Math.max(1, Math.ceil(totalItems / safePerPage)); ++ const safePage = clampToRange(Math.floor(page), 1, totalPages); ++ const start = (safePage - 1) * safePerPage; ++ return { ++ items: items.slice(start, start + safePerPage), ++ page: safePage, ++ perPage: safePerPage, ++ totalItems, ++ totalPages, ++ }; + }; diff --git a/packages/benchmark/tasks/paginate-util/solution/solve.sh b/packages/benchmark/tasks/paginate-util/solution/solve.sh new file mode 100755 index 000000000..764e03155 --- /dev/null +++ b/packages/benchmark/tasks/paginate-util/solution/solve.sh @@ -0,0 +1,5 @@ +#!/usr/bin/env bash +# Reference solution applier (reviewer aid only — never used at grade time). +set -euo pipefail +cd /app +git apply --whitespace=nowarn /solution/solution.patch diff --git a/packages/benchmark/tasks/paginate-util/task.toml b/packages/benchmark/tasks/paginate-util/task.toml new file mode 100644 index 000000000..e90491e5d --- /dev/null +++ b/packages/benchmark/tasks/paginate-util/task.toml @@ -0,0 +1,42 @@ +schema_version = "1.1" +artifacts = [] + +[task] +name = "slopbench/paginate-util" +description = "Implement paginate(items, page, perPage) with clamping and metadata." +authors = [] +keywords = ["react", "typescript", "slop", "frontend"] + +[metadata] +task_id = "paginate-util" +display_title = "Paginate utility" +display_description = "Implement paginate(items, page, perPage) with clamping and metadata." +family = "produce-clean" +target_dimensions = ["maintainability", "ts-strictness"] +language = "typescript" +repository_url = "in-tree" +base_commit_hash = "root" +slop_profile = "" + +[verifier] +timeout_sec = 1200.0 + +[verifier.env] + +[agent] +timeout_sec = 3600.0 + +[environment] +build_timeout_sec = 1200.0 +docker_image = "slopbench-base:latest" +os = "linux" +cpus = 2 +memory_mb = 4096 +storage_mb = 10240 +gpus = 0 +allow_internet = false +mcp_servers = [] + +[environment.env] + +[solution.env] diff --git a/packages/benchmark/tasks/paginate-util/tests/test.patch b/packages/benchmark/tasks/paginate-util/tests/test.patch new file mode 100644 index 000000000..80fea5b59 --- /dev/null +++ b/packages/benchmark/tasks/paginate-util/tests/test.patch @@ -0,0 +1,35 @@ +diff --git a/tests/paginate.test.ts b/tests/paginate.test.ts +new file mode 100644 +index 0000000..80a217b +--- /dev/null ++++ b/tests/paginate.test.ts +@@ -0,0 +1,29 @@ ++import { test } from "node:test"; ++import assert from "node:assert/strict"; ++import { paginate } from "../src/paginate.ts"; ++ ++test("returns the first page slice with metadata", () => { ++ const result = paginate([1, 2, 3, 4, 5], 1, 2); ++ assert.deepEqual(result.items, [1, 2]); ++ assert.equal(result.page, 1); ++ assert.equal(result.totalPages, 3); ++ assert.equal(result.totalItems, 5); ++}); ++ ++test("returns the final partial page", () => { ++ assert.deepEqual(paginate([1, 2, 3, 4, 5], 3, 2).items, [5]); ++}); ++ ++test("clamps an out-of-range page to the last page", () => { ++ const result = paginate([1, 2, 3, 4, 5], 99, 2); ++ assert.deepEqual(result.items, [5]); ++ assert.equal(result.page, 3); ++}); ++ ++test("an empty list still has one empty page", () => { ++ const result = paginate([], 1, 2); ++ assert.deepEqual(result.items, []); ++ assert.equal(result.page, 1); ++ assert.equal(result.totalPages, 1); ++ assert.equal(result.totalItems, 0); ++}); diff --git a/packages/benchmark/tasks/paginate-util/tests/test.sh b/packages/benchmark/tasks/paginate-util/tests/test.sh new file mode 100755 index 000000000..986181df3 --- /dev/null +++ b/packages/benchmark/tasks/paginate-util/tests/test.sh @@ -0,0 +1,5 @@ +#!/usr/bin/env bash +set -euo pipefail +export BASE_COMMIT="$(git -C "${APP_DIR:-/app}" rev-list --max-parents=0 HEAD | tail -1)" +export FUNCTIONAL_TEST_CMD="node --experimental-strip-types --test tests/paginate.test.ts" +exec slopbench-grade diff --git a/packages/benchmark/tasks/parse-query-util/_authoring/hidden/tests/parse-query.test.ts b/packages/benchmark/tasks/parse-query-util/_authoring/hidden/tests/parse-query.test.ts new file mode 100644 index 000000000..8af0250aa --- /dev/null +++ b/packages/benchmark/tasks/parse-query-util/_authoring/hidden/tests/parse-query.test.ts @@ -0,0 +1,24 @@ +import { test } from "node:test"; +import assert from "node:assert/strict"; +import { parseQuery } from "../src/parse-query.ts"; + +test("parses simple pairs and ignores a leading ?", () => { + assert.deepEqual(parseQuery("?a=1&b=two"), { a: "1", b: "two" }); +}); + +test("URI-decodes keys and values", () => { + assert.deepEqual(parseQuery("name=Ada%20Lovelace"), { name: "Ada Lovelace" }); +}); + +test("maps a bare key to an empty string", () => { + assert.deepEqual(parseQuery("flag&x=1"), { flag: "", x: "1" }); +}); + +test("keeps the last value for a repeated key", () => { + assert.deepEqual(parseQuery("k=1&k=2"), { k: "2" }); +}); + +test("returns an empty object for empty input", () => { + assert.deepEqual(parseQuery(""), {}); + assert.deepEqual(parseQuery("?"), {}); +}); diff --git a/packages/benchmark/tasks/parse-query-util/_authoring/solved/src/parse-query.ts b/packages/benchmark/tasks/parse-query-util/_authoring/solved/src/parse-query.ts new file mode 100644 index 000000000..41b4d6995 --- /dev/null +++ b/packages/benchmark/tasks/parse-query-util/_authoring/solved/src/parse-query.ts @@ -0,0 +1,18 @@ +// Parses a URL query string into a plain object (last value wins per key). +export const parseQuery = (search: string): Record<string, string> => { + const trimmed = search.startsWith("?") ? search.slice(1) : search; + const result: Record<string, string> = {}; + if (trimmed === "") return result; + + for (const pair of trimmed.split("&")) { + if (pair === "") continue; + const equalsIndex = pair.indexOf("="); + if (equalsIndex === -1) { + result[decodeURIComponent(pair)] = ""; + continue; + } + const key = decodeURIComponent(pair.slice(0, equalsIndex)); + result[key] = decodeURIComponent(pair.slice(equalsIndex + 1)); + } + return result; +}; diff --git a/packages/benchmark/tasks/parse-query-util/environment/Dockerfile b/packages/benchmark/tasks/parse-query-util/environment/Dockerfile new file mode 100644 index 000000000..0717d0595 --- /dev/null +++ b/packages/benchmark/tasks/parse-query-util/environment/Dockerfile @@ -0,0 +1,12 @@ +FROM slopbench-base:latest + +WORKDIR /app + +COPY seed/ . +# Pure-TS task: no dependency install (functional test uses node --test). +RUN git init -q \ + && git add -A \ + && git -c user.email=bench@react.doctor -c user.name=slopbench commit -qm "base" \ + && git config --global --add safe.directory /app + +CMD ["/bin/bash"] diff --git a/packages/benchmark/tasks/parse-query-util/instruction.md b/packages/benchmark/tasks/parse-query-util/instruction.md new file mode 100644 index 000000000..d69fdbc37 --- /dev/null +++ b/packages/benchmark/tasks/parse-query-util/instruction.md @@ -0,0 +1,26 @@ +Implement `parseQuery` in `src/parse-query.ts`. + +## Expected behavior + +`parseQuery(search)` parses a URL query string into a plain object. + +- An optional leading `?` is ignored. +- Pairs are separated by `&`; key and value are separated by `=`. +- Keys and values are URI-decoded (`%20` → space, `+` is left as-is is **not** + required — use `decodeURIComponent`). +- A key with no `=` maps to an empty string. +- When a key repeats, the **last** occurrence wins. +- An empty string (or just `"?"`) returns `{}`. + +Examples: + +- `parseQuery("?a=1&b=two")` → `{ a: "1", b: "two" }` +- `parseQuery("name=Ada%20Lovelace")` → `{ name: "Ada Lovelace" }` +- `parseQuery("flag&x=1")` → `{ flag: "", x: "1" }` +- `parseQuery("k=1&k=2")` → `{ k: "2" }` +- `parseQuery("")` → `{}` + +## Constraints + +Keep the exported `parseQuery(search: string): Record<string, string>` +signature. Do not change `src/filter-summary.tsx`. diff --git a/packages/benchmark/tasks/parse-query-util/seed/package.json b/packages/benchmark/tasks/parse-query-util/seed/package.json new file mode 100644 index 000000000..6a88bb101 --- /dev/null +++ b/packages/benchmark/tasks/parse-query-util/seed/package.json @@ -0,0 +1,10 @@ +{ + "name": "slopbench-parse-query-util", + "version": "1.0.0", + "private": true, + "type": "module", + "dependencies": { + "react": "^18.3.1", + "react-dom": "^18.3.1" + } +} diff --git a/packages/benchmark/tasks/parse-query-util/seed/src/filter-summary.tsx b/packages/benchmark/tasks/parse-query-util/seed/src/filter-summary.tsx new file mode 100644 index 000000000..02dd6e0e0 --- /dev/null +++ b/packages/benchmark/tasks/parse-query-util/seed/src/filter-summary.tsx @@ -0,0 +1,11 @@ +import { parseQuery } from "./parse-query.ts"; + +interface FilterSummaryProps { + search: string; +} + +// Existing consumer (keeps parse-query.ts reachable). Do not edit. +export const FilterSummary = ({ search }: FilterSummaryProps) => { + const params = parseQuery(search); + return <span>{Object.keys(params).length} filters</span>; +}; diff --git a/packages/benchmark/tasks/parse-query-util/seed/src/parse-query.ts b/packages/benchmark/tasks/parse-query-util/seed/src/parse-query.ts new file mode 100644 index 000000000..70a0e167f --- /dev/null +++ b/packages/benchmark/tasks/parse-query-util/seed/src/parse-query.ts @@ -0,0 +1,4 @@ +// TODO(agent): implement. See instruction.md. +export const parseQuery = (_search: string): Record<string, string> => { + throw new Error("not implemented"); +}; diff --git a/packages/benchmark/tasks/parse-query-util/seed/tsconfig.json b/packages/benchmark/tasks/parse-query-util/seed/tsconfig.json new file mode 100644 index 000000000..ffbea3d66 --- /dev/null +++ b/packages/benchmark/tasks/parse-query-util/seed/tsconfig.json @@ -0,0 +1,13 @@ +{ + "compilerOptions": { + "target": "ES2022", + "module": "ESNext", + "moduleResolution": "Bundler", + "jsx": "react-jsx", + "strict": true, + "allowImportingTsExtensions": true, + "noEmit": true, + "skipLibCheck": true + }, + "include": ["src", "tests"] +} diff --git a/packages/benchmark/tasks/parse-query-util/solution/solution.patch b/packages/benchmark/tasks/parse-query-util/solution/solution.patch new file mode 100644 index 000000000..5cea3a594 --- /dev/null +++ b/packages/benchmark/tasks/parse-query-util/solution/solution.patch @@ -0,0 +1,26 @@ +diff --git a/src/parse-query.ts b/src/parse-query.ts +index 70a0e16..41b4d69 100644 +--- a/src/parse-query.ts ++++ b/src/parse-query.ts +@@ -1,4 +1,18 @@ +-// TODO(agent): implement. See instruction.md. +-export const parseQuery = (_search: string): Record<string, string> => { +- throw new Error("not implemented"); ++// Parses a URL query string into a plain object (last value wins per key). ++export const parseQuery = (search: string): Record<string, string> => { ++ const trimmed = search.startsWith("?") ? search.slice(1) : search; ++ const result: Record<string, string> = {}; ++ if (trimmed === "") return result; ++ ++ for (const pair of trimmed.split("&")) { ++ if (pair === "") continue; ++ const equalsIndex = pair.indexOf("="); ++ if (equalsIndex === -1) { ++ result[decodeURIComponent(pair)] = ""; ++ continue; ++ } ++ const key = decodeURIComponent(pair.slice(0, equalsIndex)); ++ result[key] = decodeURIComponent(pair.slice(equalsIndex + 1)); ++ } ++ return result; + }; diff --git a/packages/benchmark/tasks/parse-query-util/solution/solve.sh b/packages/benchmark/tasks/parse-query-util/solution/solve.sh new file mode 100755 index 000000000..764e03155 --- /dev/null +++ b/packages/benchmark/tasks/parse-query-util/solution/solve.sh @@ -0,0 +1,5 @@ +#!/usr/bin/env bash +# Reference solution applier (reviewer aid only — never used at grade time). +set -euo pipefail +cd /app +git apply --whitespace=nowarn /solution/solution.patch diff --git a/packages/benchmark/tasks/parse-query-util/task.toml b/packages/benchmark/tasks/parse-query-util/task.toml new file mode 100644 index 000000000..a3536a32d --- /dev/null +++ b/packages/benchmark/tasks/parse-query-util/task.toml @@ -0,0 +1,42 @@ +schema_version = "1.1" +artifacts = [] + +[task] +name = "slopbench/parse-query-util" +description = "Implement parseQuery(search) into a typed record (last value wins)." +authors = [] +keywords = ["react", "typescript", "slop", "frontend"] + +[metadata] +task_id = "parse-query-util" +display_title = "Parse query string" +display_description = "Implement parseQuery(search) into a typed record (last value wins)." +family = "produce-clean" +target_dimensions = ["ts-strictness"] +language = "typescript" +repository_url = "in-tree" +base_commit_hash = "root" +slop_profile = "" + +[verifier] +timeout_sec = 1200.0 + +[verifier.env] + +[agent] +timeout_sec = 3600.0 + +[environment] +build_timeout_sec = 1200.0 +docker_image = "slopbench-base:latest" +os = "linux" +cpus = 2 +memory_mb = 4096 +storage_mb = 10240 +gpus = 0 +allow_internet = false +mcp_servers = [] + +[environment.env] + +[solution.env] diff --git a/packages/benchmark/tasks/parse-query-util/tests/test.patch b/packages/benchmark/tasks/parse-query-util/tests/test.patch new file mode 100644 index 000000000..5c14da6c9 --- /dev/null +++ b/packages/benchmark/tasks/parse-query-util/tests/test.patch @@ -0,0 +1,30 @@ +diff --git a/tests/parse-query.test.ts b/tests/parse-query.test.ts +new file mode 100644 +index 0000000..8af0250 +--- /dev/null ++++ b/tests/parse-query.test.ts +@@ -0,0 +1,24 @@ ++import { test } from "node:test"; ++import assert from "node:assert/strict"; ++import { parseQuery } from "../src/parse-query.ts"; ++ ++test("parses simple pairs and ignores a leading ?", () => { ++ assert.deepEqual(parseQuery("?a=1&b=two"), { a: "1", b: "two" }); ++}); ++ ++test("URI-decodes keys and values", () => { ++ assert.deepEqual(parseQuery("name=Ada%20Lovelace"), { name: "Ada Lovelace" }); ++}); ++ ++test("maps a bare key to an empty string", () => { ++ assert.deepEqual(parseQuery("flag&x=1"), { flag: "", x: "1" }); ++}); ++ ++test("keeps the last value for a repeated key", () => { ++ assert.deepEqual(parseQuery("k=1&k=2"), { k: "2" }); ++}); ++ ++test("returns an empty object for empty input", () => { ++ assert.deepEqual(parseQuery(""), {}); ++ assert.deepEqual(parseQuery("?"), {}); ++}); diff --git a/packages/benchmark/tasks/parse-query-util/tests/test.sh b/packages/benchmark/tasks/parse-query-util/tests/test.sh new file mode 100755 index 000000000..6f1ebd04b --- /dev/null +++ b/packages/benchmark/tasks/parse-query-util/tests/test.sh @@ -0,0 +1,5 @@ +#!/usr/bin/env bash +set -euo pipefail +export BASE_COMMIT="$(git -C "${APP_DIR:-/app}" rev-list --max-parents=0 HEAD | tail -1)" +export FUNCTIONAL_TEST_CMD="node --experimental-strip-types --test tests/parse-query.test.ts" +exec slopbench-grade diff --git a/packages/benchmark/tasks/retry-async-util/_authoring/hidden/tests/retry-async.test.ts b/packages/benchmark/tasks/retry-async-util/_authoring/hidden/tests/retry-async.test.ts new file mode 100644 index 000000000..aa66eeaf3 --- /dev/null +++ b/packages/benchmark/tasks/retry-async-util/_authoring/hidden/tests/retry-async.test.ts @@ -0,0 +1,35 @@ +import { test } from "node:test"; +import assert from "node:assert/strict"; +import { retryAsync } from "../src/retry-async.ts"; + +test("retries until the operation resolves", async () => { + let calls = 0; + const value = await retryAsync(async () => { + calls++; + if (calls < 2) throw new Error("transient"); + return "ok"; + }, 3); + assert.equal(value, "ok"); + assert.equal(calls, 2); +}); + +test("rejects with the last error after exhausting attempts", async () => { + let calls = 0; + await assert.rejects( + retryAsync(async () => { + calls++; + throw new Error(`fail ${calls}`); + }, 2), + /fail 2/, + ); + assert.equal(calls, 2); +}); + +test("calls the operation only once when it resolves immediately", async () => { + let calls = 0; + await retryAsync(async () => { + calls++; + return 1; + }, 5); + assert.equal(calls, 1); +}); diff --git a/packages/benchmark/tasks/retry-async-util/_authoring/solved/src/retry-async.ts b/packages/benchmark/tasks/retry-async-util/_authoring/solved/src/retry-async.ts new file mode 100644 index 000000000..444d79bf5 --- /dev/null +++ b/packages/benchmark/tasks/retry-async-util/_authoring/solved/src/retry-async.ts @@ -0,0 +1,17 @@ +// Runs an async operation, retrying on rejection up to `attempts` total calls. +// Implemented recursively so each retry chains off the previous failure without +// awaiting inside a loop. +export const retryAsync = async <Value>( + operation: () => Promise<Value>, + attempts: number, +): Promise<Value> => { + const maxAttempts = Math.max(1, Math.floor(attempts)); + + const attempt = (remaining: number): Promise<Value> => + operation().catch((error: unknown) => { + if (remaining <= 1) throw error; + return attempt(remaining - 1); + }); + + return attempt(maxAttempts); +}; diff --git a/packages/benchmark/tasks/retry-async-util/environment/Dockerfile b/packages/benchmark/tasks/retry-async-util/environment/Dockerfile new file mode 100644 index 000000000..0717d0595 --- /dev/null +++ b/packages/benchmark/tasks/retry-async-util/environment/Dockerfile @@ -0,0 +1,12 @@ +FROM slopbench-base:latest + +WORKDIR /app + +COPY seed/ . +# Pure-TS task: no dependency install (functional test uses node --test). +RUN git init -q \ + && git add -A \ + && git -c user.email=bench@react.doctor -c user.name=slopbench commit -qm "base" \ + && git config --global --add safe.directory /app + +CMD ["/bin/bash"] diff --git a/packages/benchmark/tasks/retry-async-util/instruction.md b/packages/benchmark/tasks/retry-async-util/instruction.md new file mode 100644 index 000000000..c50bfa7fb --- /dev/null +++ b/packages/benchmark/tasks/retry-async-util/instruction.md @@ -0,0 +1,25 @@ +Implement `retryAsync` in `src/retry-async.ts`. + +## Expected behavior + +`retryAsync(operation, attempts)` runs an async `operation`, retrying it when it +rejects: + +- Call `operation()`. If it resolves, return its value immediately. +- If it rejects, try again, up to `attempts` total calls. +- If the final attempt rejects, reject with that last error. +- `attempts` is treated as at least `1` (a value below 1 still runs once). + +Examples: + +- An operation that rejects once then resolves to `"ok"`, with `attempts = 3`, + resolves to `"ok"` after 2 calls. +- An operation that always rejects, with `attempts = 2`, rejects after exactly + 2 calls with the last error. +- An operation that resolves on the first call is only called once. + +## Constraints + +Keep the exported generic signature +`retryAsync<Value>(operation: () => Promise<Value>, attempts: number): Promise<Value>`. +Do not change `src/sync-button.tsx`. diff --git a/packages/benchmark/tasks/retry-async-util/seed/package.json b/packages/benchmark/tasks/retry-async-util/seed/package.json new file mode 100644 index 000000000..82db68f8e --- /dev/null +++ b/packages/benchmark/tasks/retry-async-util/seed/package.json @@ -0,0 +1,10 @@ +{ + "name": "slopbench-retry-async-util", + "version": "1.0.0", + "private": true, + "type": "module", + "dependencies": { + "react": "^18.3.1", + "react-dom": "^18.3.1" + } +} diff --git a/packages/benchmark/tasks/retry-async-util/seed/src/retry-async.ts b/packages/benchmark/tasks/retry-async-util/seed/src/retry-async.ts new file mode 100644 index 000000000..649d47c2f --- /dev/null +++ b/packages/benchmark/tasks/retry-async-util/seed/src/retry-async.ts @@ -0,0 +1,7 @@ +// TODO(agent): implement. See instruction.md. +export const retryAsync = async <Value>( + _operation: () => Promise<Value>, + _attempts: number, +): Promise<Value> => { + throw new Error("not implemented"); +}; diff --git a/packages/benchmark/tasks/retry-async-util/seed/src/sync-button.tsx b/packages/benchmark/tasks/retry-async-util/seed/src/sync-button.tsx new file mode 100644 index 000000000..9d8311b8c --- /dev/null +++ b/packages/benchmark/tasks/retry-async-util/seed/src/sync-button.tsx @@ -0,0 +1,12 @@ +import { retryAsync } from "./retry-async.ts"; + +interface SyncButtonProps { + sync: () => Promise<void>; +} + +// Existing consumer (keeps retry-async.ts reachable). Do not edit. +export const SyncButton = ({ sync }: SyncButtonProps) => ( + <button type="button" onClick={() => void retryAsync(sync, 3)}> + Sync + </button> +); diff --git a/packages/benchmark/tasks/retry-async-util/seed/tsconfig.json b/packages/benchmark/tasks/retry-async-util/seed/tsconfig.json new file mode 100644 index 000000000..ffbea3d66 --- /dev/null +++ b/packages/benchmark/tasks/retry-async-util/seed/tsconfig.json @@ -0,0 +1,13 @@ +{ + "compilerOptions": { + "target": "ES2022", + "module": "ESNext", + "moduleResolution": "Bundler", + "jsx": "react-jsx", + "strict": true, + "allowImportingTsExtensions": true, + "noEmit": true, + "skipLibCheck": true + }, + "include": ["src", "tests"] +} diff --git a/packages/benchmark/tasks/retry-async-util/solution/solution.patch b/packages/benchmark/tasks/retry-async-util/solution/solution.patch new file mode 100644 index 000000000..a2901f816 --- /dev/null +++ b/packages/benchmark/tasks/retry-async-util/solution/solution.patch @@ -0,0 +1,26 @@ +diff --git a/src/retry-async.ts b/src/retry-async.ts +index 649d47c..444d79b 100644 +--- a/src/retry-async.ts ++++ b/src/retry-async.ts +@@ -1,7 +1,17 @@ +-// TODO(agent): implement. See instruction.md. ++// Runs an async operation, retrying on rejection up to `attempts` total calls. ++// Implemented recursively so each retry chains off the previous failure without ++// awaiting inside a loop. + export const retryAsync = async <Value>( +- _operation: () => Promise<Value>, +- _attempts: number, ++ operation: () => Promise<Value>, ++ attempts: number, + ): Promise<Value> => { +- throw new Error("not implemented"); ++ const maxAttempts = Math.max(1, Math.floor(attempts)); ++ ++ const attempt = (remaining: number): Promise<Value> => ++ operation().catch((error: unknown) => { ++ if (remaining <= 1) throw error; ++ return attempt(remaining - 1); ++ }); ++ ++ return attempt(maxAttempts); + }; diff --git a/packages/benchmark/tasks/retry-async-util/solution/solve.sh b/packages/benchmark/tasks/retry-async-util/solution/solve.sh new file mode 100755 index 000000000..764e03155 --- /dev/null +++ b/packages/benchmark/tasks/retry-async-util/solution/solve.sh @@ -0,0 +1,5 @@ +#!/usr/bin/env bash +# Reference solution applier (reviewer aid only — never used at grade time). +set -euo pipefail +cd /app +git apply --whitespace=nowarn /solution/solution.patch diff --git a/packages/benchmark/tasks/retry-async-util/task.toml b/packages/benchmark/tasks/retry-async-util/task.toml new file mode 100644 index 000000000..702833b1d --- /dev/null +++ b/packages/benchmark/tasks/retry-async-util/task.toml @@ -0,0 +1,42 @@ +schema_version = "1.1" +artifacts = [] + +[task] +name = "slopbench/retry-async-util" +description = "Implement retryAsync(operation, attempts) retrying on rejection." +authors = [] +keywords = ["react", "typescript", "slop", "frontend"] + +[metadata] +task_id = "retry-async-util" +display_title = "Retry async utility" +display_description = "Implement retryAsync(operation, attempts) retrying on rejection." +family = "produce-clean" +target_dimensions = ["ts-strictness", "maintainability"] +language = "typescript" +repository_url = "in-tree" +base_commit_hash = "root" +slop_profile = "" + +[verifier] +timeout_sec = 1200.0 + +[verifier.env] + +[agent] +timeout_sec = 3600.0 + +[environment] +build_timeout_sec = 1200.0 +docker_image = "slopbench-base:latest" +os = "linux" +cpus = 2 +memory_mb = 4096 +storage_mb = 10240 +gpus = 0 +allow_internet = false +mcp_servers = [] + +[environment.env] + +[solution.env] diff --git a/packages/benchmark/tasks/retry-async-util/tests/test.patch b/packages/benchmark/tasks/retry-async-util/tests/test.patch new file mode 100644 index 000000000..88ca32f97 --- /dev/null +++ b/packages/benchmark/tasks/retry-async-util/tests/test.patch @@ -0,0 +1,41 @@ +diff --git a/tests/retry-async.test.ts b/tests/retry-async.test.ts +new file mode 100644 +index 0000000..aa66eea +--- /dev/null ++++ b/tests/retry-async.test.ts +@@ -0,0 +1,35 @@ ++import { test } from "node:test"; ++import assert from "node:assert/strict"; ++import { retryAsync } from "../src/retry-async.ts"; ++ ++test("retries until the operation resolves", async () => { ++ let calls = 0; ++ const value = await retryAsync(async () => { ++ calls++; ++ if (calls < 2) throw new Error("transient"); ++ return "ok"; ++ }, 3); ++ assert.equal(value, "ok"); ++ assert.equal(calls, 2); ++}); ++ ++test("rejects with the last error after exhausting attempts", async () => { ++ let calls = 0; ++ await assert.rejects( ++ retryAsync(async () => { ++ calls++; ++ throw new Error(`fail ${calls}`); ++ }, 2), ++ /fail 2/, ++ ); ++ assert.equal(calls, 2); ++}); ++ ++test("calls the operation only once when it resolves immediately", async () => { ++ let calls = 0; ++ await retryAsync(async () => { ++ calls++; ++ return 1; ++ }, 5); ++ assert.equal(calls, 1); ++}); diff --git a/packages/benchmark/tasks/retry-async-util/tests/test.sh b/packages/benchmark/tasks/retry-async-util/tests/test.sh new file mode 100755 index 000000000..1ff92a632 --- /dev/null +++ b/packages/benchmark/tasks/retry-async-util/tests/test.sh @@ -0,0 +1,5 @@ +#!/usr/bin/env bash +set -euo pipefail +export BASE_COMMIT="$(git -C "${APP_DIR:-/app}" rev-list --max-parents=0 HEAD | tail -1)" +export FUNCTIONAL_TEST_CMD="node --experimental-strip-types --test tests/retry-async.test.ts" +exec slopbench-grade diff --git a/packages/benchmark/tasks/route-handler-json/_authoring/hidden/tests/route.test.ts b/packages/benchmark/tasks/route-handler-json/_authoring/hidden/tests/route.test.ts new file mode 100644 index 000000000..4ea74da17 --- /dev/null +++ b/packages/benchmark/tasks/route-handler-json/_authoring/hidden/tests/route.test.ts @@ -0,0 +1,20 @@ +import { test } from "node:test"; +import assert from "node:assert/strict"; +import { GET } from "../app/api/products/route.ts"; +import { PRODUCTS } from "../src/products.ts"; + +test("returns the full catalog with status 200 when unfiltered", async () => { + const response = await GET(new Request("http://localhost/api/products")); + assert.equal(response.status, 200); + assert.deepEqual(await response.json(), PRODUCTS); +}); + +test("filters by maxPriceCents (inclusive)", async () => { + const response = await GET(new Request("http://localhost/api/products?maxPriceCents=300")); + assert.equal(response.status, 200); + const body = await response.json(); + assert.deepEqual( + body.map((product: { id: string }) => product.id), + ["p2", "p3"], + ); +}); diff --git a/packages/benchmark/tasks/route-handler-json/_authoring/solved/app/api/products/route.ts b/packages/benchmark/tasks/route-handler-json/_authoring/solved/app/api/products/route.ts new file mode 100644 index 000000000..5ecd249d0 --- /dev/null +++ b/packages/benchmark/tasks/route-handler-json/_authoring/solved/app/api/products/route.ts @@ -0,0 +1,9 @@ +import { PRODUCTS } from "../../../src/products.ts"; + +export const GET = async (request: Request): Promise<Response> => { + const maxPriceCentsParam = new URL(request.url).searchParams.get("maxPriceCents"); + const maxPriceCents = + maxPriceCentsParam === null ? Number.POSITIVE_INFINITY : Number(maxPriceCentsParam); + const matching = PRODUCTS.filter((product) => product.priceCents <= maxPriceCents); + return Response.json(matching); +}; diff --git a/packages/benchmark/tasks/route-handler-json/environment/Dockerfile b/packages/benchmark/tasks/route-handler-json/environment/Dockerfile new file mode 100644 index 000000000..0717d0595 --- /dev/null +++ b/packages/benchmark/tasks/route-handler-json/environment/Dockerfile @@ -0,0 +1,12 @@ +FROM slopbench-base:latest + +WORKDIR /app + +COPY seed/ . +# Pure-TS task: no dependency install (functional test uses node --test). +RUN git init -q \ + && git add -A \ + && git -c user.email=bench@react.doctor -c user.name=slopbench commit -qm "base" \ + && git config --global --add safe.directory /app + +CMD ["/bin/bash"] diff --git a/packages/benchmark/tasks/route-handler-json/instruction.md b/packages/benchmark/tasks/route-handler-json/instruction.md new file mode 100644 index 000000000..d3adaffea --- /dev/null +++ b/packages/benchmark/tasks/route-handler-json/instruction.md @@ -0,0 +1,23 @@ +Implement the App Router route handler in `app/api/products/route.ts`. + +## Expected behavior + +Handle `GET /api/products`, optionally filtered by a max price. + +- The catalog is the `PRODUCTS` array exported from `src/products.ts`. +- Read the `maxPriceCents` query parameter from the request URL. + - When absent, return the full catalog. + - When present, return only products whose `priceCents` is **less than or + equal to** that value. +- Respond with the resulting array as JSON and status `200`. + +Examples (status `200` in all cases): + +- `GET /api/products` → all of `PRODUCTS`. +- `GET /api/products?maxPriceCents=300` → `[{ Pen, 250 }, { Eraser, 99 }]` + (the products priced at or below 300). + +## Constraints + +Export the handler as a named `GET` function taking the `Request` +(App Router route-handler convention). Do not change `src/products.ts`. diff --git a/packages/benchmark/tasks/route-handler-json/seed/app/api/products/route.ts b/packages/benchmark/tasks/route-handler-json/seed/app/api/products/route.ts new file mode 100644 index 000000000..821b17fa2 --- /dev/null +++ b/packages/benchmark/tasks/route-handler-json/seed/app/api/products/route.ts @@ -0,0 +1,4 @@ +// TODO(agent): implement the GET route handler. See instruction.md. +export const GET = async (_request: Request): Promise<Response> => { + throw new Error("not implemented"); +}; diff --git a/packages/benchmark/tasks/route-handler-json/seed/package.json b/packages/benchmark/tasks/route-handler-json/seed/package.json new file mode 100644 index 000000000..3f9e9c32a --- /dev/null +++ b/packages/benchmark/tasks/route-handler-json/seed/package.json @@ -0,0 +1,11 @@ +{ + "name": "slopbench-route-handler-json", + "version": "1.0.0", + "private": true, + "type": "module", + "dependencies": { + "next": "^15.0.0", + "react": "^18.3.1", + "react-dom": "^18.3.1" + } +} diff --git a/packages/benchmark/tasks/route-handler-json/seed/src/products.ts b/packages/benchmark/tasks/route-handler-json/seed/src/products.ts new file mode 100644 index 000000000..fb6a53b19 --- /dev/null +++ b/packages/benchmark/tasks/route-handler-json/seed/src/products.ts @@ -0,0 +1,12 @@ +export interface Product { + id: string; + name: string; + priceCents: number; +} + +// Static catalog the route handler serves. Do not edit. +export const PRODUCTS: Product[] = [ + { id: "p1", name: "Notebook", priceCents: 1200 }, + { id: "p2", name: "Pen", priceCents: 250 }, + { id: "p3", name: "Eraser", priceCents: 99 }, +]; diff --git a/packages/benchmark/tasks/route-handler-json/seed/tsconfig.json b/packages/benchmark/tasks/route-handler-json/seed/tsconfig.json new file mode 100644 index 000000000..906d5b529 --- /dev/null +++ b/packages/benchmark/tasks/route-handler-json/seed/tsconfig.json @@ -0,0 +1,13 @@ +{ + "compilerOptions": { + "target": "ES2022", + "module": "ESNext", + "moduleResolution": "Bundler", + "jsx": "react-jsx", + "strict": true, + "allowImportingTsExtensions": true, + "noEmit": true, + "skipLibCheck": true + }, + "include": ["src", "app", "tests"] +} diff --git a/packages/benchmark/tasks/route-handler-json/solution/solution.patch b/packages/benchmark/tasks/route-handler-json/solution/solution.patch new file mode 100644 index 000000000..d6a546b2b --- /dev/null +++ b/packages/benchmark/tasks/route-handler-json/solution/solution.patch @@ -0,0 +1,17 @@ +diff --git a/app/api/products/route.ts b/app/api/products/route.ts +index 821b17f..5ecd249 100644 +--- a/app/api/products/route.ts ++++ b/app/api/products/route.ts +@@ -1,4 +1,9 @@ +-// TODO(agent): implement the GET route handler. See instruction.md. +-export const GET = async (_request: Request): Promise<Response> => { +- throw new Error("not implemented"); ++import { PRODUCTS } from "../../../src/products.ts"; ++ ++export const GET = async (request: Request): Promise<Response> => { ++ const maxPriceCentsParam = new URL(request.url).searchParams.get("maxPriceCents"); ++ const maxPriceCents = ++ maxPriceCentsParam === null ? Number.POSITIVE_INFINITY : Number(maxPriceCentsParam); ++ const matching = PRODUCTS.filter((product) => product.priceCents <= maxPriceCents); ++ return Response.json(matching); + }; diff --git a/packages/benchmark/tasks/route-handler-json/solution/solve.sh b/packages/benchmark/tasks/route-handler-json/solution/solve.sh new file mode 100755 index 000000000..764e03155 --- /dev/null +++ b/packages/benchmark/tasks/route-handler-json/solution/solve.sh @@ -0,0 +1,5 @@ +#!/usr/bin/env bash +# Reference solution applier (reviewer aid only — never used at grade time). +set -euo pipefail +cd /app +git apply --whitespace=nowarn /solution/solution.patch diff --git a/packages/benchmark/tasks/route-handler-json/task.toml b/packages/benchmark/tasks/route-handler-json/task.toml new file mode 100644 index 000000000..384b4f918 --- /dev/null +++ b/packages/benchmark/tasks/route-handler-json/task.toml @@ -0,0 +1,42 @@ +schema_version = "1.1" +artifacts = [] + +[task] +name = "slopbench/route-handler-json" +description = "Implement a Next App Router GET route handler returning the catalog as JSON." +authors = [] +keywords = ["react", "typescript", "slop", "frontend"] + +[metadata] +task_id = "route-handler-json" +display_title = "Next route handler JSON" +display_description = "Implement a Next App Router GET route handler returning the catalog as JSON." +family = "produce-clean" +target_dimensions = ["react-correctness", "ts-strictness"] +language = "typescript" +repository_url = "in-tree" +base_commit_hash = "root" +slop_profile = "" + +[verifier] +timeout_sec = 1200.0 + +[verifier.env] + +[agent] +timeout_sec = 3600.0 + +[environment] +build_timeout_sec = 1200.0 +docker_image = "slopbench-base:latest" +os = "linux" +cpus = 2 +memory_mb = 4096 +storage_mb = 10240 +gpus = 0 +allow_internet = false +mcp_servers = [] + +[environment.env] + +[solution.env] diff --git a/packages/benchmark/tasks/route-handler-json/tests/test.patch b/packages/benchmark/tasks/route-handler-json/tests/test.patch new file mode 100644 index 000000000..9ce531f44 --- /dev/null +++ b/packages/benchmark/tasks/route-handler-json/tests/test.patch @@ -0,0 +1,26 @@ +diff --git a/tests/route.test.ts b/tests/route.test.ts +new file mode 100644 +index 0000000..4ea74da +--- /dev/null ++++ b/tests/route.test.ts +@@ -0,0 +1,20 @@ ++import { test } from "node:test"; ++import assert from "node:assert/strict"; ++import { GET } from "../app/api/products/route.ts"; ++import { PRODUCTS } from "../src/products.ts"; ++ ++test("returns the full catalog with status 200 when unfiltered", async () => { ++ const response = await GET(new Request("http://localhost/api/products")); ++ assert.equal(response.status, 200); ++ assert.deepEqual(await response.json(), PRODUCTS); ++}); ++ ++test("filters by maxPriceCents (inclusive)", async () => { ++ const response = await GET(new Request("http://localhost/api/products?maxPriceCents=300")); ++ assert.equal(response.status, 200); ++ const body = await response.json(); ++ assert.deepEqual( ++ body.map((product: { id: string }) => product.id), ++ ["p2", "p3"], ++ ); ++}); diff --git a/packages/benchmark/tasks/route-handler-json/tests/test.sh b/packages/benchmark/tasks/route-handler-json/tests/test.sh new file mode 100755 index 000000000..d56c26aac --- /dev/null +++ b/packages/benchmark/tasks/route-handler-json/tests/test.sh @@ -0,0 +1,5 @@ +#!/usr/bin/env bash +set -euo pipefail +export BASE_COMMIT="$(git -C "${APP_DIR:-/app}" rev-list --max-parents=0 HEAD | tail -1)" +export FUNCTIONAL_TEST_CMD="node --experimental-strip-types --test tests/route.test.ts" +exec slopbench-grade diff --git a/packages/benchmark/tasks/slugify-util/_authoring/hidden/tests/slugify.test.ts b/packages/benchmark/tasks/slugify-util/_authoring/hidden/tests/slugify.test.ts new file mode 100644 index 000000000..1b214462e --- /dev/null +++ b/packages/benchmark/tasks/slugify-util/_authoring/hidden/tests/slugify.test.ts @@ -0,0 +1,23 @@ +import { test } from "node:test"; +import assert from "node:assert/strict"; +import { slugify } from "../src/slugify.ts"; + +test("lowercases and hyphenates words", () => { + assert.equal(slugify("Hello, World!"), "hello-world"); +}); + +test("collapses runs of whitespace", () => { + assert.equal(slugify(" Multiple Spaces "), "multiple-spaces"); +}); + +test("strips non-alphanumeric characters", () => { + assert.equal(slugify("Café & Crème"), "caf-crme"); +}); + +test("trims and collapses stray hyphens", () => { + assert.equal(slugify("--already--slugged--"), "already-slugged"); +}); + +test("returns empty string for empty input", () => { + assert.equal(slugify(""), ""); +}); diff --git a/packages/benchmark/tasks/slugify-util/_authoring/solved/src/slugify.ts b/packages/benchmark/tasks/slugify-util/_authoring/solved/src/slugify.ts new file mode 100644 index 000000000..b17d8bc95 --- /dev/null +++ b/packages/benchmark/tasks/slugify-util/_authoring/solved/src/slugify.ts @@ -0,0 +1,9 @@ +// Turns arbitrary text into a URL slug via a sequence of focused replacements. +export const slugify = (input: string): string => + input + .toLowerCase() + .trim() + .replace(/\s+/g, "-") + .replace(/[^a-z0-9-]/g, "") + .replace(/-+/g, "-") + .replace(/^-+|-+$/g, ""); diff --git a/packages/benchmark/tasks/slugify-util/environment/Dockerfile b/packages/benchmark/tasks/slugify-util/environment/Dockerfile new file mode 100644 index 000000000..0717d0595 --- /dev/null +++ b/packages/benchmark/tasks/slugify-util/environment/Dockerfile @@ -0,0 +1,12 @@ +FROM slopbench-base:latest + +WORKDIR /app + +COPY seed/ . +# Pure-TS task: no dependency install (functional test uses node --test). +RUN git init -q \ + && git add -A \ + && git -c user.email=bench@react.doctor -c user.name=slopbench commit -qm "base" \ + && git config --global --add safe.directory /app + +CMD ["/bin/bash"] diff --git a/packages/benchmark/tasks/slugify-util/instruction.md b/packages/benchmark/tasks/slugify-util/instruction.md new file mode 100644 index 000000000..d76a50040 --- /dev/null +++ b/packages/benchmark/tasks/slugify-util/instruction.md @@ -0,0 +1,25 @@ +Implement `slugify` in `src/slugify.ts`. + +## Expected behavior + +`slugify(input)` turns arbitrary text into a URL slug: + +- Lowercase the whole string. +- Trim leading/trailing whitespace. +- Replace any run of whitespace with a single hyphen. +- Remove every character that is not `a–z`, `0–9`, or `-`. +- Collapse runs of multiple hyphens into one. +- Strip leading and trailing hyphens. + +Examples: + +- `slugify("Hello, World!")` → `"hello-world"` +- `slugify(" Multiple Spaces ")` → `"multiple-spaces"` +- `slugify("Café & Crème")` → `"caf-crme"` +- `slugify("--already--slugged--")` → `"already-slugged"` +- `slugify("")` → `""` + +## Constraints + +Keep the exported `slugify(input: string): string` signature. Do not change +`src/article-link.tsx`. diff --git a/packages/benchmark/tasks/slugify-util/seed/package.json b/packages/benchmark/tasks/slugify-util/seed/package.json new file mode 100644 index 000000000..f633810cc --- /dev/null +++ b/packages/benchmark/tasks/slugify-util/seed/package.json @@ -0,0 +1,10 @@ +{ + "name": "slopbench-slugify-util", + "version": "1.0.0", + "private": true, + "type": "module", + "dependencies": { + "react": "^18.3.1", + "react-dom": "^18.3.1" + } +} diff --git a/packages/benchmark/tasks/slugify-util/seed/src/article-link.tsx b/packages/benchmark/tasks/slugify-util/seed/src/article-link.tsx new file mode 100644 index 000000000..e2ec11d83 --- /dev/null +++ b/packages/benchmark/tasks/slugify-util/seed/src/article-link.tsx @@ -0,0 +1,10 @@ +import { slugify } from "./slugify.ts"; + +interface ArticleLinkProps { + title: string; +} + +// Existing consumer (keeps slugify.ts reachable). Do not edit. +export const ArticleLink = ({ title }: ArticleLinkProps) => ( + <a href={`/articles/${slugify(title)}`}>{title}</a> +); diff --git a/packages/benchmark/tasks/slugify-util/seed/src/slugify.ts b/packages/benchmark/tasks/slugify-util/seed/src/slugify.ts new file mode 100644 index 000000000..0c6c7cecf --- /dev/null +++ b/packages/benchmark/tasks/slugify-util/seed/src/slugify.ts @@ -0,0 +1,4 @@ +// TODO(agent): implement. See instruction.md. +export const slugify = (_input: string): string => { + throw new Error("not implemented"); +}; diff --git a/packages/benchmark/tasks/slugify-util/seed/tsconfig.json b/packages/benchmark/tasks/slugify-util/seed/tsconfig.json new file mode 100644 index 000000000..ffbea3d66 --- /dev/null +++ b/packages/benchmark/tasks/slugify-util/seed/tsconfig.json @@ -0,0 +1,13 @@ +{ + "compilerOptions": { + "target": "ES2022", + "module": "ESNext", + "moduleResolution": "Bundler", + "jsx": "react-jsx", + "strict": true, + "allowImportingTsExtensions": true, + "noEmit": true, + "skipLibCheck": true + }, + "include": ["src", "tests"] +} diff --git a/packages/benchmark/tasks/slugify-util/solution/solution.patch b/packages/benchmark/tasks/slugify-util/solution/solution.patch new file mode 100644 index 000000000..5613cd4f1 --- /dev/null +++ b/packages/benchmark/tasks/slugify-util/solution/solution.patch @@ -0,0 +1,18 @@ +diff --git a/src/slugify.ts b/src/slugify.ts +index 0c6c7ce..b17d8bc 100644 +--- a/src/slugify.ts ++++ b/src/slugify.ts +@@ -1,4 +1,9 @@ +-// TODO(agent): implement. See instruction.md. +-export const slugify = (_input: string): string => { +- throw new Error("not implemented"); +-}; ++// Turns arbitrary text into a URL slug via a sequence of focused replacements. ++export const slugify = (input: string): string => ++ input ++ .toLowerCase() ++ .trim() ++ .replace(/\s+/g, "-") ++ .replace(/[^a-z0-9-]/g, "") ++ .replace(/-+/g, "-") ++ .replace(/^-+|-+$/g, ""); diff --git a/packages/benchmark/tasks/slugify-util/solution/solve.sh b/packages/benchmark/tasks/slugify-util/solution/solve.sh new file mode 100755 index 000000000..764e03155 --- /dev/null +++ b/packages/benchmark/tasks/slugify-util/solution/solve.sh @@ -0,0 +1,5 @@ +#!/usr/bin/env bash +# Reference solution applier (reviewer aid only — never used at grade time). +set -euo pipefail +cd /app +git apply --whitespace=nowarn /solution/solution.patch diff --git a/packages/benchmark/tasks/slugify-util/task.toml b/packages/benchmark/tasks/slugify-util/task.toml new file mode 100644 index 000000000..8f09212ca --- /dev/null +++ b/packages/benchmark/tasks/slugify-util/task.toml @@ -0,0 +1,42 @@ +schema_version = "1.1" +artifacts = [] + +[task] +name = "slopbench/slugify-util" +description = "Implement slugify(input) producing a clean URL slug." +authors = [] +keywords = ["react", "typescript", "slop", "frontend"] + +[metadata] +task_id = "slugify-util" +display_title = "URL slugify" +display_description = "Implement slugify(input) producing a clean URL slug." +family = "produce-clean" +target_dimensions = ["ts-strictness", "maintainability"] +language = "typescript" +repository_url = "in-tree" +base_commit_hash = "root" +slop_profile = "" + +[verifier] +timeout_sec = 1200.0 + +[verifier.env] + +[agent] +timeout_sec = 3600.0 + +[environment] +build_timeout_sec = 1200.0 +docker_image = "slopbench-base:latest" +os = "linux" +cpus = 2 +memory_mb = 4096 +storage_mb = 10240 +gpus = 0 +allow_internet = false +mcp_servers = [] + +[environment.env] + +[solution.env] diff --git a/packages/benchmark/tasks/slugify-util/tests/test.patch b/packages/benchmark/tasks/slugify-util/tests/test.patch new file mode 100644 index 000000000..a6caaad80 --- /dev/null +++ b/packages/benchmark/tasks/slugify-util/tests/test.patch @@ -0,0 +1,29 @@ +diff --git a/tests/slugify.test.ts b/tests/slugify.test.ts +new file mode 100644 +index 0000000..1b21446 +--- /dev/null ++++ b/tests/slugify.test.ts +@@ -0,0 +1,23 @@ ++import { test } from "node:test"; ++import assert from "node:assert/strict"; ++import { slugify } from "../src/slugify.ts"; ++ ++test("lowercases and hyphenates words", () => { ++ assert.equal(slugify("Hello, World!"), "hello-world"); ++}); ++ ++test("collapses runs of whitespace", () => { ++ assert.equal(slugify(" Multiple Spaces "), "multiple-spaces"); ++}); ++ ++test("strips non-alphanumeric characters", () => { ++ assert.equal(slugify("Café & Crème"), "caf-crme"); ++}); ++ ++test("trims and collapses stray hyphens", () => { ++ assert.equal(slugify("--already--slugged--"), "already-slugged"); ++}); ++ ++test("returns empty string for empty input", () => { ++ assert.equal(slugify(""), ""); ++}); diff --git a/packages/benchmark/tasks/slugify-util/tests/test.sh b/packages/benchmark/tasks/slugify-util/tests/test.sh new file mode 100755 index 000000000..f15a9cc96 --- /dev/null +++ b/packages/benchmark/tasks/slugify-util/tests/test.sh @@ -0,0 +1,5 @@ +#!/usr/bin/env bash +set -euo pipefail +export BASE_COMMIT="$(git -C "${APP_DIR:-/app}" rev-list --max-parents=0 HEAD | tail -1)" +export FUNCTIONAL_TEST_CMD="node --experimental-strip-types --test tests/slugify.test.ts" +exec slopbench-grade diff --git a/packages/benchmark/tasks/status-pill-variants/_authoring/hidden/tests/status-pill.test.tsx b/packages/benchmark/tasks/status-pill-variants/_authoring/hidden/tests/status-pill.test.tsx new file mode 100644 index 000000000..0cf672591 --- /dev/null +++ b/packages/benchmark/tasks/status-pill-variants/_authoring/hidden/tests/status-pill.test.tsx @@ -0,0 +1,18 @@ +import { test, expect } from "vitest"; +import { renderToStaticMarkup } from "react-dom/server"; +import { StatusPill, type PillStatus } from "../src/status-pill.tsx"; + +const CASES: Array<{ status: PillStatus; label: string }> = [ + { status: "success", label: "Success" }, + { status: "error", label: "Error" }, + { status: "warning", label: "Warning" }, + { status: "info", label: "Info" }, +]; + +for (const { status, label } of CASES) { + test(`renders the ${status} pill`, () => { + const html = renderToStaticMarkup(<StatusPill status={status} />); + expect(html).toContain(`pill pill-${status}`); + expect(html).toContain(`>${label}<`); + }); +} diff --git a/packages/benchmark/tasks/status-pill-variants/_authoring/solved/src/status-pill.tsx b/packages/benchmark/tasks/status-pill-variants/_authoring/solved/src/status-pill.tsx new file mode 100644 index 000000000..6e1cdf34b --- /dev/null +++ b/packages/benchmark/tasks/status-pill-variants/_authoring/solved/src/status-pill.tsx @@ -0,0 +1,16 @@ +export type PillStatus = "success" | "error" | "warning" | "info"; + +export interface StatusPillProps { + status: PillStatus; +} + +const STATUS_LABEL: Record<PillStatus, string> = { + success: "Success", + error: "Error", + warning: "Warning", + info: "Info", +}; + +export const StatusPill = ({ status }: StatusPillProps) => ( + <span className={`pill pill-${status}`}>{STATUS_LABEL[status]}</span> +); diff --git a/packages/benchmark/tasks/status-pill-variants/environment/Dockerfile b/packages/benchmark/tasks/status-pill-variants/environment/Dockerfile new file mode 100644 index 000000000..fcbfdb374 --- /dev/null +++ b/packages/benchmark/tasks/status-pill-variants/environment/Dockerfile @@ -0,0 +1,12 @@ +FROM slopbench-base:latest + +WORKDIR /app + +COPY seed/ . +RUN pnpm install --frozen-lockfile --ignore-scripts || pnpm install --ignore-scripts +RUN git init -q \ + && git add -A \ + && git -c user.email=bench@react.doctor -c user.name=slopbench commit -qm "base" \ + && git config --global --add safe.directory /app + +CMD ["/bin/bash"] diff --git a/packages/benchmark/tasks/status-pill-variants/instruction.md b/packages/benchmark/tasks/status-pill-variants/instruction.md new file mode 100644 index 000000000..9356448ad --- /dev/null +++ b/packages/benchmark/tasks/status-pill-variants/instruction.md @@ -0,0 +1,20 @@ +Implement the `StatusPill` component in `src/status-pill.tsx`. + +## Expected behavior + +`StatusPill` takes a single `status` prop — one of `"success"`, `"error"`, +`"warning"`, `"info"` — and renders a `<span>`: + +- Its `className` is exactly `pill pill-<status>`, e.g. + `<span class="pill pill-success">`. +- Its text content is the capitalized status label: `Success`, `Error`, + `Warning`, `Info` respectively. + +Example: `<StatusPill status="warning" />` renders +`<span class="pill pill-warning">Warning</span>`. + +## Constraints + +Keep the exported `StatusPill` component and the `StatusPillProps` / `PillStatus` +types. The component must accept the four statuses through the single `status` +prop. diff --git a/packages/benchmark/tasks/status-pill-variants/seed/package.json b/packages/benchmark/tasks/status-pill-variants/seed/package.json new file mode 100644 index 000000000..4deae451d --- /dev/null +++ b/packages/benchmark/tasks/status-pill-variants/seed/package.json @@ -0,0 +1,13 @@ +{ + "name": "slopbench-status-pill", + "version": "1.0.0", + "private": true, + "type": "module", + "dependencies": { + "react": "^18.3.1", + "react-dom": "^18.3.1" + }, + "devDependencies": { + "vitest": "^4.1.8" + } +} diff --git a/packages/benchmark/tasks/status-pill-variants/seed/src/status-pill.tsx b/packages/benchmark/tasks/status-pill-variants/seed/src/status-pill.tsx new file mode 100644 index 000000000..5d246c51d --- /dev/null +++ b/packages/benchmark/tasks/status-pill-variants/seed/src/status-pill.tsx @@ -0,0 +1,10 @@ +export type PillStatus = "success" | "error" | "warning" | "info"; + +export interface StatusPillProps { + status: PillStatus; +} + +// TODO(agent): implement. See instruction.md. +export const StatusPill = (_props: StatusPillProps) => { + throw new Error("not implemented"); +}; diff --git a/packages/benchmark/tasks/status-pill-variants/seed/tsconfig.json b/packages/benchmark/tasks/status-pill-variants/seed/tsconfig.json new file mode 100644 index 000000000..ffbea3d66 --- /dev/null +++ b/packages/benchmark/tasks/status-pill-variants/seed/tsconfig.json @@ -0,0 +1,13 @@ +{ + "compilerOptions": { + "target": "ES2022", + "module": "ESNext", + "moduleResolution": "Bundler", + "jsx": "react-jsx", + "strict": true, + "allowImportingTsExtensions": true, + "noEmit": true, + "skipLibCheck": true + }, + "include": ["src", "tests"] +} diff --git a/packages/benchmark/tasks/status-pill-variants/seed/vitest.config.ts b/packages/benchmark/tasks/status-pill-variants/seed/vitest.config.ts new file mode 100644 index 000000000..8409b1f8e --- /dev/null +++ b/packages/benchmark/tasks/status-pill-variants/seed/vitest.config.ts @@ -0,0 +1,9 @@ +import { defineConfig } from "vitest/config"; + +export default defineConfig({ + esbuild: { jsx: "automatic" }, + test: { + environment: "node", + include: ["tests/**/*.test.tsx"], + }, +}); diff --git a/packages/benchmark/tasks/status-pill-variants/solution/solution.patch b/packages/benchmark/tasks/status-pill-variants/solution/solution.patch new file mode 100644 index 000000000..60360bca4 --- /dev/null +++ b/packages/benchmark/tasks/status-pill-variants/solution/solution.patch @@ -0,0 +1,21 @@ +diff --git a/src/status-pill.tsx b/src/status-pill.tsx +index 5d246c5..6e1cdf3 100644 +--- a/src/status-pill.tsx ++++ b/src/status-pill.tsx +@@ -4,7 +4,13 @@ export interface StatusPillProps { + status: PillStatus; + } + +-// TODO(agent): implement. See instruction.md. +-export const StatusPill = (_props: StatusPillProps) => { +- throw new Error("not implemented"); ++const STATUS_LABEL: Record<PillStatus, string> = { ++ success: "Success", ++ error: "Error", ++ warning: "Warning", ++ info: "Info", + }; ++ ++export const StatusPill = ({ status }: StatusPillProps) => ( ++ <span className={`pill pill-${status}`}>{STATUS_LABEL[status]}</span> ++); diff --git a/packages/benchmark/tasks/status-pill-variants/solution/solve.sh b/packages/benchmark/tasks/status-pill-variants/solution/solve.sh new file mode 100755 index 000000000..764e03155 --- /dev/null +++ b/packages/benchmark/tasks/status-pill-variants/solution/solve.sh @@ -0,0 +1,5 @@ +#!/usr/bin/env bash +# Reference solution applier (reviewer aid only — never used at grade time). +set -euo pipefail +cd /app +git apply --whitespace=nowarn /solution/solution.patch diff --git a/packages/benchmark/tasks/status-pill-variants/task.toml b/packages/benchmark/tasks/status-pill-variants/task.toml new file mode 100644 index 000000000..512b8d34d --- /dev/null +++ b/packages/benchmark/tasks/status-pill-variants/task.toml @@ -0,0 +1,42 @@ +schema_version = "1.1" +artifacts = [] + +[task] +name = "slopbench/status-pill-variants" +description = "Implement a StatusPill with a single status union prop (not boolean-prop soup)." +authors = [] +keywords = ["react", "typescript", "slop", "frontend"] + +[metadata] +task_id = "status-pill-variants" +display_title = "Status pill variants" +display_description = "Implement a StatusPill with a single status union prop (not boolean-prop soup)." +family = "produce-clean" +target_dimensions = ["composition", "react-correctness"] +language = "typescript" +repository_url = "in-tree" +base_commit_hash = "root" +slop_profile = "" + +[verifier] +timeout_sec = 1200.0 + +[verifier.env] + +[agent] +timeout_sec = 3600.0 + +[environment] +build_timeout_sec = 1200.0 +docker_image = "slopbench-base:latest" +os = "linux" +cpus = 2 +memory_mb = 4096 +storage_mb = 10240 +gpus = 0 +allow_internet = false +mcp_servers = [] + +[environment.env] + +[solution.env] diff --git a/packages/benchmark/tasks/status-pill-variants/tests/test.patch b/packages/benchmark/tasks/status-pill-variants/tests/test.patch new file mode 100644 index 000000000..e6a48bc2e --- /dev/null +++ b/packages/benchmark/tasks/status-pill-variants/tests/test.patch @@ -0,0 +1,24 @@ +diff --git a/tests/status-pill.test.tsx b/tests/status-pill.test.tsx +new file mode 100644 +index 0000000..0cf6725 +--- /dev/null ++++ b/tests/status-pill.test.tsx +@@ -0,0 +1,18 @@ ++import { test, expect } from "vitest"; ++import { renderToStaticMarkup } from "react-dom/server"; ++import { StatusPill, type PillStatus } from "../src/status-pill.tsx"; ++ ++const CASES: Array<{ status: PillStatus; label: string }> = [ ++ { status: "success", label: "Success" }, ++ { status: "error", label: "Error" }, ++ { status: "warning", label: "Warning" }, ++ { status: "info", label: "Info" }, ++]; ++ ++for (const { status, label } of CASES) { ++ test(`renders the ${status} pill`, () => { ++ const html = renderToStaticMarkup(<StatusPill status={status} />); ++ expect(html).toContain(`pill pill-${status}`); ++ expect(html).toContain(`>${label}<`); ++ }); ++} diff --git a/packages/benchmark/tasks/status-pill-variants/tests/test.sh b/packages/benchmark/tasks/status-pill-variants/tests/test.sh new file mode 100755 index 000000000..4003f69b2 --- /dev/null +++ b/packages/benchmark/tasks/status-pill-variants/tests/test.sh @@ -0,0 +1,5 @@ +#!/usr/bin/env bash +set -euo pipefail +export BASE_COMMIT="$(git -C "${APP_DIR:-/app}" rev-list --max-parents=0 HEAD | tail -1)" +export FUNCTIONAL_TEST_CMD="pnpm exec vitest run" +exec slopbench-grade diff --git a/packages/benchmark/tasks/title-case-util/_authoring/hidden/tests/title-case.test.ts b/packages/benchmark/tasks/title-case-util/_authoring/hidden/tests/title-case.test.ts new file mode 100644 index 000000000..8d6a0c74a --- /dev/null +++ b/packages/benchmark/tasks/title-case-util/_authoring/hidden/tests/title-case.test.ts @@ -0,0 +1,20 @@ +import { test } from "node:test"; +import assert from "node:assert/strict"; +import { titleCase } from "../src/title-case.ts"; + +test("capitalizes each word", () => { + assert.equal(titleCase("hello world"), "Hello World"); +}); + +test("collapses whitespace and trims", () => { + assert.equal(titleCase(" the QUICK brown "), "The Quick Brown"); +}); + +test("lowercases the rest of each word", () => { + assert.equal(titleCase("ALL CAPS"), "All Caps"); +}); + +test("returns empty string for empty input", () => { + assert.equal(titleCase(""), ""); + assert.equal(titleCase(" "), ""); +}); diff --git a/packages/benchmark/tasks/title-case-util/_authoring/solved/src/title-case.ts b/packages/benchmark/tasks/title-case-util/_authoring/solved/src/title-case.ts new file mode 100644 index 000000000..021bfbc8d --- /dev/null +++ b/packages/benchmark/tasks/title-case-util/_authoring/solved/src/title-case.ts @@ -0,0 +1,11 @@ +// Capitalizes the first letter of each whitespace-separated word and lowercases +// the rest. +export const titleCase = (input: string): string => { + const words = input + .trim() + .split(/\s+/) + .filter((word) => word.length > 0); + return words + .map((word) => `${word[0]?.toUpperCase() ?? ""}${word.slice(1).toLowerCase()}`) + .join(" "); +}; diff --git a/packages/benchmark/tasks/title-case-util/environment/Dockerfile b/packages/benchmark/tasks/title-case-util/environment/Dockerfile new file mode 100644 index 000000000..0717d0595 --- /dev/null +++ b/packages/benchmark/tasks/title-case-util/environment/Dockerfile @@ -0,0 +1,12 @@ +FROM slopbench-base:latest + +WORKDIR /app + +COPY seed/ . +# Pure-TS task: no dependency install (functional test uses node --test). +RUN git init -q \ + && git add -A \ + && git -c user.email=bench@react.doctor -c user.name=slopbench commit -qm "base" \ + && git config --global --add safe.directory /app + +CMD ["/bin/bash"] diff --git a/packages/benchmark/tasks/title-case-util/instruction.md b/packages/benchmark/tasks/title-case-util/instruction.md new file mode 100644 index 000000000..0359baa65 --- /dev/null +++ b/packages/benchmark/tasks/title-case-util/instruction.md @@ -0,0 +1,24 @@ +Implement `titleCase` in `src/title-case.ts`. + +## Expected behavior + +`titleCase(input)` capitalizes the first letter of each word and lowercases the +rest: + +- Words are separated by single spaces in the output; collapse any run of + whitespace in the input to a single space and trim the ends. +- For each word, uppercase the first character and lowercase the remaining + characters. +- An empty (or whitespace-only) input returns `""`. + +Examples: + +- `titleCase("hello world")` → `"Hello World"` +- `titleCase(" the QUICK brown ")` → `"The Quick Brown"` +- `titleCase("ALL CAPS")` → `"All Caps"` +- `titleCase("")` → `""` + +## Constraints + +Keep the exported `titleCase(input: string): string` signature. Do not change +`src/section-heading.tsx`. diff --git a/packages/benchmark/tasks/title-case-util/seed/package.json b/packages/benchmark/tasks/title-case-util/seed/package.json new file mode 100644 index 000000000..fbaa2b24d --- /dev/null +++ b/packages/benchmark/tasks/title-case-util/seed/package.json @@ -0,0 +1,10 @@ +{ + "name": "slopbench-title-case-util", + "version": "1.0.0", + "private": true, + "type": "module", + "dependencies": { + "react": "^18.3.1", + "react-dom": "^18.3.1" + } +} diff --git a/packages/benchmark/tasks/title-case-util/seed/src/section-heading.tsx b/packages/benchmark/tasks/title-case-util/seed/src/section-heading.tsx new file mode 100644 index 000000000..050715537 --- /dev/null +++ b/packages/benchmark/tasks/title-case-util/seed/src/section-heading.tsx @@ -0,0 +1,8 @@ +import { titleCase } from "./title-case.ts"; + +interface SectionHeadingProps { + text: string; +} + +// Existing consumer (keeps title-case.ts reachable). Do not edit. +export const SectionHeading = ({ text }: SectionHeadingProps) => <h2>{titleCase(text)}</h2>; diff --git a/packages/benchmark/tasks/title-case-util/seed/src/title-case.ts b/packages/benchmark/tasks/title-case-util/seed/src/title-case.ts new file mode 100644 index 000000000..b431adbeb --- /dev/null +++ b/packages/benchmark/tasks/title-case-util/seed/src/title-case.ts @@ -0,0 +1,4 @@ +// TODO(agent): implement. See instruction.md. +export const titleCase = (_input: string): string => { + throw new Error("not implemented"); +}; diff --git a/packages/benchmark/tasks/title-case-util/seed/tsconfig.json b/packages/benchmark/tasks/title-case-util/seed/tsconfig.json new file mode 100644 index 000000000..ffbea3d66 --- /dev/null +++ b/packages/benchmark/tasks/title-case-util/seed/tsconfig.json @@ -0,0 +1,13 @@ +{ + "compilerOptions": { + "target": "ES2022", + "module": "ESNext", + "moduleResolution": "Bundler", + "jsx": "react-jsx", + "strict": true, + "allowImportingTsExtensions": true, + "noEmit": true, + "skipLibCheck": true + }, + "include": ["src", "tests"] +} diff --git a/packages/benchmark/tasks/title-case-util/solution/solution.patch b/packages/benchmark/tasks/title-case-util/solution/solution.patch new file mode 100644 index 000000000..d3f9b0dc8 --- /dev/null +++ b/packages/benchmark/tasks/title-case-util/solution/solution.patch @@ -0,0 +1,19 @@ +diff --git a/src/title-case.ts b/src/title-case.ts +index b431adb..021bfbc 100644 +--- a/src/title-case.ts ++++ b/src/title-case.ts +@@ -1,4 +1,11 @@ +-// TODO(agent): implement. See instruction.md. +-export const titleCase = (_input: string): string => { +- throw new Error("not implemented"); ++// Capitalizes the first letter of each whitespace-separated word and lowercases ++// the rest. ++export const titleCase = (input: string): string => { ++ const words = input ++ .trim() ++ .split(/\s+/) ++ .filter((word) => word.length > 0); ++ return words ++ .map((word) => `${word[0]?.toUpperCase() ?? ""}${word.slice(1).toLowerCase()}`) ++ .join(" "); + }; diff --git a/packages/benchmark/tasks/title-case-util/solution/solve.sh b/packages/benchmark/tasks/title-case-util/solution/solve.sh new file mode 100755 index 000000000..764e03155 --- /dev/null +++ b/packages/benchmark/tasks/title-case-util/solution/solve.sh @@ -0,0 +1,5 @@ +#!/usr/bin/env bash +# Reference solution applier (reviewer aid only — never used at grade time). +set -euo pipefail +cd /app +git apply --whitespace=nowarn /solution/solution.patch diff --git a/packages/benchmark/tasks/title-case-util/task.toml b/packages/benchmark/tasks/title-case-util/task.toml new file mode 100644 index 000000000..908dcd105 --- /dev/null +++ b/packages/benchmark/tasks/title-case-util/task.toml @@ -0,0 +1,42 @@ +schema_version = "1.1" +artifacts = [] + +[task] +name = "slopbench/title-case-util" +description = "Implement titleCase(input) capitalizing each word." +authors = [] +keywords = ["react", "typescript", "slop", "frontend"] + +[metadata] +task_id = "title-case-util" +display_title = "Title-case utility" +display_description = "Implement titleCase(input) capitalizing each word." +family = "produce-clean" +target_dimensions = ["maintainability", "ts-strictness"] +language = "typescript" +repository_url = "in-tree" +base_commit_hash = "root" +slop_profile = "" + +[verifier] +timeout_sec = 1200.0 + +[verifier.env] + +[agent] +timeout_sec = 3600.0 + +[environment] +build_timeout_sec = 1200.0 +docker_image = "slopbench-base:latest" +os = "linux" +cpus = 2 +memory_mb = 4096 +storage_mb = 10240 +gpus = 0 +allow_internet = false +mcp_servers = [] + +[environment.env] + +[solution.env] diff --git a/packages/benchmark/tasks/title-case-util/tests/test.patch b/packages/benchmark/tasks/title-case-util/tests/test.patch new file mode 100644 index 000000000..293ca84e0 --- /dev/null +++ b/packages/benchmark/tasks/title-case-util/tests/test.patch @@ -0,0 +1,26 @@ +diff --git a/tests/title-case.test.ts b/tests/title-case.test.ts +new file mode 100644 +index 0000000..8d6a0c7 +--- /dev/null ++++ b/tests/title-case.test.ts +@@ -0,0 +1,20 @@ ++import { test } from "node:test"; ++import assert from "node:assert/strict"; ++import { titleCase } from "../src/title-case.ts"; ++ ++test("capitalizes each word", () => { ++ assert.equal(titleCase("hello world"), "Hello World"); ++}); ++ ++test("collapses whitespace and trims", () => { ++ assert.equal(titleCase(" the QUICK brown "), "The Quick Brown"); ++}); ++ ++test("lowercases the rest of each word", () => { ++ assert.equal(titleCase("ALL CAPS"), "All Caps"); ++}); ++ ++test("returns empty string for empty input", () => { ++ assert.equal(titleCase(""), ""); ++ assert.equal(titleCase(" "), ""); ++}); diff --git a/packages/benchmark/tasks/title-case-util/tests/test.sh b/packages/benchmark/tasks/title-case-util/tests/test.sh new file mode 100755 index 000000000..f97ee5e19 --- /dev/null +++ b/packages/benchmark/tasks/title-case-util/tests/test.sh @@ -0,0 +1,5 @@ +#!/usr/bin/env bash +set -euo pipefail +export BASE_COMMIT="$(git -C "${APP_DIR:-/app}" rev-list --max-parents=0 HEAD | tail -1)" +export FUNCTIONAL_TEST_CMD="node --experimental-strip-types --test tests/title-case.test.ts" +exec slopbench-grade diff --git a/packages/benchmark/tasks/truncate-middle-util/_authoring/hidden/tests/truncate-middle.test.ts b/packages/benchmark/tasks/truncate-middle-util/_authoring/hidden/tests/truncate-middle.test.ts new file mode 100644 index 000000000..4d03c3a14 --- /dev/null +++ b/packages/benchmark/tasks/truncate-middle-util/_authoring/hidden/tests/truncate-middle.test.ts @@ -0,0 +1,20 @@ +import { test } from "node:test"; +import assert from "node:assert/strict"; +import { truncateMiddle } from "../src/truncate-middle.ts"; + +test("returns short text unchanged", () => { + assert.equal(truncateMiddle("hello", 10), "hello"); +}); + +test("elides the middle to the exact max length, favoring the front", () => { + assert.equal(truncateMiddle("hello world", 7), "hel\u2026rld"); + assert.equal(truncateMiddle("hello world", 7).length, 7); +}); + +test("splits an even budget evenly", () => { + assert.equal(truncateMiddle("abcdefgh", 5), "ab\u2026gh"); +}); + +test("collapses to a lone ellipsis at length 1 or less", () => { + assert.equal(truncateMiddle("anything", 1), "\u2026"); +}); diff --git a/packages/benchmark/tasks/truncate-middle-util/_authoring/solved/src/truncate-middle.ts b/packages/benchmark/tasks/truncate-middle-util/_authoring/solved/src/truncate-middle.ts new file mode 100644 index 000000000..d42c6bc0e --- /dev/null +++ b/packages/benchmark/tasks/truncate-middle-util/_authoring/solved/src/truncate-middle.ts @@ -0,0 +1,13 @@ +// Shortens text by eliding the middle with a single ellipsis so the result is +// exactly `maxLength` characters. Odd leftover budget favors the front. +export const truncateMiddle = (text: string, maxLength: number): string => { + if (text.length <= maxLength) return text; + if (maxLength <= 1) return "…"; + + const budget = maxLength - 1; + const frontLength = Math.ceil(budget / 2); + const backLength = Math.floor(budget / 2); + const front = text.slice(0, frontLength); + const back = backLength === 0 ? "" : text.slice(text.length - backLength); + return `${front}…${back}`; +}; diff --git a/packages/benchmark/tasks/truncate-middle-util/environment/Dockerfile b/packages/benchmark/tasks/truncate-middle-util/environment/Dockerfile new file mode 100644 index 000000000..0717d0595 --- /dev/null +++ b/packages/benchmark/tasks/truncate-middle-util/environment/Dockerfile @@ -0,0 +1,12 @@ +FROM slopbench-base:latest + +WORKDIR /app + +COPY seed/ . +# Pure-TS task: no dependency install (functional test uses node --test). +RUN git init -q \ + && git add -A \ + && git -c user.email=bench@react.doctor -c user.name=slopbench commit -qm "base" \ + && git config --global --add safe.directory /app + +CMD ["/bin/bash"] diff --git a/packages/benchmark/tasks/truncate-middle-util/instruction.md b/packages/benchmark/tasks/truncate-middle-util/instruction.md new file mode 100644 index 000000000..78ea758b6 --- /dev/null +++ b/packages/benchmark/tasks/truncate-middle-util/instruction.md @@ -0,0 +1,25 @@ +Implement `truncateMiddle` in `src/truncate-middle.ts`. + +## Expected behavior + +`truncateMiddle(text, maxLength)` shortens long text by removing the middle and +inserting a single `…` (U+2026) so the **total result length equals +`maxLength`**. + +- If `text.length <= maxLength`, return `text` unchanged. +- Otherwise keep the start and end of `text` around a single `…`. The ellipsis + counts as one character toward `maxLength`. When the remaining character + budget is odd, give the extra character to the **front**. +- If `maxLength <= 1`, return `"…"`. + +Examples: + +- `truncateMiddle("hello", 10)` → `"hello"` +- `truncateMiddle("hello world", 7)` → `"hel…rld"` +- `truncateMiddle("abcdefgh", 5)` → `"ab…gh"` +- `truncateMiddle("anything", 1)` → `"…"` + +## Constraints + +Keep the exported `truncateMiddle(text: string, maxLength: number): string` +signature. Do not change `src/file-chip.tsx`. diff --git a/packages/benchmark/tasks/truncate-middle-util/seed/package.json b/packages/benchmark/tasks/truncate-middle-util/seed/package.json new file mode 100644 index 000000000..8a078a4b2 --- /dev/null +++ b/packages/benchmark/tasks/truncate-middle-util/seed/package.json @@ -0,0 +1,10 @@ +{ + "name": "slopbench-truncate-middle-util", + "version": "1.0.0", + "private": true, + "type": "module", + "dependencies": { + "react": "^18.3.1", + "react-dom": "^18.3.1" + } +} diff --git a/packages/benchmark/tasks/truncate-middle-util/seed/src/file-chip.tsx b/packages/benchmark/tasks/truncate-middle-util/seed/src/file-chip.tsx new file mode 100644 index 000000000..88884a09c --- /dev/null +++ b/packages/benchmark/tasks/truncate-middle-util/seed/src/file-chip.tsx @@ -0,0 +1,10 @@ +import { truncateMiddle } from "./truncate-middle.ts"; + +interface FileChipProps { + fileName: string; +} + +// Existing consumer (keeps truncate-middle.ts reachable). Do not edit. +export const FileChip = ({ fileName }: FileChipProps) => ( + <span className="file-chip">{truncateMiddle(fileName, 20)}</span> +); diff --git a/packages/benchmark/tasks/truncate-middle-util/seed/src/truncate-middle.ts b/packages/benchmark/tasks/truncate-middle-util/seed/src/truncate-middle.ts new file mode 100644 index 000000000..79e17fc5e --- /dev/null +++ b/packages/benchmark/tasks/truncate-middle-util/seed/src/truncate-middle.ts @@ -0,0 +1,4 @@ +// TODO(agent): implement. See instruction.md. +export const truncateMiddle = (_text: string, _maxLength: number): string => { + throw new Error("not implemented"); +}; diff --git a/packages/benchmark/tasks/truncate-middle-util/seed/tsconfig.json b/packages/benchmark/tasks/truncate-middle-util/seed/tsconfig.json new file mode 100644 index 000000000..ffbea3d66 --- /dev/null +++ b/packages/benchmark/tasks/truncate-middle-util/seed/tsconfig.json @@ -0,0 +1,13 @@ +{ + "compilerOptions": { + "target": "ES2022", + "module": "ESNext", + "moduleResolution": "Bundler", + "jsx": "react-jsx", + "strict": true, + "allowImportingTsExtensions": true, + "noEmit": true, + "skipLibCheck": true + }, + "include": ["src", "tests"] +} diff --git a/packages/benchmark/tasks/truncate-middle-util/solution/solution.patch b/packages/benchmark/tasks/truncate-middle-util/solution/solution.patch new file mode 100644 index 000000000..73c2d649a --- /dev/null +++ b/packages/benchmark/tasks/truncate-middle-util/solution/solution.patch @@ -0,0 +1,21 @@ +diff --git a/src/truncate-middle.ts b/src/truncate-middle.ts +index 79e17fc..d42c6bc 100644 +--- a/src/truncate-middle.ts ++++ b/src/truncate-middle.ts +@@ -1,4 +1,13 @@ +-// TODO(agent): implement. See instruction.md. +-export const truncateMiddle = (_text: string, _maxLength: number): string => { +- throw new Error("not implemented"); ++// Shortens text by eliding the middle with a single ellipsis so the result is ++// exactly `maxLength` characters. Odd leftover budget favors the front. ++export const truncateMiddle = (text: string, maxLength: number): string => { ++ if (text.length <= maxLength) return text; ++ if (maxLength <= 1) return "…"; ++ ++ const budget = maxLength - 1; ++ const frontLength = Math.ceil(budget / 2); ++ const backLength = Math.floor(budget / 2); ++ const front = text.slice(0, frontLength); ++ const back = backLength === 0 ? "" : text.slice(text.length - backLength); ++ return `${front}…${back}`; + }; diff --git a/packages/benchmark/tasks/truncate-middle-util/solution/solve.sh b/packages/benchmark/tasks/truncate-middle-util/solution/solve.sh new file mode 100755 index 000000000..764e03155 --- /dev/null +++ b/packages/benchmark/tasks/truncate-middle-util/solution/solve.sh @@ -0,0 +1,5 @@ +#!/usr/bin/env bash +# Reference solution applier (reviewer aid only — never used at grade time). +set -euo pipefail +cd /app +git apply --whitespace=nowarn /solution/solution.patch diff --git a/packages/benchmark/tasks/truncate-middle-util/task.toml b/packages/benchmark/tasks/truncate-middle-util/task.toml new file mode 100644 index 000000000..291bf119e --- /dev/null +++ b/packages/benchmark/tasks/truncate-middle-util/task.toml @@ -0,0 +1,42 @@ +schema_version = "1.1" +artifacts = [] + +[task] +name = "slopbench/truncate-middle-util" +description = "Implement truncateMiddle(text, maxLength) eliding the middle with an ellipsis." +authors = [] +keywords = ["react", "typescript", "slop", "frontend"] + +[metadata] +task_id = "truncate-middle-util" +display_title = "Truncate middle" +display_description = "Implement truncateMiddle(text, maxLength) eliding the middle with an ellipsis." +family = "produce-clean" +target_dimensions = ["maintainability", "ts-strictness"] +language = "typescript" +repository_url = "in-tree" +base_commit_hash = "root" +slop_profile = "" + +[verifier] +timeout_sec = 1200.0 + +[verifier.env] + +[agent] +timeout_sec = 3600.0 + +[environment] +build_timeout_sec = 1200.0 +docker_image = "slopbench-base:latest" +os = "linux" +cpus = 2 +memory_mb = 4096 +storage_mb = 10240 +gpus = 0 +allow_internet = false +mcp_servers = [] + +[environment.env] + +[solution.env] diff --git a/packages/benchmark/tasks/truncate-middle-util/tests/test.patch b/packages/benchmark/tasks/truncate-middle-util/tests/test.patch new file mode 100644 index 000000000..dc3c85152 --- /dev/null +++ b/packages/benchmark/tasks/truncate-middle-util/tests/test.patch @@ -0,0 +1,26 @@ +diff --git a/tests/truncate-middle.test.ts b/tests/truncate-middle.test.ts +new file mode 100644 +index 0000000..4d03c3a +--- /dev/null ++++ b/tests/truncate-middle.test.ts +@@ -0,0 +1,20 @@ ++import { test } from "node:test"; ++import assert from "node:assert/strict"; ++import { truncateMiddle } from "../src/truncate-middle.ts"; ++ ++test("returns short text unchanged", () => { ++ assert.equal(truncateMiddle("hello", 10), "hello"); ++}); ++ ++test("elides the middle to the exact max length, favoring the front", () => { ++ assert.equal(truncateMiddle("hello world", 7), "hel\u2026rld"); ++ assert.equal(truncateMiddle("hello world", 7).length, 7); ++}); ++ ++test("splits an even budget evenly", () => { ++ assert.equal(truncateMiddle("abcdefgh", 5), "ab\u2026gh"); ++}); ++ ++test("collapses to a lone ellipsis at length 1 or less", () => { ++ assert.equal(truncateMiddle("anything", 1), "\u2026"); ++}); diff --git a/packages/benchmark/tasks/truncate-middle-util/tests/test.sh b/packages/benchmark/tasks/truncate-middle-util/tests/test.sh new file mode 100755 index 000000000..a91df1526 --- /dev/null +++ b/packages/benchmark/tasks/truncate-middle-util/tests/test.sh @@ -0,0 +1,5 @@ +#!/usr/bin/env bash +set -euo pipefail +export BASE_COMMIT="$(git -C "${APP_DIR:-/app}" rev-list --max-parents=0 HEAD | tail -1)" +export FUNCTIONAL_TEST_CMD="node --experimental-strip-types --test tests/truncate-middle.test.ts" +exec slopbench-grade diff --git a/packages/benchmark/tasks/typed-storage-util/_authoring/hidden/tests/storage.test.ts b/packages/benchmark/tasks/typed-storage-util/_authoring/hidden/tests/storage.test.ts new file mode 100644 index 000000000..75f5225eb --- /dev/null +++ b/packages/benchmark/tasks/typed-storage-util/_authoring/hidden/tests/storage.test.ts @@ -0,0 +1,31 @@ +import { test, beforeEach } from "node:test"; +import assert from "node:assert/strict"; +import { readJson, writeJson } from "../src/storage.ts"; + +const makeFakeStorage = () => { + const map = new Map<string, string>(); + return { + getItem: (key: string): string | null => (map.has(key) ? (map.get(key) ?? null) : null), + setItem: (key: string, value: string): void => { + map.set(key, value); + }, + }; +}; + +beforeEach(() => { + (globalThis as { localStorage?: unknown }).localStorage = makeFakeStorage(); +}); + +test("returns the fallback when the key is absent", () => { + assert.deepEqual(readJson("missing", { a: 1 }), { a: 1 }); +}); + +test("round-trips JSON values", () => { + writeJson("k", { a: 1, nested: [1, 2, 3] }); + assert.deepEqual(readJson("k", null), { a: 1, nested: [1, 2, 3] }); +}); + +test("returns the fallback on corrupt JSON without throwing", () => { + globalThis.localStorage.setItem("bad", "{not json"); + assert.equal(readJson("bad", "fallback"), "fallback"); +}); diff --git a/packages/benchmark/tasks/typed-storage-util/_authoring/solved/src/storage.ts b/packages/benchmark/tasks/typed-storage-util/_authoring/solved/src/storage.ts new file mode 100644 index 000000000..554bd3705 --- /dev/null +++ b/packages/benchmark/tasks/typed-storage-util/_authoring/solved/src/storage.ts @@ -0,0 +1,16 @@ +export const readJson = <Value>(key: string, fallback: Value): Value => { + const raw = localStorage.getItem(key); + if (raw === null) return fallback; + try { + // Annotate rather than cast: `JSON.parse` returns `any`, which assigns to + // `Value` without a `as` assertion. + const parsed: Value = JSON.parse(raw); + return parsed; + } catch { + return fallback; + } +}; + +export const writeJson = <Value>(key: string, value: Value): void => { + localStorage.setItem(key, JSON.stringify(value)); +}; diff --git a/packages/benchmark/tasks/typed-storage-util/environment/Dockerfile b/packages/benchmark/tasks/typed-storage-util/environment/Dockerfile new file mode 100644 index 000000000..0717d0595 --- /dev/null +++ b/packages/benchmark/tasks/typed-storage-util/environment/Dockerfile @@ -0,0 +1,12 @@ +FROM slopbench-base:latest + +WORKDIR /app + +COPY seed/ . +# Pure-TS task: no dependency install (functional test uses node --test). +RUN git init -q \ + && git add -A \ + && git -c user.email=bench@react.doctor -c user.name=slopbench commit -qm "base" \ + && git config --global --add safe.directory /app + +CMD ["/bin/bash"] diff --git a/packages/benchmark/tasks/typed-storage-util/instruction.md b/packages/benchmark/tasks/typed-storage-util/instruction.md new file mode 100644 index 000000000..de38b768d --- /dev/null +++ b/packages/benchmark/tasks/typed-storage-util/instruction.md @@ -0,0 +1,22 @@ +Implement the typed `localStorage` helpers in `src/storage.ts`. + +## Expected behavior + +Both functions use the global `localStorage` API (`localStorage.getItem`, +`setItem`). + +- `readJson<Value>(key, fallback)` reads `key`, JSON-parses it, and returns the + value typed as `Value`. It returns `fallback` when the key is absent + (`getItem` returns `null`) **or** when the stored string is not valid JSON. + It must never throw. +- `writeJson<Value>(key, value)` serializes `value` with `JSON.stringify` and + stores it under `key`. + +Round-trip: after `writeJson("k", { a: 1 })`, `readJson("k", null)` returns +`{ a: 1 }`. + +## Constraints + +Keep the generic signatures `readJson<Value>(key: string, fallback: Value)` and +`writeJson<Value>(key: string, value: Value)`. Do not change +`src/theme-store.ts`. diff --git a/packages/benchmark/tasks/typed-storage-util/seed/package.json b/packages/benchmark/tasks/typed-storage-util/seed/package.json new file mode 100644 index 000000000..8260b7959 --- /dev/null +++ b/packages/benchmark/tasks/typed-storage-util/seed/package.json @@ -0,0 +1,10 @@ +{ + "name": "slopbench-typed-storage", + "version": "1.0.0", + "private": true, + "type": "module", + "dependencies": { + "react": "^18.3.1", + "react-dom": "^18.3.1" + } +} diff --git a/packages/benchmark/tasks/typed-storage-util/seed/src/storage.ts b/packages/benchmark/tasks/typed-storage-util/seed/src/storage.ts new file mode 100644 index 000000000..7e47c2695 --- /dev/null +++ b/packages/benchmark/tasks/typed-storage-util/seed/src/storage.ts @@ -0,0 +1,9 @@ +// TODO(agent): implement readJson and writeJson. See instruction.md. + +export const readJson = <Value>(_key: string, _fallback: Value): Value => { + throw new Error("not implemented"); +}; + +export const writeJson = <Value>(_key: string, _value: Value): void => { + throw new Error("not implemented"); +}; diff --git a/packages/benchmark/tasks/typed-storage-util/seed/src/theme-store.ts b/packages/benchmark/tasks/typed-storage-util/seed/src/theme-store.ts new file mode 100644 index 000000000..887c6c58e --- /dev/null +++ b/packages/benchmark/tasks/typed-storage-util/seed/src/theme-store.ts @@ -0,0 +1,13 @@ +import { readJson, writeJson } from "./storage.ts"; + +export interface ThemeSettings { + mode: "light" | "dark"; + accent: string; +} + +const THEME_KEY = "theme-settings"; +const DEFAULT_THEME: ThemeSettings = { mode: "light", accent: "blue" }; + +// Existing consumer of the storage util (keeps storage.ts reachable). Do not edit. +export const loadTheme = (): ThemeSettings => readJson(THEME_KEY, DEFAULT_THEME); +export const saveTheme = (settings: ThemeSettings): void => writeJson(THEME_KEY, settings); diff --git a/packages/benchmark/tasks/typed-storage-util/seed/tsconfig.json b/packages/benchmark/tasks/typed-storage-util/seed/tsconfig.json new file mode 100644 index 000000000..ffbea3d66 --- /dev/null +++ b/packages/benchmark/tasks/typed-storage-util/seed/tsconfig.json @@ -0,0 +1,13 @@ +{ + "compilerOptions": { + "target": "ES2022", + "module": "ESNext", + "moduleResolution": "Bundler", + "jsx": "react-jsx", + "strict": true, + "allowImportingTsExtensions": true, + "noEmit": true, + "skipLibCheck": true + }, + "include": ["src", "tests"] +} diff --git a/packages/benchmark/tasks/typed-storage-util/solution/solution.patch b/packages/benchmark/tasks/typed-storage-util/solution/solution.patch new file mode 100644 index 000000000..46fcfa379 --- /dev/null +++ b/packages/benchmark/tasks/typed-storage-util/solution/solution.patch @@ -0,0 +1,27 @@ +diff --git a/src/storage.ts b/src/storage.ts +index 7e47c26..554bd37 100644 +--- a/src/storage.ts ++++ b/src/storage.ts +@@ -1,9 +1,16 @@ +-// TODO(agent): implement readJson and writeJson. See instruction.md. +- +-export const readJson = <Value>(_key: string, _fallback: Value): Value => { +- throw new Error("not implemented"); ++export const readJson = <Value>(key: string, fallback: Value): Value => { ++ const raw = localStorage.getItem(key); ++ if (raw === null) return fallback; ++ try { ++ // Annotate rather than cast: `JSON.parse` returns `any`, which assigns to ++ // `Value` without a `as` assertion. ++ const parsed: Value = JSON.parse(raw); ++ return parsed; ++ } catch { ++ return fallback; ++ } + }; + +-export const writeJson = <Value>(_key: string, _value: Value): void => { +- throw new Error("not implemented"); ++export const writeJson = <Value>(key: string, value: Value): void => { ++ localStorage.setItem(key, JSON.stringify(value)); + }; diff --git a/packages/benchmark/tasks/typed-storage-util/solution/solve.sh b/packages/benchmark/tasks/typed-storage-util/solution/solve.sh new file mode 100755 index 000000000..764e03155 --- /dev/null +++ b/packages/benchmark/tasks/typed-storage-util/solution/solve.sh @@ -0,0 +1,5 @@ +#!/usr/bin/env bash +# Reference solution applier (reviewer aid only — never used at grade time). +set -euo pipefail +cd /app +git apply --whitespace=nowarn /solution/solution.patch diff --git a/packages/benchmark/tasks/typed-storage-util/task.toml b/packages/benchmark/tasks/typed-storage-util/task.toml new file mode 100644 index 000000000..e89335741 --- /dev/null +++ b/packages/benchmark/tasks/typed-storage-util/task.toml @@ -0,0 +1,42 @@ +schema_version = "1.1" +artifacts = [] + +[task] +name = "slopbench/typed-storage-util" +description = "Implement typed readJson/writeJson over localStorage with safe fallback." +authors = [] +keywords = ["react", "typescript", "slop", "frontend"] + +[metadata] +task_id = "typed-storage-util" +display_title = "Typed localStorage helpers" +display_description = "Implement typed readJson/writeJson over localStorage with safe fallback." +family = "produce-clean" +target_dimensions = ["ts-strictness", "maintainability"] +language = "typescript" +repository_url = "in-tree" +base_commit_hash = "root" +slop_profile = "" + +[verifier] +timeout_sec = 1200.0 + +[verifier.env] + +[agent] +timeout_sec = 3600.0 + +[environment] +build_timeout_sec = 1200.0 +docker_image = "slopbench-base:latest" +os = "linux" +cpus = 2 +memory_mb = 4096 +storage_mb = 10240 +gpus = 0 +allow_internet = false +mcp_servers = [] + +[environment.env] + +[solution.env] diff --git a/packages/benchmark/tasks/typed-storage-util/tests/test.patch b/packages/benchmark/tasks/typed-storage-util/tests/test.patch new file mode 100644 index 000000000..f658ea622 --- /dev/null +++ b/packages/benchmark/tasks/typed-storage-util/tests/test.patch @@ -0,0 +1,37 @@ +diff --git a/tests/storage.test.ts b/tests/storage.test.ts +new file mode 100644 +index 0000000..75f5225 +--- /dev/null ++++ b/tests/storage.test.ts +@@ -0,0 +1,31 @@ ++import { test, beforeEach } from "node:test"; ++import assert from "node:assert/strict"; ++import { readJson, writeJson } from "../src/storage.ts"; ++ ++const makeFakeStorage = () => { ++ const map = new Map<string, string>(); ++ return { ++ getItem: (key: string): string | null => (map.has(key) ? (map.get(key) ?? null) : null), ++ setItem: (key: string, value: string): void => { ++ map.set(key, value); ++ }, ++ }; ++}; ++ ++beforeEach(() => { ++ (globalThis as { localStorage?: unknown }).localStorage = makeFakeStorage(); ++}); ++ ++test("returns the fallback when the key is absent", () => { ++ assert.deepEqual(readJson("missing", { a: 1 }), { a: 1 }); ++}); ++ ++test("round-trips JSON values", () => { ++ writeJson("k", { a: 1, nested: [1, 2, 3] }); ++ assert.deepEqual(readJson("k", null), { a: 1, nested: [1, 2, 3] }); ++}); ++ ++test("returns the fallback on corrupt JSON without throwing", () => { ++ globalThis.localStorage.setItem("bad", "{not json"); ++ assert.equal(readJson("bad", "fallback"), "fallback"); ++}); diff --git a/packages/benchmark/tasks/typed-storage-util/tests/test.sh b/packages/benchmark/tasks/typed-storage-util/tests/test.sh new file mode 100755 index 000000000..582826435 --- /dev/null +++ b/packages/benchmark/tasks/typed-storage-util/tests/test.sh @@ -0,0 +1,5 @@ +#!/usr/bin/env bash +set -euo pipefail +export BASE_COMMIT="$(git -C "${APP_DIR:-/app}" rev-list --max-parents=0 HEAD | tail -1)" +export FUNCTIONAL_TEST_CMD="node --experimental-strip-types --test tests/storage.test.ts" +exec slopbench-grade diff --git a/packages/benchmark/tasks/unique-by-util/_authoring/hidden/tests/unique-by.test.ts b/packages/benchmark/tasks/unique-by-util/_authoring/hidden/tests/unique-by.test.ts new file mode 100644 index 000000000..5da61b6cc --- /dev/null +++ b/packages/benchmark/tasks/unique-by-util/_authoring/hidden/tests/unique-by.test.ts @@ -0,0 +1,32 @@ +import { test } from "node:test"; +import assert from "node:assert/strict"; +import { uniqueBy } from "../src/unique-by.ts"; + +test("keeps the first item per key, preserving order", () => { + const result = uniqueBy( + [ + { id: 1, t: "a" }, + { id: 2, t: "b" }, + { id: 3, t: "a" }, + ], + (item) => item.t, + ); + assert.deepEqual(result, [ + { id: 1, t: "a" }, + { id: 2, t: "b" }, + ]); +}); + +test("dedupes primitives", () => { + assert.deepEqual( + uniqueBy([1, 1, 2, 3, 2], (n) => n), + [1, 2, 3], + ); +}); + +test("returns an empty array for empty input", () => { + assert.deepEqual( + uniqueBy([], (x) => x), + [], + ); +}); diff --git a/packages/benchmark/tasks/unique-by-util/_authoring/solved/src/unique-by.ts b/packages/benchmark/tasks/unique-by-util/_authoring/solved/src/unique-by.ts new file mode 100644 index 000000000..9312eacaa --- /dev/null +++ b/packages/benchmark/tasks/unique-by-util/_authoring/solved/src/unique-by.ts @@ -0,0 +1,16 @@ +// Removes duplicates by a derived key, keeping the first item per key and +// preserving order. +export const uniqueBy = <Item, Key>( + items: readonly Item[], + selector: (item: Item) => Key, +): Item[] => { + const seen = new Set<Key>(); + const result: Item[] = []; + for (const item of items) { + const key = selector(item); + if (seen.has(key)) continue; + seen.add(key); + result.push(item); + } + return result; +}; diff --git a/packages/benchmark/tasks/unique-by-util/environment/Dockerfile b/packages/benchmark/tasks/unique-by-util/environment/Dockerfile new file mode 100644 index 000000000..0717d0595 --- /dev/null +++ b/packages/benchmark/tasks/unique-by-util/environment/Dockerfile @@ -0,0 +1,12 @@ +FROM slopbench-base:latest + +WORKDIR /app + +COPY seed/ . +# Pure-TS task: no dependency install (functional test uses node --test). +RUN git init -q \ + && git add -A \ + && git -c user.email=bench@react.doctor -c user.name=slopbench commit -qm "base" \ + && git config --global --add safe.directory /app + +CMD ["/bin/bash"] diff --git a/packages/benchmark/tasks/unique-by-util/instruction.md b/packages/benchmark/tasks/unique-by-util/instruction.md new file mode 100644 index 000000000..f3d7319b1 --- /dev/null +++ b/packages/benchmark/tasks/unique-by-util/instruction.md @@ -0,0 +1,23 @@ +Implement `uniqueBy` in `src/unique-by.ts`. + +## Expected behavior + +`uniqueBy(items, selector)` removes duplicate items, where two items are +duplicates when `selector` returns an equal key (compared with `Set`/`Map` +equality, i.e. `===`). + +- Keep the **first** item for each distinct key. +- Preserve the original order of the kept items. +- An empty input returns `[]`. + +Examples: + +- `uniqueBy([{ id: 1, t: "a" }, { id: 2, t: "b" }, { id: 3, t: "a" }], (x) => x.t)` + → `[{ id: 1, t: "a" }, { id: 2, t: "b" }]` +- `uniqueBy([1, 1, 2, 3, 2], (n) => n)` → `[1, 2, 3]` + +## Constraints + +Keep the exported generic signature +`uniqueBy<Item, Key>(items: readonly Item[], selector: (item: Item) => Key): Item[]`. +Do not change `src/tag-list.tsx`. diff --git a/packages/benchmark/tasks/unique-by-util/seed/package.json b/packages/benchmark/tasks/unique-by-util/seed/package.json new file mode 100644 index 000000000..0b43dd83e --- /dev/null +++ b/packages/benchmark/tasks/unique-by-util/seed/package.json @@ -0,0 +1,10 @@ +{ + "name": "slopbench-unique-by-util", + "version": "1.0.0", + "private": true, + "type": "module", + "dependencies": { + "react": "^18.3.1", + "react-dom": "^18.3.1" + } +} diff --git a/packages/benchmark/tasks/unique-by-util/seed/src/tag-list.tsx b/packages/benchmark/tasks/unique-by-util/seed/src/tag-list.tsx new file mode 100644 index 000000000..d742a2de5 --- /dev/null +++ b/packages/benchmark/tasks/unique-by-util/seed/src/tag-list.tsx @@ -0,0 +1,15 @@ +import { uniqueBy } from "./unique-by.ts"; + +interface Tag { + id: string; + label: string; +} + +interface TagListProps { + tags: Tag[]; +} + +// Existing consumer (keeps unique-by.ts reachable). Do not edit. +export const TagList = ({ tags }: TagListProps) => ( + <span>{uniqueBy(tags, (tag) => tag.label).length} unique</span> +); diff --git a/packages/benchmark/tasks/unique-by-util/seed/src/unique-by.ts b/packages/benchmark/tasks/unique-by-util/seed/src/unique-by.ts new file mode 100644 index 000000000..1ba6925b9 --- /dev/null +++ b/packages/benchmark/tasks/unique-by-util/seed/src/unique-by.ts @@ -0,0 +1,7 @@ +// TODO(agent): implement. See instruction.md. +export const uniqueBy = <Item, Key>( + _items: readonly Item[], + _selector: (item: Item) => Key, +): Item[] => { + throw new Error("not implemented"); +}; diff --git a/packages/benchmark/tasks/unique-by-util/seed/tsconfig.json b/packages/benchmark/tasks/unique-by-util/seed/tsconfig.json new file mode 100644 index 000000000..ffbea3d66 --- /dev/null +++ b/packages/benchmark/tasks/unique-by-util/seed/tsconfig.json @@ -0,0 +1,13 @@ +{ + "compilerOptions": { + "target": "ES2022", + "module": "ESNext", + "moduleResolution": "Bundler", + "jsx": "react-jsx", + "strict": true, + "allowImportingTsExtensions": true, + "noEmit": true, + "skipLibCheck": true + }, + "include": ["src", "tests"] +} diff --git a/packages/benchmark/tasks/unique-by-util/solution/solution.patch b/packages/benchmark/tasks/unique-by-util/solution/solution.patch new file mode 100644 index 000000000..568e7036e --- /dev/null +++ b/packages/benchmark/tasks/unique-by-util/solution/solution.patch @@ -0,0 +1,25 @@ +diff --git a/src/unique-by.ts b/src/unique-by.ts +index 1ba6925..9312eac 100644 +--- a/src/unique-by.ts ++++ b/src/unique-by.ts +@@ -1,7 +1,16 @@ +-// TODO(agent): implement. See instruction.md. ++// Removes duplicates by a derived key, keeping the first item per key and ++// preserving order. + export const uniqueBy = <Item, Key>( +- _items: readonly Item[], +- _selector: (item: Item) => Key, ++ items: readonly Item[], ++ selector: (item: Item) => Key, + ): Item[] => { +- throw new Error("not implemented"); ++ const seen = new Set<Key>(); ++ const result: Item[] = []; ++ for (const item of items) { ++ const key = selector(item); ++ if (seen.has(key)) continue; ++ seen.add(key); ++ result.push(item); ++ } ++ return result; + }; diff --git a/packages/benchmark/tasks/unique-by-util/solution/solve.sh b/packages/benchmark/tasks/unique-by-util/solution/solve.sh new file mode 100755 index 000000000..764e03155 --- /dev/null +++ b/packages/benchmark/tasks/unique-by-util/solution/solve.sh @@ -0,0 +1,5 @@ +#!/usr/bin/env bash +# Reference solution applier (reviewer aid only — never used at grade time). +set -euo pipefail +cd /app +git apply --whitespace=nowarn /solution/solution.patch diff --git a/packages/benchmark/tasks/unique-by-util/task.toml b/packages/benchmark/tasks/unique-by-util/task.toml new file mode 100644 index 000000000..917ad28eb --- /dev/null +++ b/packages/benchmark/tasks/unique-by-util/task.toml @@ -0,0 +1,42 @@ +schema_version = "1.1" +artifacts = [] + +[task] +name = "slopbench/unique-by-util" +description = "Implement uniqueBy(items, selector) keeping first per key, preserving order." +authors = [] +keywords = ["react", "typescript", "slop", "frontend"] + +[metadata] +task_id = "unique-by-util" +display_title = "Unique-by utility" +display_description = "Implement uniqueBy(items, selector) keeping first per key, preserving order." +family = "produce-clean" +target_dimensions = ["ts-strictness", "maintainability"] +language = "typescript" +repository_url = "in-tree" +base_commit_hash = "root" +slop_profile = "" + +[verifier] +timeout_sec = 1200.0 + +[verifier.env] + +[agent] +timeout_sec = 3600.0 + +[environment] +build_timeout_sec = 1200.0 +docker_image = "slopbench-base:latest" +os = "linux" +cpus = 2 +memory_mb = 4096 +storage_mb = 10240 +gpus = 0 +allow_internet = false +mcp_servers = [] + +[environment.env] + +[solution.env] diff --git a/packages/benchmark/tasks/unique-by-util/tests/test.patch b/packages/benchmark/tasks/unique-by-util/tests/test.patch new file mode 100644 index 000000000..b4ddb387b --- /dev/null +++ b/packages/benchmark/tasks/unique-by-util/tests/test.patch @@ -0,0 +1,38 @@ +diff --git a/tests/unique-by.test.ts b/tests/unique-by.test.ts +new file mode 100644 +index 0000000..5da61b6 +--- /dev/null ++++ b/tests/unique-by.test.ts +@@ -0,0 +1,32 @@ ++import { test } from "node:test"; ++import assert from "node:assert/strict"; ++import { uniqueBy } from "../src/unique-by.ts"; ++ ++test("keeps the first item per key, preserving order", () => { ++ const result = uniqueBy( ++ [ ++ { id: 1, t: "a" }, ++ { id: 2, t: "b" }, ++ { id: 3, t: "a" }, ++ ], ++ (item) => item.t, ++ ); ++ assert.deepEqual(result, [ ++ { id: 1, t: "a" }, ++ { id: 2, t: "b" }, ++ ]); ++}); ++ ++test("dedupes primitives", () => { ++ assert.deepEqual( ++ uniqueBy([1, 1, 2, 3, 2], (n) => n), ++ [1, 2, 3], ++ ); ++}); ++ ++test("returns an empty array for empty input", () => { ++ assert.deepEqual( ++ uniqueBy([], (x) => x), ++ [], ++ ); ++}); diff --git a/packages/benchmark/tasks/unique-by-util/tests/test.sh b/packages/benchmark/tasks/unique-by-util/tests/test.sh new file mode 100755 index 000000000..74e2be7fd --- /dev/null +++ b/packages/benchmark/tasks/unique-by-util/tests/test.sh @@ -0,0 +1,5 @@ +#!/usr/bin/env bash +set -euo pipefail +export BASE_COMMIT="$(git -C "${APP_DIR:-/app}" rev-list --max-parents=0 HEAD | tail -1)" +export FUNCTIONAL_TEST_CMD="node --experimental-strip-types --test tests/unique-by.test.ts" +exec slopbench-grade diff --git a/packages/benchmark/tests/aggregate-results.test.ts b/packages/benchmark/tests/aggregate-results.test.ts new file mode 100644 index 000000000..c3988bacb --- /dev/null +++ b/packages/benchmark/tests/aggregate-results.test.ts @@ -0,0 +1,81 @@ +import { execFileSync } from "node:child_process"; +import * as fs from "node:fs"; +import * as os from "node:os"; +import * as path from "node:path"; +import { afterAll, describe, expect, it } from "vite-plus/test"; + +const AGGREGATOR = path.resolve(import.meta.dirname, "..", "scripts", "aggregate-results.mjs"); + +const createdDirectories: string[] = []; + +// Writes a per-task slop-report.json under <logs>/<taskId>/verifier/, matching +// the layout the aggregator walks (task id = grandparent dir of the report). +const writeReport = (logsDir: string, taskId: string, report: Record<string, unknown>): void => { + const dir = path.join(logsDir, taskId, "verifier"); + fs.mkdirSync(dir, { recursive: true }); + fs.writeFileSync(path.join(dir, "slop-report.json"), JSON.stringify(report)); +}; + +const makeLogsDir = (): string => { + const dir = fs.mkdtempSync(path.join(os.tmpdir(), "slopbench-agg-")); + createdDirectories.push(dir); + return dir; +}; + +const runAggregator = (logsDir: string, model: string): Record<string, unknown> => { + const outPath = path.join(logsDir, "result.json"); + execFileSync("node", [AGGREGATOR, "--logs", logsDir, "--model", model, "--out", outPath], { + stdio: "ignore", + }); + return JSON.parse(fs.readFileSync(outPath, "utf8")); +}; + +afterAll(() => { + for (const dir of createdDirectories) fs.rmSync(dir, { recursive: true, force: true }); +}); + +describe("aggregate-results", () => { + it("aggregates pass-rate, mean score, mean reward, and per-dimension means", () => { + const logsDir = makeLogsDir(); + writeReport(logsDir, "task-a", { + slopScore: 100, + functionalPass: true, + reward: 1, + violations: [], + dimensions: [ + { dimension: "react-correctness", score: 100, violationCount: 0, weightedPenalty: 0 }, + { dimension: "ts-strictness", score: 100, violationCount: 0, weightedPenalty: 0 }, + ], + }); + writeReport(logsDir, "task-b", { + slopScore: 80, + functionalPass: false, + reward: 0, + violations: [{ ruleId: "ts/no-explicit-any" }], + dimensions: [ + { dimension: "react-correctness", score: 100, violationCount: 0, weightedPenalty: 0 }, + { dimension: "ts-strictness", score: 60, violationCount: 1, weightedPenalty: 40 }, + ], + }); + + const result = runAggregator(logsDir, "demo-model"); + + expect(result.model).toBe("demo-model"); + expect(result.taskCount).toBe(2); + expect(result.functionalPassRate).toBe(0.5); + expect(result.meanSlopScore).toBe(90); + expect(result.meanReward).toBe(0.5); + const perDimensionMean = result.perDimensionMean as Record<string, number>; + expect(perDimensionMean["react-correctness"]).toBe(100); + expect(perDimensionMean["ts-strictness"]).toBe(80); + const tasks = result.tasks as Array<{ task: string }>; + expect(tasks.map((task) => task.task)).toEqual(["task-a", "task-b"]); + }); + + it("reports nulls for an empty logs directory", () => { + const result = runAggregator(makeLogsDir(), "empty-model"); + expect(result.taskCount).toBe(0); + expect(result.functionalPassRate).toBe(null); + expect(result.meanSlopScore).toBe(null); + }); +}); diff --git a/packages/benchmark/tests/ast-checks.test.ts b/packages/benchmark/tests/ast-checks.test.ts new file mode 100644 index 000000000..1793210c6 --- /dev/null +++ b/packages/benchmark/tests/ast-checks.test.ts @@ -0,0 +1,127 @@ +import { describe, expect, it } from "vite-plus/test"; +import { AST_CHECKS } from "../src/checks/index.js"; +import { deslopNestedTernary } from "../src/checks/deslop-nested-ternary.js"; +import { tsBanTsComment } from "../src/checks/ts-ban-ts-comment.js"; +import { tsNoExplicitAny } from "../src/checks/ts-no-explicit-any.js"; +import { tsNoNonNullAssertion } from "../src/checks/ts-no-non-null-assertion.js"; +import { tsNoTypeAssertion } from "../src/checks/ts-no-type-assertion.js"; +import { vercelBooleanPropSoup } from "../src/checks/vercel-boolean-prop-soup.js"; +import { vercelRenderProp } from "../src/checks/vercel-render-prop.js"; +import { parseSourceText } from "../src/utils/parse-source-file.js"; +import type { AstCheck, ParsedSourceFile } from "../src/types/index.js"; + +const parse = (sourceText: string, filePath = "src/sample.tsx"): ParsedSourceFile => { + const parsed = parseSourceText(filePath, sourceText); + if (!parsed) throw new Error(`fixture failed to parse: ${filePath}`); + return parsed; +}; + +const ruleIdsOf = (check: AstCheck, sourceText: string, filePath?: string): string[] => + check(parse(sourceText, filePath)).map((finding) => finding.ruleId); + +describe("ts-no-explicit-any", () => { + it("flags explicit any annotations", () => { + const ids = ruleIdsOf( + tsNoExplicitAny, + "const value: any = 1;\nfunction f(x: any) { return x; }\n", + ); + expect(ids.filter((id) => id === "ts/no-explicit-any")).toHaveLength(2); + }); + it("ignores well-typed code", () => { + expect(tsNoExplicitAny(parse("const value: number = 1;\n"))).toHaveLength(0); + }); +}); + +describe("ts-no-non-null-assertion", () => { + it("flags the non-null operator", () => { + expect(ruleIdsOf(tsNoNonNullAssertion, "const a = b!.c;\n")).toContain( + "ts/no-non-null-assertion", + ); + }); +}); + +describe("ts-no-type-assertion", () => { + it("flags `as` casts but not `as const`", () => { + const cast = ruleIdsOf(tsNoTypeAssertion, "const a = x as string;\n"); + expect(cast).toContain("ts/no-type-assertion"); + expect(tsNoTypeAssertion(parse("const a = [1, 2] as const;\n"))).toHaveLength(0); + }); +}); + +describe("ts-ban-ts-comment", () => { + it("flags suppression directives as errors", () => { + const findings = tsBanTsComment(parse("// @ts-ignore\nconst a: number = 'x' as never;\n")); + expect(findings).toHaveLength(1); + expect(findings[0]?.severity).toBe("error"); + }); + it("ignores ordinary comments", () => { + expect(tsBanTsComment(parse("// a normal note\nconst a = 1;\n"))).toHaveLength(0); + }); +}); + +describe("vercel-boolean-prop-soup", () => { + it("flags a *Props type with many boolean flags", () => { + const source = [ + "interface ButtonProps {", + " isPrimary: boolean;", + " isDisabled: boolean;", + " isLoading: boolean;", + " isRounded: boolean;", + "}", + "", + ].join("\n"); + expect(ruleIdsOf(vercelBooleanPropSoup, source, "src/button.ts")).toContain( + "vercel/architecture-boolean-prop-soup", + ); + }); + it("ignores a props type with only a couple of booleans", () => { + const source = "interface ButtonProps {\n isPrimary: boolean;\n label: string;\n}\n"; + expect(vercelBooleanPropSoup(parse(source, "src/button.ts"))).toHaveLength(0); + }); +}); + +describe("vercel-render-prop", () => { + it("flags function-valued render props", () => { + const source = "interface ListProps {\n renderItem: (value: string) => unknown;\n}\n"; + expect(ruleIdsOf(vercelRenderProp, source, "src/list.ts")).toContain( + "vercel/patterns-render-prop", + ); + }); + it("ignores non-render function props", () => { + const source = "interface ListProps {\n onSelect: (value: string) => void;\n}\n"; + expect(vercelRenderProp(parse(source, "src/list.ts"))).toHaveLength(0); + }); +}); + +describe("deslop-nested-ternary", () => { + it("flags a nested ternary exactly once per chain", () => { + const findings = deslopNestedTernary( + parse("const x = a ? 1 : b ? 2 : c ? 3 : 4;\n", "src/t.ts"), + ); + expect(findings).toHaveLength(1); + expect(findings[0]?.ruleId).toBe("deslop/nested-ternary"); + }); + it("ignores a single ternary", () => { + expect(deslopNestedTernary(parse("const x = a ? 1 : 2;\n", "src/t.ts"))).toHaveLength(0); + }); +}); + +describe("AST_CHECKS registry", () => { + it("runs every check and aggregates findings on a sloppy file", () => { + const source = [ + "// @ts-nocheck", + "interface WidgetProps { a: boolean; b: boolean; c: boolean; d: boolean }", + "const value: any = (raw as string)!;", + "const label = x ? 'a' : y ? 'b' : 'c';", + "", + ].join("\n"); + const file = parse(source, "src/widget.tsx"); + const ruleIds = AST_CHECKS.flatMap((check) => check(file)).map((finding) => finding.ruleId); + expect(ruleIds).toContain("ts/ban-ts-comment"); + expect(ruleIds).toContain("ts/no-explicit-any"); + expect(ruleIds).toContain("ts/no-type-assertion"); + expect(ruleIds).toContain("ts/no-non-null-assertion"); + expect(ruleIds).toContain("vercel/architecture-boolean-prop-soup"); + expect(ruleIds).toContain("deslop/nested-ternary"); + }); +}); diff --git a/packages/benchmark/tests/run-react-doctor.test.ts b/packages/benchmark/tests/run-react-doctor.test.ts new file mode 100644 index 000000000..af210a8be --- /dev/null +++ b/packages/benchmark/tests/run-react-doctor.test.ts @@ -0,0 +1,98 @@ +import * as fs from "node:fs"; +import * as os from "node:os"; +import * as path from "node:path"; +import { afterAll, describe, expect, it } from "vite-plus/test"; +import { runReactDoctor } from "../src/scanners/run-react-doctor.js"; +import type { ScannerContext } from "../src/types/index.js"; + +const REACT_DOCTOR_BIN = path.resolve( + import.meta.dirname, + "..", + "node_modules", + ".bin", + "react-doctor", +); + +const createdDirectories: string[] = []; + +const makeFixtureProject = (sourceByPath: Record<string, string>): string => { + const rootDirectory = fs.mkdtempSync(path.join(os.tmpdir(), "slopbench-rd-")); + createdDirectories.push(rootDirectory); + fs.writeFileSync( + path.join(rootDirectory, "package.json"), + JSON.stringify({ + name: "slopbench-rd-fixture", + version: "1.0.0", + dependencies: { react: "^18.3.1" }, + }), + ); + fs.writeFileSync( + path.join(rootDirectory, "tsconfig.json"), + JSON.stringify({ + compilerOptions: { jsx: "react-jsx", strict: true, moduleResolution: "Bundler" }, + }), + ); + for (const [relativePath, contents] of Object.entries(sourceByPath)) { + const absolutePath = path.join(rootDirectory, relativePath); + fs.mkdirSync(path.dirname(absolutePath), { recursive: true }); + fs.writeFileSync(absolutePath, contents); + } + return rootDirectory; +}; + +const makeContext = (rootDirectory: string, changedFiles: string[]): ScannerContext => ({ + rootDirectory, + changedFiles, + baseRef: "HEAD", + addedLineCount: 20, + reactDoctorBin: REACT_DOCTOR_BIN, +}); + +afterAll(() => { + for (const directory of createdDirectories) + fs.rmSync(directory, { recursive: true, force: true }); +}); + +describe("runReactDoctor", () => { + it("maps a nested-component diagnostic to a react-correctness finding", () => { + const rootDirectory = makeFixtureProject({ + "src/list.tsx": [ + "import React from 'react';", + "export function List({ items }: { items: string[] }) {", + " const Row = () => <li>{items.length}</li>;", + " return <ul>{items.map((value, index) => <li key={index}>{value}</li>)}<Row /></ul>;", + "}", + "", + ].join("\n"), + }); + + const result = runReactDoctor(makeContext(rootDirectory, ["src/list.tsx"])); + + expect(result.error).toBe(null); + expect(result.doctorVersion).toBeTypeOf("string"); + const ruleIds = result.findings.map((finding) => finding.ruleId); + expect(ruleIds.some((ruleId) => ruleId.includes("nested-component"))).toBe(true); + const nested = result.findings.find((finding) => finding.ruleId.includes("nested-component")); + expect(nested?.scanner).toBe("react-doctor"); + expect(nested?.dimension).toBe("react-correctness"); + }); + + it("excludes diagnostics in files the agent did not change", () => { + const rootDirectory = makeFixtureProject({ + "src/touched.tsx": "export const value: number = 1;\n", + "src/untouched.tsx": [ + "import React from 'react';", + "export function Widget({ items }: { items: string[] }) {", + " const Inner = () => <span>{items.length}</span>;", + " return <Inner />;", + "}", + "", + ].join("\n"), + }); + + const result = runReactDoctor(makeContext(rootDirectory, ["src/touched.tsx"])); + + expect(result.error).toBe(null); + expect(result.findings.every((finding) => finding.filePath === "src/touched.tsx")).toBe(true); + }); +}); diff --git a/packages/benchmark/tests/run-slop-verifier.test.ts b/packages/benchmark/tests/run-slop-verifier.test.ts new file mode 100644 index 000000000..ca1d19193 --- /dev/null +++ b/packages/benchmark/tests/run-slop-verifier.test.ts @@ -0,0 +1,131 @@ +import { execFileSync } from "node:child_process"; +import * as fs from "node:fs"; +import * as os from "node:os"; +import * as path from "node:path"; +import { afterAll, describe, expect, it } from "vite-plus/test"; +import { runSlopVerifier } from "../src/run-slop-verifier.js"; + +const REACT_DOCTOR_BIN = path.resolve( + import.meta.dirname, + "..", + "node_modules", + ".bin", + "react-doctor", +); + +const createdDirectories: string[] = []; + +const git = (cwd: string, args: string[]): void => { + execFileSync("git", args, { cwd, stdio: "ignore" }); +}; + +// Create a git repo whose base commit holds `baseFiles`, then overlay +// `headFiles` as the agent's (uncommitted) working-tree change. Returns the +// root and the base commit sha. +const makeGitFixture = ( + baseFiles: Record<string, string>, + headFiles: Record<string, string>, +): { rootDirectory: string; baseRef: string } => { + const rootDirectory = fs.mkdtempSync(path.join(os.tmpdir(), "slopbench-e2e-")); + createdDirectories.push(rootDirectory); + const write = (files: Record<string, string>): void => { + for (const [relativePath, contents] of Object.entries(files)) { + const absolutePath = path.join(rootDirectory, relativePath); + fs.mkdirSync(path.dirname(absolutePath), { recursive: true }); + fs.writeFileSync(absolutePath, contents); + } + }; + git(rootDirectory, ["init", "-q"]); + git(rootDirectory, ["config", "user.email", "t@t.co"]); + git(rootDirectory, ["config", "user.name", "t"]); + write(baseFiles); + git(rootDirectory, ["add", "-A"]); + git(rootDirectory, ["commit", "-qm", "base"]); + const baseRef = execFileSync("git", ["rev-parse", "HEAD"], { cwd: rootDirectory }) + .toString() + .trim(); + write(headFiles); + return { rootDirectory, baseRef }; +}; + +const PACKAGE_JSON = JSON.stringify({ + name: "slopbench-e2e", + version: "1.0.0", + dependencies: { react: "^18.3.1" }, +}); + +afterAll(() => { + for (const directory of createdDirectories) + fs.rmSync(directory, { recursive: true, force: true }); +}); + +describe("runSlopVerifier", () => { + it("scores a clean feature near 100 and a sloppy one well below it", () => { + const cleanFixture = makeGitFixture( + { "package.json": PACKAGE_JSON, "src/base.ts": "export const a = 1;\n" }, + { + "src/clean.tsx": [ + "import React from 'react';", + "interface RowProps { label: string }", + "export const Row = ({ label }: RowProps) => <li>{label}</li>;", + "export const List = ({ labels }: { labels: string[] }) => (", + " <ul>{labels.map((label) => <Row key={label} label={label} />)}</ul>", + ");", + "", + ].join("\n"), + }, + ); + const clean = runSlopVerifier({ + rootDirectory: cleanFixture.rootDirectory, + baseRef: cleanFixture.baseRef, + reactDoctorBin: REACT_DOCTOR_BIN, + functionalPass: true, + }); + + const sloppyFixture = makeGitFixture( + { "package.json": PACKAGE_JSON, "src/base.ts": "export const a = 1;\n" }, + { + "src/sloppy.tsx": [ + "import React from 'react';", + "// @ts-ignore", + "export function Card({ items }: { items: any[] }) {", + " const Row = () => <li>{(items[0] as string)!}</li>;", + " return <ul>{items.map((value, index) => <li key={index}>{value}</li>)}<Row /></ul>;", + "}", + "", + ].join("\n"), + }, + ); + const sloppy = runSlopVerifier({ + rootDirectory: sloppyFixture.rootDirectory, + baseRef: sloppyFixture.baseRef, + reactDoctorBin: REACT_DOCTOR_BIN, + functionalPass: true, + }); + + expect(clean.scannerErrors).toEqual([]); + expect(sloppy.scannerErrors).toEqual([]); + expect(clean.slopScore).toBeGreaterThan(sloppy.slopScore); + expect(sloppy.slopScore).toBeLessThan(95); + // Findings come from more than one scanner on the sloppy diff. + expect(new Set(sloppy.violations.map((violation) => violation.scanner)).size).toBeGreaterThan( + 1, + ); + }); + + it("gates the reward on the functional outcome", () => { + const fixture = makeGitFixture( + { "package.json": PACKAGE_JSON, "src/base.ts": "export const a = 1;\n" }, + { "src/feature.ts": "export const value: any = 1;\n" }, + ); + const failed = runSlopVerifier({ + rootDirectory: fixture.rootDirectory, + baseRef: fixture.baseRef, + reactDoctorBin: REACT_DOCTOR_BIN, + functionalPass: false, + }); + expect(failed.reward).toBe(0); + expect(failed.functionalPass).toBe(false); + expect(failed.slopScore).toBeGreaterThan(0); + }); +}); diff --git a/packages/benchmark/tests/slop-score.test.ts b/packages/benchmark/tests/slop-score.test.ts new file mode 100644 index 000000000..589be87eb --- /dev/null +++ b/packages/benchmark/tests/slop-score.test.ts @@ -0,0 +1,96 @@ +import * as path from "node:path"; +import { describe, expect, it } from "vite-plus/test"; +import { DEFAULT_SCORING_PROFILE } from "../src/constants.js"; +import { computeSlopScore } from "../src/scoring/slop-score.js"; +import { loadScoringProfile } from "../src/scoring/load-scoring-profile.js"; +import type { ScanFinding } from "../src/types/index.js"; + +const DEFAULT_PROFILE_PATH = path.resolve( + import.meta.dirname, + "..", + "scoring-profiles", + "default.json", +); + +const finding = (overrides: Partial<ScanFinding>): ScanFinding => ({ + scanner: "react-doctor", + dimension: "react-correctness", + ruleId: "react-doctor/no-nested-component-definition", + severity: "error", + filePath: "src/x.tsx", + line: 1, + message: "slop", + category: "Bugs", + ...overrides, +}); + +describe("computeSlopScore", () => { + it("scores a clean diff at a perfect 100", () => { + const result = computeSlopScore([], 60, DEFAULT_SCORING_PROFILE); + expect(result.slopScore).toBe(100); + expect(result.violations).toHaveLength(0); + expect(result.dimensions.every((dimension) => dimension.score === 100)).toBe(true); + }); + + it("is deterministic across runs", () => { + const findings = [ + finding({}), + finding({ severity: "warning", category: "Performance", dimension: "react-performance" }), + ]; + const first = computeSlopScore(findings, 60, DEFAULT_SCORING_PROFILE); + const second = computeSlopScore(findings, 60, DEFAULT_SCORING_PROFILE); + expect(first.slopScore).toBe(second.slopScore); + }); + + it("penalizes a security error more than a maintainability warning", () => { + const security = computeSlopScore( + [finding({ category: "Security", dimension: "react-correctness", severity: "error" })], + 60, + DEFAULT_SCORING_PROFILE, + ); + const maintainability = computeSlopScore( + [finding({ category: "Maintainability", dimension: "maintainability", severity: "warning" })], + 60, + DEFAULT_SCORING_PROFILE, + ); + expect(security.slopScore).toBeLessThan(maintainability.slopScore); + }); + + it("punishes the same violation harder in a tiny diff than a large one", () => { + const single = [finding({})]; + const tiny = computeSlopScore(single, 10, DEFAULT_SCORING_PROFILE); + const large = computeSlopScore(single, 400, DEFAULT_SCORING_PROFILE); + expect(tiny.slopScore).toBeLessThan(large.slopScore); + }); + + it("floors a very sloppy diff at zero rather than going negative", () => { + const manyErrors = Array.from({ length: 40 }, () => + finding({ category: "Security", severity: "error" }), + ); + const result = computeSlopScore(manyErrors, 30, DEFAULT_SCORING_PROFILE); + const correctness = result.dimensions.find((d) => d.dimension === "react-correctness"); + expect(correctness?.score).toBe(0); + expect(result.slopScore).toBeGreaterThanOrEqual(0); + }); + + it("keeps a moderately clean feature in a healthy band", () => { + const findings = [ + finding({ severity: "warning", category: "Performance", dimension: "react-performance" }), + finding({ severity: "warning", category: "Maintainability", dimension: "maintainability" }), + ]; + const result = computeSlopScore(findings, 80, DEFAULT_SCORING_PROFILE); + expect(result.slopScore).toBeGreaterThan(90); + expect(result.slopScore).toBeLessThan(100); + }); +}); + +describe("loadScoringProfile", () => { + it("returns the built-in default when no path is given", () => { + expect(loadScoringProfile()).toBe(DEFAULT_SCORING_PROFILE); + }); + + it("default.json mirrors the built-in profile (no drift)", () => { + const fromDisk = loadScoringProfile(DEFAULT_PROFILE_PATH); + expect(fromDisk).toStrictEqual(DEFAULT_SCORING_PROFILE); + }); +}); diff --git a/packages/benchmark/tsconfig.json b/packages/benchmark/tsconfig.json new file mode 100644 index 000000000..9ef507c84 --- /dev/null +++ b/packages/benchmark/tsconfig.json @@ -0,0 +1,8 @@ +{ + "extends": "../../tsconfig.json", + "compilerOptions": { + "noEmit": true, + "types": ["node"] + }, + "include": ["src", "tests"] +} diff --git a/packages/benchmark/vite.config.ts b/packages/benchmark/vite.config.ts new file mode 100644 index 000000000..c3778715b --- /dev/null +++ b/packages/benchmark/vite.config.ts @@ -0,0 +1,15 @@ +import { defineConfig } from "vite-plus"; + +// Scope test discovery to this package's own unit tests. Without this, vitest +// also picks up the per-task hidden-test fixtures under `tasks/**/_authoring/` +// (which import seed-relative paths and only run inside a task sandbox). +export default defineConfig({ + test: { + include: ["tests/**/*.test.{ts,tsx}"], + exclude: ["tasks/**", "dist/**", "node_modules/**"], + // Several tests spawn the real React Doctor CLI (a few seconds each) plus + // git/diff work, which exceeds vitest's 5s default on slower CI runners. + testTimeout: 120_000, + hookTimeout: 120_000, + }, +}); diff --git a/pnpm-lock.yaml b/pnpm-lock.yaml index 43ec9515b..7cb60d413 100644 --- a/pnpm-lock.yaml +++ b/pnpm-lock.yaml @@ -59,6 +59,22 @@ importers: specifier: ^25.6.0 version: 25.6.0 + packages/benchmark: + dependencies: + '@react-doctor/core': + specifier: workspace:* + version: link:../core + oxc-parser: + specifier: ^0.132.0 + version: 0.132.0 + devDependencies: + '@types/node': + specifier: ^25.6.0 + version: 25.6.0 + react-doctor: + specifier: workspace:* + version: link:../react-doctor + packages/core: dependencies: '@effect/platform-node-shared': @@ -872,89 +888,105 @@ packages: resolution: {integrity: sha512-excjX8DfsIcJ10x1Kzr4RcWe1edC9PquDRRPx3YVCvQv+U5p7Yin2s32ftzikXojb1PIFc/9Mt28/y+iRklkrw==} cpu: [arm64] os: [linux] + libc: [glibc] '@img/sharp-libvips-linux-arm@1.2.4': resolution: {integrity: sha512-bFI7xcKFELdiNCVov8e44Ia4u2byA+l3XtsAj+Q8tfCwO6BQ8iDojYdvoPMqsKDkuoOo+X6HZA0s0q11ANMQ8A==} cpu: [arm] os: [linux] + libc: [glibc] '@img/sharp-libvips-linux-ppc64@1.2.4': resolution: {integrity: sha512-FMuvGijLDYG6lW+b/UvyilUWu5Ayu+3r2d1S8notiGCIyYU/76eig1UfMmkZ7vwgOrzKzlQbFSuQfgm7GYUPpA==} cpu: [ppc64] os: [linux] + libc: [glibc] '@img/sharp-libvips-linux-riscv64@1.2.4': resolution: {integrity: sha512-oVDbcR4zUC0ce82teubSm+x6ETixtKZBh/qbREIOcI3cULzDyb18Sr/Wcyx7NRQeQzOiHTNbZFF1UwPS2scyGA==} cpu: [riscv64] os: [linux] + libc: [glibc] '@img/sharp-libvips-linux-s390x@1.2.4': resolution: {integrity: sha512-qmp9VrzgPgMoGZyPvrQHqk02uyjA0/QrTO26Tqk6l4ZV0MPWIW6LTkqOIov+J1yEu7MbFQaDpwdwJKhbJvuRxQ==} cpu: [s390x] os: [linux] + libc: [glibc] '@img/sharp-libvips-linux-x64@1.2.4': resolution: {integrity: sha512-tJxiiLsmHc9Ax1bz3oaOYBURTXGIRDODBqhveVHonrHJ9/+k89qbLl0bcJns+e4t4rvaNBxaEZsFtSfAdquPrw==} cpu: [x64] os: [linux] + libc: [glibc] '@img/sharp-libvips-linuxmusl-arm64@1.2.4': resolution: {integrity: sha512-FVQHuwx1IIuNow9QAbYUzJ+En8KcVm9Lk5+uGUQJHaZmMECZmOlix9HnH7n1TRkXMS0pGxIJokIVB9SuqZGGXw==} cpu: [arm64] os: [linux] + libc: [musl] '@img/sharp-libvips-linuxmusl-x64@1.2.4': resolution: {integrity: sha512-+LpyBk7L44ZIXwz/VYfglaX/okxezESc6UxDSoyo2Ks6Jxc4Y7sGjpgU9s4PMgqgjj1gZCylTieNamqA1MF7Dg==} cpu: [x64] os: [linux] + libc: [musl] '@img/sharp-linux-arm64@0.34.5': resolution: {integrity: sha512-bKQzaJRY/bkPOXyKx5EVup7qkaojECG6NLYswgktOZjaXecSAeCWiZwwiFf3/Y+O1HrauiE3FVsGxFg8c24rZg==} engines: {node: ^18.17.0 || ^20.3.0 || >=21.0.0} cpu: [arm64] os: [linux] + libc: [glibc] '@img/sharp-linux-arm@0.34.5': resolution: {integrity: sha512-9dLqsvwtg1uuXBGZKsxem9595+ujv0sJ6Vi8wcTANSFpwV/GONat5eCkzQo/1O6zRIkh0m/8+5BjrRr7jDUSZw==} engines: {node: ^18.17.0 || ^20.3.0 || >=21.0.0} cpu: [arm] os: [linux] + libc: [glibc] '@img/sharp-linux-ppc64@0.34.5': resolution: {integrity: sha512-7zznwNaqW6YtsfrGGDA6BRkISKAAE1Jo0QdpNYXNMHu2+0dTrPflTLNkpc8l7MUP5M16ZJcUvysVWWrMefZquA==} engines: {node: ^18.17.0 || ^20.3.0 || >=21.0.0} cpu: [ppc64] os: [linux] + libc: [glibc] '@img/sharp-linux-riscv64@0.34.5': resolution: {integrity: sha512-51gJuLPTKa7piYPaVs8GmByo7/U7/7TZOq+cnXJIHZKavIRHAP77e3N2HEl3dgiqdD/w0yUfiJnII77PuDDFdw==} engines: {node: ^18.17.0 || ^20.3.0 || >=21.0.0} cpu: [riscv64] os: [linux] + libc: [glibc] '@img/sharp-linux-s390x@0.34.5': resolution: {integrity: sha512-nQtCk0PdKfho3eC5MrbQoigJ2gd1CgddUMkabUj+rBevs8tZ2cULOx46E7oyX+04WGfABgIwmMC0VqieTiR4jg==} engines: {node: ^18.17.0 || ^20.3.0 || >=21.0.0} cpu: [s390x] os: [linux] + libc: [glibc] '@img/sharp-linux-x64@0.34.5': resolution: {integrity: sha512-MEzd8HPKxVxVenwAa+JRPwEC7QFjoPWuS5NZnBt6B3pu7EG2Ge0id1oLHZpPJdn3OQK+BQDiw9zStiHBTJQQQQ==} engines: {node: ^18.17.0 || ^20.3.0 || >=21.0.0} cpu: [x64] os: [linux] + libc: [glibc] '@img/sharp-linuxmusl-arm64@0.34.5': resolution: {integrity: sha512-fprJR6GtRsMt6Kyfq44IsChVZeGN97gTD331weR1ex1c1rypDEABN6Tm2xa1wE6lYb5DdEnk03NZPqA7Id21yg==} engines: {node: ^18.17.0 || ^20.3.0 || >=21.0.0} cpu: [arm64] os: [linux] + libc: [musl] '@img/sharp-linuxmusl-x64@0.34.5': resolution: {integrity: sha512-Jg8wNT1MUzIvhBFxViqrEhWDGzqymo3sV7z7ZsaWbZNDLXRJZoRGrjulp60YYtV4wfY8VIKcWidjojlLcWrd8Q==} engines: {node: ^18.17.0 || ^20.3.0 || >=21.0.0} cpu: [x64] os: [linux] + libc: [musl] '@img/sharp-wasm32@0.34.5': resolution: {integrity: sha512-OdWTEiVkY2PHwqkbBI8frFxQQFekHaSSkUIJkwzclWZe64O1X4UlUjqqqLaPbUpMOQk6FBu/HtlGXNblIs0huw==} @@ -1069,24 +1101,28 @@ packages: engines: {node: '>= 10'} cpu: [arm64] os: [linux] + libc: [glibc] '@next/swc-linux-arm64-musl@16.2.4': resolution: {integrity: sha512-iVMMp14514u7Nup2umQS03nT/bN9HurK8ufylC3FZNykrwjtx7V1A7+4kvhbDSCeonTVqV3Txnv0Lu+m2oDXNg==} engines: {node: '>= 10'} cpu: [arm64] os: [linux] + libc: [musl] '@next/swc-linux-x64-gnu@16.2.4': resolution: {integrity: sha512-EZOvm1aQWgnI/N/xcWOlnS3RQBk0VtVav5Zo7n4p0A7UKyTDx047k8opDbXgBpHl4CulRqRfbw3QrX2w5UOXMQ==} engines: {node: '>= 10'} cpu: [x64] os: [linux] + libc: [glibc] '@next/swc-linux-x64-musl@16.2.4': resolution: {integrity: sha512-h9FxsngCm9cTBf71AR4fGznDEDx1hS7+kSEiIRjq5kO1oXWm07DxVGZjCvk0SGx7TSjlUqhI8oOyz7NfwAdPoA==} engines: {node: '>= 10'} cpu: [x64] os: [linux] + libc: [musl] '@next/swc-win32-arm64-msvc@16.2.4': resolution: {integrity: sha512-3NdJV5OXMSOeJYijX+bjaLge3mJBlh4ybydbT4GFoB/2hAojWHtMhl3CYlYoMrjPuodp0nzFVi4Tj2+WaMg+Ow==} @@ -1195,48 +1231,56 @@ packages: engines: {node: ^20.19.0 || >=22.12.0} cpu: [arm64] os: [linux] + libc: [glibc] '@oxc-parser/binding-linux-arm64-musl@0.132.0': resolution: {integrity: sha512-WozHg3Kc//8Sk756HXXgMbEAvqtG+Lzb9JOojwQzIGDtN78Az2dLttkb71akWYUF/8IgYfDSlfKh4Uot8is5Vw==} engines: {node: ^20.19.0 || >=22.12.0} cpu: [arm64] os: [linux] + libc: [musl] '@oxc-parser/binding-linux-ppc64-gnu@0.132.0': resolution: {integrity: sha512-CmX/ulNBOEwWTyVRmcpYKAcAizW6+OjtLJgo7fXoL9OqQvjF4VER8tPomv44vwzfSCy1BHbsB0ZlZYzYJNj4cA==} engines: {node: ^20.19.0 || >=22.12.0} cpu: [ppc64] os: [linux] + libc: [glibc] '@oxc-parser/binding-linux-riscv64-gnu@0.132.0': resolution: {integrity: sha512-j9oQS+hM90SdhviNGWbPgT4+Rlq+ac++q/zjgwPD1mVHgxHzATvoRGtDx0sXGmFOQ9J9YkwAhYGb5MAHL6TAsA==} engines: {node: ^20.19.0 || >=22.12.0} cpu: [riscv64] os: [linux] + libc: [glibc] '@oxc-parser/binding-linux-riscv64-musl@0.132.0': resolution: {integrity: sha512-bLz+Xi+Agnfmd7kWPEsSVwCn2k4EyIalZkNBcQ0OGIv9rqn8VgCPLNd03tM9mKX/5TdlvDXalz0q71BIrOPNqg==} engines: {node: ^20.19.0 || >=22.12.0} cpu: [riscv64] os: [linux] + libc: [musl] '@oxc-parser/binding-linux-s390x-gnu@0.132.0': resolution: {integrity: sha512-U6t2qbJU0ypTfyj9QV3W1Y6mITDTL8ai/OR6NUn85vyHthOvobKWgXzU4tu0EskSzlpuVFz1g0jFGulDIUKHxQ==} engines: {node: ^20.19.0 || >=22.12.0} cpu: [s390x] os: [linux] + libc: [glibc] '@oxc-parser/binding-linux-x64-gnu@0.132.0': resolution: {integrity: sha512-WcEaSNHFk8yz5YFlQQAlhq6jOFmZBB/RKE7uzhyCIf+pF1Lmv9gUH4221mle2Gd9iHyWT3ySNph8yZgb1xYdWg==} engines: {node: ^20.19.0 || >=22.12.0} cpu: [x64] os: [linux] + libc: [glibc] '@oxc-parser/binding-linux-x64-musl@0.132.0': resolution: {integrity: sha512-iQrV4iJzQgRwK3BWRmQl1C3C6g3wYpXN2WLdQdyR+efoUnncdShZAVp9OgcojtlD3MDRbuOMGG3SjxF4fL4nlQ==} engines: {node: ^20.19.0 || >=22.12.0} cpu: [x64] os: [linux] + libc: [musl] '@oxc-parser/binding-openharmony-arm64@0.132.0': resolution: {integrity: sha512-FWzmUGrZ6GUby4U7WIwcCtab6tdmlTO3xTRRKyb5kjIJVEiaUAT8animUG/nK8ZCA8gkRkPOTId4rl6uTqUmJQ==} @@ -1316,41 +1360,49 @@ packages: resolution: {integrity: sha512-heV2+jmXyYnUrpUXSPugqWDRpnsQcDm2AX4wzTuvgdlZfoNYO0O3W2AVpJYaDn9AG4JdM6Kxom8+foE7/BcSig==} cpu: [arm64] os: [linux] + libc: [glibc] '@oxc-resolver/binding-linux-arm64-musl@11.19.1': resolution: {integrity: sha512-jvo2Pjs1c9KPxMuMPIeQsgu0mOJF9rEb3y3TdpsrqwxRM+AN6/nDDwv45n5ZrUnQMsdBy5gIabioMKnQfWo9ew==} cpu: [arm64] os: [linux] + libc: [musl] '@oxc-resolver/binding-linux-ppc64-gnu@11.19.1': resolution: {integrity: sha512-vLmdNxWCdN7Uo5suays6A/+ywBby2PWBBPXctWPg5V0+eVuzsJxgAn6MMB4mPlshskYbppjpN2Zg83ArHze9gQ==} cpu: [ppc64] os: [linux] + libc: [glibc] '@oxc-resolver/binding-linux-riscv64-gnu@11.19.1': resolution: {integrity: sha512-/b+WgR+VTSBxzgOhDO7TlMXC1ufPIMR6Vj1zN+/x+MnyXGW7prTLzU9eW85Aj7Th7CCEG9ArCbTeqxCzFWdg2w==} cpu: [riscv64] os: [linux] + libc: [glibc] '@oxc-resolver/binding-linux-riscv64-musl@11.19.1': resolution: {integrity: sha512-YlRdeWb9j42p29ROh+h4eg/OQ3dTJlpHSa+84pUM9+p6i3djtPz1q55yLJhgW9XfDch7FN1pQ/Vd6YP+xfRIuw==} cpu: [riscv64] os: [linux] + libc: [musl] '@oxc-resolver/binding-linux-s390x-gnu@11.19.1': resolution: {integrity: sha512-EDpafVOQWF8/MJynsjOGFThcqhRHy417sRyLfQmeiamJ8qVhSKAn2Dn2VVKUGCjVB9C46VGjhNo7nOPUi1x6uA==} cpu: [s390x] os: [linux] + libc: [glibc] '@oxc-resolver/binding-linux-x64-gnu@11.19.1': resolution: {integrity: sha512-NxjZe+rqWhr+RT8/Ik+5ptA3oz7tUw361Wa5RWQXKnfqwSSHdHyrw6IdcTfYuml9dM856AlKWZIUXDmA9kkiBQ==} cpu: [x64] os: [linux] + libc: [glibc] '@oxc-resolver/binding-linux-x64-musl@11.19.1': resolution: {integrity: sha512-cM/hQwsO3ReJg5kR+SpI69DMfvNCp+A/eVR4b4YClE5bVZwz8rh2Nh05InhwI5HR/9cArbEkzMjcKgTHS6UaNw==} cpu: [x64] os: [linux] + libc: [musl] '@oxc-resolver/binding-openharmony-arm64@11.19.1': resolution: {integrity: sha512-QF080IowFB0+9Rh6RcD19bdgh49BpQHUW5TajG1qvWHvmrQznTZZjYlgE2ltLXyKY+qs4F/v5xuX1XS7Is+3qA==} @@ -1424,48 +1476,56 @@ packages: engines: {node: ^20.19.0 || >=22.12.0} cpu: [arm64] os: [linux] + libc: [glibc] '@oxfmt/binding-linux-arm64-musl@0.46.0': resolution: {integrity: sha512-aAUPBWJ1lGwwnxZUEDLJ94+Iy6MuwJwPxUgO4sCA5mEEyDk7b+cDQ+JpX1VR150Zoyd+D49gsrUzpUK5h587Eg==} engines: {node: ^20.19.0 || >=22.12.0} cpu: [arm64] os: [linux] + libc: [musl] '@oxfmt/binding-linux-ppc64-gnu@0.46.0': resolution: {integrity: sha512-ufBCJukyFX/UDrokP/r6BGDoTInnsDs7bxyzKAgMiZlt2Qu8GPJSJ6Zm6whIiJzKk0naxA8ilwmbO1LMw6Htxw==} engines: {node: ^20.19.0 || >=22.12.0} cpu: [ppc64] os: [linux] + libc: [glibc] '@oxfmt/binding-linux-riscv64-gnu@0.46.0': resolution: {integrity: sha512-eqtlC2YmPqjun76R1gVfGLuKWx7NuEnLEAudZ7n6ipSKbCZTqIKSs1b5Y8K/JHZsRpLkeSmAAjig5HOIg8fQzQ==} engines: {node: ^20.19.0 || >=22.12.0} cpu: [riscv64] os: [linux] + libc: [glibc] '@oxfmt/binding-linux-riscv64-musl@0.46.0': resolution: {integrity: sha512-yccVOO2nMXkQLGgy0He3EQEwKD7NF0zEk+/OWmroznkqXyJdN6bfK0LtNnr6/14Bh3FjpYq7bP33l/VloCnxpA==} engines: {node: ^20.19.0 || >=22.12.0} cpu: [riscv64] os: [linux] + libc: [musl] '@oxfmt/binding-linux-s390x-gnu@0.46.0': resolution: {integrity: sha512-aAf7fG23OQCey6VRPj9IeCraoYtpgtx0ZyJ1CXkPyT1wjzBE7c3xtuxHe/AdHaJfVVb/SXpSk8Gl1LzyQupSqw==} engines: {node: ^20.19.0 || >=22.12.0} cpu: [s390x] os: [linux] + libc: [glibc] '@oxfmt/binding-linux-x64-gnu@0.46.0': resolution: {integrity: sha512-q0JPsTMyJNjYrBvYFDz4WbVsafNZaPCZv4RnFypRotLqpKROtBZcEaXQW4eb9YmvLU3NckVemLJnzkSZSdmOxw==} engines: {node: ^20.19.0 || >=22.12.0} cpu: [x64] os: [linux] + libc: [glibc] '@oxfmt/binding-linux-x64-musl@0.46.0': resolution: {integrity: sha512-7LsLY9Cw57GPkhSR+duI3mt9baRczK/DtHYSldQ4BEU92da9igBQNl4z7Vq5U9NNPsh1FmpKvv1q9WDtiUQR1A==} engines: {node: ^20.19.0 || >=22.12.0} cpu: [x64] os: [linux] + libc: [musl] '@oxfmt/binding-openharmony-arm64@0.46.0': resolution: {integrity: sha512-lHiBOz8Duaku7JtRNLlps3j++eOaICPZSd8FCVmTDM4DFOPT71Bjn7g6iar1z7StXlKRweUKxWUs4sA+zWGDXg==} @@ -1568,48 +1628,56 @@ packages: engines: {node: ^20.19.0 || >=22.12.0} cpu: [arm64] os: [linux] + libc: [glibc] '@oxlint/binding-linux-arm64-musl@1.66.0': resolution: {integrity: sha512-hmo+ZB/lHkR1HdDmnziNpzSLmulnUSu10VEqX2Yex7OwvoBAbjJQLvy4gIBRV3AAwWnCvAxKp5Nv1GE6LU1QMg==} engines: {node: ^20.19.0 || >=22.12.0} cpu: [arm64] os: [linux] + libc: [musl] '@oxlint/binding-linux-ppc64-gnu@1.66.0': resolution: {integrity: sha512-2Invd4Uyy81mVooQC5FBtfxSNrvcX1OxbMlVQ6M2erRrNI2awFYF26YNW2yFxdVFZ4ffNOWKghtMjhnUPsXsVA==} engines: {node: ^20.19.0 || >=22.12.0} cpu: [ppc64] os: [linux] + libc: [glibc] '@oxlint/binding-linux-riscv64-gnu@1.66.0': resolution: {integrity: sha512-s0iXPDQVdgayE3RGa/N2DZF7tjgg0TwEtD1sGoDxqPDGrIXgo45H0yHknT0f9A0yteASsweYZtDyTuVlM4aSag==} engines: {node: ^20.19.0 || >=22.12.0} cpu: [riscv64] os: [linux] + libc: [glibc] '@oxlint/binding-linux-riscv64-musl@1.66.0': resolution: {integrity: sha512-OekL4XFiu7RPK0JIZi8VeHgtIXPREf42t8Cy/rKEsC+P3gcqDgNAAGiyuUOpdbG4wwbfue1q4CHcCO7spSve6w==} engines: {node: ^20.19.0 || >=22.12.0} cpu: [riscv64] os: [linux] + libc: [musl] '@oxlint/binding-linux-s390x-gnu@1.66.0': resolution: {integrity: sha512-Ga1D0kj1SFslm34ThA/BdkUlyAYEnTsXyRC4pF0C5agZSwtGdHYWMTQWemUfBGp4RCG4QWXgdO+HmmmKqOtlBg==} engines: {node: ^20.19.0 || >=22.12.0} cpu: [s390x] os: [linux] + libc: [glibc] '@oxlint/binding-linux-x64-gnu@1.66.0': resolution: {integrity: sha512-p5jfP1wUZe/IC3qpQO84n9DRnf9g3lKRtLBlQq23ykyrDglHcVx7sWmVTlPuU6SBw8mNnPzyOn022G3XZHnlww==} engines: {node: ^20.19.0 || >=22.12.0} cpu: [x64] os: [linux] + libc: [glibc] '@oxlint/binding-linux-x64-musl@1.66.0': resolution: {integrity: sha512-vUB/sYlYZorDL1ZD+o9mRv7zbsykrrFRtmgS6R8musZqLtrPRQn1gc1eGpuX+sfdccz42STl/AqldY6XRb2upQ==} engines: {node: ^20.19.0 || >=22.12.0} cpu: [x64] os: [linux] + libc: [musl] '@oxlint/binding-openharmony-arm64@1.66.0': resolution: {integrity: sha512-yde+6p/F59xRkGR9H1HfngWRif1QRJjynZK349l+UI0H6w9hL3G8/AVaTHFyTtLVQ56qtNbX2/5Dc77n1ovnOg==} @@ -1677,66 +1745,79 @@ packages: resolution: {integrity: sha512-F8sWbhZ7tyuEfsmOxwc2giKDQzN3+kuBLPwwZGyVkLlKGdV1nvnNwYD0fKQ8+XS6hp9nY7B+ZeK01EBUE7aHaw==} cpu: [arm] os: [linux] + libc: [glibc] '@rollup/rollup-linux-arm-musleabihf@4.57.1': resolution: {integrity: sha512-rGfNUfn0GIeXtBP1wL5MnzSj98+PZe/AXaGBCRmT0ts80lU5CATYGxXukeTX39XBKsxzFpEeK+Mrp9faXOlmrw==} cpu: [arm] os: [linux] + libc: [musl] '@rollup/rollup-linux-arm64-gnu@4.57.1': resolution: {integrity: sha512-MMtej3YHWeg/0klK2Qodf3yrNzz6CGjo2UntLvk2RSPlhzgLvYEB3frRvbEF2wRKh1Z2fDIg9KRPe1fawv7C+g==} cpu: [arm64] os: [linux] + libc: [glibc] '@rollup/rollup-linux-arm64-musl@4.57.1': resolution: {integrity: sha512-1a/qhaaOXhqXGpMFMET9VqwZakkljWHLmZOX48R0I/YLbhdxr1m4gtG1Hq7++VhVUmf+L3sTAf9op4JlhQ5u1Q==} cpu: [arm64] os: [linux] + libc: [musl] '@rollup/rollup-linux-loong64-gnu@4.57.1': resolution: {integrity: sha512-QWO6RQTZ/cqYtJMtxhkRkidoNGXc7ERPbZN7dVW5SdURuLeVU7lwKMpo18XdcmpWYd0qsP1bwKPf7DNSUinhvA==} cpu: [loong64] os: [linux] + libc: [glibc] '@rollup/rollup-linux-loong64-musl@4.57.1': resolution: {integrity: sha512-xpObYIf+8gprgWaPP32xiN5RVTi/s5FCR+XMXSKmhfoJjrpRAjCuuqQXyxUa/eJTdAE6eJ+KDKaoEqjZQxh3Gw==} cpu: [loong64] os: [linux] + libc: [musl] '@rollup/rollup-linux-ppc64-gnu@4.57.1': resolution: {integrity: sha512-4BrCgrpZo4hvzMDKRqEaW1zeecScDCR+2nZ86ATLhAoJ5FQ+lbHVD3ttKe74/c7tNT9c6F2viwB3ufwp01Oh2w==} cpu: [ppc64] os: [linux] + libc: [glibc] '@rollup/rollup-linux-ppc64-musl@4.57.1': resolution: {integrity: sha512-NOlUuzesGauESAyEYFSe3QTUguL+lvrN1HtwEEsU2rOwdUDeTMJdO5dUYl/2hKf9jWydJrO9OL/XSSf65R5+Xw==} cpu: [ppc64] os: [linux] + libc: [musl] '@rollup/rollup-linux-riscv64-gnu@4.57.1': resolution: {integrity: sha512-ptA88htVp0AwUUqhVghwDIKlvJMD/fmL/wrQj99PRHFRAG6Z5nbWoWG4o81Nt9FT+IuqUQi+L31ZKAFeJ5Is+A==} cpu: [riscv64] os: [linux] + libc: [glibc] '@rollup/rollup-linux-riscv64-musl@4.57.1': resolution: {integrity: sha512-S51t7aMMTNdmAMPpBg7OOsTdn4tySRQvklmL3RpDRyknk87+Sp3xaumlatU+ppQ+5raY7sSTcC2beGgvhENfuw==} cpu: [riscv64] os: [linux] + libc: [musl] '@rollup/rollup-linux-s390x-gnu@4.57.1': resolution: {integrity: sha512-Bl00OFnVFkL82FHbEqy3k5CUCKH6OEJL54KCyx2oqsmZnFTR8IoNqBF+mjQVcRCT5sB6yOvK8A37LNm/kPJiZg==} cpu: [s390x] os: [linux] + libc: [glibc] '@rollup/rollup-linux-x64-gnu@4.57.1': resolution: {integrity: sha512-ABca4ceT4N+Tv/GtotnWAeXZUZuM/9AQyCyKYyKnpk4yoA7QIAuBt6Hkgpw8kActYlew2mvckXkvx0FfoInnLg==} cpu: [x64] os: [linux] + libc: [glibc] '@rollup/rollup-linux-x64-musl@4.57.1': resolution: {integrity: sha512-HFps0JeGtuOR2convgRRkHCekD7j+gdAuXM+/i6kGzQtFhlCtQkpwtNzkNj6QhCDp7DRJ7+qC/1Vg2jt5iSOFw==} cpu: [x64] os: [linux] + libc: [musl] '@rollup/rollup-openbsd-x64@4.57.1': resolution: {integrity: sha512-H+hXEv9gdVQuDTgnqD+SQffoWoc0Of59AStSzTEj/feWTBAnSfSD3+Dql1ZruJQxmykT/JVY0dE8Ka7z0DH1hw==} @@ -1908,24 +1989,28 @@ packages: engines: {node: '>= 10'} cpu: [arm64] os: [linux] + libc: [glibc] '@tailwindcss/oxide-linux-arm64-musl@4.1.18': resolution: {integrity: sha512-1px92582HkPQlaaCkdRcio71p8bc8i/ap5807tPRDK/uw953cauQBT8c5tVGkOwrHMfc2Yh6UuxaH4vtTjGvHg==} engines: {node: '>= 10'} cpu: [arm64] os: [linux] + libc: [musl] '@tailwindcss/oxide-linux-x64-gnu@4.1.18': resolution: {integrity: sha512-v3gyT0ivkfBLoZGF9LyHmts0Isc8jHZyVcbzio6Wpzifg/+5ZJpDiRiUhDLkcr7f/r38SWNe7ucxmGW3j3Kb/g==} engines: {node: '>= 10'} cpu: [x64] os: [linux] + libc: [glibc] '@tailwindcss/oxide-linux-x64-musl@4.1.18': resolution: {integrity: sha512-bhJ2y2OQNlcRwwgOAGMY0xTFStt4/wyU6pvI6LSuZpRgKQwxTec0/3Scu91O8ir7qCR3AuepQKLU/kX99FouqQ==} engines: {node: '>= 10'} cpu: [x64] os: [linux] + libc: [musl] '@tailwindcss/oxide-wasm32-wasi@4.1.18': resolution: {integrity: sha512-LffYTvPjODiP6PT16oNeUQJzNVyJl1cjIebq/rWWBF+3eDst5JGEFSc5cWxyRCJ0Mxl+KyIkqRxk1XPEs9x8TA==} @@ -2174,24 +2259,28 @@ packages: engines: {node: ^20.19.0 || >=22.12.0} cpu: [arm64] os: [linux] + libc: [glibc] '@voidzero-dev/vite-plus-linux-arm64-musl@0.1.20': resolution: {integrity: sha512-Oh/pxMdTLR/wsDl/OONjItjLOeTewFBLuKkH5RQmcI9g3AVqKzLj1/uawujgysBI5E25tonRRK7I2q/zu8Uqvg==} engines: {node: ^20.19.0 || >=22.12.0} cpu: [arm64] os: [linux] + libc: [musl] '@voidzero-dev/vite-plus-linux-x64-gnu@0.1.20': resolution: {integrity: sha512-msO1ZoUX5aSK8L6kN1C3XQO4CcH9aFsNPRSNcO1cjk1kTnaLyVYzkVxgvbh3vk7nzZAAMkmyZ4SlMpqJrdahrg==} engines: {node: ^20.19.0 || >=22.12.0} cpu: [x64] os: [linux] + libc: [glibc] '@voidzero-dev/vite-plus-linux-x64-musl@0.1.20': resolution: {integrity: sha512-U93urREvg23ZFDkxKkkfWWIOI4GI9erhbWAZpXG+GeYqygWKrVC6PUTXiuexVg3/CFg2sSMTdm1W6V7TFG5hYA==} engines: {node: ^20.19.0 || >=22.12.0} cpu: [x64] os: [linux] + libc: [musl] '@voidzero-dev/vite-plus-test@0.1.20': resolution: {integrity: sha512-vy2dJYw1bhgQ/+BrQrfwPlSKzQ2mm3YLJ9kGF7Yo0UJ2P3XKpshtgFIWLjSg/IASnC93OAx0c/7j3NM0I1RMuA==} @@ -2844,24 +2933,28 @@ packages: engines: {node: '>= 12.0.0'} cpu: [arm64] os: [linux] + libc: [glibc] lightningcss-linux-arm64-musl@1.30.2: resolution: {integrity: sha512-5Vh9dGeblpTxWHpOx8iauV02popZDsCYMPIgiuw97OJ5uaDsL86cnqSFs5LZkG3ghHoX5isLgWzMs+eD1YzrnA==} engines: {node: '>= 12.0.0'} cpu: [arm64] os: [linux] + libc: [musl] lightningcss-linux-x64-gnu@1.30.2: resolution: {integrity: sha512-Cfd46gdmj1vQ+lR6VRTTadNHu6ALuw2pKR9lYq4FnhvgBc4zWY1EtZcAc6EffShbb1MFrIPfLDXD6Xprbnni4w==} engines: {node: '>= 12.0.0'} cpu: [x64] os: [linux] + libc: [glibc] lightningcss-linux-x64-musl@1.30.2: resolution: {integrity: sha512-XJaLUUFXb6/QG2lGIW6aIk6jKdtjtcffUT0NKvIqhSBY3hh9Ch+1LCeH80dR9q9LBjG3ewbDjnumefsLsP6aiA==} engines: {node: '>= 12.0.0'} cpu: [x64] os: [linux] + libc: [musl] lightningcss-win32-arm64-msvc@1.30.2: resolution: {integrity: sha512-FZn+vaj7zLv//D/192WFFVA0RgHawIcHqLX9xuWiQt7P0PtdFEVaxgF9rjM/IRYHQXNnk61/H/gb2Ei+kUQ4xQ==}