v1 by mattpocock · Pull Request #240 · mattpocock/evalite

mattpocock · 2025-10-19T11:45:37Z

No description provided.

changeset-bot · 2025-10-19T11:45:41Z

🦋 Changeset detected

Latest commit: 647e587

The changes in this PR will be included in the next version bump.

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

vercel · 2025-10-19T11:45:41Z

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
evalite	Ignored		Mar 27, 2026 10:54am

pkg-pr-new · 2025-10-19T11:46:34Z

Open in StackBlitz

npm i https://pkg.pr.new/evalite@240

commit: 647e587

mattpocock · 2025-10-19T11:55:29Z

@claude Remove the BACKEND_ONLY_CONSTANT import that's causing the error in CI

claude · 2025-10-19T11:55:47Z

Claude finished @mattpocock's task —— View job

Task completed

Check CI status to understand the error
Read modified files to locate the problematic import
Remove the BACKEND_ONLY_CONSTANT imports
Commit and push the fix

Fixed the CI error:

The error was caused by imports from evalite/backend-only-constants that referenced exports that no longer exist:

test-utils.ts: Removed unused DB_LOCATION import
files.test.ts: Replaced FILES_LOCATION import with a local constant "./node_modules/.evalite/files"

The changes have been committed and pushed to the v1 branch. CI should now pass!

Job Run | Branch: v1

- Remove unused DB_LOCATION import from test-utils.ts - Replace FILES_LOCATION import with local constant in files.test.ts Co-authored-by: Matt Pocock <mattpocock@users.noreply.github.com>

- Add dotenv as a dependency - Create env-setup-file module that imports dotenv/config - Export env-setup-file as 'evalite/env-setup-file' - Automatically prepend env-setup-file to setupFiles array - Update documentation to reflect automatic .env loading - Update example config to remove manual dotenv setup Fixes #234 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: Matt Pocock <mattpocock@users.noreply.github.com>

… precedence - Add loadVitestSetupFiles() to load setupFiles from vitest.config.ts - Merge setupFiles from both configs with evalite.config.ts taking precedence - Add tests for vitest.config.ts setupFiles support and precedence - setupFiles execution order: env-setup-file -> vitest -> evalite Co-authored-by: Matt Pocock <mattpocock@users.noreply.github.com>

…gorization guidelines; create OUT_OF_SCOPE.md for triage decisions.

…ions, data, observability, experimentation, display, and relationships

…ions

…eps for prioritization

…iage process

…rompts, and remove review prompt

The exported static UI requires a static file server due to absolute asset paths, ES module CORS restrictions, and fetch calls that fail from file://. Updated docs to clarify a static file server is needed. Files changed: - apps/evalite-docs/src/content/docs/tips/run-evals-on-ci-cd.mdx - apps/evalite-docs/src/content/docs/api/cli.mdx Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ot be pushed to remote before task completion

Instead of throwing on unsupported content types (reasoning, file, etc.), handlePromptContent now returns null and processPromptForTracing filters out nulls. This fixes crashes when using AI SDK with thinking models that include reasoning parts in assistant messages. Key decisions: - Return null + filter instead of silently converting to avoid data loss - Generic fix covers all unsupported types, not just reasoning Files changed: - packages/evalite/src/ai-sdk.ts (handlePromptContent + processPromptForTracing) - packages/evalite-tests/tests/ai-sdk-reasoning.test.ts (new test) - packages/evalite-tests/tests/fixtures/ai-sdk-reasoning/reasoning.eval.ts (new fixture) - .changeset/0000-handle-reasoning-content.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… - Issue #386 Added @typescript/native-preview (tsgo v7.0) to root dependencies and updated all typecheck scripts to use tsgo instead of tsc. Build scripts still use tsc for emit. Key decisions: - tsgo used for typecheck only; tsc retained for build/emit in packages/evalite - Removed deprecated baseUrl option from evalite-ui tsconfig (tsgo 7.0 requires paths-only) - Used caret range for @typescript/native-preview to track latest 7.x dev releases Files changed: - package.json: added @typescript/native-preview dependency - pnpm-lock.yaml: updated lockfile - packages/evalite/package.json: typecheck script tsc -> tsgo - packages/evalite-tests/package.json: typecheck script tsc --noEmit -> tsgo --noEmit - apps/evalite-ui/package.json: typecheck script tsc --noEmit -> tsgo --noEmit - apps/evalite-ui/tsconfig.json: removed deprecated baseUrl option All typecheck and tests pass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

PRD: Issue #370 - evalite export does not fail if threshold is not reached Key decisions: - Added scoreThreshold parameter to exportCommand - Threshold check runs after export completes (works for both auto-run and pre-existing data) - Passes scoreThreshold to runEvalite during auto-run for console output - Reuses same threshold semantics as run command (0-100 scale, exit code 1 if below) Files changed: - packages/evalite/src/export-static.ts: Added scoreThreshold param and post-export threshold check - packages/evalite/src/command.ts: Added --threshold CLI flag to export command - packages/evalite-tests/tests/export-static.test.ts: Added 3 tests for threshold behavior - .changeset/0000-export-threshold.md: Added changeset Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…de-code/20260327-102352

…le/claude-code/20260327-102352

…claude-code/20260327-102352

…astle/claude-code/20260327-102352

… Issue #386 Key decisions: - Replaced tsc with tsgo in evalite build and dev scripts (tsgo supports emit and watch) - Removed typescript from root dependencies and resolutions override - Moved typescript to devDependencies (still needed as peer dep for typescript-eslint) - Removed typescript from evalite-ui devDependencies (not needed by Vite or tsgo) Files changed: - package.json: removed typescript from dependencies/resolutions, added to devDependencies - packages/evalite/package.json: build/dev scripts now use tsgo instead of tsc - apps/evalite-ui/package.json: removed typescript devDependency - pnpm-lock.yaml: updated lockfile All tests (110), typecheck, and lint pass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

PRD: Issue #223 - Eval duration not being tracked properly Key decisions: - Suite duration = wall-clock time from creation to completion (Date.now() - created_at) - Added optional `duration` field to Suites.UpdateOpts type - SQLite uses COALESCE to only update duration when provided - In-memory storage preserves existing duration when not provided (was hardcoded to 0) Files changed: - packages/evalite/src/types.ts: Added optional duration to UpdateOpts - packages/evalite/src/reporter/EvaliteRunner.ts: Compute and pass duration on suite completion - packages/evalite/src/storage/sqlite.ts: Handle duration in updateSuiteStatus - packages/evalite/src/storage/in-memory.ts: Use opts.duration instead of hardcoded 0 - packages/evalite-tests/tests/basics.test.ts: Unskipped duration test - .changeset/0000-eval-duration-tracking.md: Added changeset Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

PRD: Issue #354 - Evalite Creates Cache Directory Regardless of cacheEnabled Setting Key decisions: - Moved mkdir call from before config loading to after cacheEnabled is resolved - Guard mkdir with if (cacheEnabled) so directory is only created when needed - FILES_LOCATION (node_modules/.evalite/files/) is only for caching, so skip when disabled Files changed: - packages/evalite/src/run-evalite.ts: Moved and guarded mkdir call - packages/evalite-tests/tests/cache-dir.test.ts: New test verifying cache dir behavior - packages/evalite-tests/tests/fixtures/issue-354/issue-354.eval.ts: Minimal test fixture - .changeset/0000-cache-dir-guard.md: Patch changeset Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…to sandcastle/claude-code/20260327-104811

…astle/claude-code/20260327-104811

…ation tracking - sandcastle/issue-386-upgrade-to-tsgo: Replaced tsc with tsgo (@typescript/native-preview) for type checking and build - sandcastle/issue-354-cache-dir-regardless-of-config: Only create cache directory when cacheEnabled is true (fixes #354) - sandcastle/issue-223-eval-duration-tracking: Fix suite duration tracking so eval durations are stored correctly (fixes #223) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

mattpocock mentioned this pull request Oct 19, 2025

feat: Support .env files by default via dotenv/config #243

Closed

mattpocock added this to the v1 milestone Oct 20, 2025

This was referenced Oct 21, 2025

Build a library of scorers #250

Closed

Move from React Markdown to Streamdown #256

Closed

mattpocock force-pushed the v1 branch from 9423bf2 to df9484b Compare October 23, 2025 09:32

mattpocock force-pushed the v1 branch 3 times, most recently from c69a19d to 4ae1080 Compare November 6, 2025 14:48

mattpocock mentioned this pull request Nov 8, 2025

Add 'copy' button to each page of the docs #307

Open

mattpocock force-pushed the v1 branch from a5f098c to 8c4667c Compare November 8, 2025 13:51

This was referenced Nov 9, 2025

Remove implicit reading of vitest.config.ts/vite.config.ts files #303

Closed

Add failing test for issue #95 (vitest workspace conflict) #245

Closed

Ability to specify which config files are read (--config) #296

Closed

mattpocock force-pushed the v1 branch from 9b843a9 to 08e62c2 Compare November 9, 2025 17:19

mattpocock and others added 11 commits November 10, 2025 17:39

Changed default storage to in-memory. SQLite still available via config.

93113dc

Remove problematic backend-only-constants imports

82ef941

- Remove unused DB_LOCATION import from test-utils.ts - Replace FILES_LOCATION import with local constant in files.test.ts Co-authored-by: Matt Pocock <mattpocock@users.noreply.github.com>

Fixed CI properly

e586e61

Huge move from evals -> suites, and results -> evals

fca9086

Added changeset

172f5e1

Removed streaming text support from tasks.

460a77e

Fixes after cherrypick

8d8ec99

Formatting

926e1b8

Docs updates

f8a928c

mattpocock and others added 30 commits March 26, 2026 20:39

Update

159e54a

Update SKILL.md to refine issue fetching command and add feature cate…

fbd6407

…gorization guidelines; create OUT_OF_SCOPE.md for triage decisions.

Updated the evalite triage skill

52035dc

Add Ubiquitous Language documentation for evaluation lifecycle, funct…

786f7ea

…ions, data, observability, experimentation, display, and relationships

Update CLAUDE.md to clarify project structure and documentation locat…

cf4ff62

…ions

Update SKILL.md to improve issue triage documentation and add next st…

2cca8b6

…eps for prioritization

Update SKILL.md to clarify issue labeling definitions for improved tr…

8d65092

…iage process

Add environment variable configuration, update planning and merging p…

3fbd554

…rompts, and remove review prompt

Update Dockerfile and prompts to use pnpm for testing

6931051

RALPH: Update implementation guidelines to clarify that code should n…

6f09225

…ot be pushed to remote before task completion

Update sandbox hooks to include build step after installation

49f7e6e

Merge branch 'sandcastle/issue-386-upgrade-tsgo' into sandcastle/clau…

98abad7

…de-code/20260327-102352

Merge branch 'sandcastle/issue-375-fix-standalone-docs' into sandcast…

7958d78

…le/claude-code/20260327-102352

Merge branch 'sandcastle/issue-370-export-threshold' into sandcastle/…

0bf3002

…claude-code/20260327-102352

Merge branch 'sandcastle/issue-361-reasoning-content-type' into sandc…

83dcb96

…astle/claude-code/20260327-102352

Update sandbox hooks command and clarify merge prompt instructions

23f9071

Update Dockerfile to install pnpm globally instead of enabling corepack

da96418

Remove copyToSandbox hook from main orchestration loop

bf30fc0

Merge branch 'sandcastle/issue-354-cache-dir-regardless-of-config' in…

a0ef387

…to sandcastle/claude-code/20260327-104811

Merge branch 'sandcastle/issue-223-eval-duration-tracking' into sandc…

15d951a

…astle/claude-code/20260327-104811

Updated the implement prompt

92c46f3

Bump version

647e587

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1#240

v1#240
mattpocock wants to merge 190 commits intomainfrom
v1

mattpocock commented Oct 19, 2025

Uh oh!

changeset-bot bot commented Oct 19, 2025 •

edited

Loading

Uh oh!

vercel bot commented Oct 19, 2025 •

edited

Loading

Uh oh!

pkg-pr-new bot commented Oct 19, 2025 •

edited

Loading

Uh oh!

mattpocock commented Oct 19, 2025

Uh oh!

claude bot commented Oct 19, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

mattpocock commented Oct 19, 2025

Uh oh!

changeset-bot bot commented Oct 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

vercel bot commented Oct 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pkg-pr-new bot commented Oct 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattpocock commented Oct 19, 2025

Uh oh!

claude bot commented Oct 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Task completed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

changeset-bot bot commented Oct 19, 2025 •

edited

Loading

vercel bot commented Oct 19, 2025 •

edited

Loading

pkg-pr-new bot commented Oct 19, 2025 •

edited

Loading

claude bot commented Oct 19, 2025 •

edited

Loading