refactor: Unify return types and casing by adnanrhussain · Pull Request #19 · learning-commons-org/evaluators

adnanrhussain · 2026-03-10T00:51:16Z

Please review this PR by each commit. This PR primarily refactors and unifies a consistent result type across the evals and ensure the metadata is accurate

3a0b96c235505ca92ac324269bf05c741b7f82d0

refactor(types): unify TextComplexityLevel and tighten evaluator return types

Rename ComplexityLevel → TextComplexityLevel; all evaluators now share a single snake_case enum (slightly/moderately/very/exceedingly_complex)
Remove VocabularyComplexityLevel (lowercase spaced) in favour of unified enum; complexity_score now uses snake_case values
Add COMPLEXITY_LEVEL_LABELS map for display strings (e.g. 'Slightly complex')
Rename internal types: VocabularyComplexity → VocabularyInternal, GradeLevelAppropriateness → GradeLevelAppropriatenessInternal
Move SentenceStructureInternal to its schema file and export it
Tighten EvaluationResult TScore: string → TextComplexityLevel (vocab, SS) and string → GradeBand (GLA)
Redesign TextComplexityEvaluator to return TextComplexityResult map instead of a nested EvaluationResult wrapper; each key holds the sub-evaluator result or { error: Error } directly
Remove buildCombinedReasoning; callers access per-evaluator reasoning directly from the result map
Make runSubEvaluator generic to preserve TScore/TInternal types through the p-limit boundary
normalizeLabel normalises LLM output variations to canonical snake_case values; returns TextComplexityLevel | null (no cast needed)
Update unit and integration tests for new enum values and result shape

14b91d1c0c08c52e781acd5fab7545156bccf752
Removed unused exports and metadata, like promptVersion and timestamp
Fixed the telemetry endpoint

…rn types - Rename ComplexityLevel → TextComplexityLevel; all evaluators now share a single snake_case enum (slightly/moderately/very/exceedingly_complex) - Remove VocabularyComplexityLevel (lowercase spaced) in favour of unified enum; complexity_score now uses snake_case values - Add COMPLEXITY_LEVEL_LABELS map for display strings (e.g. 'Slightly complex') - Rename internal types: VocabularyComplexity → VocabularyInternal, GradeLevelAppropriateness → GradeLevelAppropriatenessInternal - Move SentenceStructureInternal to its schema file and export it - Tighten EvaluationResult TScore: string → TextComplexityLevel (vocab, SS) and string → GradeBand (GLA) - Redesign TextComplexityEvaluator to return TextComplexityResult map instead of a nested EvaluationResult wrapper; each key holds the sub-evaluator result or { error: Error } directly - Remove buildCombinedReasoning; callers access per-evaluator reasoning directly from the result map - Make runSubEvaluator generic to preserve TScore/TInternal types through the p-limit boundary - normalizeLabel normalises LLM output variations to canonical snake_case values; returns TextComplexityLevel | null (no cast needed) - Update unit and integration tests for new enum values and result shape

codecov · 2026-03-10T01:00:00Z

Codecov Report

❌ Patch coverage is 80.95238% with 4 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...ript/src/evaluators/grade-level-appropriateness.ts	50.00%	1 Missing ⚠️
...ks/typescript/src/evaluators/sentence-structure.ts	83.33%	1 Missing ⚠️
sdks/typescript/src/evaluators/text-complexity.ts	88.88%	1 Missing ⚠️
sdks/typescript/src/evaluators/vocabulary.ts	66.66%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

czi-fsisenda

Looks good!
Just one comment that is unrelated to the primary change in this PR so you can address it here or later.

czi-fsisenda · 2026-03-11T07:04:11Z

sdks/typescript/src/evaluators/grade-level-appropriateness.ts

        reasoning: response.data.reasoning,
        metadata: {
-          promptVersion: '1.2.0',
+          promptVersion: '1.0.0',


I think updates to the prompt should result in us creating a new prompt file. The convention could be [eval_id]_[eval_ver]/user.txt or [eval_id]/user_[eval_ver].txt
Updating the prompt version without updating the eval version misrepresents the evaluator. Updating both seems redundant. Having a prompt version seems to just introduce the opportunity to mismatch evaluator and prompt versions.

adnanrhussain added 2 commits March 9, 2026 17:27

fix: set correct promptVersion per evaluator based on prompt changelog

14b91d1

adnanrhussain requested a review from czi-fsisenda March 10, 2026 00:51

czi-fsisenda approved these changes Mar 11, 2026

View reviewed changes

adnanrhussain added 4 commits March 11, 2026 21:29

refactor: remove unused metadata

d465fd5

refactor: remove unused exports

9cee78e

refactor: remove unused metadata

f72c1c4

fix: telemetry endpoint

128724e

adnanrhussain merged commit f21a704 into ahussain/sdk_typescript Mar 12, 2026
9 checks passed

adnanrhussain deleted the ahussain/hardening branch March 12, 2026 04:58

adnanrhussain mentioned this pull request Mar 12, 2026

feat: Release TypeScript SDK 0.1.0 #20

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: Unify return types and casing#19

refactor: Unify return types and casing#19
adnanrhussain merged 6 commits intoahussain/sdk_typescriptfrom
ahussain/hardening

adnanrhussain commented Mar 10, 2026 •

edited

Loading

Uh oh!

codecov bot commented Mar 10, 2026

Uh oh!

czi-fsisenda left a comment

Uh oh!

czi-fsisenda Mar 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

adnanrhussain commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 10, 2026

Codecov Report

Uh oh!

czi-fsisenda left a comment

Choose a reason for hiding this comment

Uh oh!

czi-fsisenda Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

adnanrhussain commented Mar 10, 2026 •

edited

Loading