refactor: Unify return types and casing#19
Merged
adnanrhussain merged 6 commits intoahussain/sdk_typescriptfrom Mar 12, 2026
Merged
refactor: Unify return types and casing#19adnanrhussain merged 6 commits intoahussain/sdk_typescriptfrom
adnanrhussain merged 6 commits intoahussain/sdk_typescriptfrom
Conversation
…rn types
- Rename ComplexityLevel → TextComplexityLevel; all evaluators now share a single snake_case enum (slightly/moderately/very/exceedingly_complex)
- Remove VocabularyComplexityLevel (lowercase spaced) in favour of unified enum; complexity_score now uses snake_case values
- Add COMPLEXITY_LEVEL_LABELS map for display strings (e.g. 'Slightly complex')
- Rename internal types: VocabularyComplexity → VocabularyInternal, GradeLevelAppropriateness → GradeLevelAppropriatenessInternal
- Move SentenceStructureInternal to its schema file and export it
- Tighten EvaluationResult TScore: string → TextComplexityLevel (vocab, SS) and string → GradeBand (GLA)
- Redesign TextComplexityEvaluator to return TextComplexityResult map instead of a nested EvaluationResult wrapper; each key holds the sub-evaluator result or { error: Error } directly
- Remove buildCombinedReasoning; callers access per-evaluator reasoning directly from the result map
- Make runSubEvaluator generic to preserve TScore/TInternal types through the p-limit boundary
- normalizeLabel normalises LLM output variations to canonical snake_case values; returns TextComplexityLevel | null (no cast needed)
- Update unit and integration tests for new enum values and result shape
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
czi-fsisenda
approved these changes
Mar 11, 2026
czi-fsisenda
left a comment
There was a problem hiding this comment.
Looks good!
Just one comment that is unrelated to the primary change in this PR so you can address it here or later.
| reasoning: response.data.reasoning, | ||
| metadata: { | ||
| promptVersion: '1.2.0', | ||
| promptVersion: '1.0.0', |
There was a problem hiding this comment.
I think updates to the prompt should result in us creating a new prompt file. The convention could be [eval_id]_[eval_ver]/user.txt or [eval_id]/user_[eval_ver].txt
Updating the prompt version without updating the eval version misrepresents the evaluator. Updating both seems redundant. Having a prompt version seems to just introduce the opportunity to mismatch evaluator and prompt versions.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Please review this PR by each commit. This PR primarily refactors and unifies a consistent result type across the evals and ensure the metadata is accurate
refactor(types): unify TextComplexityLevel and tighten evaluator return types
14b91d1c0c08c52e781acd5fab7545156bccf752
Removed unused exports and metadata, like
promptVersionandtimestampFixed the telemetry endpoint