Skip to content

refactor: Unify return types and casing#19

Merged
adnanrhussain merged 6 commits intoahussain/sdk_typescriptfrom
ahussain/hardening
Mar 12, 2026
Merged

refactor: Unify return types and casing#19
adnanrhussain merged 6 commits intoahussain/sdk_typescriptfrom
ahussain/hardening

Conversation

@adnanrhussain
Copy link
Contributor

@adnanrhussain adnanrhussain commented Mar 10, 2026

Please review this PR by each commit. This PR primarily refactors and unifies a consistent result type across the evals and ensure the metadata is accurate

  1. 3a0b96c235505ca92ac324269bf05c741b7f82d0

refactor(types): unify TextComplexityLevel and tighten evaluator return types

  • Rename ComplexityLevel → TextComplexityLevel; all evaluators now share a single snake_case enum (slightly/moderately/very/exceedingly_complex)
  • Remove VocabularyComplexityLevel (lowercase spaced) in favour of unified enum; complexity_score now uses snake_case values
  • Add COMPLEXITY_LEVEL_LABELS map for display strings (e.g. 'Slightly complex')
  • Rename internal types: VocabularyComplexity → VocabularyInternal, GradeLevelAppropriateness → GradeLevelAppropriatenessInternal
  • Move SentenceStructureInternal to its schema file and export it
  • Tighten EvaluationResult TScore: string → TextComplexityLevel (vocab, SS) and string → GradeBand (GLA)
  • Redesign TextComplexityEvaluator to return TextComplexityResult map instead of a nested EvaluationResult wrapper; each key holds the sub-evaluator result or { error: Error } directly
  • Remove buildCombinedReasoning; callers access per-evaluator reasoning directly from the result map
  • Make runSubEvaluator generic to preserve TScore/TInternal types through the p-limit boundary
  • normalizeLabel normalises LLM output variations to canonical snake_case values; returns TextComplexityLevel | null (no cast needed)
  • Update unit and integration tests for new enum values and result shape
  1. 14b91d1c0c08c52e781acd5fab7545156bccf752

  2. Removed unused exports and metadata, like promptVersion and timestamp

  3. Fixed the telemetry endpoint

…rn types

  - Rename ComplexityLevel → TextComplexityLevel; all evaluators now share a single snake_case enum (slightly/moderately/very/exceedingly_complex)
  - Remove VocabularyComplexityLevel (lowercase spaced) in favour of unified enum; complexity_score now uses snake_case values
  - Add COMPLEXITY_LEVEL_LABELS map for display strings (e.g. 'Slightly complex')
  - Rename internal types: VocabularyComplexity → VocabularyInternal, GradeLevelAppropriateness → GradeLevelAppropriatenessInternal
  - Move SentenceStructureInternal to its schema file and export it
  - Tighten EvaluationResult TScore: string → TextComplexityLevel (vocab, SS) and string → GradeBand (GLA)
  - Redesign TextComplexityEvaluator to return TextComplexityResult map instead of a nested EvaluationResult wrapper; each key holds the sub-evaluator result or { error: Error } directly
  - Remove buildCombinedReasoning; callers access per-evaluator reasoning directly from the result map
  - Make runSubEvaluator generic to preserve TScore/TInternal types through the p-limit boundary
  - normalizeLabel normalises LLM output variations to canonical snake_case values; returns TextComplexityLevel | null (no cast needed)
  - Update unit and integration tests for new enum values and result shape
@codecov
Copy link

codecov bot commented Mar 10, 2026

Copy link

@czi-fsisenda czi-fsisenda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!
Just one comment that is unrelated to the primary change in this PR so you can address it here or later.

reasoning: response.data.reasoning,
metadata: {
promptVersion: '1.2.0',
promptVersion: '1.0.0',

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think updates to the prompt should result in us creating a new prompt file. The convention could be [eval_id]_[eval_ver]/user.txt or [eval_id]/user_[eval_ver].txt
Updating the prompt version without updating the eval version misrepresents the evaluator. Updating both seems redundant. Having a prompt version seems to just introduce the opportunity to mismatch evaluator and prompt versions.

@adnanrhussain adnanrhussain merged commit f21a704 into ahussain/sdk_typescript Mar 12, 2026
9 checks passed
@adnanrhussain adnanrhussain deleted the ahussain/hardening branch March 12, 2026 04:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants