-
Notifications
You must be signed in to change notification settings - Fork 7
feat: Implement core evaluator files and Vocab implementation #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
b430284
3d01f04
fb820ab
e88a60e
b60aade
4951f09
d933db9
c48e673
7750105
7f5b5da
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1 +1,179 @@ | ||
| # @learning-commons/evaluators | ||
|
|
||
| TypeScript SDK for Learning Commons educational text complexity evaluators. | ||
|
|
||
| ## Installation | ||
|
|
||
| ```bash | ||
| npm install @learning-commons/evaluators ai | ||
| ``` | ||
|
|
||
| The SDK uses the [Vercel AI SDK](https://sdk.vercel.ai) (`ai`) as its LLM interface. You also need to install the provider adapter(s) for the LLM(s) you use: | ||
|
|
||
| ```bash | ||
| npm install @ai-sdk/openai # for OpenAI | ||
| npm install @ai-sdk/google # for Google Gemini | ||
| npm install @ai-sdk/anthropic # for Anthropic | ||
| ``` | ||
|
|
||
| ## Quick Start | ||
|
|
||
| ```typescript | ||
| import { VocabularyEvaluator } from '@learning-commons/evaluators'; | ||
|
|
||
| const evaluator = new VocabularyEvaluator({ | ||
| googleApiKey: process.env.GOOGLE_API_KEY, | ||
| openaiApiKey: process.env.OPENAI_API_KEY | ||
| }); | ||
|
|
||
| const result = await evaluator.evaluate("Your text here", "5"); | ||
| console.log(result.score); // "moderately complex" | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Evaluators | ||
|
|
||
| ### 1. Vocabulary Evaluator | ||
|
|
||
| Evaluates vocabulary complexity using the Qual Text Complexity rubric (SAP). | ||
|
|
||
| **Supported Grades:** 3-12 | ||
|
|
||
| **Uses:** Google Gemini 2.5 Pro + OpenAI GPT-4o | ||
|
|
||
| **Constructor:** | ||
| ```typescript | ||
| const evaluator = new VocabularyEvaluator({ | ||
| googleApiKey: string; // Required - Google API key | ||
| openaiApiKey: string; // Required - OpenAI API key | ||
| maxRetries?: number; // Optional - Max retry attempts (default: 2) | ||
| telemetry?: boolean | TelemetryOptions; // Optional (default: true) | ||
| logger?: Logger; // Optional - Custom logger | ||
| logLevel?: LogLevel; // Optional - SILENT | ERROR | WARN | INFO | DEBUG (default: WARN) | ||
| }); | ||
| ``` | ||
|
|
||
| **API:** | ||
| ```typescript | ||
| await evaluator.evaluate(text: string, grade: string) | ||
| ``` | ||
|
|
||
| **Returns:** | ||
| ```typescript | ||
| { | ||
| score: 'slightly complex' | 'moderately complex' | 'very complex' | 'exceedingly complex'; | ||
| reasoning: string; | ||
| metadata: { | ||
| promptVersion: string; | ||
| model: string; | ||
| timestamp: Date; | ||
| processingTimeMs: number; | ||
| }; | ||
| _internal: VocabularyComplexity; // Detailed analysis | ||
| } | ||
| ``` | ||
|
|
||
| ## Error Handling | ||
|
|
||
| The SDK provides specific error types to help you handle different scenarios: | ||
|
|
||
| ```typescript | ||
| import { | ||
| ConfigurationError, | ||
| ValidationError, | ||
| APIError, | ||
| AuthenticationError, | ||
| RateLimitError, | ||
| NetworkError, | ||
| TimeoutError, | ||
| } from '@learning-commons/evaluators'; | ||
|
|
||
| try { | ||
| const evaluator = new VocabularyEvaluator({ googleApiKey, openaiApiKey }); | ||
| const result = await evaluator.evaluate(text, grade); | ||
| } catch (error) { | ||
| if (error instanceof ConfigurationError) { | ||
| // Missing or invalid API keys — fix your config | ||
| console.error('Configuration error:', error.message); | ||
| } else if (error instanceof ValidationError) { | ||
| // Invalid input (text too short, invalid grade, etc.) | ||
| console.error('Invalid input:', error.message); | ||
| } else if (error instanceof AuthenticationError) { | ||
| // Invalid API keys | ||
| console.error('Check your API keys:', error.message); | ||
| } else if (error instanceof RateLimitError) { | ||
| // Rate limit exceeded - wait and retry | ||
| console.error('Rate limited. Retry after:', error.retryAfter); | ||
| } else if (error instanceof NetworkError) { | ||
| // Network connectivity issues | ||
| console.error('Network error:', error.message); | ||
| } else if (error instanceof APIError) { | ||
| // Other API errors | ||
| console.error('API error:', error.message, 'Status:', error.statusCode); | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Logging | ||
|
|
||
| Control logging verbosity with `logLevel`: | ||
|
|
||
| ```typescript | ||
| import { VocabularyEvaluator, LogLevel } from '@learning-commons/evaluators'; | ||
|
|
||
| const evaluator = new VocabularyEvaluator({ | ||
| googleApiKey: '...', | ||
| openaiApiKey: '...', | ||
| logLevel: LogLevel.INFO, // SILENT | ERROR | WARN | INFO | DEBUG | ||
| }); | ||
| ``` | ||
|
|
||
| Or provide a custom logger: | ||
|
|
||
| ```typescript | ||
| import type { Logger } from '@learning-commons/evaluators'; | ||
|
|
||
| const customLogger: Logger = { | ||
| debug: (msg, ctx) => myLogger.debug(msg, ctx), | ||
| info: (msg, ctx) => myLogger.info(msg, ctx), | ||
| warn: (msg, ctx) => myLogger.warn(msg, ctx), | ||
| error: (msg, ctx) => myLogger.error(msg, ctx), | ||
| }; | ||
|
|
||
| const evaluator = new VocabularyEvaluator({ | ||
| googleApiKey: '...', | ||
| openaiApiKey: '...', | ||
| logger: customLogger, | ||
| }); | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Telemetry & Privacy | ||
|
|
||
| See [docs/telemetry.md](./docs/telemetry.md) for telemetry configuration and privacy information. | ||
|
|
||
| --- | ||
|
|
||
| ## Configuration Options | ||
|
|
||
| All evaluators support these common options: | ||
|
|
||
| ```typescript | ||
| interface BaseEvaluatorConfig { | ||
| maxRetries?: number; // Max API retry attempts (default: 2) | ||
| telemetry?: boolean | TelemetryOptions; // Telemetry config (default: true) | ||
| logger?: Logger; // Custom logger (optional) | ||
| logLevel?: LogLevel; // Console log level (default: WARN) | ||
| partnerKey?: string; // Learning Commons partner key for authenticated telemetry (optional) | ||
| } | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## License | ||
|
|
||
| MIT | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,124 @@ | ||
| # Telemetry | ||
|
|
||
| ## Why We Collect Telemetry | ||
|
|
||
| We use telemetry data to improve evaluator quality, identify edge cases, and optimize performance. This helps us build better tools for our developer partners. | ||
|
|
||
| Telemetry is **anonymous by default**. If you'd like to partner with us to improve your specific use case, you can optionally provide an API key (see Configuration section below). This allows us to connect with you and collaborate more deeply. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do our terms of service cover this data? If so, should we include a link? |
||
|
|
||
| ## What We Collect | ||
|
|
||
| **By default, telemetry is enabled** and sends: | ||
| - Performance metrics (latency, token usage) | ||
| - Metadata (evaluator type, grade, SDK version) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we send input size or can we infer that from token usage? So even if we don't get the actual text, we can get the text length. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, I see text length is included. |
||
|
|
||
| **Input text is NOT collected by default.** You can opt in via `recordInputs: true` — see [Enable Input Text Collection](#enable-input-text-collection) below. | ||
|
|
||
| We **never** collect your API keys (only an anonymous identifier). | ||
|
|
||
| If you prefer not to send any telemetry, you can disable it entirely — see [Disable Telemetry Completely](#disable-telemetry-completely) below. | ||
|
|
||
| ## Example Telemetry Event | ||
|
|
||
| ```json | ||
| { | ||
| "timestamp": "2026-02-05T19:30:00.000Z", | ||
| "sdk_version": "0.1.0", | ||
| "evaluator_type": "vocabulary", | ||
| "grade": "5", | ||
| "status": "success", | ||
| "latency_ms": 3500, | ||
| "text_length_chars": 456, | ||
| "provider": "google:gemini-2.5-pro+openai:gpt-4o", | ||
| "token_usage": { | ||
| "input_tokens": 650, | ||
| "output_tokens": 350 | ||
| }, | ||
| "metadata": { | ||
| "stage_details": [ | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: Phase may be a better name for this. When I hear stage, I think deployment stage. |
||
| { | ||
| "stage": "background_knowledge", | ||
| "provider": "openai:gpt-4o-2024-11-20", | ||
| "latency_ms": 1200, | ||
| "token_usage": { | ||
| "input_tokens": 250, | ||
| "output_tokens": 150 | ||
| } | ||
| }, | ||
| { | ||
| "stage": "complexity_evaluation", | ||
| "provider": "google:gemini-2.5-pro", | ||
| "latency_ms": 2300, | ||
| "token_usage": { | ||
| "input_tokens": 400, | ||
| "output_tokens": 200 | ||
| } | ||
| } | ||
| ] | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| ## Field Reference | ||
|
|
||
| | Field | Description | | ||
| |-------|-------------| | ||
| | `timestamp` | ISO 8601 timestamp when evaluation started | | ||
| | `sdk_version` | Version of the SDK (e.g., "0.1.0") | | ||
| | `evaluator_type` | Which evaluator ran (e.g., "vocabulary", "sentence-structure") | | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we have multiple versions of the same evaluator in an SDK version would the eval version be captured in the eval name or should we have an eval version field too? |
||
| | `grade` | Grade level evaluated (e.g., "5", "K") | | ||
| | `status` | Evaluation outcome: "success" or "error" | | ||
| | `error_code` | Error type if status is "error" (e.g., "Error", "TypeError") | | ||
| | `latency_ms` | Total evaluation time in milliseconds | | ||
| | `text_length_chars` | Length of input text in characters | | ||
| | `provider` | LLM provider(s) used (e.g., "openai:gpt-4o", "google:gemini-2.5-pro+openai:gpt-4o") | | ||
| | `token_usage` | Total tokens consumed (input, output, total) | | ||
| | `input_text` | The text being evaluated (only included if `recordInputs: true`) | | ||
| | `metadata.stage_details` | Per-stage breakdown for multi-stage evaluators (optional) | | ||
|
|
||
| ## Configuration | ||
|
|
||
| ### Default (Anonymous) | ||
|
|
||
| ```typescript | ||
| const evaluator = new VocabularyEvaluator({ | ||
| googleApiKey: process.env.GOOGLE_API_KEY!, | ||
| openaiApiKey: process.env.OPENAI_API_KEY!, | ||
| // telemetry: true (default - anonymous) | ||
| }); | ||
| ``` | ||
|
|
||
| ### Partner with Us (Authenticated) | ||
|
|
||
| To help us support your specific use case, provide an API key: | ||
|
|
||
| ```typescript | ||
| const evaluator = new VocabularyEvaluator({ | ||
| googleApiKey: process.env.GOOGLE_API_KEY!, | ||
| openaiApiKey: process.env.OPENAI_API_KEY!, | ||
| partnerKey: process.env.LEARNING_COMMONS_PARTNER_KEY!, // Contact us for a key | ||
| }); | ||
| ``` | ||
|
|
||
| ### Disable Telemetry Completely | ||
|
|
||
| ```typescript | ||
| const evaluator = new VocabularyEvaluator({ | ||
| googleApiKey: process.env.GOOGLE_API_KEY!, | ||
| openaiApiKey: process.env.OPENAI_API_KEY!, | ||
| telemetry: false, // No data sent | ||
| }); | ||
| ``` | ||
|
|
||
| ### Enable Input Text Collection | ||
|
|
||
| ```typescript | ||
| const evaluator = new VocabularyEvaluator({ | ||
| googleApiKey: process.env.GOOGLE_API_KEY!, | ||
| openaiApiKey: process.env.OPENAI_API_KEY!, | ||
| telemetry: { | ||
| enabled: true, | ||
| recordInputs: true, // Also send input text with telemetry | ||
| }, | ||
| }); | ||
| ``` | ||
Uh oh!
There was an error while loading. Please reload this page.