Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions evals/prompts/vocabulary/grades-3-4-user.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,3 @@ Below is the text you need to evaluate. Let's think step by step in order to pre
- Text to evaluate: [BEGIN TEXT]
{text}
[END TEXT]

{format_instructions}
2 changes: 0 additions & 2 deletions evals/prompts/vocabulary/other-grades-user.txt
Original file line number Diff line number Diff line change
Expand Up @@ -135,5 +135,3 @@ As you read the text, you can assume the student has the following background kn
[END TEXT]

In your response, when specifying the level of complexity, be sure to use only a single integer (e.g. 2) and don't include any other text (e.g. don't say "level 2").

{format_instructions}
178 changes: 178 additions & 0 deletions sdks/typescript/README.md
Original file line number Diff line number Diff line change
@@ -1 +1,179 @@
# @learning-commons/evaluators

TypeScript SDK for Learning Commons educational text complexity evaluators.

## Installation

```bash
npm install @learning-commons/evaluators ai
```

The SDK uses the [Vercel AI SDK](https://sdk.vercel.ai) (`ai`) as its LLM interface. You also need to install the provider adapter(s) for the LLM(s) you use:

```bash
npm install @ai-sdk/openai # for OpenAI
npm install @ai-sdk/google # for Google Gemini
npm install @ai-sdk/anthropic # for Anthropic
```

## Quick Start

```typescript
import { VocabularyEvaluator } from '@learning-commons/evaluators';

const evaluator = new VocabularyEvaluator({
googleApiKey: process.env.GOOGLE_API_KEY,
openaiApiKey: process.env.OPENAI_API_KEY
});

const result = await evaluator.evaluate("Your text here", "5");
console.log(result.score); // "moderately complex"
```

---

## Evaluators

### 1. Vocabulary Evaluator

Evaluates vocabulary complexity using the Qual Text Complexity rubric (SAP).

**Supported Grades:** 3-12

**Uses:** Google Gemini 2.5 Pro + OpenAI GPT-4o

**Constructor:**
```typescript
const evaluator = new VocabularyEvaluator({
googleApiKey: string; // Required - Google API key
openaiApiKey: string; // Required - OpenAI API key
maxRetries?: number; // Optional - Max retry attempts (default: 2)
telemetry?: boolean | TelemetryOptions; // Optional (default: true)
logger?: Logger; // Optional - Custom logger
logLevel?: LogLevel; // Optional - SILENT | ERROR | WARN | INFO | DEBUG (default: WARN)
});
```

**API:**
```typescript
await evaluator.evaluate(text: string, grade: string)
```

**Returns:**
```typescript
{
score: 'slightly complex' | 'moderately complex' | 'very complex' | 'exceedingly complex';
reasoning: string;
metadata: {
promptVersion: string;
model: string;
timestamp: Date;
processingTimeMs: number;
};
_internal: VocabularyComplexity; // Detailed analysis
}
```

## Error Handling

The SDK provides specific error types to help you handle different scenarios:

```typescript
import {
ConfigurationError,
ValidationError,
APIError,
AuthenticationError,
RateLimitError,
NetworkError,
TimeoutError,
} from '@learning-commons/evaluators';

try {
const evaluator = new VocabularyEvaluator({ googleApiKey, openaiApiKey });
const result = await evaluator.evaluate(text, grade);
} catch (error) {
if (error instanceof ConfigurationError) {
// Missing or invalid API keys — fix your config
console.error('Configuration error:', error.message);
} else if (error instanceof ValidationError) {
// Invalid input (text too short, invalid grade, etc.)
console.error('Invalid input:', error.message);
} else if (error instanceof AuthenticationError) {
// Invalid API keys
console.error('Check your API keys:', error.message);
} else if (error instanceof RateLimitError) {
// Rate limit exceeded - wait and retry
console.error('Rate limited. Retry after:', error.retryAfter);
} else if (error instanceof NetworkError) {
// Network connectivity issues
console.error('Network error:', error.message);
} else if (error instanceof APIError) {
// Other API errors
console.error('API error:', error.message, 'Status:', error.statusCode);
}
}
```

---

## Logging

Control logging verbosity with `logLevel`:

```typescript
import { VocabularyEvaluator, LogLevel } from '@learning-commons/evaluators';

const evaluator = new VocabularyEvaluator({
googleApiKey: '...',
openaiApiKey: '...',
logLevel: LogLevel.INFO, // SILENT | ERROR | WARN | INFO | DEBUG
});
```

Or provide a custom logger:

```typescript
import type { Logger } from '@learning-commons/evaluators';

const customLogger: Logger = {
debug: (msg, ctx) => myLogger.debug(msg, ctx),
info: (msg, ctx) => myLogger.info(msg, ctx),
warn: (msg, ctx) => myLogger.warn(msg, ctx),
error: (msg, ctx) => myLogger.error(msg, ctx),
};

const evaluator = new VocabularyEvaluator({
googleApiKey: '...',
openaiApiKey: '...',
logger: customLogger,
});
```

---

## Telemetry & Privacy

See [docs/telemetry.md](./docs/telemetry.md) for telemetry configuration and privacy information.

---

## Configuration Options

All evaluators support these common options:

```typescript
interface BaseEvaluatorConfig {
maxRetries?: number; // Max API retry attempts (default: 2)
telemetry?: boolean | TelemetryOptions; // Telemetry config (default: true)
logger?: Logger; // Custom logger (optional)
logLevel?: LogLevel; // Console log level (default: WARN)
partnerKey?: string; // Learning Commons partner key for authenticated telemetry (optional)
}
```

---

## License

MIT
124 changes: 124 additions & 0 deletions sdks/typescript/docs/telemetry.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
# Telemetry

## Why We Collect Telemetry

We use telemetry data to improve evaluator quality, identify edge cases, and optimize performance. This helps us build better tools for our developer partners.

Telemetry is **anonymous by default**. If you'd like to partner with us to improve your specific use case, you can optionally provide an API key (see Configuration section below). This allows us to connect with you and collaborate more deeply.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do our terms of service cover this data? If so, should we include a link?


## What We Collect

**By default, telemetry is enabled** and sends:
- Performance metrics (latency, token usage)
- Metadata (evaluator type, grade, SDK version)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we send input size or can we infer that from token usage? So even if we don't get the actual text, we can get the text length.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I see text length is included.


**Input text is NOT collected by default.** You can opt in via `recordInputs: true` — see [Enable Input Text Collection](#enable-input-text-collection) below.

We **never** collect your API keys (only an anonymous identifier).

If you prefer not to send any telemetry, you can disable it entirely — see [Disable Telemetry Completely](#disable-telemetry-completely) below.

## Example Telemetry Event

```json
{
"timestamp": "2026-02-05T19:30:00.000Z",
"sdk_version": "0.1.0",
"evaluator_type": "vocabulary",
"grade": "5",
"status": "success",
"latency_ms": 3500,
"text_length_chars": 456,
"provider": "google:gemini-2.5-pro+openai:gpt-4o",
"token_usage": {
"input_tokens": 650,
"output_tokens": 350
},
"metadata": {
"stage_details": [

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Phase may be a better name for this. When I hear stage, I think deployment stage.

{
"stage": "background_knowledge",
"provider": "openai:gpt-4o-2024-11-20",
"latency_ms": 1200,
"token_usage": {
"input_tokens": 250,
"output_tokens": 150
}
},
{
"stage": "complexity_evaluation",
"provider": "google:gemini-2.5-pro",
"latency_ms": 2300,
"token_usage": {
"input_tokens": 400,
"output_tokens": 200
}
}
]
}
}
```

## Field Reference

| Field | Description |
|-------|-------------|
| `timestamp` | ISO 8601 timestamp when evaluation started |
| `sdk_version` | Version of the SDK (e.g., "0.1.0") |
| `evaluator_type` | Which evaluator ran (e.g., "vocabulary", "sentence-structure") |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have multiple versions of the same evaluator in an SDK version would the eval version be captured in the eval name or should we have an eval version field too?

| `grade` | Grade level evaluated (e.g., "5", "K") |
| `status` | Evaluation outcome: "success" or "error" |
| `error_code` | Error type if status is "error" (e.g., "Error", "TypeError") |
| `latency_ms` | Total evaluation time in milliseconds |
| `text_length_chars` | Length of input text in characters |
| `provider` | LLM provider(s) used (e.g., "openai:gpt-4o", "google:gemini-2.5-pro+openai:gpt-4o") |
| `token_usage` | Total tokens consumed (input, output, total) |
| `input_text` | The text being evaluated (only included if `recordInputs: true`) |
| `metadata.stage_details` | Per-stage breakdown for multi-stage evaluators (optional) |

## Configuration

### Default (Anonymous)

```typescript
const evaluator = new VocabularyEvaluator({
googleApiKey: process.env.GOOGLE_API_KEY!,
openaiApiKey: process.env.OPENAI_API_KEY!,
// telemetry: true (default - anonymous)
});
```

### Partner with Us (Authenticated)

To help us support your specific use case, provide an API key:

```typescript
const evaluator = new VocabularyEvaluator({
googleApiKey: process.env.GOOGLE_API_KEY!,
openaiApiKey: process.env.OPENAI_API_KEY!,
partnerKey: process.env.LEARNING_COMMONS_PARTNER_KEY!, // Contact us for a key
});
```

### Disable Telemetry Completely

```typescript
const evaluator = new VocabularyEvaluator({
googleApiKey: process.env.GOOGLE_API_KEY!,
openaiApiKey: process.env.OPENAI_API_KEY!,
telemetry: false, // No data sent
});
```

### Enable Input Text Collection

```typescript
const evaluator = new VocabularyEvaluator({
googleApiKey: process.env.GOOGLE_API_KEY!,
openaiApiKey: process.env.OPENAI_API_KEY!,
telemetry: {
enabled: true,
recordInputs: true, // Also send input text with telemetry
},
});
```
10 changes: 9 additions & 1 deletion sdks/typescript/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,15 @@
},
"homepage": "https://github.com/learning-commons-org/evaluators#readme",
"peerDependencies": {
"ai": ">=4.0.0"
"ai": ">=6.0.0",
"@ai-sdk/openai": ">=3.0.0",
"@ai-sdk/google": ">=3.0.0",
"@ai-sdk/anthropic": ">=3.0.0"
},
"peerDependenciesMeta": {
"@ai-sdk/openai": { "optional": true },
"@ai-sdk/google": { "optional": true },
"@ai-sdk/anthropic": { "optional": true }
},
"dependencies": {
"compromise": "^14.13.0",
Expand Down
Loading