Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
173 changes: 173 additions & 0 deletions .github/workflows/test-sdk-typescript.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
name: 🏗️ Test TypeScript SDK

on:
push:
branches:
- main
paths:
- 'sdks/typescript/**'
- 'evals/prompts/**'
- '.github/workflows/test-sdk-typescript.yml'
pull_request:
paths:
- 'sdks/typescript/**'
- 'evals/prompts/**'
- '.github/workflows/test-sdk-typescript.yml'

jobs:
lint:
name: 👖 Lint
runs-on: ubuntu-latest
defaults:
run:
working-directory: sdks/typescript
steps:
- name: ⛔ Cancel previous runs
uses: styfle/cancel-workflow-action@0.13.0

- name: ⬇️ Checkout repo
uses: actions/checkout@v6

- name: 😻 Setup Node.js 22
uses: actions/setup-node@v6
with:
node-version: 22

- name: 📥 Install dependencies
uses: bahmutov/npm-install@v1
with:
working-directory: sdks/typescript

- name: 👖 Run linter
run: npm run lint

typecheck:
name: 🔎 TypeScript
runs-on: ubuntu-latest
defaults:
run:
working-directory: sdks/typescript
steps:
- name: ⛔ Cancel previous runs
uses: styfle/cancel-workflow-action@0.13.0

- name: ⬇️ Checkout repo
uses: actions/checkout@v6

- name: 😻 Setup Node.js 22
uses: actions/setup-node@v6
with:
node-version: 22

- name: 📥 Install dependencies
uses: bahmutov/npm-install@v1
with:
working-directory: sdks/typescript

- name: 🔎 Type check
run: npm run typecheck

test:
name: ⚡ Unit Tests (Node ${{ matrix.node-version }})
runs-on: ubuntu-latest
defaults:
run:
working-directory: sdks/typescript
strategy:
matrix:
node-version: ['20.19.0', '22', '24']
steps:
- name: ⛔ Cancel previous runs
uses: styfle/cancel-workflow-action@0.13.0

- name: ⬇️ Checkout repo
uses: actions/checkout@v6

- name: 😻 Setup Node.js ${{ matrix.node-version }}
uses: actions/setup-node@v6
with:
node-version: ${{ matrix.node-version }}

- name: 📥 Install dependencies
uses: bahmutov/npm-install@v1
with:
working-directory: sdks/typescript

- name: ⚡ Run unit tests
run: npm run test:unit

build:
name: 🏗️ Build (Node ${{ matrix.node-version }})
needs: [lint, typecheck, test]
runs-on: ubuntu-latest
defaults:
run:
working-directory: sdks/typescript
strategy:
matrix:
node-version: ['20.19.0', '22', '24']
steps:
- name: ⛔ Cancel previous runs
uses: styfle/cancel-workflow-action@0.13.0

- name: ⬇️ Checkout repo
uses: actions/checkout@v6

- name: 😻 Setup Node.js ${{ matrix.node-version }}
uses: actions/setup-node@v6
with:
node-version: ${{ matrix.node-version }}

- name: 📥 Install dependencies
uses: bahmutov/npm-install@v1
with:
working-directory: sdks/typescript

- name: 🏗️ Build package
run: npm run build

# Temporarily disabled - integration tests take too long in CI
# - name: ⚡ Run integration tests (against dist/)
# if: github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository
# env:
# RUN_INTEGRATION_TESTS: true
# OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
# GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }}
# run: npm run test:integration:dist

coverage:
name: 📊 Coverage
needs: [build]
runs-on: ubuntu-latest
defaults:
run:
working-directory: sdks/typescript
steps:
- name: ⛔ Cancel previous runs
uses: styfle/cancel-workflow-action@0.13.0

- name: ⬇️ Checkout repo
uses: actions/checkout@v6

- name: 😻 Setup Node.js 22
uses: actions/setup-node@v6
with:
node-version: 22

- name: 📥 Install dependencies
uses: bahmutov/npm-install@v1
with:
working-directory: sdks/typescript

- name: 📊 Generate coverage report
run: npm run test:coverage
continue-on-error: true

- name: 📁 Upload coverage to Codecov
uses: codecov/codecov-action@v5
with:
token: ${{ secrets.CODECOV_TOKEN }}
files: ./sdks/typescript/coverage/coverage-final.json
flags: typescript-sdk
name: typescript-sdk-coverage
continue-on-error: true
40 changes: 40 additions & 0 deletions evals/prompts/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Prompts Changelog

All notable changes to the evaluator prompt files will be documented here.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

---

## [1.2.0] - 2026-02-19

### Added
- `vocabulary/other-grades-system.txt` — system prompt for Vocabulary evaluator (grades 5–12)
- `vocabulary/other-grades-user.txt` — user prompt for Vocabulary evaluator (grades 5–12)

### Changed
- `vocabulary/grades-3-4-system.txt` — updated to reference "Qualitative Text Complexity rubric (SAP)"

## [1.1.0] - 2026-02-18

### Added
- `sentence-structure/rubric-grades-5-12.txt` — SS complexity scoring rubric for grades 5–12

### Changed
- `sentence-structure/complexity-system.txt` — updated to reference "Qualitative Text Complexity rubric (SAP)"
- `sentence-structure/analysis-user.txt` — added `Basic Complex` and `Advanced Complex` to the sentence type definitions

## [1.0.0] - 2025-09-23

### Added
- `grade-level-appropriateness/system.txt` — system prompt for the GLA evaluator
- `grade-level-appropriateness/user.txt` — user prompt for the GLA evaluator
- `sentence-structure/analysis-system.txt` — system prompt for SS sentence analysis
- `sentence-structure/analysis-user.txt` — user prompt for SS sentence analysis
- `sentence-structure/complexity-system.txt` — system prompt for SS complexity scoring
- `sentence-structure/complexity-user.txt` — user prompt for SS complexity scoring
- `sentence-structure/rubric-grade-3.txt` — SS complexity scoring rubric for grade 3
- `sentence-structure/rubric-grade-4.txt` — SS complexity scoring rubric for grade 4
- `vocabulary/background-knowledge.txt` — background knowledge context for the Vocabulary evaluator
- `vocabulary/grades-3-4-system.txt` — system prompt for Vocabulary evaluator (grades 3–4)
- `vocabulary/grades-3-4-user.txt` — user prompt for Vocabulary evaluator (grades 3–4)
10 changes: 10 additions & 0 deletions evals/prompts/grade-level-appropriateness/system.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@

You are an expert in English literature education for K-12.
Your job is to help evaluate the grade level appropriateness of a given text.

You will be given a text and you should determine which grade level the text is appropriate for (grade levels include: K-1, 2-3, 4-5, 6-8, 9-10, 11-CCR)

IMPORTANT: You should pay attention to the vocabulary used, topics of the text and readability of text.

Please first reason out loud about the vocabulary complexity of the text and then provide an answer between grade level options: K-1, 2-3, 4-5, 6-8, 9-10, 11-CCR.

120 changes: 120 additions & 0 deletions evals/prompts/grade-level-appropriateness/user.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@

Use these steps to determine appropriate grade level for a text:
1. Calculate word count and Flesch-Kincaid Grade Level of the text, and generate a grade band.
Here are the bands guideline for word count

2-3: 200-800 words
4-5: 200-800 words
6-8: 400-1000 words
9-10: 500-1500 words
11-12: 1501 words and more

Here is the formula for Flesch-Kincaid Grade Level:
Flesch-Kincaid Grade Level = 0.39 * (total words / total sentences) + 11.8 * (total syllables / total words) - 15.59


2. Determine the qualitative complexity using this text complexity rubric:
TEXT STRUCTURE

Exceedingly Complex
• Deep, intricate, often ambiguous connections between many ideas/processes/events
• Organization is intricate or discipline-specific
• Text features are essential for understanding
• Graphics are intricate, extensive, and integral to meaning; may convey unique information

Very Complex
• Expanded ideas/processes/events with implicit or subtle connections
• Organization may have multiple pathways or discipline-specific traits
• Text features directly enhance understanding
• Graphics support or are integral to understanding

Moderately Complex
• Some implicit/subtle connections between ideas/events
• Organization is evident and generally sequential or chronological
• Text features enhance understanding
• Graphics are mostly supplementary

Slightly Complex
• Explicit and clear connections between ideas/events
• Organization is chronological, sequential, or predictable
• Text features help navigation but are not essential
• Graphics are simple, not necessary, but may assist understanding


LANGUAGE FEATURES

Exceedingly Complex
• Dense, abstract, ironic, and/or figurative language
• Complex, unfamiliar, archaic, subject-specific, or ambiguous vocabulary
• Mainly complex sentences with multiple subordinate clauses and transitions

Very Complex
• Fairly complex; some abstract, ironic, and/or figurative language
• Some unfamiliar, archaic, or overly academic vocabulary
• Many complex sentences with subordinate phrases/clauses

Moderately Complex
• Mostly explicit language with some complex meaning
• Mostly familiar and conversational vocabulary
• Primarily simple and compound sentences, with some complex ones

Slightly Complex
• Explicit, literal, straightforward language
• Contemporary, familiar, conversational vocabulary
• Mainly simple sentences


PURPOSE

Exceedingly Complex
• Subtle, intricate, and difficult to determine
• Includes many theoretical or abstract elements

Very Complex
• Implicit or subtle, fairly easy to infer
• More theoretical or abstract than concrete

Moderately Complex
• Implied but easy to identify based on context or source

Slightly Complex
• Explicitly stated, clear, concrete, and narrowly focused


KNOWLEDGE DEMANDS

Exceedingly Complex
• Requires extensive discipline-specific or theoretical knowledge
• Many references/allusions to other texts or ideas

Very Complex
• Requires moderate discipline-specific knowledge
• Some references/allusions to other texts or ideas

Moderately Complex
• Requires common knowledge and some discipline-specific knowledge
• Few references/allusions

Slightly Complex
• Requires everyday, practical knowledge
• No references/allusions

3. Background knowledge:
At which grade level would student have enough background knowledge to understand the text?

4. Use your judgement of the above three steps. First use the quantitative signal to get first signal of the appropriate grade level range, then use qualitative analysis to refine your decisions and consider if student at such grade will have enough background knowledge to arrive at a final grade level band. Also consider if the text can be for a lower grade with additional scaffolding.

<begin of text to evaluate>
<text>{text}</text>
<end of text to evaluate>

When providing your response, first think out loud of your reasoning and then provide your answer from one of the grade band options above. Your reasoning and answer needs to be in JSON format. Strictly follow the following format for your response.

Your final answer should be in the "grade" property for the target grade band for the text aimed for independent reading. If there is alternative appropriate grade students can read and comprehend with scaffold (eg. picture, graph, additional context, etc) or for read-aloud purposes for lower grade, provide it in the "alternative_grade" property and provide the types of scaffolding in the "scaffolding_needed" property.

In your reasoning, provide numbered bullet points for each of the analyses in each of the 3 steps. At the end, give me the 4th bullet point called "synthesis" to summarize your analysis from the above 3 steps that help you arrive at the final decision.

{format_instructions}
1 change: 1 addition & 0 deletions evals/prompts/sentence-structure/analysis-system.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
You are an expert in grammar and literacy.
38 changes: 38 additions & 0 deletions evals/prompts/sentence-structure/analysis-user.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@

# Task
I am going to give you a text, and I need you to look through the text sentence-by-sentence to perform a comprehensive grammatical analysis. Use the computational counts as a reference; they can be incorrect in ambiguous cases.

# Definitions
* Sentences: Count a complete grammatical unit ending in a terminal punctuation mark.
* Words: Count any sequence of characters separated by a space as one word. Treat hyphenated words (e.g., "state-of-the-art") and numbers (e.g., "2025") as single words.
* Independent Clauses: Clauses that can stand alone as a complete sentence.
* Subordinate Clauses: Clauses that are dependent on the main clause and cannot stand alone as a complete sentence.
* Simple Sentences: Sentences with one independent clause and no subordinate clauses.
* Compound Sentences: Sentences with two or more independent clauses and no subordinate clauses.
* Complex Sentences: Sentences with one independent clause and at least one subordinate clause.
* Compound-Complex Sentences: Sentences with two or more independent clauses and at least one subordinate clause.
* Other / Non-Canonical Sentences: Sentences that cannot be reliably classified as simple, compound, complex, or compound-complex (e.g., sentence fragments, run-ons, elliptical responses, headlines, imperatives lacking an explicit subject, or stylized dialogue tags).
* Subordinate Clauses: Clauses that are dependent on the main clause and cannot stand alone as a complete sentence.
* Embedded Clauses: Clauses that are nested within another clause.
* Prepositional Phrases: Phrases that begin with a preposition and end with a noun phrase.
* Participle Phrases: Phrases that begin with a participle and end with a noun phrase.
* Appositive Phrases: Phrases that rename or identify a noun phrase.
* Simple Transitions: Basic coordinating conjunctions and chronological adverbs. Examples: 'and', 'but', 'or', 'so', 'then', 'next', 'first'.
* Sophisticated Transitions: Conjunctive adverbs and phrases signaling logical relationships. Examples: 'however', 'therefore', 'consequently', 'as a result', 'for example', 'although'.
* One-Concept Sentence: A sentence with ZERO subordinate clauses AND ZERO transition words/phrases (neither simple nor sophisticated).
* Multi-Concept Sentence: Any sentence that has ≥1 subordinate clause OR ≥1 transition word/phrase (or both).
* Basic Complex Sentences: Sentences with exactly one independent clause and at one dependent (subordinate) clause.
* Advanced Complex Sentences: Sentences with two or more of any of those following (can include a mix, doesn't have to be two of the same type) subordinate phrases, clauses, transition words, or any other meaningful "interruptions" to the flow of the sentence (like not-only-but-also constructions, dashes, semicolons, and lengthy appositives). A sentence can be advanced complex if it has just one subordinate phrase or clause alongside a transition phrase, like: "For example, the British favored trade with Hong Kong, assuming favorable trade conditions.

# Computational Counts
Use these as reference, your internal heuristics can be more reliable.
{ground_truth_counts}

# Text to Analyze
[BEGIN TEXT]
{text}
[END TEXT]

IMPORTANT: Your response should be a single JSON object with the following structure. Do not produce anything outside of the JSON object.

{format_instructions}
1 change: 1 addition & 0 deletions evals/prompts/sentence-structure/complexity-system.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
You are an expert in grammar and literacy, and understand K-12 and Qualitative Text Complexity rubric (SAP).
Loading
Loading