Skip to content

Test and benchmark prompts accross LLM providers and models

Notifications You must be signed in to change notification settings

glesage/ReliaPrompt

Repository files navigation

Relia Prompt

Test and benchmark prompts accross LLM providers and models

This tool is aimed at agentic use-cases for large production applications that require fast and reliable llm calls. For example, extracting sentiment from social media posts, converting a sentence into structured JSON, etc.

Screenshot

Features

  • Multi-Provider Testing – OpenAI, Bedrock, DeepSeek, Gemini, Groq, OpenRouter
  • Parallel Execution – Run tests concurrently across all configured LLMs
  • Repeatability – Each test runs N times per model to measure consistency
  • Code-first – Define prompts and tests in code

Quick Start

Prompts and tests live in your code. Use the example project pattern:

# From a project that has reliaprompt.definitions.ts (see example)
cd example
bun install
bun run reliaprompt:ui   # or: from your app, add "reliaprompt:ui" and run from project root
# Open http://localhost:3000

Set credentials via the RELIA_PROMPT_LLM_CONFIG_JSON environment variable (see Configuration). At least one provider is required.

Usage

Code-first (only mode)

Use ReliaPrompt inside your service for LLM benchmarking and testing from unit tests.

  1. Install – Add relia-prompt as a dependency.

  2. Initialize – Pass credentials at startup (or load from RELIA_PROMPT_LLM_CONFIG_JSON when using the UI):

    import {
        initializeReliaPrompt,
        runPromptTestsFromSuite,
        definePrompt,
        defineTestCase,
        defineSuite,
    } from "relia-prompt";
    
    initializeReliaPrompt({
        providers: {
            // Canonical keys can be provided directly in library mode.
            // For UI/server mode prefer RELIA_PROMPT_LLM_CONFIG_JSON in .env.
        },
    });
  3. Define prompts and tests in code – Use the builder API and export suites for the UI:

    const prompt = definePrompt({ name: "my-prompt", content: "..." });
    const testCases = [
        defineTestCase({ input: "...", expectedOutput: "[...]", expectedOutputType: "array" }),
    ];
    export const suites = [defineSuite({ prompt, testCases })];
  4. Run tests – Require testModels (and evaluationModel when using LLM evaluation) per run:

    const { score, results } = await runPromptTestsFromSuite(suite, {
      testModels: [{ provider: "provider-id", modelId: "model-id" }],
      evaluationModel: ..., // required when prompt.evaluationMode === "llm"
      runsPerTest: 1,
    });
  5. Optional UI – From your project root (where your definitions live), run:

    yarn reliaprompt:ui

    The UI shows prompts and tests from your code (read-only tests; prompt edits in the browser are drafts only). Configure RELIA_PROMPT_LLM_CONFIG_JSON in .env and choose test/evaluation models on each run.

Configuration

Configuration is JSON-only via RELIA_PROMPT_LLM_CONFIG_JSON. Use .env.example as the canonical template for the full JSON object.

See example for a full example and smoke test.

Development

bun dev              # Backend + dashboard with hot reload
bun run dev:backend  # Backend only with hot reload
bun dev:dashboard    # Dashboard dev server
bun run build        # Build dashboard + backend
bun run lint         # Lint backend
bun run test         # Unit tests
bun run test:e2e     # E2E tests (Playwright)
bun run format       # Format code

Project Structure

├── src/                    # Backend (Express + Bun)
│   ├── server.ts           # API routes
│   ├── llm-clients/        # Provider clients
│   └── services/           # Test runner
├── dashboard/              # SvelteKit app
│   └── src/
│       ├── lib/            # Components & stores
│       └── routes/         # Pages
└── example/                # Example project

License

MIT

About

Test and benchmark prompts accross LLM providers and models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 5