Skip to content

Proposal: Add Integration Tests and Cross-OS Release Safety for TinyClaw #162

@michaelerobertsjr

Description

@michaelerobertsjr

Summary

TinyClaw is moving quickly and adoption is growing, but the repository currently has:

  • No automated tests
  • A release workflow that builds only on Ubuntu
  • No automated verification that the CLI, queue processor, or API continue to work across operating systems

Because TinyClaw runs locally on user machines, the biggest risk is not logic bugs alone but environment regressions such as:

  • install script behavior
  • Node native modules (better-sqlite3)
  • filesystem paths and HOME directory assumptions
  • WSL2 compatibility
  • queue and API wiring

This proposal introduces a small, high-value test suite designed to:

  1. Prevent releases that break the core message → queue → response flow
  2. Ensure TinyClaw installs and runs on macOS, Linux, and Windows (WSL2)
  3. Keep tests deterministic and fast by avoiding real LLM calls

The goal is confidence without slowing development velocity.


Testing Strategy

The proposed strategy uses two complementary layers.

Layer 1: Local Integration Tests

These tests validate TinyClaw’s core runtime behavior using the actual queue, SQLite database, and HTTP API.

They do not depend on external providers.

Key Principles

  • Run TinyClaw in a temporary HOME directory
  • Use the real queue processor
  • Use the real HTTP API
  • Replace the LLM provider with a deterministic fake

This ensures the entire pipeline is tested:

flowchart TD
  A[HTTP Message] --> B[Queue Write]
  B --> C[Queue Processor]
  C --> D[Agent Routing]
  D --> E["Provider Call (mocked)"]
  E --> F[Response Persisted]
  F --> G[API Response Retrieval]
Loading

Deterministic Provider

Integration tests should not call Claude/Codex/OpenAI/etc.

Instead, add a simple test provider.

Example concept:

export async function fakeProvider(prompt: string) {
  return `FAKE_RESPONSE:${prompt}`
}

Configuration example:

TINYCLAW_PROVIDER=fake

Benefits:

  • No API keys
  • No network
  • Fully deterministic
  • Fast CI runs

Integration Test Cases

A small number of high-signal tests will provide strong protection.

1. Core Message Flow

Goal

Ensure message → response pipeline works.

Steps

  1. Start TinyClaw server
  2. POST /api/message
  3. Wait for queue processor
  4. GET /api/responses

Expected

  • Response exists
  • Response contains fake provider output

2. Agent Routing

Goal

Verify agent mention routing works.

Steps

POST message:

@coder write hello world

Expected

  • Response attributed to coder agent
  • Routing logic triggered

3. Queue State Transitions

Verify queue states transition correctly.

Expected flow:

pending → processing → completed

Test asserts:

  • message state transitions
  • response record created

4. Retry / Dead Letter Handling

Force provider failure.

Example fake provider:

throw new Error("simulated failure")

Expected:

retry attempts increment
message eventually marked "dead"

Then test dead-letter retry endpoint.


5. SSE Event Stream

Connect to:

GET /api/events/stream

Send a message.

Verify events received:

message_received
response_ready

Ensures UI and integrations remain stable.


6. Persistence Across Restart

  1. Send message
  2. Restart server
  3. Verify queue state persists

Confirms SQLite persistence behavior.


Integration Test Implementation

Example structure:

tests/
  integration/
    core-message.test.ts
    routing.test.ts
    queue-state.test.ts
    dead-letter.test.ts
    sse-events.test.ts

Test environment setup:

HOME=$(mktemp -d)
TINYCLAW_PROVIDER=fake
PORT=3777

Startup command example:

node dist/server.js

Then tests interact only through the HTTP API.


Layer 2: Cross-OS Release Smoke Tests

Integration tests verify functionality.

But releases must also verify installation works on real user systems.

A CI matrix ensures TinyClaw runs on:

ubuntu-latest
macos-latest
windows-latest (WSL)

These tests should mimic how users actually install TinyClaw.


Smoke Test Workflow

Each OS runner should:

Step 1: Install TinyClaw

Example:

npm install
npm run build

or test the installer script if present.


Step 2: Start TinyClaw

tinyclaw start

Verify server starts on port 3777.


Step 3: Basic API Flow

Run a minimal message test:

POST /api/message
GET /api/responses

Verify response exists.


Step 4: CLI Sanity

Check commands:

tinyclaw --help
tinyclaw agents
tinyclaw queue status

Ensures CLI packaging works.


Example GitHub Actions Workflow

name: smoke-tests

on:
  pull_request:
  push:
    branches: [main]

jobs:
  smoke:
    strategy:
      matrix:
        os: [ubuntu-latest, macos-latest, windows-latest]

    runs-on: ${{ matrix.os }}

    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: 20

      - run: npm install
      - run: npm run build

      - run: node dist/server.js &
      - run: sleep 5

      - run: |
          curl -X POST http://localhost:3777/api/message \
            -H "Content-Type: application/json" \
            -d '{"text":"hello test"}'

      - run: |
          curl http://localhost:3777/api/responses

Benefits

This approach provides:

Release Safety

Prevents regressions in:

  • message routing
  • queue processing
  • persistence
  • API contracts

Cross-Platform Confidence

Verifies TinyClaw works on:

  • macOS
  • Linux
  • Windows (WSL)

Fast CI

Tests are:

  • deterministic
  • offline
  • quick

Minimal Maintenance

The suite focuses only on core runtime guarantees, not exhaustive coverage.


Suggested Repository Changes

tests/
  integration/

src/
  providers/
    fake-provider.ts

.github/workflows/
  smoke-tests.yml

Add script:

npm run test:integration

Expected Outcome

After implementing this suite:

  • Releases cannot break the message pipeline
  • Installation failures across OS are caught before release
  • Contributors can run integration tests locally
  • Maintainers gain confidence to ship quickly

This aligns with TinyClaw’s rapid development style while ensuring the core system remains stable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions