-
Notifications
You must be signed in to change notification settings - Fork 488
Proposal: Add Integration Tests and Cross-OS Release Safety for TinyClaw #162
Description
Summary
TinyClaw is moving quickly and adoption is growing, but the repository currently has:
- No automated tests
- A release workflow that builds only on Ubuntu
- No automated verification that the CLI, queue processor, or API continue to work across operating systems
Because TinyClaw runs locally on user machines, the biggest risk is not logic bugs alone but environment regressions such as:
- install script behavior
- Node native modules (
better-sqlite3) - filesystem paths and HOME directory assumptions
- WSL2 compatibility
- queue and API wiring
This proposal introduces a small, high-value test suite designed to:
- Prevent releases that break the core message → queue → response flow
- Ensure TinyClaw installs and runs on macOS, Linux, and Windows (WSL2)
- Keep tests deterministic and fast by avoiding real LLM calls
The goal is confidence without slowing development velocity.
Testing Strategy
The proposed strategy uses two complementary layers.
Layer 1: Local Integration Tests
These tests validate TinyClaw’s core runtime behavior using the actual queue, SQLite database, and HTTP API.
They do not depend on external providers.
Key Principles
- Run TinyClaw in a temporary HOME directory
- Use the real queue processor
- Use the real HTTP API
- Replace the LLM provider with a deterministic fake
This ensures the entire pipeline is tested:
flowchart TD
A[HTTP Message] --> B[Queue Write]
B --> C[Queue Processor]
C --> D[Agent Routing]
D --> E["Provider Call (mocked)"]
E --> F[Response Persisted]
F --> G[API Response Retrieval]
Deterministic Provider
Integration tests should not call Claude/Codex/OpenAI/etc.
Instead, add a simple test provider.
Example concept:
export async function fakeProvider(prompt: string) {
return `FAKE_RESPONSE:${prompt}`
}Configuration example:
TINYCLAW_PROVIDER=fake
Benefits:
- No API keys
- No network
- Fully deterministic
- Fast CI runs
Integration Test Cases
A small number of high-signal tests will provide strong protection.
1. Core Message Flow
Goal
Ensure message → response pipeline works.
Steps
- Start TinyClaw server
- POST
/api/message - Wait for queue processor
- GET
/api/responses
Expected
- Response exists
- Response contains fake provider output
2. Agent Routing
Goal
Verify agent mention routing works.
Steps
POST message:
@coder write hello world
Expected
- Response attributed to coder agent
- Routing logic triggered
3. Queue State Transitions
Verify queue states transition correctly.
Expected flow:
pending → processing → completed
Test asserts:
- message state transitions
- response record created
4. Retry / Dead Letter Handling
Force provider failure.
Example fake provider:
throw new Error("simulated failure")
Expected:
retry attempts increment
message eventually marked "dead"
Then test dead-letter retry endpoint.
5. SSE Event Stream
Connect to:
GET /api/events/stream
Send a message.
Verify events received:
message_received
response_ready
Ensures UI and integrations remain stable.
6. Persistence Across Restart
- Send message
- Restart server
- Verify queue state persists
Confirms SQLite persistence behavior.
Integration Test Implementation
Example structure:
tests/
integration/
core-message.test.ts
routing.test.ts
queue-state.test.ts
dead-letter.test.ts
sse-events.test.ts
Test environment setup:
HOME=$(mktemp -d)
TINYCLAW_PROVIDER=fake
PORT=3777
Startup command example:
node dist/server.js
Then tests interact only through the HTTP API.
Layer 2: Cross-OS Release Smoke Tests
Integration tests verify functionality.
But releases must also verify installation works on real user systems.
A CI matrix ensures TinyClaw runs on:
ubuntu-latest
macos-latest
windows-latest (WSL)
These tests should mimic how users actually install TinyClaw.
Smoke Test Workflow
Each OS runner should:
Step 1: Install TinyClaw
Example:
npm install
npm run build
or test the installer script if present.
Step 2: Start TinyClaw
tinyclaw start
Verify server starts on port 3777.
Step 3: Basic API Flow
Run a minimal message test:
POST /api/message
GET /api/responses
Verify response exists.
Step 4: CLI Sanity
Check commands:
tinyclaw --help
tinyclaw agents
tinyclaw queue status
Ensures CLI packaging works.
Example GitHub Actions Workflow
name: smoke-tests
on:
pull_request:
push:
branches: [main]
jobs:
smoke:
strategy:
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
- run: npm install
- run: npm run build
- run: node dist/server.js &
- run: sleep 5
- run: |
curl -X POST http://localhost:3777/api/message \
-H "Content-Type: application/json" \
-d '{"text":"hello test"}'
- run: |
curl http://localhost:3777/api/responses
Benefits
This approach provides:
Release Safety
Prevents regressions in:
- message routing
- queue processing
- persistence
- API contracts
Cross-Platform Confidence
Verifies TinyClaw works on:
- macOS
- Linux
- Windows (WSL)
Fast CI
Tests are:
- deterministic
- offline
- quick
Minimal Maintenance
The suite focuses only on core runtime guarantees, not exhaustive coverage.
Suggested Repository Changes
tests/
integration/
src/
providers/
fake-provider.ts
.github/workflows/
smoke-tests.yml
Add script:
npm run test:integration
Expected Outcome
After implementing this suite:
- Releases cannot break the message pipeline
- Installation failures across OS are caught before release
- Contributors can run integration tests locally
- Maintainers gain confidence to ship quickly
This aligns with TinyClaw’s rapid development style while ensuring the core system remains stable.