Proposal: Add Integration Tests and Cross-OS Release Safety for TinyClaw

# **Summary**

TinyClaw is moving quickly and adoption is growing, but the repository currently has:

- No automated tests  
- A release workflow that builds only on Ubuntu  
- No automated verification that the CLI, queue processor, or API continue to work across operating systems

Because TinyClaw runs **locally on user machines**, the biggest risk is not logic bugs alone but **environment regressions** such as:

- install script behavior  
- Node native modules (`better-sqlite3`)  
- filesystem paths and HOME directory assumptions  
- WSL2 compatibility  
- queue and API wiring

This proposal introduces a **small, high-value test suite** designed to:

1. Prevent releases that break the core message → queue → response flow  
2. Ensure TinyClaw installs and runs on macOS, Linux, and Windows (WSL2)  
3. Keep tests deterministic and fast by avoiding real LLM calls

The goal is **confidence without slowing development velocity**.

---

# **Testing Strategy**

The proposed strategy uses two complementary layers.

## **Layer 1: Local Integration Tests**

These tests validate TinyClaw’s **core runtime behavior** using the actual queue, SQLite database, and HTTP API.

They do **not depend on external providers**.

### Key Principles

- Run TinyClaw in a **temporary HOME directory**  
- Use the real queue processor  
- Use the real HTTP API  
- Replace the LLM provider with a deterministic fake

This ensures the entire pipeline is tested:

```mermaid
flowchart TD
  A[HTTP Message] --> B[Queue Write]
  B --> C[Queue Processor]
  C --> D[Agent Routing]
  D --> E["Provider Call (mocked)"]
  E --> F[Response Persisted]
  F --> G[API Response Retrieval]
```

# **Deterministic Provider**

Integration tests should **not call Claude/Codex/OpenAI/etc**.

Instead, add a simple test provider.

Example concept:

```ts
export async function fakeProvider(prompt: string) {
  return `FAKE_RESPONSE:${prompt}`
}
```

Configuration example:

```
TINYCLAW_PROVIDER=fake
```

Benefits:

* No API keys  
* No network  
* Fully deterministic  
* Fast CI runs

---

# **Integration Test Cases**

A small number of **high-signal tests** will provide strong protection.

## **1\. Core Message Flow**

**Goal**

Ensure message → response pipeline works.

**Steps**

1. Start TinyClaw server  
2. POST `/api/message`  
3. Wait for queue processor  
4. GET `/api/responses`

**Expected**

* Response exists  
* Response contains fake provider output

---

## **2\. Agent Routing**

**Goal**

Verify agent mention routing works.

**Steps**

POST message:

```
@coder write hello world
```

**Expected**

* Response attributed to coder agent  
* Routing logic triggered

---

## **3\. Queue State Transitions**

Verify queue states transition correctly.

Expected flow:

```
pending → processing → completed
```

Test asserts:

* message state transitions  
* response record created

---

## **4\. Retry / Dead Letter Handling**

Force provider failure.

Example fake provider:

```
throw new Error("simulated failure")
```

Expected:

```
retry attempts increment
message eventually marked "dead"
```

Then test dead-letter retry endpoint.

---

## **5\. SSE Event Stream**

Connect to:

```
GET /api/events/stream
```

Send a message.

Verify events received:

```
message_received
response_ready
```

Ensures UI and integrations remain stable.

---

## **6\. Persistence Across Restart**

1. Send message  
2. Restart server  
3. Verify queue state persists

Confirms SQLite persistence behavior.

---

# **Integration Test Implementation**

Example structure:

```
tests/
  integration/
    core-message.test.ts
    routing.test.ts
    queue-state.test.ts
    dead-letter.test.ts
    sse-events.test.ts
```

Test environment setup:

```
HOME=$(mktemp -d)
TINYCLAW_PROVIDER=fake
PORT=3777
```

Startup command example:

```
node dist/server.js
```

Then tests interact only through the HTTP API.

---

# **Layer 2: Cross-OS Release Smoke Tests**

Integration tests verify functionality.

But releases must also verify **installation works on real user systems**.

A CI matrix ensures TinyClaw runs on:

```
ubuntu-latest
macos-latest
windows-latest (WSL)
```

These tests should mimic **how users actually install TinyClaw**.

---

# **Smoke Test Workflow**

Each OS runner should:

### Step 1: Install TinyClaw

Example:

```
npm install
npm run build
```

or test the installer script if present.

---

### Step 2: Start TinyClaw

```
tinyclaw start
```

Verify server starts on port 3777\.

---

### Step 3: Basic API Flow

Run a minimal message test:

```
POST /api/message
GET /api/responses
```

Verify response exists.

---

### Step 4: CLI Sanity

Check commands:

```
tinyclaw --help
tinyclaw agents
tinyclaw queue status
```

Ensures CLI packaging works.

---

# **Example GitHub Actions Workflow**

```
name: smoke-tests

on:
  pull_request:
  push:
    branches: [main]

jobs:
  smoke:
    strategy:
      matrix:
        os: [ubuntu-latest, macos-latest, windows-latest]

    runs-on: ${{ matrix.os }}

    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: 20

      - run: npm install
      - run: npm run build

      - run: node dist/server.js &
      - run: sleep 5

      - run: |
          curl -X POST http://localhost:3777/api/message \
            -H "Content-Type: application/json" \
            -d '{"text":"hello test"}'

      - run: |
          curl http://localhost:3777/api/responses
```

# **Benefits**

This approach provides:

### Release Safety

Prevents regressions in:

* message routing  
* queue processing  
* persistence  
* API contracts

### Cross-Platform Confidence

Verifies TinyClaw works on:

* macOS  
* Linux  
* Windows (WSL)

### Fast CI

Tests are:

* deterministic  
* offline  
* quick

### Minimal Maintenance

The suite focuses only on **core runtime guarantees**, not exhaustive coverage.

---

# **Suggested Repository Changes**

```
tests/
  integration/

src/
  providers/
    fake-provider.ts

.github/workflows/
  smoke-tests.yml
```

Add script:

```
npm run test:integration
```

---

# **Expected Outcome**

After implementing this suite:

* Releases cannot break the message pipeline  
* Installation failures across OS are caught before release  
* Contributors can run integration tests locally  
* Maintainers gain confidence to ship quickly

This aligns with TinyClaw’s rapid development style while ensuring the **core system remains stable**.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Add Integration Tests and Cross-OS Release Safety for TinyClaw #162

Summary

Testing Strategy

Layer 1: Local Integration Tests

Key Principles

Deterministic Provider

Integration Test Cases

1. Core Message Flow

2. Agent Routing

3. Queue State Transitions

4. Retry / Dead Letter Handling

5. SSE Event Stream

6. Persistence Across Restart

Integration Test Implementation

Layer 2: Cross-OS Release Smoke Tests

Smoke Test Workflow

Step 1: Install TinyClaw

Step 2: Start TinyClaw

Step 3: Basic API Flow

Step 4: CLI Sanity

Example GitHub Actions Workflow

Benefits

Release Safety

Cross-Platform Confidence

Fast CI

Minimal Maintenance

Suggested Repository Changes

Expected Outcome

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposal: Add Integration Tests and Cross-OS Release Safety for TinyClaw #162

Description

Summary

Testing Strategy

Layer 1: Local Integration Tests

Key Principles

Deterministic Provider

Integration Test Cases

1. Core Message Flow

2. Agent Routing

3. Queue State Transitions

4. Retry / Dead Letter Handling

5. SSE Event Stream

6. Persistence Across Restart

Integration Test Implementation

Layer 2: Cross-OS Release Smoke Tests

Smoke Test Workflow

Step 1: Install TinyClaw

Step 2: Start TinyClaw

Step 3: Basic API Flow

Step 4: CLI Sanity

Example GitHub Actions Workflow

Benefits

Release Safety

Cross-Platform Confidence

Fast CI

Minimal Maintenance

Suggested Repository Changes

Expected Outcome

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions