Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
16 commits
Select commit Hold shift + click to select a range
24b6cdb
Initial plan
Copilot Mar 25, 2026
159f2cc
fix: restore hourly execution for Intent Signal Discovery (workflow c…
Copilot Mar 25, 2026
af96bfc
fix: remove unused imports MAX_CONCURRENT_EXECUTIONS and Workflow in …
Copilot Mar 25, 2026
a5fb1bd
feat: add CI workflow, integration tests, and fix findOrphanedExecuti…
Copilot Mar 31, 2026
7b06002
fix: enforce activeExecutions strictly-running invariant; fix hasStat…
Copilot Mar 31, 2026
8cbb861
fix: restore hourly execution for Intent Signal Discovery (workflow c…
Copilot Mar 31, 2026
d66d48f
feat: add JSON file-backed persistence, audit log, and restart-surviv…
Copilot Mar 31, 2026
bf11b27
test: add 6-hour observation window acceptance criteria tests (issue …
Copilot Apr 3, 2026
11bb527
docs: add SCHEDULER_PERSISTENCE_ADAPTER.md with production field mapp…
Copilot Apr 3, 2026
093074d
fix: replace runIntentSignalScan TODO stub with real HTTP dispatch vi…
Copilot Apr 17, 2026
ee9eb68
fix: replace Response type alias, convert status/type unions to enums
Copilot Apr 17, 2026
ebb5898
fix: rename fetchResponse→scanResponse, add outcome input validation
Copilot Apr 17, 2026
932f85b
fix: remove unused error param and unused SchedulerLoopHandle import …
Copilot Apr 18, 2026
5d28887
fix: capture completedAt timestamp after runTask() completes in sched…
Copilot Apr 18, 2026
a5fce45
fix: gate null last_cycle_completed_at alert by due-window; idempoten…
Copilot Apr 20, 2026
4d7d893
refactor: typed ExecutionNotFoundError; null next_recurrence_date tes…
Copilot Apr 20, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
name: CI

on:
push:
branches: ['**']
pull_request:
branches: ['**']

jobs:
test:
name: Unit Tests
runs-on: ubuntu-latest
permissions:
contents: read

steps:
- uses: actions/checkout@v4

- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'

- name: Install dependencies
run: npm ci

- name: Run unit tests
run: npm run test:unit
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -48,3 +48,6 @@ next-env.d.ts

# history
.history

# scheduler runtime state and audit log (written by the server process at runtime)
/data/
161 changes: 161 additions & 0 deletions docs/SCHEDULER_PERSISTENCE_ADAPTER.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
# Scheduler Persistence Adapter

This document describes how the scheduler state store maps to Poly Operations
production state, and provides the wiring plan for when PostgreSQL integration
lands.

---

## Current implementation (`JsonFileSchedulerStore`)

`server/scheduler/json-file-store.ts` persists `SchedulerState` to a JSON file
(`data/scheduler-state.json`) with atomic writes (temp file → rename). The
store is consumed exclusively through two accessors injected into the scheduler
loop and all API endpoints:

```ts
const getSchedulerState = (): SchedulerState => schedulerStore.read();
const setSchedulerState = (s: SchedulerState) => schedulerStore.write(s);
```

This interface is intentionally identical to what a PostgreSQL adapter would
expose. No other code in the API or scheduler loop references the store
directly — **swapping the backing store requires changing only this wiring in
`server/api.ts`**.

### Relationship to Poly Operations state

| Poly Operations field | `SchedulerState` field | Notes |
|-----------------------------------------|------------------------------------------------------|-------|
| `workflow.id` (`c10f1d63…::1.0`) | `state.workflow.id` | Set by `INTENT_SIGNAL_DISCOVERY_WORKFLOW_FULL_ID` |
| `workflow.execution_status` | `state.workflow.execution_status` | `'not_started'` ↔ `'running'` ↔ `'failed'` |
| `workflow.is_scheduled` | `state.workflow.is_scheduled` | Enforced `true` by `reconcile()` |
| `task.id` (`8c929111…::1.0`) | `state.task.id` | Set by `INTENT_SIGNAL_DISCOVERY_TASK_FULL_ID` |
| `task.is_recurring` | `state.task.is_recurring` | Always `true` |
| `task.next_recurrence_date` | `state.task.next_recurrence_date` | Advanced to `completed_at + 1h` after every successful cycle |
| `task.updated_at` | `state.task.updated_at` | Updated on every `startExecution` / `completeExecution` |
| Active `TaskExecution` rows | `state.activeExecutions[]` | Strictly running-only; orphans cleared by `reconcile()` |
| `TaskExecution.execution_status` | `execution.status` | `'running'` → `'completed'` / `'failed'` / `'cancelled'` |

### What reconcile() heals

`reconcile()` is called on every scheduler tick (every 60 seconds). It
replays the same repair that would be applied to real persisted rows:

1. **Orphaned executions** — any `status === 'running'` execution older than
30 minutes is cancelled and removed from `activeExecutions`. This unblocks
the `UserConcurrencyLimitError` gate.
2. **False-RUNNING workflow** — if `workflow.execution_status === 'running'`
but `activeExecutions` contains zero running entries, the workflow is reset
to `'not_started'`.
3. **Stale `next_recurrence_date`** — if the date is in the past, it is
advanced in one-hour steps until it is in the future, restoring cadence
continuity after a crash or long idle.
4. **`is_scheduled` drift** — `workflow.is_scheduled` is unconditionally set
to `true` (mirrors the Poly Operations incident where the field had drifted
to `false`).

---

## Migration path: JSON file → PostgreSQL

When PostgreSQL integration lands (`docs/WIP_MCP_STATUS.md`), the adapter
swap is a single edit in `server/api.ts`:

### Step 1 — Implement `PostgresSchedulerStore`

Create `server/scheduler/postgres-store.ts` with the same two-method interface:

```ts
export class PostgresSchedulerStore {
async read(): Promise<SchedulerState> {
// SELECT workflow, task, active executions from DB
// Map rows → SchedulerState
}

async write(state: SchedulerState): Promise<void> {
// Upsert workflow row
// Upsert task row
// Upsert / delete execution rows to match state.activeExecutions
}
}
```

The `read()` method maps real `TaskExecution` rows to `Execution` objects using
the same field names defined in `server/scheduler/types.ts`. Orphan detection,
concurrency gating, and recurrence advancement all operate on the in-memory
`SchedulerState` object — the store only handles persistence.

### Step 2 — Swap the wiring in `server/api.ts`

Replace the `JsonFileSchedulerStore` block (~10 lines) with:

```ts
import { PostgresSchedulerStore } from './scheduler/postgres-store';

const schedulerStore = new PostgresSchedulerStore(dbPool);
const getSchedulerState = async (): Promise<SchedulerState> => schedulerStore.read();
const setSchedulerState = async (s: SchedulerState): Promise<void> => schedulerStore.write(s);
```

The scheduler loop and all four API endpoints (`/status`, `/reconcile`,
`/trigger`, `/complete/:id`) receive `getSchedulerState`/`setSchedulerState` as
injected dependencies — they continue to work without any changes.

### Step 3 — No other files change

All scheduler logic (`workflow-scheduler.ts`, `monitoring.ts`,
`scheduler-loop.ts`, `intent-signal-discovery.ts`) operates on plain
`SchedulerState` objects. The pure-function design means none of those files
reference the store type.

---

## Audit log

`runIntentSignalScan()` appends a JSONL record to `data/scheduler-audit.jsonl`
on every execution attempt (success or failure):

```jsonl
{"execution_id":"…","workflow_id":"c10f1d63…::1.0","task_id":"8c929111…::1.0","started_at":"…","completed_at":"…","outcome":"completed"}
```

This file is the durable, observable proof that hourly cycles are executing.
In a PostgreSQL environment this record should also be written to a
`scheduler_audit` table so it survives across container restarts.

## Scan service integration

`runIntentSignalScan()` requires the `INTENT_SIGNAL_SCAN_URL` environment variable.
When set, it `POST`s the following JSON body to that URL:

```json
{
"workflow_id": "c10f1d63-0e63-4c03-bfea-aa16c31d2a6a::1.0",
"task_id": "8c929111-2380-49bb-b07d-e6c2429927c3::1.0",
"execution_id": "<uuid>"
}
```

A `2xx` response is treated as success. Any non-`2xx` response or network error
causes the execution to be recorded as `failed` in the audit log and the
scheduler lifecycle to mark the execution failed (so `reconcile()` can
clean it up on the next tick if it becomes orphaned).

If `INTENT_SIGNAL_SCAN_URL` is not set, `runIntentSignalScan()` throws
immediately so executions fail honestly rather than writing false-positive
"completed" audit records.

---

## Related files

| File | Role |
|------|------|
| `server/scheduler/types.ts` | `SchedulerState`, `Workflow`, `Task`, `Execution` types |
| `server/scheduler/workflow-scheduler.ts` | Pure scheduler logic (no I/O) |
| `server/scheduler/json-file-store.ts` | Current persistence adapter |
| `server/scheduler/scheduler-loop.ts` | `setInterval` daemon; injects store accessors |
| `server/scheduler/intent-signal-discovery.ts` | Seed factory; exports canonical `::1.0` IDs |
| `server/api.ts` | Wires store → loop; houses `runIntentSignalScan()` |
| `src/tests/scheduler/scheduler-integration.test.ts` | End-to-end lifecycle tests including 6-hour simulation |
Loading