Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -141,5 +141,5 @@ When tests fail, the orchestrator feeds test failure details back to the agent.
- **All test data lives in browser memory only** — never write to external realms during tests.
- **Use `import.meta.url`** to resolve card definitions — never hardcode realm URLs.
- **Use `data-test-*` attributes** for stable test selectors, not CSS classes.
- **Every ticket must have at least one test file** as `{card-name}.test.gts` co-located with the card definition.
- **Every issue must have at least one test file** as `{card-name}.test.gts` co-located with the card definition.
- **Test files live in the target realm** as realm files alongside the card definitions they test.
Original file line number Diff line number Diff line change
Expand Up @@ -40,9 +40,9 @@ Use `eq` to match exact field values. You must specify `on` to scope the field t
"filter": {
"on": {
"module": "http://localhost:4201/software-factory/darkfactory",
"name": "Ticket"
"name": "Issue"
},
"eq": { "ticketStatus": "in_progress" }
"eq": { "status": "in_progress" }
}
}
```
Expand Down Expand Up @@ -226,21 +226,21 @@ You can only filter/sort on fields that exist on the card type. To find which fi
"commandInput": {
"codeRef": {
"module": "http://localhost:4201/software-factory/darkfactory",
"name": "Ticket"
"name": "Issue"
}
}
}
```

2. The result contains `attributes.properties` listing all searchable fields (e.g., `ticketStatus`, `summary`, `priority`).
2. The result contains `attributes.properties` listing all searchable fields (e.g., `status`, `summary`, `priority`).

3. Use those field names in your `eq`, `contains`, `range`, or `sort` with the matching `on` type.

The card tools (`update_project`, `update_ticket`, `create_knowledge`, `create_catalog_spec`) also have dynamic JSON schemas in their parameters that list available fields.
The card tools (`update_project`, `update_issue`, `create_knowledge`, `create_catalog_spec`) also have dynamic JSON schemas in their parameters that list available fields.

### Inheritance

Filtering on a base card type's fields matches all cards that inherit from it. For example, filtering on `CardDef` fields like `cardTitle` or `cardDescription` finds cards of any type. Filtering on a `Ticket` field like `ticketStatus` finds only Ticket cards (and any subtypes of Ticket).
Filtering on a base card type's fields matches all cards that inherit from it. For example, filtering on `CardDef` fields like `cardTitle` or `cardDescription` finds cards of any type. Filtering on an `Issue` field like `status` finds only Issue cards (and any subtypes of Issue).

### Searching Through Relationship Fields

Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
name: software-factory-operations
description: Use when implementing cards in a target realm through the factory execution loop — covers the tool-use workflow for searching, writing, testing, and updating tickets via factory tools.
description: Use when implementing cards in a target realm through the factory execution loop — covers the tool-use workflow for searching, writing, testing, and updating issues via factory tools.
---

# Software Factory Operations
Expand All @@ -12,7 +12,7 @@ Use this skill when operating inside the factory execution loop. The factory age
- **Source realm** (`packages/software-factory/realm`)
Publishes shared modules, briefs, templates, and tracker schema. Never write to this realm.
- **Target realm** (user-specified, passed to `factory:go`)
Receives all generated artifacts: Project, Ticket, KnowledgeArticle, card definitions, card instances, Catalog Spec cards, and QUnit test files.
Receives all generated artifacts: Project, Issue, KnowledgeArticle, card definitions, card instances, Catalog Spec cards, and QUnit test files.

## Available Tools

Expand All @@ -30,7 +30,7 @@ The agent has these tools during the execution loop. Use them by name — they a
### Updating Project State

- `update_project({ path, attributes, relationships? })` — Update a Project card in the target realm. The tool's parameters include a dynamic JSON schema describing available fields — use it to know valid field names and types. The tool auto-constructs the JSON:API document with the correct `adoptsFrom`.
- `update_ticket({ path, attributes, relationships? })` — Update a Ticket card. Same structured interface with dynamic field schema in the tool parameters.
- `update_issue({ path, attributes, relationships? })` — Update an Issue card. Same structured interface with dynamic field schema in the tool parameters.
- `create_knowledge({ path, attributes, relationships? })` — Create or update a KnowledgeArticle card. Same structured interface with dynamic field schema in the tool parameters.
- `create_catalog_spec({ path, attributes, relationships? })` — Create a Catalog Spec card in the target realm's `Spec/` folder. Makes a card definition discoverable in the Boxel catalog. Same structured interface with dynamic field schema. The tool auto-constructs the document with `adoptsFrom` pointing to `https://cardstack.com/base/spec#Spec`.

Expand Down Expand Up @@ -60,20 +60,20 @@ Returns `{ status: "ready", result: "<serialized JsonCard with schema>" }`. Pars

### Control Flow

- `signal_done()` — Signal that the current ticket is complete. Call this only after all implementation and test files have been written.
- `signal_done()` — Signal that the current issue is complete. Call this only after all implementation and test files have been written.
- `request_clarification({ message })` — Signal that you cannot proceed and need human input. Describe what is blocking.

## Required Flow

1. **Inspect before writing.** Use `search_realm` and `read_file` to understand what already exists in the target realm before creating or modifying files.
2. **Move ticket to `in_progress`.** Use `update_ticket` to set the ticket status before starting implementation.
2. **Move issue to `in_progress`.** Use `update_issue` to set the issue status before starting implementation.
3. **Write card definitions** (`.gts`) via `write_file` to the target realm.
4. **Write card instances** (`.json`) via `write_file` to the target realm.
5. **Write a Catalog Spec card** (`Spec/<card-name>.json`) for each top-level card defined in the brief. Link sample instances via `linkedExamples`.
6. **Write `.test.gts` test files** co-located with card definitions via `write_file` to the target realm. Every ticket must have at least one test file.
6. **Write `.test.gts` test files** co-located with card definitions via `write_file` to the target realm. Every issue must have at least one test file.
7. **Call `signal_done()`** when all implementation and test files are written. The orchestrator triggers test execution after this.
8. **If tests fail**, the orchestrator feeds failure details back. Use `read_file` to inspect current state, then `write_file` to fix implementation or test files. Call `signal_done()` again.
9. **Update ticket state** via `update_ticket` — update notes, acceptance criteria, and related knowledge as work progresses.
9. **Update issue state** via `update_issue` — update notes, acceptance criteria, and related knowledge as work progresses.

## Target Realm Artifact Structure

Expand All @@ -86,11 +86,11 @@ target-realm/
├── Spec/
│ └── card-name.json # Catalog Spec card
├── Test Runs/
│ └── ticket-slug-1.json # TestRun card
│ └── issue-slug-1.json # TestRun card
├── Projects/
│ └── project-name.json # Project card
├── Tickets/
│ └── ticket-slug.json # Ticket card
├── Issues/
│ └── issue-slug.json # Issue card
└── Knowledge Articles/
└── article-name.json # KnowledgeArticle card
```
Expand Down
12 changes: 6 additions & 6 deletions packages/software-factory/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,15 @@ The software factory is an automated card-development system that takes a brief
The factory flow has four phases:

1. **Intake** — Fetch a brief card from a source realm, normalize it into a structured representation
2. **Bootstrap** — Create a target realm (if needed), populate it with a Project card, Knowledge Articles, and starter Tickets
3. **Implementation** — An LLM agent picks up the active ticket and uses tool calls to write card definitions (`.gts`), sample instances (`.json`), catalog specs (`Spec/`), and QUnit test files (`.test.gts`) into the target realm
2. **Bootstrap** — Create a target realm (if needed), populate it with a Project card, Knowledge Articles, and starter Issues
3. **Implementation** — An LLM agent picks up the active issue and uses tool calls to write card definitions (`.gts`), sample instances (`.json`), catalog specs (`Spec/`), and QUnit test files (`.test.gts`) into the target realm
4. **Verification** — The orchestrator runs QUnit tests via Playwright in a real browser, collects structured results into a TestRun card, and feeds failures back to the agent for iteration

The agent iterates (implement → test → fix) until tests pass or max iterations are reached. The orchestrator (the "ralph loop") controls iteration count, test execution, and ticket selection deterministically — the LLM handles only the implementation work.
The agent iterates (implement → test → fix) until tests pass or max iterations are reached. The orchestrator (the "ralph loop") controls iteration count, test execution, and issue selection deterministically — the LLM handles only the implementation work.

### Realm Roles

- **Source realm** (`packages/software-factory/realm/`) — publishes shared modules, card type definitions (Project, Ticket, KnowledgeArticle, TestRun), briefs, and templates. Never written to by the factory.
- **Source realm** (`packages/software-factory/realm/`) — publishes shared modules, card type definitions (Project, Issue, KnowledgeArticle, TestRun), briefs, and templates. Never written to by the factory.
- **Target realm** (user-specified) — receives all generated artifacts: card definitions, instances, specs, test files, and TestRun results.
- **Fixture realm** (`test-fixtures/`) — disposable test input for development-time verification of the factory itself.

Expand All @@ -24,7 +24,7 @@ The agent iterates (implement → test → fix) until tests pass or max iteratio
| Path | What it is |
| --------------------- | ------------------------------------------------------------------- |
| `Projects/` | Project card with objective, scope, success criteria |
| `Tickets/` | Ticket cards tracking implementation work |
| `Issues/` | Issue cards tracking implementation work |
| `Knowledge Articles/` | Context articles derived from the brief |
| `*.gts` | Card definition files |
| `*.test.gts` | Co-located QUnit test files |
Expand Down Expand Up @@ -81,7 +81,7 @@ The `--debug` flag shows LLM prompts, tool calls and their results, and `console
| Folder / File | What it is |
| -------------------------- | ------------------------------------------------------------------------- |
| `Projects/` | A Project card with the brief's objective and success criteria |
| `Tickets/` | Ticket cards — the active ticket should show status `done` |
| `Issues/` | Issue cards — the active issue should show status `done` |
| `Knowledge Articles/` | Context articles derived from the brief |
| `*.gts` | Card definition file(s) for the implemented card |
| `*.test.gts` | Co-located QUnit test file(s) |
Expand Down
34 changes: 17 additions & 17 deletions packages/software-factory/docs/testing-strategy.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ The testing strategy assumes three separate realm roles:
- `packages/software-factory/realm`
- publishes shared modules, briefs, templates, and other software-factory inputs
- target realm
- the user-selected realm where the factory writes generated tickets, knowledge articles, and implementation artifacts
- the user-selected realm where the factory writes generated issues, knowledge articles, and implementation artifacts
- fixture realm
- disposable test data used to verify source-realm publishing and target-realm behavior during development

Expand All @@ -47,7 +47,7 @@ If the source realm includes output-like examples, they should be clearly labele

The factory requires the agent to produce tests alongside implementation code. This is not optional.

Flow per ticket:
Flow per issue:

1. agent implements the card or feature in the target realm
2. agent generates QUnit test files co-located with card definitions (`.test.gts`)
Expand All @@ -57,9 +57,9 @@ Flow per ticket:
6. test results are parsed from the QUnit test output, grouped by module into `TestModuleResult` entries, and written back to the TestRun card's `moduleResults` field. Each TestModuleResult has a `moduleRef` (CodeRefField with `module` = test module URL, `name` = "default") and its own `passedCount`/`failedCount` computeds. TestRun's `passedCount`/`failedCount` are rolled up across all TestModuleResults.
7. if tests fail, the full test output (errors, stack traces) is available on the TestRun card and fed back to the agent
8. agent iterates on implementation and/or tests until all tests pass
9. passing TestRun cards serve as durable verification evidence for the ticket, linked to the Project card
9. passing TestRun cards serve as durable verification evidence for the issue, linked to the Project card

This loop is the primary quality gate. A ticket cannot be marked done without at least one passing TestRun in the target realm.
This loop is the primary quality gate. An issue cannot be marked done without at least one passing TestRun in the target realm.

## Core Principle

Expand Down Expand Up @@ -153,7 +153,7 @@ We are trying to prove that:

- briefs are normalized correctly
- project artifacts are created correctly
- ticket state transitions are correct
- issue state transitions are correct
- verification gates are enforced
- reruns resume instead of duplicating work
- failure paths are handled predictably
Expand All @@ -165,7 +165,7 @@ This is the straightforward part.
Test the `DarkFactory` cards like normal Boxel artifacts:

- `Project`
- `Ticket`
- `Issue`
- `KnowledgeArticle`
- `AgentProfile`

Expand Down Expand Up @@ -224,7 +224,7 @@ Examples:
- a vague brief defaults to thin-MVP planning
- a missing target realm gets bootstrapped correctly via `/_create-realm` while reusing the public tracker module URL
- rerunning bootstrap does not create duplicate cards
- existing `in_progress` tickets are resumed instead of replaced
- existing `in_progress` issues are resumed instead of replaced

## Terminology: "Spec" Disambiguation

Expand All @@ -250,11 +250,11 @@ The agent produces actions using these actual action types:
- `update_file` — replace the content of an existing file
- `create_test` — create a QUnit test file co-located with card definitions (`.test.gts`)
- `update_test` — update an existing QUnit test file
- `update_ticket` — update the current ticket with notes or status changes
- `update_issue` — update the current issue with notes or status changes
- `create_knowledge` — create a knowledge article
- `invoke_tool` — run a registered tool (search-realm, realm-read, etc.)
- `request_clarification` — signal that the agent cannot proceed
- `done` — signal that all work for this ticket is complete
- `done` — signal that all work for this issue is complete

Then test the loop as a state machine.

Expand All @@ -272,9 +272,9 @@ Then test the loop as a state machine.

Assertions should be about workflow behavior:

- the right ticket is chosen
- the right issue is chosen
- the right state transitions occur
- failed verification keeps the ticket open
- failed verification keeps the issue open
- successful verification advances the loop
- clarification paths stop correctly
- retries and resumes are handled correctly
Expand All @@ -292,10 +292,10 @@ Suggested acceptance cases:
1. Sticky Note bootstrap
- brief URL points to `software-factory/Wiki/sticky-note`
- target realm is a scratch or temp realm
- result is one project, starter knowledge cards, and starter tickets
- result is one project, starter knowledge cards, and starter issues

2. Sticky Note first implementation pass
- loop executes the first active ticket
- loop executes the first active issue
- one implementation artifact is created (card definition + card instance)
- one Catalog Spec card is created in the `Spec/` folder
- one QUnit test file is created co-located with the card definition (`.test.gts`)
Expand All @@ -312,7 +312,7 @@ These tests are slower and more brittle, so keep them few and high-signal.
Avoid tests that depend on:

- exact phrasing of generated text
- exact ticket wording
- exact issue wording
- exact `agentNotes` wording
- full open-ended model behavior

Expand Down Expand Up @@ -398,15 +398,15 @@ These are the highest-value early tests:
- use a dedicated fixture realm, not the published realm itself, for any mutable test setup
2. brief normalization handles the sticky-note wiki card
3. target realm bootstrap creates required surfaces in a temp realm
4. artifact bootstrap creates one project and one `in_progress` ticket
4. artifact bootstrap creates one project and one `in_progress` issue
5. rerunning bootstrap does not duplicate artifacts
6. fake loop test covers success path
7. fake loop test covers failed verification path
8. one end-to-end sticky-note acceptance test

## Ticket Mapping

Testing is part of implementation and should stay attached to the current Linear tickets.
Testing is part of implementation and should stay attached to the current Linear issues.

The current mapping is:

Expand All @@ -425,7 +425,7 @@ The current mapping is:
- ~~`CS-10451`~~ _(cancelled — hard-coded verification policy conflicts with phase-2 issue-driven approach where test execution is an issue type, not an orchestrator-enforced gate)_
- ~~verification-policy unit tests~~
- `CS-10450`
- execution loop implementation, broken into child tickets:
- execution loop implementation, broken into child issues:
- ~~action dispatcher~~ _(replaced by agent-driven tool calls in CS-10568)_
- context builder (assemble `AgentContext` from skills, realm state) — CS-10567
- core loop orchestrator (run → test → iterate cycle) — CS-10568
Expand Down
1 change: 0 additions & 1 deletion packages/software-factory/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@
"license": "MIT",
"description": "Software Factory workspace package",
"scripts": {
"boxel:pick-ticket": "NODE_NO_WARNINGS=1 ts-node --transpileOnly scripts/pick-ticket.ts",
"boxel:search": "NODE_NO_WARNINGS=1 ts-node --transpileOnly scripts/boxel-search.ts",
"boxel:session": "NODE_NO_WARNINGS=1 ts-node --transpileOnly scripts/boxel-session.ts",
"cache:prepare": "NODE_NO_WARNINGS=1 ts-node --transpileOnly src/cli/cache-realm.ts",
Expand Down
18 changes: 9 additions & 9 deletions packages/software-factory/prompts/ticket-implement.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,19 +18,19 @@ Success criteria:
{{content}}
{{/each}}

# Current Ticket
# Current Issue

ID: {{ticket.id}}
Summary: {{ticket.summary}}
Status: {{ticket.status}}
Priority: {{ticket.priority}}
ID: {{issue.id}}
Summary: {{issue.summary}}
Status: {{issue.status}}
Priority: {{issue.priority}}

Description:
{{ticket.description}}
{{issue.description}}

{{#if ticket.checklist}}
{{#if issue.checklist}}
Checklist:
{{#each ticket.checklist}}
{{#each issue.checklist}}
- [ ] {{.}}
{{/each}}
{{/if}}
Expand All @@ -54,7 +54,7 @@ You previously invoked the following tools. Use these results to inform your imp

# Instructions

Implement this ticket:
Implement this issue:

1. Use search_realm and read_file to inspect existing realm state
2. Use write_file to create or update card definitions (.gts) and/or card instances (.json) in the target realm
Expand Down
Loading
Loading