Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
103 commits
Select commit Hold shift + click to select a range
fdf453c
refactor: rename onyxia-planner to onyxia-compiler
Feilkin Feb 14, 2026
7bb1609
refactor: rename kernels to operators
Feilkin Feb 14, 2026
7c6995f
refactor: rename OpOperator to Operator
Feilkin Feb 14, 2026
a8fff9d
ai: plans for refactoring
Feilkin Feb 15, 2026
d52f4cf
feat(core): add onyxia-core crate with IR, traits, and plan types
Feilkin Feb 15, 2026
8ad8f46
ai: allow commit creation
Feilkin Feb 15, 2026
38735cf
feat(compiler): implement compiler pipeline with pluggable passes
Feilkin Feb 15, 2026
abddca6
feat(operators): add onyxia-operators crate with collapsed operator f…
Feilkin Feb 15, 2026
1420b75
feat(runtime/compiler/core): add runtime dimension update support
Feilkin Feb 15, 2026
1ea14ea
feat(operators): migrate initial 12 operators to onyxia-operators
Feilkin Feb 15, 2026
cc1a648
feat(operators): migrate remaining 26 operators to onyxia-operators
Feilkin Feb 15, 2026
22adc9d
fix(core): enable all naga capabilities in shader compilation
Feilkin Feb 15, 2026
737ff6c
feat(core): export TensorMetadata from root module
Feilkin Feb 15, 2026
c99e519
build(runtime): add onyxia-compiler and onyxia-operators as dev-depen…
Feilkin Feb 15, 2026
694a688
fix(runtime): remove unused imports and variables
Feilkin Feb 15, 2026
9e56f30
fix(runtime): fix test compilation errors
Feilkin Feb 15, 2026
38f2c98
feat: remove obsolete shape_inference module
Feilkin Feb 15, 2026
00ca0b0
fix: CLI tests to new architecture
Feilkin Feb 15, 2026
ec62b7b
fix: missing operator shape inference implementations
Feilkin Feb 15, 2026
52589d8
feat(operators): complete 11 operator implementations
Feilkin Feb 15, 2026
e20b806
chore: cargo fmt
Feilkin Feb 15, 2026
857f4e9
chore: clippy --fix
Feilkin Feb 15, 2026
9311cde
feat: keep tensor metadata while folding
Feilkin Feb 15, 2026
a5b1acc
feat(compiler): add InitializeConstantsPass
Feilkin Feb 15, 2026
b3aa765
fix(compiler): integrate InitializeConstantsPass into pipeline
Feilkin Feb 15, 2026
3982fc7
fix(operators): use numeric shader defines in Cast operator
Feilkin Feb 15, 2026
9552e75
feat(operators): add variadic input support to binary elementwise ops
Feilkin Feb 15, 2026
975e7d8
perf(compiler): run constant folding before shape inference
Feilkin Feb 15, 2026
b8a4e7a
chore: ignore plans
Feilkin Feb 15, 2026
112aaf1
feat(core): add IrInput enum for flexible node inputs
Feilkin Feb 15, 2026
0f09b64
feat(core): convert IrNode from struct to enum
Feilkin Feb 15, 2026
252d6cd
feat(core/compiler): add graph mutation helpers for value nodes
Feilkin Feb 15, 2026
505d296
refactor: constant folding
Feilkin Feb 15, 2026
35a12d0
refactor: remove deprecated files
Feilkin Feb 15, 2026
1aee407
refactor(core): replace IrNode enum with struct, move constants to edges
Feilkin Feb 15, 2026
2f41913
chore: cargo fmt
Feilkin Feb 15, 2026
b8448a9
refactor(core): replace IrEdge fields with EdgeData enum
Feilkin Feb 15, 2026
8a412bd
feat: initialize_constants pass
Feilkin Feb 15, 2026
0a0f467
fix(operators): ConstantOfShape wrong attribute type
Feilkin Feb 15, 2026
fabe246
fix: unknown shapes in ONNX models
Feilkin Feb 15, 2026
3a6ad60
feat: operator namespaces, com.microsoft.RotaryEmbedding operator
Feilkin Feb 16, 2026
7bfe78c
doc: vendor ContribOperators.md
Feilkin Feb 16, 2026
252c388
feat(operators): com.micrsoft::RotaryEmbedding
Feilkin Feb 16, 2026
d353411
fix(operators): register MatMulNBits with correct domain
Feilkin Feb 16, 2026
e55ea8f
fix(operators): fix QGA domain in tests
Feilkin Feb 16, 2026
0d33e73
feat: buffer initialization
Feilkin Feb 16, 2026
b5eac31
chore: cargo fmt, fix, clippy
Feilkin Feb 16, 2026
82b3d84
fix(operators): correctly infer output type for operators
Feilkin Feb 16, 2026
4f2d01c
feat(cli): add inspect-node command for detailed node inspection
Feilkin Feb 16, 2026
790a56d
feat(cli): add list-nodes command for filtering and inspecting model …
Feilkin Feb 16, 2026
63b8570
feat(cli): add inspect-tensor command for tensor inspection
Feilkin Feb 16, 2026
9836e26
feat(core/operators): add enhanced shape error messages with full con…
Feilkin Feb 16, 2026
f28cac9
feat(cli): add trace-node command for local dataflow visualization
Feilkin Feb 16, 2026
73f9c5d
feat(cli): add validate command for model validation
Feilkin Feb 16, 2026
ec2cdd6
fix: buffer issues
Feilkin Feb 16, 2026
6998598
test: run GPU e2e tests in series
Feilkin Feb 16, 2026
55b45a3
test: fix gemma compilation test
Feilkin Feb 16, 2026
d1c1458
feat(runtime): return error if not all inputs are present
Feilkin Feb 16, 2026
45a7bb7
feat(cli): initialize KV cache tensors
Feilkin Feb 16, 2026
09f2a1e
fix(operators): fix attention
Feilkin Feb 16, 2026
c90b388
feat(compiler): only fold nodes if their inputs are available
Feilkin Feb 16, 2026
6701070
doc: update documentation
Feilkin Feb 16, 2026
aa20832
doc: logo color palette link
Feilkin Feb 17, 2026
78b1c19
test: remove broken gemma-related tests
Feilkin Feb 17, 2026
612ef56
feat: remove most operators
Feilkin Feb 17, 2026
5ef2cb5
feat(core): add dispatch-based execution types
Feilkin Feb 17, 2026
e5db28e
feat: remove unused compiler passes
Feilkin Feb 17, 2026
f1e60c2
fix(core): broken tests
Feilkin Feb 17, 2026
7d5b2d8
feat: update runtime to new architecture
Feilkin Feb 17, 2026
99dc80c
feat(operators): binary and Shape operators
Feilkin Feb 17, 2026
21eba40
test(operators): end-to-end tests for new operators
Feilkin Feb 17, 2026
0df6cf2
feat: update docs
Feilkin Feb 17, 2026
a1056c4
feat(cli): run model command
Feilkin Feb 17, 2026
8afb666
feat(cli): print operator domain if specified
Feilkin Feb 17, 2026
f84573e
feat(operators): add Div, Sub, and Pow binary elementwise operators
Feilkin Feb 17, 2026
53a1f59
feat(operators): add comparison operators (Equal, Greater, Less, Grea…
Feilkin Feb 17, 2026
ac2cca2
feat(operators): add unary math operators (Neg, Sqrt, Cos, Sin, Tanh)
Feilkin Feb 17, 2026
adfe9aa
feat(operators): add shape manipulation operators (Concat, Expand, Tr…
Feilkin Feb 17, 2026
9e0ae03
fix(operators): shape operator shader bug
Feilkin Feb 17, 2026
72404b4
feat(operators): implement Slice operator
Feilkin Feb 17, 2026
8a76557
feat(operators): implement MatMul operator with GPU support
Feilkin Feb 17, 2026
00a6c04
feat(operators): add ReduceMean and Max operators
Feilkin Feb 17, 2026
c50f841
feat(operators): add Gather and ScatterND indexing operators
Feilkin Feb 17, 2026
e7aa603
feat(operators): add Shape and ConstantOfShape operators
Feilkin Feb 17, 2026
2161208
feat(operators): add Cast operator for type conversion
Feilkin Feb 17, 2026
0d1c31b
feat(operators): add Softmax operator with three-pass algorithm
Feilkin Feb 17, 2026
3f696fc
feat(operators): implement Range, Trilu, and Where operators
Feilkin Feb 17, 2026
822532b
fix(cli): validate command not using operator domain in resolution
Feilkin Feb 17, 2026
c748eee
feat(operators): add Constant operator
Feilkin Feb 17, 2026
2e99d65
feat(operators): add Gelu activation operator
Feilkin Feb 17, 2026
adb68dc
feat(operators): add ReduceSum reduction operator
Feilkin Feb 17, 2026
1be6ef7
feat(operators): add SimplifiedLayerNormalization operator
Feilkin Feb 17, 2026
fd9875f
feat(operators): implement Microsoft contrib operators RotaryEmbeddin…
Feilkin Feb 17, 2026
230f6d7
fix(operators): Gemma 3 270m model using com.microsoft::SimplifiedLay…
Feilkin Feb 17, 2026
618ec15
feat(operators): 3D dispatch
Feilkin Feb 17, 2026
8f0bb76
feat(core): use submission ID when copying buffers
Feilkin Feb 17, 2026
0ade5ad
fix(operators): handle I64 types properly in Concat operator
Feilkin Feb 18, 2026
4119563
fix: missing KV cache tensors
Feilkin Feb 18, 2026
bb23bcb
fix(operators): QGA KV cache concatenation
Feilkin Feb 18, 2026
e64182a
fix: inference errors with Gemma 3 270M
Feilkin Feb 18, 2026
0daed50
feat(operators): add com.microsoft::MatMulNBits
Feilkin Feb 18, 2026
f887854
chore(operators): remove debug print statements from tests (task 065)…
Feilkin Feb 18, 2026
6450f5a
feat: more immediates
Feilkin Feb 18, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .config/nextest.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,5 @@ gpu-tests = { max-threads = 1 }
[profile.default]
# Assign GPU-intensive tests to the serial group
[[profile.default.overrides]]
filter = 'test(~generation) + test(~gemma) + test(~llm)'
filter = 'test(~generation) + test(~gemma) + test(~llm) + test(~e2e)'
test-group = 'gpu-tests'
104 changes: 104 additions & 0 deletions .github/agents/implement-task.agent.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
---
description: "Read a task file from tasks/, implement it, verify, commit, and mark done"
argument-hint: "Task file name"
tools: [execute, read, agent, edit, search, githits/search, todo]
---

You are an implementation agent for the Onyxia project. Your job is to read a single task file, implement all required changes, verify correctness, commit the result, and mark the task as done.

## Input

The user provides a task file name.

## Workflow

Follow these steps **in order**. Do not skip steps. Do not proceed to the next step until the current step is fully complete.

### Step 1: Read and understand the task

1. Read the task file at `tasks/<number>-*.md` in full.
2. If the task references a plan file, read the relevant sections of the plan to get additional context.
3. Read `ARCHITECTURE.md` and `.github/copilot-instructions.md` to understand project conventions.
4. Identify the task's dependencies (listed at the top of the task file). Verify that each dependency task has been completed — its file should be prefixed with `_done_` (e.g., `tasks/_done_018-onyxia-core-crate.md`). If any dependency is not done, **stop and report which dependencies are missing**. Do not attempt the task.
5. Summarize your understanding of the task and create a todo list with specific implementation steps.

### Step 2: Implement the changes

1. Work through each todo item, marking items in-progress and completed as you go.
2. Follow the project's code conventions:
- Rust 2024 edition idioms
- `///` doc comments on all public APIs
- `Result<T, E>` with `?` operator for error handling
- Unit tests in `#[cfg(test)]` modules
3. **Dependency management**: Use `cargo add <crate>` to add dependencies — never edit `Cargo.toml` by hand. Before adding any new external dependency (not a workspace crate), list it and ask for confirmation.
4. Write idiomatic Rust: iterators, pattern matching, proper ownership.
5. Implement everything specified in the task's "Scope" section. Do not implement more or less than what the task specifies.

### Step 3: Verify the changes

Run **all** of these checks. Every single one must pass before proceeding to Step 4.

```bash
cargo fmt --all # Format code
cargo clippy --workspace # Lint — no warnings allowed
cargo build --workspace # Full workspace build
cargo nextest run # Run all non-ignored tests
cargo nextest run --run-ignored=all # Run GPU tests
```

If any check fails:
1. Read the error output carefully.
2. Fix the issue.
3. Re-run **all** checks from the beginning (not just the one that failed).

Do not proceed to Step 4 until all four commands succeed.

### Step 4: Verify the Definition of Done

Re-read the "Definition of Done" section of the task file. Go through each checkbox item and verify that it has been satisfied by your implementation. If any item is not satisfied, go back to Step 2 and implement the missing piece, then re-run Step 3.

### Step 5: Commit the changes

Create a single commit using [Conventional Commits](https://www.conventionalcommits.org/) style:

```bash
git add -A
git commit -m "<type>(<scope>): <description>

<body>"
```

**Commit message rules:**
- `<type>`: Use `feat` for new features/crates, `refactor` for restructuring, `test` for test-only changes, `fix` for bug fixes, `chore` for non-code changes.
- `<scope>`: The primary crate affected (e.g., `core`, `compiler`, `operators`, `runtime`, `cli`). Use multiple scopes separated by `/` if needed (e.g., `core/compiler`).
- `<description>`: Imperative mood, lowercase, no period. Summarize what was done.
- `<body>`: Brief list of what was implemented (bullet points). Reference the task number.

Example:
```
feat(core): add onyxia-core crate with IR, traits, and plan types

- Graph IR with StableGraph, IrNode, TensorDef
- Operator and Pass traits with InferenceCtx, FoldCtx, PlanCtx
- TensorShape enum (Static, Symbolic, Absent — no Unknown)
- SymbolicExpr parser/evaluator moved from onyxia-compiler
- CompiledModel and PlannedOp plan types
- OperatorRegistry
```

### Step 6: Mark the task as done

Rename the task file by prepending `_done_` to the filename:

```bash
mv tasks/<number>-<name>.md tasks/_done_<number>-<name>.md
```

## Important Rules

- **Do not modify files outside the task's scope** unless the task explicitly requires it (e.g., updating workspace `Cargo.toml`).
- **Do not add dependencies without listing them first.** Workspace-internal crate dependencies (e.g., `onyxia-core = { path = "../onyxia-core" }`) are fine to add without asking.
- **Stop immediately if dependency tasks are not done.** Do not attempt partial implementation.
- **Run the full verification suite (Step 3) before committing.** No exceptions.
- **One commit per task.** Do not create multiple commits.
- **Do not generate "implementation report" or other useless markdown files.**
9 changes: 9 additions & 0 deletions .github/agents/task-master.agent.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
---
name: task-master
description: Orchestrates sub-agents to complete tasks from tasks/-folder.
argument-hint: Which tasks to implement (018-025)
tools: [read/readFile, agent, edit/createFile, edit/editFiles, search/fileSearch, search/listDirectory, todo]
agents: ["implement-task"]
---
Resolve task numbers to task file names (`tasks/xxx-something.md`).
For each task file in the given range, invoke "implement-task" sub-agent with "Implement task: {task file name}, and report back with a summary of what was implemented, or any blockers that are preveting you from completing the task". Do not add anything else to the invocation. If the sub-agent becomes blocked by some issue, create a new task to solve that issue, and invoke another sub-agent to implement that task, then try running the blocked agent again.
64 changes: 48 additions & 16 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,30 @@
# Copilot Instructions for Onyxia

Onyxia is a **GPU compute shader runtime for ONNX models**, built in Rust 2024 edition. It compiles ONNX operator graphs into WGSL compute shaders executed via `wgpu`.
Onyxia is a **GPU compute shader runtime for ONNX models**, built in Rust 2024 edition. It uses a dispatch-based execution model where operators compile their shaders at compile time and compute shapes at runtime from actual input tensors.

### Architecture

```
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ onyxia-onnx │────▶│ onyxia-planner │────▶│ onyxia-runtime │
│ (ONNX parser) │ │ (execution graph)│ │ (GPU executor) │
└─────────────────┘ └──────────────────┘ └─────────────────┘
┌─────────────────┐
│ onyxia-cli │
(CLI interface) │
─────────────────┘
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────
│ onyxia-onnx │────▶│ onyxia-compiler │────▶│ onyxia-runtime
│ (ONNX parser) │ │ (dispatch model) │ │ (GPU executor)
└─────────────────┘ └──────────────────┘ └─────────────────
┌───────────┴───────────┐ ┌─────────────────┐
│ onyxia-operators │ onyxia-cli
│ (3 core operators) (CLI interface) │
└───────────────────────┘ └──────────────────┘
```

| Crate | Purpose |
|-------|---------|
| `onyxia-onnx` | Parse ONNX protobuf models into internal IR |
| `onyxia-planner` | Generate execution plans with pre-compiled WGSL shaders |
| `onyxia-runtime` | Execute execution plans on GPU via `wgpu` |
| `onyxia-cli` | CLI for testing models, generating dot graphs, benchmarking |
| `onyxia-core` | IR graph, operator/dispatch traits, compiled model types, operator registry |
| `onyxia-operators` | 3 core operators (Add, Mul, Reshape) with WGSL shaders |
| `onyxia-compiler` | Build dispatch models with pre-compiled WGSL shaders |
| `onyxia-runtime` | Execute dispatch models on GPU via `wgpu` with register-based routing |
| `onyxia-cli` | CLI for model inspection, DOT graphs, validation |

Read [ARCHITECTURE.md](../ARCHITECTURE.md) for more details.

Expand Down Expand Up @@ -81,6 +83,36 @@ pub mod onnx {
- For GPU code, write WGSL in separate `.wgsl` files and use `naga_oil` for runtime compilation

### Cross-Crate Communication
- `onyxia-onnx` exports an IR that `onyxia-planner` consumes
- `onyxia-planner` produces execution plans for `onyxia-runtime`
- `onyxia-onnx` exports a `Graph` that `onyxia-compiler` consumes
- `onyxia-core` defines the `Operator` trait (with `create_dispatch()`) and `OpDispatch` trait
- `onyxia-operators` implements operators and exports `core_operator_registry()`
- `onyxia-compiler` produces `CompiledModel` (dispatch entries + register routing) for `onyxia-runtime`
- `onyxia-runtime` executes dispatch entries using a register-based execution model
- Keep crate boundaries clean; avoid circular dependencies

### Key Concepts

**Dispatch-based execution:**
- Each operator implements `Operator::create_dispatch()` to produce an `OpDispatch` object
- At runtime, `OpDispatch::dispatch(inputs, ctx) -> outputs` computes shapes from actual inputs and executes GPU work
- No compile-time shape inference or constant folding — shapes determined at runtime

**Register machine:**
- Runtime maintains a vector of `Option<RuntimeTensor>` (the register file)
- Each tensor in the IR graph maps to a register index
- Operations read inputs from registers and write outputs to registers

**Operator trait:**
```rust
pub trait Operator: Send + Sync {
fn name(&self) -> &str;
fn create_dispatch(&self, ctx: &mut CompileCtx) -> Result<Box<dyn OpDispatch>>;
}
```

**OpDispatch trait:**
```rust
pub trait OpDispatch: Send + Sync {
fn dispatch(&self, inputs: Vec<RuntimeTensor>, ctx: &mut DispatchCtx) -> Result<Vec<RuntimeTensor>>;
}
```
Loading