Iterative feedback-based NKI kernel optimization pipeline for AWS Trainium/Inferentia.
Adapts the KernelGYM architecture to generate and optimize AWS NKI (Neuron Kernel Interface) kernels.
PyTorch Reference -> LLM generates NKI kernel -> Compile (@nki.jit)
-> Correctness check -> Performance measurement -> Feedback -> Repeat
Phase 1: Skeleton - Project structure created with detailed TODO comments. See docs/plans/progress.md for current status.
# Install
pip install -e .
# Verify import
python -c "import nkigym; print(nkigym.__version__)"
# Start evaluation server (requires Neuron instance)
python -m nkigym.server.api
# Run evaluation
bash nkigym/scripts/eval/claude-sonnet-nki.shnkigym/
├── data/ # Dataset loading & NKI prompt templates
├── server/ # Kernel evaluation (compile, correctness, timing)
├── rewards/ # Reward computation & server client
├── workers/ # Async reward manager & multi-turn agent
├── metrics/ # Multi-turn evaluation metrics
├── config/ # YAML configuration
└── scripts/ # Evaluation shell scripts
Grep for specific TODO tags to find areas needing implementation:
| Tag | What's needed |
|---|---|
[NKI-COMPILE] |
NKI compilation API, @nki.jit usage |
[NKI-EXEC] |
Neuron device execution, tensor creation |
[NKI-PERF] |
neuron-profile metrics, timing |
[NKI-PROMPT] |
Optimization guidelines, code patterns |
[NKI-CODE] |
Import patterns, decorators |
[NKI-FEEDBACK] |
Error message format |