A self-hosting, self-organizing compiler for a minimal Rust-shaped language, bootstrapped from a 1960s instruction set floor, targeting real vintage hardware.
A compiler that arrives on a machine, interrogates it, and decides what it can become. Not a stripped-down Rust — a seed language whose semantics can be fully expressed using the 14 operations present on every stored-program machine since the IBM System/360 (1964). From that seed the language grows, gated by what the hardware underneath actually supports.
| Machine | CPU | RAM | Storage | Tier ceiling |
|---|---|---|---|---|
| NEC UltraLite (1988) | NEC V30 (8086-class) | 640KB | RAM drive + floppy | Tier 0–1 |
| NEC Versa (1st gen, 1993) | Intel 486SL @ 20MHz | 4–20MB | HDD + floppy | Tier 2 |
| Mac OS 8 machine | 68k or PowerPC | 32–256MB | HDD + CD | Tier 3 |
rusty-dusty-bones/
├── README.md
├── docs/
│ ├── design/
│ │ ├── 00-concept.md philosophy, spiritual ancestors
│ │ ├── 01-isa-floor.md target machine survey, ISA comparison
│ │ ├── 02-capability-record.md the 16-byte hardware manifest format
│ │ ├── 03-growth-tiers.md capability-gated language tiers
│ │ ├── 04-grammar.md Tier 0 EBNF grammar (complete)
│ │ ├── 05-self-organization.md hardware interrogation architecture
│ │ ├── 06-ir.md IR instruction set and encoding
│ │ ├── 07-codegen-v30.md V30 register model, calling convention
│ │ └── 08-pipeline-status.md current status, known limitations
│ └── research/
│ └── isa_reference.html interactive ISA reference (ENIAC to WebAssembly)
├── src/
│ └── tier0/
│ ├── symbol_table.t0 types, names, scopes — the compiler's memory
│ ├── lexer.t0 tokenizer (73 token kinds, keyword table)
│ ├── parser.t0 recursive descent, single pass, wired to IR
│ ├── ir.t0 flat IrBlock, 45 IR kinds, bridge functions
│ └── probe_x86.t0 hardware probe — writes capability record
└── targets/
├── v30/
│ ├── codegen_v30.t0 V30/8086 real mode code generator
│ ├── bios_stubs.asm extern function implementations (231 bytes)
│ ├── seed.asm trust anchor — runs first on UltraLite (539 bytes)
│ ├── codegen_v30_example.asm traced example: detect_ram() full output
│ └── README.md
├── i486/
│ ├── codegen_i486.t0 486 32-bit protected mode code generator
│ └── README.md
└── m68k/
├── codegen_m68k.t0 Motorola 68k / Mac OS 8 code generator
└── README.md
Every compiler pass, every hardware probe, every capability-record write is expressible using only these operations — present on every machine since 1964:
LOAD STORE ADD SUB AND OR XOR SHL SHR MOV LI JMP BEQ/BNE CMP CALL RET
seed.asm (hand-written, ~300-400 lines per target)
probes hardware via BIOS / Toolbox
writes 16-byte capability record to fixed address
loads STAGE1.BIN from storage
jumps to Stage 1
Stage 1 compiles itself (bit-identical check)
unlocks next tier based on capability record
source bytes (.t0)
lexer.t0 token stream
parser.t0 symbol table + type checking + IR emission
ir.t0 flat IrBlock (max 255 instructions per function)
codegen_*.t0 target NASM / Motorola assembly
assembler flat binary
real hardware
| Feature | V30 (16-bit real) | i486 (32-bit flat) | 68k (Mac OS 8) |
|---|---|---|---|
| Working registers | AX BX CX DX SI DI | EAX EBX ECX EDX ESI EDI | D0-D5, A0-A1 |
| Pointer width | 2 bytes | 4 bytes | 4 bytes |
| 32-bit values | DX:AX split | Single register | Single Dn register |
| Comparison → bool | Short-jump hack (8 bytes) | SETCC (3 bytes) | Branch workaround |
| Zero-extension | XOR + MOV | MOVZX | MOVE (unsigned) |
| Sign-extension | CBW trick | MOVSX | EXT.W / EXT.L |
| Frame setup | PUSH BP / MOV BP,SP / SUB SP,n | PUSH EBP / MOV EBP,ESP / SUB ESP,n | LINK A6,#-n |
| Frame teardown | MOV SP,BP / POP BP | MOV ESP,EBP / POP EBP | UNLK A6 |
| Operand order | dest, src (Intel) | dest, src (Intel) | src, dest (Motorola) |
| XOR mnemonic | XOR | XOR | EOR |
| Call / Return | CALL / RET | CALL / RET | JSR / RTS |
All three code generators share these limitations, documented in 08-pipeline-status.md:
- Spill reload is stubbed — functions with >6 simultaneously live IR values will produce incorrect output. The probe functions (<=4 live values) are fine.
- Array indexing emits plain dereference rather than base + i * sizeof(elem).
- The i486 and 68k seed binaries are not yet written (architecture documented).
- Architecture, design notes, ISA research (docs/design/00-08)
- Capability record format (16 bytes, fixed address 0x0800)
- Growth tier design (Tier 0-3, capability-gated)
- Tier 0 grammar (full EBNF, 22 keywords, no inference)
- symbol_table.t0 — types, scopes, frame tracking (334 lines)
- lexer.t0 — tokenizer, 73 token kinds (539 lines)
- parser.t0 — single-pass recursive descent, fully wired to IR (1286 lines)
- ir.t0 — flat IrBlock, 45 instruction kinds, all bridge functions (596 lines)
- probe_x86.t0 — hardware probe, writes capability record (136 lines)
- codegen_v30.t0 — V30/8086 real mode, LRU allocator (1025 lines)
- codegen_i486.t0 — 486 32-bit flat, SETCC, MOVZX/MOVSX (971 lines)
- codegen_m68k.t0 — 68k Mac OS 8, LINK/UNLK, dual register files (1093 lines)
- bios_stubs.asm — all V30 extern functions, verified 231 bytes
- seed.asm — V30/UltraLite trust anchor, verified 539 bytes DOS .COM
- Interactive ISA reference HTML (ENIAC to WebAssembly)
- Spill reload in all three code generators
- seed.asm for i486 target (Versa 486 + A20 + protected mode entry)
- seed.asm for m68k target (Mac OS 8 Toolbox startup sequence)
- toolbox_stubs.asm for m68k (Gestalt + FSFindFolder + BlockMove)
- Bootstrap host (C or Rust wrapper to run compiler on a modern machine)
- Self-hosting check (compile self, verify bit-identical output)
- Tier 1-3 source files (impl blocks, generics, closures)