Skip to content

sormondocom/rusty-dusty-bones

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rusty-dusty-bones

A self-hosting, self-organizing compiler for a minimal Rust-shaped language, bootstrapped from a 1960s instruction set floor, targeting real vintage hardware.

The idea

A compiler that arrives on a machine, interrogates it, and decides what it can become. Not a stripped-down Rust — a seed language whose semantics can be fully expressed using the 14 operations present on every stored-program machine since the IBM System/360 (1964). From that seed the language grows, gated by what the hardware underneath actually supports.

Target machines

Machine CPU RAM Storage Tier ceiling
NEC UltraLite (1988) NEC V30 (8086-class) 640KB RAM drive + floppy Tier 0–1
NEC Versa (1st gen, 1993) Intel 486SL @ 20MHz 4–20MB HDD + floppy Tier 2
Mac OS 8 machine 68k or PowerPC 32–256MB HDD + CD Tier 3

Repository layout

rusty-dusty-bones/
├── README.md
├── docs/
│   ├── design/
│   │   ├── 00-concept.md            philosophy, spiritual ancestors
│   │   ├── 01-isa-floor.md          target machine survey, ISA comparison
│   │   ├── 02-capability-record.md  the 16-byte hardware manifest format
│   │   ├── 03-growth-tiers.md       capability-gated language tiers
│   │   ├── 04-grammar.md            Tier 0 EBNF grammar (complete)
│   │   ├── 05-self-organization.md  hardware interrogation architecture
│   │   ├── 06-ir.md                 IR instruction set and encoding
│   │   ├── 07-codegen-v30.md        V30 register model, calling convention
│   │   └── 08-pipeline-status.md    current status, known limitations
│   └── research/
│       └── isa_reference.html       interactive ISA reference (ENIAC to WebAssembly)
├── src/
│   └── tier0/
│       ├── symbol_table.t0          types, names, scopes — the compiler's memory
│       ├── lexer.t0                 tokenizer (73 token kinds, keyword table)
│       ├── parser.t0                recursive descent, single pass, wired to IR
│       ├── ir.t0                    flat IrBlock, 45 IR kinds, bridge functions
│       └── probe_x86.t0             hardware probe — writes capability record
└── targets/
    ├── v30/
    │   ├── codegen_v30.t0           V30/8086 real mode code generator
    │   ├── bios_stubs.asm           extern function implementations (231 bytes)
    │   ├── seed.asm                 trust anchor — runs first on UltraLite (539 bytes)
    │   ├── codegen_v30_example.asm  traced example: detect_ram() full output
    │   └── README.md
    ├── i486/
    │   ├── codegen_i486.t0          486 32-bit protected mode code generator
    │   └── README.md
    └── m68k/
        ├── codegen_m68k.t0          Motorola 68k / Mac OS 8 code generator
        └── README.md

The 14-operation floor

Every compiler pass, every hardware probe, every capability-record write is expressible using only these operations — present on every machine since 1964:

LOAD  STORE  ADD  SUB  AND  OR  XOR  SHL  SHR  MOV  LI  JMP  BEQ/BNE  CMP  CALL  RET

Bootstrapping chain

seed.asm  (hand-written, ~300-400 lines per target)
  probes hardware via BIOS / Toolbox
  writes 16-byte capability record to fixed address
  loads STAGE1.BIN from storage
  jumps to Stage 1
    Stage 1 compiles itself (bit-identical check)
    unlocks next tier based on capability record

Pipeline

source bytes (.t0)
  lexer.t0           token stream
    parser.t0        symbol table + type checking + IR emission
      ir.t0          flat IrBlock (max 255 instructions per function)
        codegen_*.t0   target NASM / Motorola assembly
          assembler    flat binary
            real hardware

Code generator comparison

Feature V30 (16-bit real) i486 (32-bit flat) 68k (Mac OS 8)
Working registers AX BX CX DX SI DI EAX EBX ECX EDX ESI EDI D0-D5, A0-A1
Pointer width 2 bytes 4 bytes 4 bytes
32-bit values DX:AX split Single register Single Dn register
Comparison → bool Short-jump hack (8 bytes) SETCC (3 bytes) Branch workaround
Zero-extension XOR + MOV MOVZX MOVE (unsigned)
Sign-extension CBW trick MOVSX EXT.W / EXT.L
Frame setup PUSH BP / MOV BP,SP / SUB SP,n PUSH EBP / MOV EBP,ESP / SUB ESP,n LINK A6,#-n
Frame teardown MOV SP,BP / POP BP MOV ESP,EBP / POP EBP UNLK A6
Operand order dest, src (Intel) dest, src (Intel) src, dest (Motorola)
XOR mnemonic XOR XOR EOR
Call / Return CALL / RET CALL / RET JSR / RTS

Known limitations

All three code generators share these limitations, documented in 08-pipeline-status.md:

  • Spill reload is stubbed — functions with >6 simultaneously live IR values will produce incorrect output. The probe functions (<=4 live values) are fine.
  • Array indexing emits plain dereference rather than base + i * sizeof(elem).
  • The i486 and 68k seed binaries are not yet written (architecture documented).

Status

Complete

  • Architecture, design notes, ISA research (docs/design/00-08)
  • Capability record format (16 bytes, fixed address 0x0800)
  • Growth tier design (Tier 0-3, capability-gated)
  • Tier 0 grammar (full EBNF, 22 keywords, no inference)
  • symbol_table.t0 — types, scopes, frame tracking (334 lines)
  • lexer.t0 — tokenizer, 73 token kinds (539 lines)
  • parser.t0 — single-pass recursive descent, fully wired to IR (1286 lines)
  • ir.t0 — flat IrBlock, 45 instruction kinds, all bridge functions (596 lines)
  • probe_x86.t0 — hardware probe, writes capability record (136 lines)
  • codegen_v30.t0 — V30/8086 real mode, LRU allocator (1025 lines)
  • codegen_i486.t0 — 486 32-bit flat, SETCC, MOVZX/MOVSX (971 lines)
  • codegen_m68k.t0 — 68k Mac OS 8, LINK/UNLK, dual register files (1093 lines)
  • bios_stubs.asm — all V30 extern functions, verified 231 bytes
  • seed.asm — V30/UltraLite trust anchor, verified 539 bytes DOS .COM
  • Interactive ISA reference HTML (ENIAC to WebAssembly)

Remaining

  • Spill reload in all three code generators
  • seed.asm for i486 target (Versa 486 + A20 + protected mode entry)
  • seed.asm for m68k target (Mac OS 8 Toolbox startup sequence)
  • toolbox_stubs.asm for m68k (Gestalt + FSFindFolder + BlockMove)
  • Bootstrap host (C or Rust wrapper to run compiler on a modern machine)
  • Self-hosting check (compile self, verify bit-identical output)
  • Tier 1-3 source files (impl blocks, generics, closures)

About

An interesting exploration into a base-level assembly of how to build Rust on anything from the 1960's onward.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors