LLM From Scratch

Run the smallest llama2.c model (stories260K) inside Scratch/TurboWarp by compiling C inference code to Scratch blocks with llvm2scratch.

If everything is working, the sprite will start generating the familiar opening: Once upon a time, ... (streamed into the speech bubble token-by-token).

Live Demo

Scratch project: https://scratch.mit.edu/projects/1277883263

Credits (Upstream)

This repo vendors two upstream projects in-tree for reproducibility:

llama2.c by Andrej Karpathy (MIT). Source: llama2.c/ and llama2.c/LICENSE.
llvm2scratch by Classfied3D (MIT). Source: llvm2scratch/ and llvm2scratch/LICENSE.

The model/tokenizer artifacts in artifacts/ come from the llama2.c ecosystem.

How It Works

High-level pipeline:

scratch_llama2/build_stories260k_sprite3.py reads:
- artifacts/stories260K.bin (the smallest llama2.c checkpoint)
- artifacts/tok512.bin (tokenizer vocabulary)
It quantizes the weight matrices to Q8_0 (group size 4) and packs 4 signed int8 values into one u32.
It lays out everything into a single Scratch list !stack:
- packed weights + per-group scales
- RMSNorm weights
- RoPE cos/sin tables (for a reduced SEQ_LEN)
- runtime buffers (x/xb/hb/q/att + KV cache)
It writes scratch_llama2/generated_layout.h with 1-indexed addresses into !stack.
It compiles scratch_llama2/llama2_scratch.c to LLVM IR (scratch_llama2/llama2_scratch.ll) using:
- clang --target=i386-none-elf (keeps pointers as 32-bit ints)
It runs llvm2scratch to turn LLVM IR into Scratch blocks, then exports .sprite3 and .sb3 outputs.

Runtime UI:

!!output (list) stores generated token IDs.
!!vocab (list) stores token pieces (strings).
!!text (variable) accumulates decoded text; the sprite says it continuously.
!!resets (variable) increments when the compiler triggers a broadcast-based “stack reset” (progress indicator + avoids JS call stack blowups).
!!status (variable) shows a high-level state machine (Edit params... -> Running... -> Done.).
ui_* variables let you adjust sampling/generation settings from TurboWarp/Scratch UI.

Build

Requires:

clang
uv (and Python >= 3.12; llvm2scratch requires it)

Command:

# If you don't have a usable Python yet:
# uv python install 3.12
#
# Optional: tune stack reset frequency for TurboWarp stability/perf.
# Lower = more stable (less likely to hit "Maximum call stack size exceeded"), but slower.
# Higher = faster, but can crash in TurboWarp.
# MAX_BRANCH_RECURSION=200 is the default.
MAX_BRANCH_RECURSION=200 \\
# Optional: number of tokens to generate (upper bound). Defaults to 20.
# (Must be <= SEQ_LEN, currently 32.)
GEN_STEPS=20 \\
# llvm2scratch requires Python >= 3.12; pin via `--python` to avoid uv picking an older system Python.
uv run --python 3.12 --no-project --with-editable ./llvm2scratch python scratch_llama2/build_stories260k_sprite3.py

Outputs:

scratch_llama2/stories260k_inference.sprite3: sprite, blocks hidden (fast editor/import)
scratch_llama2/stories260k_inference_visible.sprite3: sprite, blocks visible (debug)
scratch_llama2/stories260k_inference_visible.sb3: standalone project wrapper around the visible sprite
scratch_llama2/stories260k_inference_visible_scratch.sprite3: Scratch-compatible sprite (no TurboWarp-only blocks)
scratch_llama2/stories260k_inference_visible_scratch.sb3: Scratch-compatible standalone project

Run (TurboWarp)

Sprite workflow:

Import scratch_llama2/stories260k_inference_visible.sprite3 into TurboWarp (File -> Upload sprite or drag/drop).
Select the sprite.
Click the green flag.
Edit ui_* variables (Variables panel).
Press space (or click the sprite) to start.

Project workflow:

Open scratch_llama2/stories260k_inference_visible.sb3 in TurboWarp (File -> Load from your computer).
Click the green flag.
Use the sliders/monitors on the stage to edit params.
Press space (or click the sprite) to start.

What you should see:

!!status updates: Edit params... -> Running... -> Done.
!!resets increments periodically (a "still alive" indicator during long runs).
As tokens are generated, the sprite streams decoded text into its speech bubble (!!text).
For debugging, generated token IDs are appended to the !!output list.

Sampling UI:

ui_steps: max tokens to generate (<= 32).
ui_temperature: 0 => greedy; >0 => sampling.
ui_top_k: 1 => greedy; >1 => top-k sampling.
ui_top_p: nucleus cutoff in (0, 1] (use 1 to disable).
ui_seed: nonzero => deterministic; 0 => pick a random seed at start.
ui_prompt_preset: 0 => start from BOS; 1 => force the token prefix Once upon a time, (demo).

Run (Scratch)

Use the *_scratch.* outputs:

scratch_llama2/stories260k_inference_visible_scratch.sb3 (recommended)

Scratch is significantly slower than TurboWarp, and does not support TurboWarp-only “hacked counter” blocks.

Notes

scratch_llama2/llama2_scratch.c is inference-only and uses a reduced SEQ_LEN for Scratch feasibility.
llvm2scratch is vendored here and patched to support pre-seeding !stack and a few extra IR patterns.
Official Scratch does not support TurboWarp's hacked counter opcodes. Use the *_scratch.* outputs for scratch.mit.edu.

Notable `llvm2scratch` Patches (For This Project)

These are the key changes that made llama2_scratch.c viable:

Preseeded memory: skip generating huge “initializer” scripts by directly injecting !stack at export time.
i8 pointer arithmetic fix: clang emits getelementptr i8 using byte offsets (4/8/12/...), but our “memory” is list-indexed; we scale i8 GEP indices back into 32-bit cells (i8_gep_div=4).
Stack reset progress: optional !!resets counter to confirm the VM is still working during long runs (we keep the speech bubble for generated text).
Token streaming: SB3_emit_token_dbl logs token IDs to !!output, decodes through !!vocab, appends into !!text, and continuously updates the sprite speech bubble.
Added intrinsic support: clang can emit llvm.umin/umax/smin/smax; llvm2scratch now translates these so -O2 IR compiles.

Citation

@misc{andrews2026llm_from_scratch,
  author       = {Andrews, David},
  title        = {llm\_from\_scratch},
  year         = {2026},
  howpublished = {\\url{https://github.com/broyojo/llm_from_scratch}}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
artifacts		artifacts
docs		docs
llama2.c		llama2.c
llvm2scratch		llvm2scratch
patches		patches
scratch_llama2		scratch_llama2
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM From Scratch

Live Demo

Credits (Upstream)

How It Works

Build

Run (TurboWarp)

Run (Scratch)

Notes

Notable `llvm2scratch` Patches (For This Project)

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM From Scratch

Live Demo

Credits (Upstream)

How It Works

Build

Run (TurboWarp)

Run (Scratch)

Notes

Notable llvm2scratch Patches (For This Project)

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Notable `llvm2scratch` Patches (For This Project)

Packages