Skip to content

brysontang/ContextWars

Repository files navigation

MLX Token Convergence Experiments

Exploring how language models converge on token sequences — and what adversarial pressure reveals about their training.

Quick Start

uv run --with mlx-lm mlx_example.py -l 20 -m sequential   # Solo: 2 iterations
uv run --with mlx-lm mlx_example.py -l 20 -m adversarial  # Battle: never converges

The Core Finding

Mode Solo Adversarial
Sequential (greedy L→R) 2 iterations Never converges (oscillates)
Single-token + blocking ~2.5n iterations ~3n iterations

A single model has coherent internal preferences and quickly finds a stable state. Two different models have incompatible preferences and fight forever.

Model Fingerprinting

The most interesting finding: adversarial pressure reveals what models are made of.

Solo vs Adversarial Token Insertions

Model Solo Mode Adversarial Mode
Llama 3B Varied: \n, ,, {} braces Collapsed: ! (100%)
Qwen 1.5B Structured: \n\n, # headers Collapsed: spaces, Chinese chars

Key Insights

1. Adversarial pressure collapses diversity

In solo mode, models use context-appropriate tokens. Under adversarial pressure, they fall back to whatever they're most confident about regardless of context.

2. The ! finding

Llama never uses ! in solo mode. But under adversarial pressure, it's the only thing it uses. This isn't Llama's "favorite" token — it's Llama's most defensible token. The one it can justify in the widest range of contexts.

3. Chinese tokens only under pressure

Qwen's Chinese tokens (输入, 错误, 背景, ) don't appear in solo mode. They emerge when Qwen is actively contested — a fallback to vocabulary that Llama can't contest effectively.

4. Training strata revealed

Solo evaluation shows surface behavior. Adversarial pressure reveals composition — the training data that shows through when everything else is stripped away.

Modes

Mode Description
sequential Greedy left-to-right, recompute after each token
single One token per iteration + oscillation blocking
batch Replace all mismatches at once
adversarial Two models, full sequential pass each
adversarial-single Two models, one token each + blocking + fingerprinting

Usage

# Solo modes
python mlx_example.py -l LENGTH -m single|batch|sequential

# Adversarial modes (with fingerprinting)
python mlx_example.py -l LENGTH -m adversarial|adversarial-single \
  --model-a mlx-community/Qwen2.5-1.5B-Instruct-4bit \
  --model-b mlx-community/Llama-3.2-3B-Instruct-4bit

# Options
-l, --length      Number of random tokens (default: 10)
-n, --iterations  Max iterations (default: 100)
-m, --mode        Convergence mode
--model-a         First model
--model-b         Second model (adversarial modes)

How It Works

  1. Start: Generate random token sequence
  2. Each iteration: Find highest-entropy position where model's prediction differs from actual
  3. Replace: Swap actual token with model's argmax prediction
  4. Track: Record what tokens each model tries to insert (fingerprinting)
  5. Block: Positions that flip 3+ times get blocked to prevent infinite oscillation
  6. Repeat: Until convergence or oscillation detected

Example Output

MODEL FINGERPRINTS (tokens each model compulsively inserts)
============================================================

Model A (Qwen) top insertions (44 total):
   27x  ' '
    5x  '-'
    4x  '!'
    1x  '的'
    1x  '们'

Model B (Llama) top insertions (43 total):
   43x  '!'

The "Model Mass Spectrometer"

This technique works like a mass spectrometer for models:

  • Apply adversarial pressure to strip away surface behavior
  • What remains reveals the training composition
  • Different models have different "elemental signatures"

Not what models prefer — what they're made of.

About

Adversarial token convergence experiments on MLX. Pits language models against each other to reveal training strata — solo mode converges in 2 iterations, adversarial mode never does. Model fingerprinting through pressure.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages