Reed-Solomon codes are the "magic" that allows damaged barcodes to still be readable. Named after Irving Reed and Gustave Solomon who invented them in 1960, they're used everywhere: CDs, DVDs, QR codes, satellite communications, and of course, Aztec codes.
Imagine you have 10 numbers to send, but you're worried some might get corrupted. Reed-Solomon lets you add extra "check" numbers that can reconstruct the originals if some get damaged.
Original: [5, 12, 8, 3, 7, 22, 15, 9, 4, 11]
↓ Add 4 parity numbers
Protected: [5, 12, 8, 3, 7, 22, 15, 9, 4, 11, P1, P2, P3, P4]
If up to 2 numbers get corrupted (including their positions being unknown), all 10 originals can be recovered!
Remember from algebra: a straight line (degree-1 polynomial) is defined by 2 points.
y = mx + b
Two points uniquely determine m and b
A parabola (degree-2 polynomial) needs 3 points:
y = ax² + bx + c
Three points uniquely determine a, b, and c
In general: a polynomial of degree n-1 is uniquely determined by n points.
Here's the key insight:
- Treat your data as coefficients of a polynomial
- Evaluate that polynomial at extra points
- Send both the original data AND those evaluations
- If some values get corrupted, you have redundant information to recover
Example with 3 data values:
Data: [5, 3, 7]
Think of this as the polynomial: D(x) = 5 + 3x + 7x²
Evaluate at extra points:
D(1) = 5 + 3 + 7 = 15
D(2) = 5 + 6 + 28 = 39
Send: [5, 3, 7, 15, 39]
Now if one value gets corrupted, you have 4 good values to reconstruct a degree-2 polynomial—more than enough!
Aztec uses "systematic" Reed-Solomon, meaning:
- The original data appears unchanged at the start
- Parity symbols are appended at the end
[Data1, Data2, ..., DataN, Parity1, Parity2, ..., ParityM]
This is convenient because if no errors occur, you can just read the data directly without decoding.
To calculate parity, we use a special "generator polynomial" G(x).
The generator polynomial has roots at consecutive powers of α:
G(x) = (x - α^1)(x - α^2)(x - α^3)...(x - α^m)
where m is the number of parity symbols.
For 4 parity symbols:
G(x) = (x - α)(x - α²)(x - α³)(x - α⁴)
When expanded, this gives polynomial coefficients we can use for encoding.
The roots are chosen so that the complete codeword (data + parity) is divisible by G(x). This means evaluating the codeword polynomial at α^1, α^2, etc., gives zero.
Any corruption changes these evaluations to non-zero values, which is how errors are detected!
Instead of polynomial division, we use an efficient method called a Linear Feedback Shift Register (LFSR).
Think of it like a pipeline:
Data flows in → [Reg0][Reg1][Reg2][Reg3] → Feedback
↑ ↑ ↑ ↑
tap0 tap1 tap2 tap3 (from G(x) coefficients)
Algorithm:
Initialize registers to 0
For each data codeword d:
feedback = d XOR last_register
Shift all registers right
For each register position:
register[i] XOR= feedback × G[i+1]
After all data: registers contain parity symbols
Let's encode 3 data symbols with 2 parity symbols.
Setup:
- Field: GF(8) with primitive polynomial x³ + x + 1
- Data: [5, 3, 7]
- Parity count: 2
Step 1: Build generator polynomial
G(x) = (x - α)(x - α²)
= x² - (α + α²)x + α³
= x² + 6x + 3 (in GF(8), subtraction = addition)
Coefficients: [3, 6, 1] (constant term first)
Step 2: LFSR encoding
Registers: [R0, R1] = [0, 0]
G taps: G[1]=6, G[2]=1
Process data[0] = 5:
feedback = 5 XOR R1 = 5 XOR 0 = 5
R1 = R0 XOR (5 × G[2]) = 0 XOR 5 = 5
R0 = 5 × G[1] = 5 × 6 = ... (Galois mult)
... (continue for each data symbol)
Final registers = parity symbols
With m parity symbols, Reed-Solomon can:
- Detect up to m errors
- Correct up to ⌊m/2⌋ errors (when positions are unknown)
- Correct up to m erasures (when positions are known)
Why the difference?
Finding an error requires determining both:
- Which position is wrong
- What the correct value should be
Each unknown uses up some of our redundancy. With unknown positions, each error "costs" 2 parity symbols.
In AztecLib, Reed-Solomon encoding follows this pattern:
// Create the Galois field
let gf = GaloisField(wordSizeInBits: 6, primitivePolynomial: 0x43)
// Create encoder with generator starting at α^1
let encoder = ReedSolomonEncoder(field: gf, startExponent: 1)
// Compute parity for data
let parity = encoder.makeParityCodewords(
for: dataCodewords,
parityCodewordCount: 4
)
// Or get data + parity combined
let allCodewords = encoder.appendingParity(
to: dataCodewords,
parityCodewordCount: 4
)When decoding, a scanner computes "syndromes" by evaluating the received codeword at α^1, α^2, etc.
No errors: All syndromes are 0 (because correct codeword is divisible by G(x)) Errors present: Some syndromes are non-zero
The pattern of non-zero syndromes reveals:
- How many errors occurred
- Where they are
- What the correct values should be
Error correction solves a system of equations:
S₁ = e₁·α^j₁ + e₂·α^j₂ + ...
S₂ = e₁·α^(2j₁) + e₂·α^(2j₂) + ...
...
Where:
- Sᵢ are the syndromes (known)
- eᵢ are error values (unknown)
- jᵢ are error positions (unknown)
Galois field math lets us solve this system efficiently using algorithms like Berlekamp-Massey or Euclidean algorithm.
The mode message (symbol metadata) uses its own small Reed-Solomon code over GF(16):
Compact symbols:
- 2 data nibbles (4 bits each)
- 5 parity nibbles
- Total: 28 bits
Full symbols:
- 4 data nibbles
- 6 parity nibbles
- Total: 40 bits
This heavy redundancy ensures the scanner can read the symbol structure even if the center is damaged.
| EC Level | Parity Ratio | Can Correct | Use Case |
|---|---|---|---|
| 5% | 1:20 | ~2.5% errors | Clean environment |
| 23% | ~1:4 | ~11% errors | Default, general use |
| 50% | 1:2 | ~25% errors | Harsh conditions |
| 95% | ~19:1 | ~47% errors | Maximum protection |
Higher error correction = larger symbols for same data.
- Reed-Solomon adds redundant parity symbols to detect and correct errors
- Generator polynomial G(x) defines how parity is calculated
- LFSR efficiently computes parity using shift register technique
- m parity symbols can correct up to m/2 errors or m erasures
- Galois field math makes all the calculations work within fixed bit widths
See Symbol Layout to understand how the protected codewords are arranged in the final barcode!