Area
- Runtime / Core crates (stdlib/core/derive)
- Other
Problem statement
RFC 101 OrdinalMap intentionally keeps the default in-memory lookup layout hot and uncompressed. That is the right default for query speed, but future InQL/InQL-db persisted index artifacts may store very large OrdinalMap-style indices where disk, network, cache, or resident-memory pressure matters more than immediate random-access speed.
During the RFC 101 compression spike, the current field_00000000 benchmark corpus compressed extremely well for persisted bytes:
- 1M key records: raw
14.0 B/key, zstd-3 0.392 B/key, lzma-3 0.110 B/key.
- 1M key section including offsets: raw
18.0 B/key, zstd-3 3.336 B/key, lzma-3 0.220 B/key.
- Modeled compact storage: raw
30.389 B/key, zstd-3 9.831 B/key, lzma-3 3.888 B/key.
The same spike showed why this should not blindly replace the hot lookup layout: block-compressed key records introduced no-cache random exact-verification costs around ~6.1 us/probe for 1024 records/block and ~22.7 us/probe for 4096 records/block, which is far slower than the current lookup path.
Proposed solution
Add a future storage policy for persisted OrdinalMap / InQL index artifacts that can compress serialized sections while keeping the default query path decompressed in memory.
A likely shape:
hot load policy: read compressed persisted artifact, decompress into the normal OrdinalMap layout, and preserve current lookup performance.
compact load policy: keep selected verification/key sections compressed or block-compressed for memory-constrained workloads, with explicit slower exact lookup semantics.
- Persisted artifact metadata should record codec, compression level/profile, uncompressed section lengths, and the intended load policy.
This should build on the existing std.compression surface where possible rather than adding bespoke compression plumbing directly to OrdinalMap.
Alternatives considered
- Do nothing: acceptable for RFC 101, but loses a likely storage win for future InQL-db index artifacts.
- Compress the default in-memory layout: rejected for now because random exact lookup becomes too slow unless the data is decompressed back into the hot layout.
- Only compress the full serialized blob externally: simple and probably enough for a first persisted-artifact path, but it does not cover future memory-constrained
compact operation.
Scope / acceptance criteria
-
In scope:
- Define persisted-index compression metadata and codec policy for
OrdinalMap-style artifacts.
- Benchmark compression ratio, load/decompress time, and lookup cost across friendly and less-friendly key corpora.
- Support a default
hot load path that decompresses into the existing fast in-memory layout.
- Consider an explicit
compact mode only if its slower lookup semantics are documented and benchmarked.
-
Out of scope:
- Changing RFC 101's default
OrdinalMap lookup layout.
- Making compressed lookup the default runtime behavior.
- Regressing current exact/unchecked lookup benchmark targets.
-
Done when:
- InQL/InQL-db can persist an
OrdinalMap-style index with optional compression and load it back deterministically.
- Benchmarks show the storage win and quantify the load/lookup tradeoff.
- Documentation explains when to use hot vs compact loading.
Related context: RFC 101 / PR #597.
Area
Problem statement
RFC 101
OrdinalMapintentionally keeps the default in-memory lookup layout hot and uncompressed. That is the right default for query speed, but future InQL/InQL-db persisted index artifacts may store very largeOrdinalMap-style indices where disk, network, cache, or resident-memory pressure matters more than immediate random-access speed.During the RFC 101 compression spike, the current
field_00000000benchmark corpus compressed extremely well for persisted bytes:14.0 B/key, zstd-30.392 B/key, lzma-30.110 B/key.18.0 B/key, zstd-33.336 B/key, lzma-30.220 B/key.30.389 B/key, zstd-39.831 B/key, lzma-33.888 B/key.The same spike showed why this should not blindly replace the hot lookup layout: block-compressed key records introduced no-cache random exact-verification costs around
~6.1 us/probefor 1024 records/block and~22.7 us/probefor 4096 records/block, which is far slower than the current lookup path.Proposed solution
Add a future storage policy for persisted
OrdinalMap/ InQL index artifacts that can compress serialized sections while keeping the default query path decompressed in memory.A likely shape:
hotload policy: read compressed persisted artifact, decompress into the normalOrdinalMaplayout, and preserve current lookup performance.compactload policy: keep selected verification/key sections compressed or block-compressed for memory-constrained workloads, with explicit slower exact lookup semantics.This should build on the existing
std.compressionsurface where possible rather than adding bespoke compression plumbing directly toOrdinalMap.Alternatives considered
compactoperation.Scope / acceptance criteria
In scope:
OrdinalMap-style artifacts.hotload path that decompresses into the existing fast in-memory layout.compactmode only if its slower lookup semantics are documented and benchmarked.Out of scope:
OrdinalMaplookup layout.Done when:
OrdinalMap-style index with optional compression and load it back deterministically.Related context: RFC 101 / PR #597.