Skip to content

feature - compressed persisted OrdinalMap artifacts for InQL indices #598

@dannymeijer

Description

@dannymeijer

Area

  • Runtime / Core crates (stdlib/core/derive)
  • Other

Problem statement

RFC 101 OrdinalMap intentionally keeps the default in-memory lookup layout hot and uncompressed. That is the right default for query speed, but future InQL/InQL-db persisted index artifacts may store very large OrdinalMap-style indices where disk, network, cache, or resident-memory pressure matters more than immediate random-access speed.

During the RFC 101 compression spike, the current field_00000000 benchmark corpus compressed extremely well for persisted bytes:

  • 1M key records: raw 14.0 B/key, zstd-3 0.392 B/key, lzma-3 0.110 B/key.
  • 1M key section including offsets: raw 18.0 B/key, zstd-3 3.336 B/key, lzma-3 0.220 B/key.
  • Modeled compact storage: raw 30.389 B/key, zstd-3 9.831 B/key, lzma-3 3.888 B/key.

The same spike showed why this should not blindly replace the hot lookup layout: block-compressed key records introduced no-cache random exact-verification costs around ~6.1 us/probe for 1024 records/block and ~22.7 us/probe for 4096 records/block, which is far slower than the current lookup path.

Proposed solution

Add a future storage policy for persisted OrdinalMap / InQL index artifacts that can compress serialized sections while keeping the default query path decompressed in memory.

A likely shape:

  • hot load policy: read compressed persisted artifact, decompress into the normal OrdinalMap layout, and preserve current lookup performance.
  • compact load policy: keep selected verification/key sections compressed or block-compressed for memory-constrained workloads, with explicit slower exact lookup semantics.
  • Persisted artifact metadata should record codec, compression level/profile, uncompressed section lengths, and the intended load policy.

This should build on the existing std.compression surface where possible rather than adding bespoke compression plumbing directly to OrdinalMap.

Alternatives considered

  • Do nothing: acceptable for RFC 101, but loses a likely storage win for future InQL-db index artifacts.
  • Compress the default in-memory layout: rejected for now because random exact lookup becomes too slow unless the data is decompressed back into the hot layout.
  • Only compress the full serialized blob externally: simple and probably enough for a first persisted-artifact path, but it does not cover future memory-constrained compact operation.

Scope / acceptance criteria

  • In scope:

    • Define persisted-index compression metadata and codec policy for OrdinalMap-style artifacts.
    • Benchmark compression ratio, load/decompress time, and lookup cost across friendly and less-friendly key corpora.
    • Support a default hot load path that decompresses into the existing fast in-memory layout.
    • Consider an explicit compact mode only if its slower lookup semantics are documented and benchmarked.
  • Out of scope:

    • Changing RFC 101's default OrdinalMap lookup layout.
    • Making compressed lookup the default runtime behavior.
    • Regressing current exact/unchecked lookup benchmark targets.
  • Done when:

    • InQL/InQL-db can persist an OrdinalMap-style index with optional compression and load it back deterministically.
    • Benchmarks show the storage win and quantify the load/lookup tradeoff.
    • Documentation explains when to use hot vs compact loading.

Related context: RFC 101 / PR #597.

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureNew feature or requestruntime / core cratesSuggestions, features, or bugs related to the `incan-core`, `incan-stdlib`, 'incan-derive` crates
    No fields configured for Feature.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions