Skip to content

feat(core)!: per-column codecs and structured skip index args#111

Merged
KeKs0r merged 4 commits intomainfrom
marc/num-7015-obsessiondb-add-compression-algorithm-by-column-to-chkit
Apr 22, 2026
Merged

feat(core)!: per-column codecs and structured skip index args#111
KeKs0r merged 4 commits intomainfrom
marc/num-7015-obsessiondb-add-compression-algorithm-by-column-to-chkit

Conversation

@KeKs0r
Copy link
Copy Markdown
Contributor

@KeKs0r KeKs0r commented Apr 22, 2026

Summary

Two related breaking DSL enhancements that replace free-form string fields with structured discriminated unions, moving validation from runtime to the type system.

  • Per-column compression codecs — declare codecs directly on a column (codec: { kind: 'ZSTD', level: 3 } or a chain). CREATE TABLE and ALTER emit the CODEC(...) clause in the correct position, chkit generate emits MODIFY COLUMN ... REMOVE CODEC when a codec is dropped, and chkit pull round-trips structured codecs (unknown tokens fall back to codec.raw(...)).
  • Structured skip index argumentsSkipIndexDefinition becomes a discriminated union keyed on type, with typed fields per variant (maxRows for set, falsePositiveRate for bloom_filter, sizeBytes/hashFunctions/randomSeed for tokenbf_v1, plus ngramSize for ngrambf_v1). The old typeArgs string is gone; the index_type_missing_args runtime check is removed because it is now a compile-time concern.

Both changes cascade major bumps to @chkit/core, @chkit/clickhouse, @chkit/plugin-pull, and chkit; migration tables live in the respective changesets.

Test plan

  • bun run typecheck clean
  • bun run test passes (unit)
  • bun run test:env passes (e2e via Doppler — SQL validation and pull round-trip)

🤖 Generated with Claude Code

KeKs0r and others added 2 commits April 22, 2026 09:38
Add a structured `codec` field on `ColumnDefinition` covering general-purpose
codecs (NONE/LZ4/LZ4HC/ZSTD/T64/GCD/ALP), preprocessors (Delta/DoubleDelta/
Gorilla/FPC), and a `codec.raw(...)` escape hatch. Rendering emits the CODEC
clause in CREATE/ALTER; planner detects codec-only removal and emits
`MODIFY COLUMN ... REMOVE CODEC`. ClickHouse introspection reads
`system.columns.compression_codec` and plugin-pull round-trips it back into
structured form. Canonicalization fills ClickHouse defaults so `{kind:'ZSTD'}`
and `{kind:'ZSTD', level:1}` compare equal.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replace free-form typeArgs string with a discriminated union per index
type (minmax, set, bloom_filter, tokenbf_v1, ngrambf_v1). Argument
validation moves from runtime to the type system, and plugin-pull
emits typed fields back into schema files.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a42cb603e1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

export type ColumnCodec = GeneralColumnCodec | PreprocessingColumnCodec | RawColumnCodec

/** Single codec or a chain (preprocessors then exactly one general codec). */
export type ColumnCodecSpec = ColumnCodec | ColumnCodec[]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Forbid empty codec chains in schema types

ColumnCodecSpec currently allows ColumnCodec[], which includes []; that lets callers pass codec: [], and renderCodec then emits CODEC() (invalid SQL) without any validation error. This weakens the “typed codec chain” guarantee and can break generate/migrate at runtime when codec arrays are built programmatically.

Useful? React with 👍 / 👎.

Comment on lines +82 to +84
const level = Number(args[0])
if (!Number.isFinite(level)) return undefined
return { kind: 'ZSTD', level }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fallback to raw when known codecs have extra args

The parser reads only the first argument for ZSTD and ignores any additional ones, so parseCodec('CODEC(ZSTD(3, x))') becomes { kind: 'ZSTD', level: 3 } and re-renders as CODEC(ZSTD(3)), silently dropping information. This breaks pull round-trips for future/extended codec signatures that reuse known names, and should be treated as an unknown shape (raw) instead of truncating.

Useful? React with 👍 / 👎.

KeKs0r and others added 2 commits April 22, 2026 10:35
ClickHouse requires the CODEC clause to follow DEFAULT and COMMENT in
column definitions. The previous order produced invalid SQL whenever a
column declared both a codec and a default, surfaced by the e2e EXPLAIN
AST check "codec + DEFAULT combined".

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Addresses two Codex review findings: empty codec chains (`codec: []`)
emitted invalid `CODEC()` SQL without validation error, and known codec
tokens with extra args (e.g. `ZSTD(3, 1)`) were silently truncated on
`chkit pull` instead of round-tripping through the raw fallback.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@KeKs0r KeKs0r merged commit b0f200d into main Apr 22, 2026
3 of 4 checks passed
@KeKs0r KeKs0r deleted the marc/num-7015-obsessiondb-add-compression-algorithm-by-column-to-chkit branch April 22, 2026 09:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant