[codex] replace backfill chunking with smart planner by KeKs0r · Pull Request #107 · obsessiondb/chkit

KeKs0r · 2026-04-01T22:13:23Z

Summary

replace the old backfill partition/range chunking path with the new smart recursive chunk planner
expose planner/runtime internals from @chkit/plugin-backfill/sdk instead of the package root
add fixture-driven integration coverage for each chunking strategy and mixed multi-dimension scenarios
fix review findings around degenerate equal-width slices, sort-key expression parsing, temporal slice bounds, and package export validation

Why

The previous backfill planner only handled coarse partition splitting with a single sort-key range fallback. The new smart planner supports string-prefix, temporal, quantile, and equal-width strategies, including secondary-dimension handoff for hot keys, which matches the new prototype behavior and gives much better confidence in expected chunk layouts.

Impact

Backfill planning now produces smarter chunk boundaries for skewed data and multi-column sort keys.

Public API note: planner/runtime internals are no longer exported from @chkit/plugin-backfill. They now live under @chkit/plugin-backfill/sdk.

Root Cause

The older planner could only split by partition and a single coarse sort-key range, which broke down on hot keys and skewed distributions. During review, a few edge cases also surfaced in the first port: collapsed equal-width boundaries, brittle sorting_key parsing for expression-based sort keys, and temporal refinement extending past parent slice bounds.

Validation

bun test /Users/marc/Workspace/chkit/packages/plugin-backfill/src
bun run --cwd /Users/marc/Workspace/chkit/packages/plugin-backfill typecheck
bun verify

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fbd3a19868

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-14T10:36:44Z

+  if (sortKey.category === 'string') {
+    return `unhex('${Buffer.from(value, 'latin1').toString('hex')}')`
+  }


Encode string SQL bounds with UTF-8 bytes

The string branch in formatBound builds predicates from Buffer.from(value, 'latin1'), which changes the byte sequence for UTF-8 text values. For example, a bound like é is emitted as unhex('e9') even though ClickHouse String stores UTF-8 bytes (c3a9), so range filters can miss rows or create overlapping chunks whenever sort-key values contain non-ASCII characters. This directly affects both planning probes and execution SQL that rely on these bounds.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-14T10:36:44Z

+  if (sortKey.boundaryEncoding === 'hex-latin1') {
+    return Buffer.from(value, 'latin1').toString('hex')
+  }


Preserve UTF-8 when persisting string chunk bounds

encodeBoundary serializes string boundaries with Buffer.from(value, 'latin1'), which is lossy for non-Latin-1 data (e.g., emoji and many Unicode code points). Because plans are written with this encoding and later decoded for run/resume, persisted chunk ranges can mutate after round-trip and cause resumed backfills to read the wrong key intervals.

Useful? React with 👍 / 👎.

KeKs0r · 2026-04-14T11:23:39Z

Superseded by #108, which is a strict superset containing the critical correctness fixes (uncompressed-bytes sizing, parallel-replicas count inflation, string boundary truncation) plus E2E coverage.

KeKs0r added 5 commits April 2, 2026 00:13

replace backfill chunking with smart planner

d2eb78f

fix smart chunking review issues

6d224b7

Update Algo

ab1239e

Fix CI

3f99820

Export backfill SDK helpers and obsessiondb service types

fbd3a19

KeKs0r marked this pull request as ready for review April 14, 2026 10:29

chatgpt-codex-connector Bot reviewed Apr 14, 2026

View reviewed changes

KeKs0r closed this Apr 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codex] replace backfill chunking with smart planner#107

[codex] replace backfill chunking with smart planner#107
KeKs0r wants to merge 5 commits intomainfrom
new-smart-chunking

KeKs0r commented Apr 1, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 14, 2026

Uh oh!

chatgpt-codex-connector Bot Apr 14, 2026

Uh oh!

KeKs0r commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

KeKs0r commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Impact

Root Cause

Validation

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

KeKs0r commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

KeKs0r commented Apr 1, 2026 •

edited

Loading