[ENG-38901] Support Hudi table version 9 timeline layout#185
[ENG-38901] Support Hudi table version 9 timeline layout#185tiennguyen-onehouse wants to merge 6 commits intomainfrom
Conversation
e5e47cc to
81b2a73
Compare
|
/push-image ENG-38901-hudi-v9-test |
92acc5d to
cfc021a
Compare
Add support for Hudi timeline layout version 2 (used by table version 9)
while maintaining backward compatibility with layout version 1.
Changes:
- Read hoodie.timeline.layout.version and hoodie.table.version from
hoodie.properties and propagate through Table model
- Update active/archived timeline path construction for V2 layout
(.hoodie/timeline/ and .hoodie/timeline/history/)
- Update regex patterns to match V9 completed instant filenames
({timestamp}_{completionTimestamp}.action)
- Fix timestamp extraction and instant grouping for V9 format
- Add V9 archived file parsing (parquet, manifest, _version_)
- Add "clustering" and "logcompaction" action types
- Update path-stripping helpers for V2 layout awareness
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
cfc021a to
6665a15
Compare
|
/push-image ENG-38901 |
tiennguyen-onehouse
left a comment
There was a problem hiding this comment.
/push-image ENG-38901
…pertiesReaderTest Address SonarCloud issue: use Java text block instead of string concatenation for test properties content. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Text blocks require Java 15+. The CI build compiles with Java 11, so reverting to string concatenation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix V2 archived parquet comparison to use full filename when numeric keys match, preventing files with same timestamp but different sequence numbers from being skipped on restart - Add error handling for readHoodieProperties in existing-table path to avoid silently falling back to V1 defaults on failure - Cache ParsedHudiProperties to avoid re-reading on every sync cycle - Extract TIMELINE_LAYOUT_VERSION_V2 constant replacing magic number 2 - Rename V2_PARQUET_NUMERIC_PATTERN to V2_ARCHIVED_PARQUET_TIMESTAMP_PATTERN - Use consistent char separator style in path construction - Remove redundant if/else branch in stubUploadInstantsCallsV2 test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
tiennguyen-onehouse
left a comment
There was a problem hiding this comment.
/push-image ENG-38884-v9-it
| public static final Pattern ARCHIVED_COMMIT_INSTANT_PATTERN = | ||
| Pattern.compile("\\.commits_\\.archive\\.\\d+_\\d+-\\d+-\\d+"); | ||
| public static final Pattern ARCHIVED_COMMIT_INSTANT_PATTERN_V2 = | ||
| Pattern.compile("\\d+_\\d+_\\d+\\.parquet|manifest_\\d+|" + VERSION_MARKER_FILE); |
There was a problem hiding this comment.
this is too broad. Can we add ^ and $ at begin and end for each pattern ?
There was a problem hiding this comment.
These patterns are used exclusively with .matches(), which implicitly anchors the match to the entire string (equivalent to ^pattern$). Adding ^ and $ would be redundant. The V2-specific extraction patterns (V2_ARCHIVED_PARQUET_TIMESTAMP_PATTERN and V2_MANIFEST_NUMERIC_PATTERN) already have anchors since they're used with .find().
| private final LakeViewExtractorMetrics hudiMetadataExtractorMetrics; | ||
| private final ExecutorService executorService; | ||
| private final ObjectMapper mapper; | ||
| private final Map<String, ParsedHudiProperties> propertiesCache = new ConcurrentHashMap<>(); |
There was a problem hiding this comment.
should this be cached forever or add ttl so hudi properties are read periodically ? Ex : If a table is upgraded from v6 to v9 while LakeView is running, the stale cached version will be used indefinitely.
There was a problem hiding this comment.
The cache stores a few lightweight entries (one per table). Table versions do not change at runtime — upgrades require a full Hudi migration that would also restart LakeView. Added a comment explaining this.
There was a problem hiding this comment.
Since this map is of all hudi properties but is currently used only to read table version and timeline layout version, it is okay. Suggest updating the comment to indicate that other table properties may not need be migration dependent and so could be stale..
| return new BigDecimal(activeTimeLineInstant.split("\\.")[0]); | ||
| String timestampPart = activeTimeLineInstant.split("\\.")[0]; | ||
| if (timestampPart.contains("_")) { | ||
| timestampPart = timestampPart.split("_")[0]; |
There was a problem hiding this comment.
Suggest extracting timestamp into its own utility method. Its repeated 3 times in this file
There was a problem hiding this comment.
The timestamp extraction in getCommitIdFromActiveTimelineInstant is already its own utility method. The split(".", 3) in isSavepointCommit and isRollbackCommit extracts the action type (not the timestamp), but those two methods do duplicate each other. Extracted a shared hasActionType(File, String) helper to consolidate them.
| Arrays.asList( | ||
| generateFileObj("should_be_ignored", false), | ||
| generateFileObj("20260130205837315_20260201000250371.commit", false), | ||
| generateFileObj("20260130205837315_20260201000250371.inflight", false), |
There was a problem hiding this comment.
why does inflight and requested have completion time ?
There was a problem hiding this comment.
Good catch. In Hudi V9 (timeline layout V2), pending instants (inflight/requested) only have the requested timestamp — the completion time doesn't exist yet. Fixed the test filenames to use single-timestamp format for pending instants (e.g., 20260130205837315.commit.inflight, 20260130205837315.commit.requested).
- Add comment explaining why propertiesCache has no TTL - Extract hasActionType() helper to consolidate isSavepointCommit/isRollbackCommit - Fix V9 test filenames: pending instants use single-timestamp format Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
sorry, I am not fully done yet. somehow clicked on submit. Please hold on to get all my feedback |
nsivabalan
left a comment
There was a problem hiding this comment.
Entire design of archived timeline v2 need to be revisited.
For v1, files are named monotonically.
.commits_.archive.1
.commits_.archive.2
.commits_.archive.3
.
and so, the checkpointing based on file name would suffice.
but in case of Archived timeline V2, its lsm timeline. we could have parquet files from different levels in the history directory.
├── _version_ # Single file: contains latest manifest version number (e.g., "12")
├── manifest_1 # Manifest snapshot v1 (JSON listing valid parquet files)
├── manifest_2 # Manifest snapshot v2
├── ...
├── manifest_12 # Latest manifest snapshot
├── 20240101120000000_20240101130000000_0.parquet # L0 data files
├── 20240101130000001_20240101140000000_0.parquet # L0
├── 20240101140000001_20240101150000000_0.parquet # L0
├── ...
├── 20240101120000000_20240101200000000_1.parquet # L1 (compacted from L0 files)
└── ...
w/o reading the manifest file, we can never know which files to read. And even the checkpointing has to be thought through.
better checkpointing is based on commit time ranges.
lets say below layout
each L0 contains 10 commit intants and there are 6 parquet files.
L1 contains 100 instants in each and contains 10 files.
L2 contains 1000 instants in each and contains 5.
So, in total we have
5000 + 1000 + 60 = 6600 commit intants.
say we wanted to do 2500 per batch.
in round 1: we consume from L2_F1_startTime -> mid(L2_F1_start, L2_F1_end)
in round2: we consume from mid(L2_F1_start, L2_F1_end) -> L2_F1_end
in round3: we consume from L1_F1 -> L1_F10 and all files in L0 as well.
|
and btw, archived dir location is diff for V2 |
|
let me share a quick write up on LSM timeline and you can come up w/ a design for it. LSM Timeline in Hudi 1.x Where it lives All LSM timeline files live under: The LSM Tree Layout The archived timeline uses an LSM-tree structure with parquet data files, manifest files, and a version file: .hoodie/archived/ File Naming Convention Data files follow this pattern: ┌─────────────┬───────────────────────────────────────────────────────────────────────────────────────────┐ Parsed via regex: ^(\d+)(\d+)(\d).parquet in LSMTimeline.java:117 Manifest files: manifest_N where N is a monotonically increasing version number (starts at 1). Version file: version — single file containing the integer of the latest manifest version. How New Files Are Created (Archival) The flow is driven by TimelineArchiverV2.archiveIfRequired():
Compaction Uses a universal compaction strategy (LSMTimelineWriter.compactAndClean()): Trigger: When >= N files (default 10, configured by hoodie.timeline.compaction.batch.size) accumulate at a given layer. Process:
Example: After compaction: With 10 instants per L0 file, each L1 file covers ~100 instants, each L2 file ~1000 instants, etc. The benchmark shows reading 1000 instants costs ~10ms. Cleaning Triggered after compaction. Retains the latest N manifest snapshot versions (default 3 + number of compacted layers). Older manifests and any data files Reader Workflow (Snapshot Isolation)
This gives snapshot isolation — the writer creates new manifests atomically, while readers always see a consistent snapshot via the manifest they read. |
tiennguyen-onehouse
left a comment
There was a problem hiding this comment.
/push-image 1.1-Apr14
Rewrites the V2 archived path so LakeView reads _version_ and manifest_N and mirrors files by diffing successive manifest versions, making the path compaction-safe. Checkpoints advance by manifest version instead of filename, so subsequent syncs correctly pick up new archivals after Hudi's LSM compaction rewrites the file set. Also addresses round 3 review feedback from nsivabalan: - Extract hoodie table/timeline version defaults to constants - Dedupe concurrent hoodie.properties reads via computeIfAbsent - Skip table instead of silently falling back to v6/v1 on read failure - Add optional completionTime field to ActiveTimelineInstant for v9 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
tiennguyen-onehouse
left a comment
There was a problem hiding this comment.
Issue addressed:
- Filename ordering is meaningless in LSM. Compaction rewrites [A, B, C] at L0 into a single L1 file X whose minInstant may be older than existing files. Sorting by leading timestamp and picking "what comes after the checkpoint" gives the wrong set.
- The version tiebreaker was a dead end. With version as lastUploadedFile, every subsequent file appeared "already uploaded." Combined with archivedCommitsProcessed=true, V2 archived became a one-shot bootstrap and never resynced — any instant Hudi archived later was invisible to the backend forever.
- Race condition in-between compaction in the archived timeline. We need to checkpoint using latest manifest version and upload only the diff between the last checkpoint.
Solution:
Manifest-driven, checkpointed by manifest version. New LSMTimelineManifestReader reads version and the tiny JSON manifest_N from source.
Each sync cycle:
- Read version → currentVersion. If equal to checkpoint.lastArchivedManifestVersion, fast path: nothing to do.
- Read manifest_currentVersion and manifest_previousVersion. Diff the parquet lists.
- Upload in strict order: parquets → manifest_currentVersion → version. The manifest arrives after every parquet it references, and version
is the very last write. - One checkpoint upsert at the end, advancing lastArchivedManifestVersion.
Added lastArchivedManifestVersion (int, default 0) to Checkpoint. Jackson deserializes old checkpoints with the field absent using the default —
no migration. V1 path is untouched.
Notes:
The key observation: gateway-controller PR #8797 already switched the backend to TimelineFactory.createArchivedTimeline(...), which reads via
ArchivedTimelineV2 (consults version → manifest_N → parquets). So LakeView doesn't need to understand LSM semantics — it just needs to mirror
files such that the backend can read a consistent snapshot.
A manifest_N file is an immutable snapshot that says "to read the archived timeline at version N, read exactly these parquet files." So the
correctness question reduces to: after a successful sync, is every file in manifest_N on the backend, and does version point to N? That's all
the checkpoint needs to guarantee.
Changes in this commitReview comments addressed
V2 archived timeline redesign
TestsReplaced the old
Also updated All 249 lakeview tests pass except |



Summary
hoodie.timeline.layout.versionandhoodie.table.versionfromhoodie.propertiesand propagate version info through the pipelineclustering,logcompaction) used by Hudi 1.0+Changes
tableVersionandtimelineLayoutVersionfromhoodie.propertiesintoParsedHudiPropertiesandTablemodelsgetPathSuffixForTimeline()— V2 uses.hoodie/timeline/(active) and.hoodie/timeline/history/(archived)ACTIVE_COMMIT_INSTANT_PATTERNto match V9 completed instants ({ts}_{completionTs}.action)ARCHIVED_COMMIT_INSTANT_PATTERN_V2for parquet files, manifest files, and_version_getCommitIdFromActiveTimelineInstant()to strip completion timestamp before parsingActiveTimelineInstantBatcherso V9 completed files group with their requested/inflight filesgetNumericPartFromArchivedCommit()to handle V2 parquet, manifest, and_version_filesclusteringandlogcompactiontoWHITELISTED_ACTION_TYPESconstructStorageUri()andgetFileNameWithPrefix()layout-version-awareClickUp
https://app.clickup.com/t/86e0k91b2
Test plan
HoodiePropertiesReaderTest.testReadHoodiePropertiesV9passes (reads V9 properties)ActiveTimelineInstantBatcherTest.testCreateBatchWithV9CompletedInstantspasses (V9 grouping)TableMetadataUploaderServiceTestpasses with new properties reader mocks for existing tables🤖 Generated with Claude Code