Skip to content

Added ducl diff, scan --s3 and several bug fixes.#1

Open
jpc wants to merge 6 commits intomainfrom
jpc/wip
Open

Added ducl diff, scan --s3 and several bug fixes.#1
jpc wants to merge 6 commits intomainfrom
jpc/wip

Conversation

@jpc
Copy link
Copy Markdown
Member

@jpc jpc commented Mar 9, 2026

No description provided.

jpc and others added 6 commits March 4, 2026 18:29
- Fix hatch_build.py: use force_include (underscore) not force-include
  (hyphen) so hatchling actually includes the compiled binary in the wheel
- Set pure_python=False and platform tag (manylinux_2_17) so pip installs
  the correct architecture-specific wheel
- Rewrite release.yml with matrix builds (x86_64 + aarch64) producing
  one wheel per arch, separate sdist, combined publish step
- Attach arch-suffixed pwalk2 binaries to GitHub releases

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Switch from ThreadPoolExecutor to raw mp.Process workers to eliminate
GIL contention from botocore XML parsing that was starving the main
thread. Workers now use a work-stealing pattern with blocking queues
and None sentinels for clean shutdown.

Also adds --max-objects CLI option for benchmarking, progress display
during prefix discovery, and per-worker timing instrumentation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Consolidate parquet files and meta.json under a dashboard_data/
subdirectory alongside dashboard.html, using _export_dashboard_data
for both build and update commands. Update test helpers accordingly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The old approach used the shallowest file's parent directory as root,
which breaks when files exist at different depths (e.g. S3 data).
Use lexicographic min/max to compute the LCP in O(1) comparisons.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Folder chip sizes were computed from cell.folder (immediate parent dir
only), giving values much lower than compute_top_folders which explodes
all path components. Now accumulates chip sizes for every matching
folder tag in the node's path, consistent with Python-side computation.

Folder filtering now checks both node path parts and cell.folder,
catching files in pruned subdirectories that the node-only approach
missed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Compares two .feather scan files and builds a diff dashboard showing
where disk usage grew or shrank. Merges pruned trees via outer join,
builds merged cubes with old/new/delta columns, and produces a
standalone HTML dashboard with diverging red/green treemap coloring
based on absolute delta, deep linking support, and top changes panel.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant