Skip to content

Add public datasets testing harness#33

Merged
unexge merged 1 commit intomainfrom
push-znnpumxqttqm
Feb 1, 2026
Merged

Add public datasets testing harness#33
unexge merged 1 commit intomainfrom
push-znnpumxqttqm

Conversation

@unexge
Copy link
Copy Markdown
Owner

@unexge unexge commented Feb 1, 2026

Summary

  • Add -Dci-tests build option to enable large dataset tests
  • Add download script (scripts/download-public-datasets.sh) with --small/--all flags
  • Add NYC Taxi dataset tests (green, fhv, yellow, fhvhv tripdata)
  • Small datasets (~1-25MB) run locally, large datasets (~50-400MB) run in CI only
  • Each test parses all row groups and columns using readColumnDynamic

Test plan

  • zig build test passes locally (2 small tests run, 2 CI-only tests skipped)
  • CI runs with -Dci-tests=true and all 4 tests pass

🤖 Generated with Claude Code

@unexge unexge enabled auto-merge (squash) February 1, 2026 12:46
@unexge unexge force-pushed the push-znnpumxqttqm branch from 47d9129 to b0ceaad Compare February 1, 2026 13:29
Add a testing infrastructure for public Parquet datasets with support for
CI-only tests for large files:

- Add -Dci-tests build option to enable large dataset tests
- Add download script (scripts/download-public-datasets.sh) with --small/--all flags
- Add NYC Taxi dataset tests (green, fhv, yellow, fhvhv tripdata)
- Small datasets (~1-25MB) run locally, large datasets (~50-400MB) run in CI only
- Each test parses all row groups and columns using readColumnDynamic
@unexge unexge force-pushed the push-znnpumxqttqm branch from b0ceaad to 64640f1 Compare February 1, 2026 13:34
@unexge unexge merged commit 2c82078 into main Feb 1, 2026
1 check passed
@unexge unexge deleted the push-znnpumxqttqm branch February 1, 2026 13:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant