-
Notifications
You must be signed in to change notification settings - Fork 15
Open
Labels
Description
Feather format is smaller than CSV, i.e. more efficient on space/processing, and stores dtypes, helping to avoid some problems when loading the data for further processing.
We initially moved to .csv.gz, which was an improvement on uncompressed CSVs. However, it uses a significant amount of CPU. We believe that moving to Arrow/Feather would use much less CPU and be an overall improvement.
To do:
- Update project.yaml and code sample
- Update gitignore to ignore
.feather/.arrowfiles - Update docs, including Getting Started Guide and ehrql tutorials (Remove csv.gz from ehrql tutorials and how-tos documentation#1610)
- Ensure that arrow files can be viewed using Codespaces
- Provide researchers with instructions about how to view arrow files during local development, for researchers using VS Code, R Studio and the Stata IDE (Add fallback support for viewing arrow files locally. opensafely-core/opensafely-cli#267) (Added by Lucy)
Reactions are currently unavailable