-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Problem:
While trying to write a csv (any type) file, I am getting the following exception if the file already exists:
Error: Execution("Could not create directory data/processed/diabetes.csv: Os { code: 17, kind: AlreadyExists, message: \"File exists\" }")
But other frameworks like Polars, PySpark accept overriding either optional or by default and it also possible to append existing files if selected.
Possible Solution:
I suggest having a CsvWriteOptions struct like CsvReadOptions struct which must be passed during writing the file. For example df.write_csv(path: "my/precious/path", CsvWriteOptions::default()).await? This family of write options can also have all the configuration possibilities for json, parquet, csv, table etc. For example, If I want to write the csv into multiple file or one single file.
Also not sure why by default csv file is written into multiple partition part01.csv ... part08.csv by default, I should let user write into one csv file incase of smaller dataset.