Skip to content

DataFrame write api should accept Overwrite option when the file already exist #4986

@MrDataPsycho

Description

@MrDataPsycho

Problem:
While trying to write a csv (any type) file, I am getting the following exception if the file already exists:

Error: Execution("Could not create directory data/processed/diabetes.csv: Os { code: 17, kind: AlreadyExists, message: \"File exists\" }")

But other frameworks like Polars, PySpark accept overriding either optional or by default and it also possible to append existing files if selected.

Possible Solution:
I suggest having a CsvWriteOptions struct like CsvReadOptions struct which must be passed during writing the file. For example df.write_csv(path: "my/precious/path", CsvWriteOptions::default()).await? This family of write options can also have all the configuration possibilities for json, parquet, csv, table etc. For example, If I want to write the csv into multiple file or one single file.

Also not sure why by default csv file is written into multiple partition part01.csv ... part08.csv by default, I should let user write into one csv file incase of smaller dataset.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions