Would y'all accept a PR that would have datakit write a zero-size placeholder with metadata matching the date of the most recent write of the relevant file to S3?
e.g. writing data/clean/my_big_file.csv.datakit_placeholder when data/clean/my_big_file.csv is uploaded to S3.
The motivation is that I do not remember when I most recently synced. When it comes time to delete big old files, I want greater confidence that the most recent version is synced to S3. And, if I come back to a project, re-run a notebook, and find that a big data file no longer exists on disk, I'd love to have a reminder that it's on datakit (versus my external HDD or gone forever or on a colleague's laptop or whatever).
Would y'all accept a PR that would have datakit write a zero-size placeholder with metadata matching the date of the most recent write of the relevant file to S3?
e.g. writing
data/clean/my_big_file.csv.datakit_placeholderwhendata/clean/my_big_file.csvis uploaded to S3.The motivation is that I do not remember when I most recently synced. When it comes time to delete big old files, I want greater confidence that the most recent version is synced to S3. And, if I come back to a project, re-run a notebook, and find that a big data file no longer exists on disk, I'd love to have a reminder that it's on datakit (versus my external HDD or gone forever or on a colleague's laptop or whatever).