Is this a new feature, an improvement, or a change to existing functionality?
New Feature
How would you describe the priority of this feature request
Currently preventing usage
Please provide a clear description of problem this feature solves
Users ingest content from a single subfolder with mixed types: PDFs, text files, pptx, etc
Users want to apply the .split() task on such jobs, with source type filtering, which should restrict splitting to specific file types (text, html, docx, pptx, mp3, optionally PDF)
Describe the feature, and optionally a solution or implementation and any alternatives
.split(source_types=[".txt"]
Additional context
No response
Is this a new feature, an improvement, or a change to existing functionality?
New Feature
How would you describe the priority of this feature request
Currently preventing usage
Please provide a clear description of problem this feature solves
Users ingest content from a single subfolder with mixed types: PDFs, text files, pptx, etc
Users want to apply the .split() task on such jobs, with source type filtering, which should restrict splitting to specific file types (text, html, docx, pptx, mp3, optionally PDF)
Describe the feature, and optionally a solution or implementation and any alternatives
Additional context
No response