Skip to content

[FEA]: Allow user to configure which content types the .split() task operates on #1770

@randerzander

Description

@randerzander

Is this a new feature, an improvement, or a change to existing functionality?

New Feature

How would you describe the priority of this feature request

Currently preventing usage

Please provide a clear description of problem this feature solves

Users ingest content from a single subfolder with mixed types: PDFs, text files, pptx, etc

Users want to apply the .split() task on such jobs, with source type filtering, which should restrict splitting to specific file types (text, html, docx, pptx, mp3, optionally PDF)

Describe the feature, and optionally a solution or implementation and any alternatives

.split(source_types=[".txt"]

Additional context

No response

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions