Support Audio Datatypes

This task is not fully formed yet, but the idea is to take one step towards supporting audio data.

In the case of text data, tokenization is used to split it into tokens and pass it into the model.
In the case of image data, we resize, convert to ndarray, etc., then pass it into the model.

In the case of audio data??? 

That is the question this issue looks to answer.

The user should be able to pass in bytes of an audio file and we read and do basic processing (it is fine if this is not model-specific, just generic stuff) that leads to the input type supported by the model.

Such that we can add an extra input type here: https://github.com/deven96/ahnlich/blob/4ed8654b40b1a0930dc1ff7f9aa78bb4f5e0847f/ahnlich/ai/src/engine/ai/models.rs#L247

Such as:

```
#[derive(Debug)]
pub enum ModelInput {
    Texts(Vec<Encoding>),
    Images(Array<f32, Ix4>),
    Audios(...)
}
```

Some other places where this new data type would reflect could be: https://github.com/deven96/ahnlich/blob/4ed8654b40b1a0930dc1ff7f9aa78bb4f5e0847f/ahnlich/ai/src/engine/ai/models.rs#L25-L33

Other useful pointers include:

How we currently read bytes for images: https://github.com/deven96/ahnlich/blob/4ed8654b40b1a0930dc1ff7f9aa78bb4f5e0847f/ahnlich/ai/src/engine/ai/models.rs#L261

The information provided doesn't cover all the bits of code that need modification to make the introduction, so feel free to do what is needed.

Out of scope: Model Support, Specific Preprocessing

Focus should be on being able to get audio into the format that can be sent into models using the most minimal processing possible. For example, in the case of image processing, this would mean simply resizing the image and converting to NdArray.

	pub enum ModelType {
	Text {
	max_input_tokens: NonZeroUsize,
	},
	Image {
	// width, height
	expected_image_dimensions: (NonZeroUsize, NonZeroUsize),
	},
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Audio Datatypes #178

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Support Audio Datatypes #178

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions