Skip to content

Support Audio Datatypes #178

@HAKSOAT

Description

@HAKSOAT

This task is not fully formed yet, but the idea is to take one step towards supporting audio data.

In the case of text data, tokenization is used to split it into tokens and pass it into the model.
In the case of image data, we resize, convert to ndarray, etc., then pass it into the model.

In the case of audio data???

That is the question this issue looks to answer.

The user should be able to pass in bytes of an audio file and we read and do basic processing (it is fine if this is not model-specific, just generic stuff) that leads to the input type supported by the model.

Such that we can add an extra input type here:

pub enum ModelInput {

Such as:

#[derive(Debug)]
pub enum ModelInput {
    Texts(Vec<Encoding>),
    Images(Array<f32, Ix4>),
    Audios(...)
}

Some other places where this new data type would reflect could be:

pub enum ModelType {
Text {
max_input_tokens: NonZeroUsize,
},
Image {
// width, height
expected_image_dimensions: (NonZeroUsize, NonZeroUsize),
},
}

Other useful pointers include:

How we currently read bytes for images:

pub fn try_new(bytes: Vec<u8>) -> Result<Self, AIProxyError> {

The information provided doesn't cover all the bits of code that need modification to make the introduction, so feel free to do what is needed.

Out of scope: Model Support, Specific Preprocessing

Focus should be on being able to get audio into the format that can be sent into models using the most minimal processing possible. For example, in the case of image processing, this would mean simply resizing the image and converting to NdArray.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is needed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions