Voice Print AI

Introduction

Voice Print AI is a project that uses Whisperfile to generate transcripts from audio files completely locally on your system.

Read more about the Whisperfile at link Mozilla/whisperfile.

Whisperfile is a high-performance implementation of OpenAI's Whisper created by Mozilla Ocho as part of the llamafile project, based on the whisper.cpp software written by Georgi Gerganov, et al.

Why

This project can be used to generate transcripts where privacy is a concern. The audio files are processed locally on your system and no data is sent to any server. This can include sensitive information like medical records, legal data, etc.

Docker Version

You can run the Docker version of this project by running the following commands. This way you don't have to worry about dealing with the code or the dependencies.

docker pull dbanswan/voice-print-ai
mkdir -p audio db
docker run -p 3000:3000 -p 8080:8080 -v $(pwd)/audio:/app/public/audio/uploads -v $(pwd)/db:/app/db dbanswan/voice-print-ai

Open your browser and go to http://localhost:3000 to use the application.

The Whisperfile server will be available on http://localhost:8080.

You can also run just the Whisperfile server using the following command.

docker pull dbanswan/whisper-tiny-server
docker run -p 8080:8080 dbanswan/whisper-tiny-server

Both images are based on tiny model to keep the size small.

Run the project locally

Clone the repository


git clone https://github.com/dbanswan/voiceprint-ai.git

Install the dependencies


npm install

Go to Hugging Face and download the model files.

I would suggest downloading "whisper-tiny.llamafile" it is around 315 MB and works great for most part. But feel free to download the other models as well the larger the model the better the results. But they would need more resources to run.
Go the folder where you downloaded the model file.


# make it executable

chmod +x whisper-tiny.llamafile

And then run it with the following command


./whisper-tiny.llamafile

As you can see the model will start a server on port 8080. And this what we are going to call from our app. You can download any model and run it the same way.

Run the app


npm run dev

No keys or API tokens are required to run this project.

How it works

Select the audio
Press the "Transcribe" button
You can go to "History" tab to see the all the transcripts that you have generated.
It also comes with audio player so you can listen to the audio as well while reading the transcript.

The history is stored in a simple JSON file inside the db folder. Feel free to switch to let's say sqlite or any other database. This is done to keep things simple and not to add any extra dependencies.
The audio files are stored in the public/audio/uploads folder.
Also for advance users it also exports the transcript in JSON, Verbose json, VTT, SRT format. This can be used to integrate with other systems.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.vscode		.vscode
app		app
components		components
db		db
lib		lib
public		public
.dockerignore		.dockerignore
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
components.json		components.json
docker-compose.yml		docker-compose.yml
jsconfig.json		jsconfig.json
next.config.mjs		next.config.mjs
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tailwind.config.js		tailwind.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice Print AI

Introduction

Why

Docker Version

Run the project locally

How it works

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Voice Print AI

Introduction

Why

Docker Version

Run the project locally

How it works

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages