Skip to content

codecadede/bghira-bark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

84 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🐢 Bark

πŸš€ Updates

2025.05.17

  • MultiGPU inference example added to run a large script and combine the outputs at the end
  • Uses torch.compile with max-autotune across all models by default
  • Uses padded attention masks to support batched inference
  • SageAttention will be used automatically if installed
    • Compatible with batch inference and torch compile

On a 3x 4090 system, this can bring long inference job runtimes down from 2-5 minutes to 30-60 seconds.

Bark is licensed under the MIT License, meaning it's now available for commercial use!

Installation

git clone https://github.com/bghira/bghira-bark
cd bghira-bark
python3.12 -m venv .venv
. .venv/bin/activate
pip install -e .

Usage

This will run the example across all available GPUs without invoking torch compile:

env SUNO_DISABLE_COMPILE=true accelerate launch examples/parallel.py --out out.mp3 --normalize -14 --compress

βš™οΈ Details

Bark is fully generative tex-to-audio model devolved for research and demo purposes. It follows a GPT style architecture similar to AudioLM and Vall-E and a quantized Audio representation from EnCodec. It is not a conventional TTS model, but instead a fully generative text-to-audio model capable of deviating in unexpected ways from any given script. Different to previous approaches, the input text prompt is converted directly to audio without the intermediate use of phonemes. It can therefore generalize to arbitrary instructions beyond speech such as music lyrics, sound effects or other non-speech sounds.

Below is a list of some known non-speech sounds, but we are finding more every day. Please let us know if you find patterns that work particularly well on Discord!

  • [laughter]
  • [laughs]
  • [sighs]
  • [music]
  • [gasps]
  • [clears throat]
  • β€” or ... for hesitations
  • β™ͺ for song lyrics
  • CAPITALIZATION for emphasis of a word
  • [MAN] and [WOMAN] to bias Bark toward male and female speakers, respectively

Supported Languages

Language Status
English (en) βœ…
German (de) βœ…
Spanish (es) βœ…
French (fr) βœ…
Hindi (hi) βœ…
Italian (it) βœ…
Japanese (ja) βœ…
Korean (ko) βœ…
Polish (pl) βœ…
Portuguese (pt) βœ…
Russian (ru) βœ…
Turkish (tr) βœ…
Chinese, simplified (zh) βœ…

Requests for future language support here or in the #forums channel on Discord.

About

A 🍴 of a πŸ”Š Text-Prompted Generative Audio Model

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published