mirror of https://github.com/suno-ai/bark.git
Update Doc: Added extra voice examples, updated documentation and related code example (#290)
parent
81d3a507fb
commit
2d9eded792
49
README.md
49
README.md
|
@ -4,7 +4,7 @@
|
|||
[](https://twitter.com/OnusFM)
|
||||
<a href="http://www.repostatus.org/#active"><img src="http://www.repostatus.org/badges/latest/active.svg" /></a>
|
||||
|
||||
[Examples](https://suno-ai.notion.site/Bark-Examples-5edae8b02a604b54a42244ba45ebc2e2) • [Suno Studio Waitlist](https://3os84zs17th.typeform.com/suno-studio) • [Updates](#-updates) • [How to Use](#-usage-in-python) • [Installation](#-installation) • [FAQ](#-faq)
|
||||
> 🔗 [Examples](https://suno-ai.notion.site/Bark-Examples-5edae8b02a604b54a42244ba45ebc2e2) • [Suno Studio Waitlist](https://3os84zs17th.typeform.com/suno-studio) • [Updates](#-updates) • [How to Use](#-usage-in-python) • [Installation](#-installation) • [FAQ](#-faq)
|
||||
|
||||
[//]: <br> (vertical spaces around image)
|
||||
<br>
|
||||
|
@ -18,6 +18,13 @@ Bark is a transformer-based text-to-audio model created by [Suno](https://suno.a
|
|||
## ⚠ Disclaimer
|
||||
Bark was developed for research purposes. It is not a conventional text-to-speech model but instead a fully generative text-to-audio model, which can deviate in unexpected ways from provided prompts. Suno does not take responsibility for any output generated. Use at your own risk, and please act responsibly.
|
||||
|
||||
## 📖 Quick Index
|
||||
* [🚀 Updates](#-updates)
|
||||
* [💻 Installation](#-installation)
|
||||
* [🐍 Usage](#-usage-in-python)
|
||||
* [🌀 Live Examples](https://suno-ai.notion.site/Bark-Examples-5edae8b02a604b54a42244ba45ebc2e2)
|
||||
* [❓ FAQ](#-faq)
|
||||
|
||||
## 🎧 Demos
|
||||
|
||||
[](https://huggingface.co/spaces/suno/bark)
|
||||
|
@ -66,8 +73,8 @@ write_wav("bark_generation.wav", SAMPLE_RATE, audio_array)
|
|||
# play text in notebook
|
||||
Audio(audio_array, rate=SAMPLE_RATE)
|
||||
```
|
||||
|
||||
[pizza.webm](https://user-images.githubusercontent.com/5068315/230490503-417e688d-5115-4eee-9550-b46a2b465ee3.webm)
|
||||
|
||||
[pizza1.webm](https://user-images.githubusercontent.com/34592747/cfa98e54-721c-4b9c-b962-688e09db684f.webm)
|
||||
|
||||
</details>
|
||||
|
||||
|
@ -87,7 +94,18 @@ audio_array = generate_audio(text_prompt)
|
|||
```
|
||||
[suno_korean.webm](https://user-images.githubusercontent.com/32879321/235313033-dc4477b9-2da0-4b94-9c8b-a8c2d8f5bb5e.webm)
|
||||
|
||||
*Note: since Bark recognizes languages automatically from input text, it is possible to use for example a german history prompt with english text. This usually leads to english audio with a german accent.*
|
||||
*Note: since Bark recognizes languages automatically from input text, it is possible to use, for example, a german history prompt with english text. This usually leads to english audio with a german accent.*
|
||||
```python
|
||||
text_prompt = """
|
||||
Der Dreißigjährige Krieg (1618-1648) war ein verheerender Konflikt, der Europa stark geprägt hat.
|
||||
This is a beginning of the history. If you want to hear more, please continue.
|
||||
"""
|
||||
audio_array = generate_audio(text_prompt)
|
||||
```
|
||||
[suno_german_accent.webm](https://user-images.githubusercontent.com/34592747/3f96ab3e-02ec-49cb-97a6-cf5af0b3524a.webm)
|
||||
|
||||
|
||||
|
||||
|
||||
</details>
|
||||
|
||||
|
@ -109,11 +127,9 @@ audio_array = generate_audio(text_prompt)
|
|||
<details open>
|
||||
<summary><h3>🎤 Voice Presets</h3></summary>
|
||||
|
||||
Bark supports 100+ speaker presets across [supported languages](#supported-languages). You can browse the library of speaker presets [here](https://suno-ai.notion.site/8b8e8749ed514b0cbf3f699013548683?v=bc67cff786b04b50b3ceb756fd05f68c), or in the [code](bark/assets/prompts). The community also often shares presets in [Discord](https://discord.gg/J2B2vsjKuE).
|
||||
Bark supports 100+ speaker presets across [supported languages](#supported-languages). You can browse the library of supported voice presets [HERE](https://suno-ai.notion.site/8b8e8749ed514b0cbf3f699013548683?v=bc67cff786b04b50b3ceb756fd05f68c), or in the [code](bark/assets/prompts). The community also often shares presets in [Discord](https://discord.gg/J2B2vsjKuE).
|
||||
|
||||
Bark tries to match the tone, pitch, emotion and prosody of a given preset, but does not currently support custom voice cloning. The model also attempts to preserve music, ambient noise, etc.
|
||||
<br>
|
||||
<br>
|
||||
> Bark tries to match the tone, pitch, emotion and prosody of a given preset, but does not currently support custom voice cloning. The model also attempts to preserve music, ambient noise, etc.
|
||||
|
||||
```python
|
||||
text_prompt = """
|
||||
|
@ -126,9 +142,9 @@ audio_array = generate_audio(text_prompt, history_prompt="v2/en_speaker_1")
|
|||
[sloth.webm](https://user-images.githubusercontent.com/5068315/230684883-a344c619-a560-4ff5-8b99-b4463a34487b.webm)
|
||||
</details>
|
||||
|
||||
### Generating Longer Audio
|
||||
### 📃 Generating Longer Audio
|
||||
|
||||
By default, `generate_audio` works well with around 13 seconds of spoken text. For an example of how to do long-form generation, see this [example notebook](notebooks/long_form_generation.ipynb).
|
||||
By default, `generate_audio` works well with around 13 seconds of spoken text. For an example of how to do long-form generation, see 👉 **[Notebook](notebooks/long_form_generation.ipynb)** 👈
|
||||
|
||||
<details>
|
||||
<summary>Click to toggle example long-form generations (from the example notebook)</summary>
|
||||
|
@ -145,18 +161,17 @@ By default, `generate_audio` works well with around 13 seconds of spoken text. F
|
|||
|
||||
|
||||
## 💻 Installation
|
||||
|
||||
```
|
||||
*‼️ CAUTION ‼️ Do NOT use `pip install bark`. It installs a different package, which is not managed by Suno.*
|
||||
```bash
|
||||
pip install git+https://github.com/suno-ai/bark.git
|
||||
```
|
||||
|
||||
or
|
||||
|
||||
```
|
||||
```bash
|
||||
git clone https://github.com/suno-ai/bark
|
||||
cd bark && pip install .
|
||||
```
|
||||
*Note: Do NOT use 'pip install bark'. It installs a different package, which is not managed by Suno.*
|
||||
|
||||
|
||||
## 🛠️ Hardware and Inference Speed
|
||||
|
@ -172,7 +187,7 @@ If you don't have hardware available or if you want to play with bigger versions
|
|||
|
||||
## ⚙️ Details
|
||||
|
||||
Bark is fully generative tex-to-audio model devolved for research and demo purposes. It follows a GPT style architecture similar to [AudioLM](https://arxiv.org/abs/2209.03143) and [Vall-E](https://arxiv.org/abs/2301.02111) and a quantized Audio representation from [EnCodec](https://github.com/facebookresearch/encodec). It is not a conventional TTS model, but instead a fully generative text-to-audio model capable of deviating in unexpected ways from any given script. Different to previous approaches, the input text prompt is converted directly to audio without the intermediate use of phonemes. It can therefore generalize to arbitrary instructions beyond speech such as music lyrics, sound effects or other non-speech sounds.
|
||||
Bark is fully generative text-to-audio model devolved for research and demo purposes. It follows a GPT style architecture similar to [AudioLM](https://arxiv.org/abs/2209.03143) and [Vall-E](https://arxiv.org/abs/2301.02111) and a quantized Audio representation from [EnCodec](https://github.com/facebookresearch/encodec). It is not a conventional TTS model, but instead a fully generative text-to-audio model capable of deviating in unexpected ways from any given script. Different to previous approaches, the input text prompt is converted directly to audio without the intermediate use of phonemes. It can therefore generalize to arbitrary instructions beyond speech such as music lyrics, sound effects or other non-speech sounds.
|
||||
|
||||
Below is a list of some known non-speech sounds, but we are finding more every day. Please let us know if you find patterns that work particularly well on [Discord](https://discord.gg/J2B2vsjKuE)!
|
||||
|
||||
|
@ -190,7 +205,7 @@ Below is a list of some known non-speech sounds, but we are finding more every d
|
|||
### Supported Languages
|
||||
|
||||
| Language | Status |
|
||||
| --- | --- |
|
||||
| --- | :---: |
|
||||
| English (en) | ✅ |
|
||||
| German (de) | ✅ |
|
||||
| Spanish (es) | ✅ |
|
||||
|
@ -218,7 +233,7 @@ Requests for future language support [here](https://github.com/suno-ai/bark/disc
|
|||
|
||||
Bark is licensed under the MIT License.
|
||||
|
||||
Please contact us at `bark@suno.ai` to request access to a larger version of the model.
|
||||
Please contact us at 📧 [bark@suno.ai](mailto:bark@suno.ai) to request access to a larger version of the model.
|
||||
|
||||
## 📱 Community
|
||||
|
||||
|
|
Loading…
Reference in New Issue