Diffsound: Discrete Diffusion Model for Text-to-sound Generation

Dongchao Yang¹, Jianwei Yu², Helin Wang¹, Wen Wang¹, Chao Weng², Yuexian Zou¹ Dong Yu² 1 Peking University 2 Tencent AI Lab

Introduction

This is a demo for our paper Diffsound: Discrete Diffusion Model for Text-to-sound Generation. Code and Pre-trained model can be found on github. In the following, we will show some generated samples by our proposed method. If you want to find more samples, please refer to our github.

Examples

The comprarison between generated sample by AR and Diffsound models and real sound

Text description	AR models	Diffsound models	Real samples
Birds and insects make noise during the daytime
Mel-spectrograms
A dog barks and whimpers
Mel-spectrograms
A person is snoring while sleeping
Mel-spectrograms

Other generated samples by Diffsound model

Sample 1
- Text input: Someone playing drums
- Generated sound
ERROR
Sample 2
- Text input: An engine idles consistently before sputtering some
- Generated sound
ERROR
Sample 3
- Text input: A train horn sounds and railroad crossing ring
- Generated sound 1
ERROR
- Generated sound 2
ERROR
Sample 4
- Text input: A clock ticktocks continuously
- Generated sound
ERROR
Sample 5
- Text input: Some knocking and rubbing
- Generated sound1
ERROR
- Generated sound2
ERROR
Sample 6
- Text input: Large explosions sound
- Generated sound1
ERROR
- Generated sound2
ERROR
Sample 7
- Text input: An engine runs loudly
- Generated sound
ERROR
Sample 8
- Text input: Birds chirp and pigeons vocalize as a vehicle passes by
- Generated sound1
ERROR
- Generated sound2
ERROR
Sample 9
- Text input: A bug is buzzing as it is flying around
- Generated sound
ERROR
Sample 10
- Text input: A person is whistling a tune
- Generated sound1
ERROR
- Generated sound2
ERROR
- Generated sound3
ERROR
Sample 11
- Text input: Thunder roars as rain falls onto a hard surface
- Generated sound1
ERROR
- Generated sound2
ERROR
Sample 12
- Text input: An audience gives applause then a man speaks
- Generated sound
ERROR
Sample 13
- Text input: Birds chirp and animals make noise
- Generated sound
ERROR
Sample 14
- Text input: A man talks while something sizzles
- Generated sound
ERROR
Sample 15
- Text input: Someone is typing on a computer keyboard
- Generated sound1
ERROR

Links