Skip to the content.

Diffsound: Discrete Diffusion Model for Text-to-sound Generation

Dongchao Yang1, Jianwei Yu2, Helin Wang1, Wen Wang1, Chao Weng2, Yuexian Zou1 Dong Yu2
1 Peking University
2 Tencent AI Lab


This is a demo for our paper Diffsound: Discrete Diffusion Model for Text-to-sound Generation. Code and Pre-trained model can be found on github. In the following, we will show some generated samples by our proposed method. If you want to find more samples, please refer to our github.


The comprarison between generated sample by AR and Diffsound models and real sound

Text description
AR models
Diffsound models
Real samples
Birds and insects make noise during the daytime
A dog barks and whimpers
A person is snoring while sleeping

Other generated samples by Diffsound model

[Paper] [Bibtex] [Demo GitHub] [TencentAILab] [PKU] [code]