Skip to the content.

Diffsound: Discrete Diffusion Model for Text-to-sound Generation

Dongchao Yang1, Jianwei Yu2, Helin Wang1, Wen Wang1, Chao Weng2, Yuexian Zou1 Dong Yu2
1 Peking University
2 Tencent AI Lab

Introduction

This is a demo for our paper Diffsound: Discrete Diffusion Model for Text-to-sound Generation. Code and Pre-trained model can be found on github. In the following, we will show some generated samples by our proposed method. If you want to find more samples, please refer to our github.

Examples

The comprarison between generated sample by AR and Diffsound models and real sound

Text description
AR models
Diffsound models
Real samples
Birds and insects make noise during the daytime
Mel-spectrograms
A dog barks and whimpers
Mel-spectrograms
A person is snoring while sleeping
Mel-spectrograms

Other generated samples by Diffsound model

[Paper] [Bibtex] [Demo GitHub] [TencentAILab] [PKU] [code]