SEEM
The Chinese University of Hongkong, Hongkong
Email: dcyang@se.cuhk.edu.hk
I am a PhD student at The Chinese University of Hongkong, majoring in Speech and Audio Processing, Supervised by Prof. Helen Meng. Before that, I received the Master's Degree from Peking University in 2023.
My research focus on developing a human-agent that can communicate with human,
e.g. understooding human's speech and environments sound, and then producing feedback to humans.
Note: I am actively looking for any collaboration opportunities (e.g. Audio Foundation Models, Generative Models, TTS, Text-to-audio...). Please feel free to contact me.
Audio Foundation Models, Generative Models, Large Language Models, Audio/Speech Processing
July 2023 - Sep. 2023
MSRA, Speech Group, Intern.
Supervisor: Xu Tan,
May 2021 - May 2023
Tencent AI Lab, Speech Group, Intern.
Supervisor: Songxiang Liu, Chao Weng, and Bo Wu
Dongchao Yang, Jianwei Yu, Helin Wang, Wen Wang, Chao Weng, Yuexian Zou, Dong Yu
Diffsound: Discrete Diffusion Model for Text-to-sound generation
Accepted by IEEE Transactions on Audio, Speech and Language Processing., 2023.
[Code]
Dongchao Yang*, Songxiang Liu*, Rongjie Huang, Chao Weng, Helen Meng
InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt
Accepted by IEEE Transactions on Audio, Speech and Language Processing , 2024.
[https://github.com/yangdongchao/SoundStorm]
Dongchao Yang*, Jinchuan Tian*, Xu Tan, Rongjie Huang, Songxiang Liu, Xuankai Chang, Jiatong Shi, Sheng Zhao, Jiang Bian, Xixin Wu, Zhou Zhao, Helen Meng
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
ICML , 2024. Highligted by Artificial Intelligence Index Report 2024. Only Three Audio related papers be highligted (MusicLM (Google), MusicGen (Meta), and Ours) [link]
[Code]
Dongchao Yang, Haohan Guo, Yuanyuan Wang, Rongjie Huang, Xiang Li, Xu Tan, Xixin Wu, Helen Meng
UniAudio 1.5: Large Language Model-driven Audio
Codec is A Few-shot Audio Task Learner
NIPS , 2024.
[Code]
Dongchao Yang, Dingdong Wang, Haohan Guo, Xueyuan Chen, Xixin Wu, Helen Meng
SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models
Proc. Interspeech, 2024. ISCA Best Student Paper Award
https://github.com/yangdongchao/SimpleSpeech
Rongjie Huang*, Jiawei Huang*, Dongchao Yang*, Yi Ren, et al.
Make-an-audio: Text-to-audio generation with prompt-enhanced diffusion models
Accepted by ICML., 2023.
[https://github.com/Text-to-Audio/Make-An-Audio]
Dongchao Yang, Helin Wang, Yuexian Zou, WenWu Wang
A Mixed Supervised Learning Framework for Target Sound Detection
DCASE Workshop, 2022.
[Code]
Rongjie Huang*, Mingzhe Li*, Dongchao Yang*, Jiatong Shi, et all
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
Accepted by AAAI 2024 , 2023.
[Code]
Dongchao Yang, Helin Wang, Yuexian Zou
Unsupervised Multi-Target Domain Adaptation for Acoustic Scene Classification
Proc. Interspeech, 2021.
[Code]
Dongchao Yang, Helin Wang, Yuexian Zou, Zhongjie Ye, WenWu Wang
A MUTUAL LEARNING FRAMEWORK FOR FEW-SHOT SOUND EVENT DETECTION
ICASSP, 2022.
[Code]
Dongchao Yang, Songxiang Liu, Rongjie Huang, Jinchuan Tian, Chao Weng, Xuexian Zou
HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec
Preprints , 2023.
https://github.com/yangdongchao/AcademiCodec
Dongchao Yang, Rongjie Huang, Yuanyuan Wang, Haohan Guo, Dading Chong, Songxiang Liu, Xixin Wu, Helen Meng
SimpleSpeech 2: Towards Simple and Efficient Text-to-Speech with Flow-based Scalar Latent Transformer Diffusion Models
Preprints , 2024.
Submitted to TASLP