Dongchao Yang

SEEM
The Chinese University of Hongkong, Hongkong

Email: dcyang@se.cuhk.edu.hk

General

I am a PhD student at The Chinese University of Hongkong, majoring in Speech and Audio Processing, Supervised by Prof. Helen Meng. Before that, I received the Master's Degree from Peking University in 2023.

My research focus on developing a human-agent that can communicate with human, e.g. understooding human's speech and environments sound, and then producing feedback to humans.
Note: I am actively looking for any collaboration opportunities (e.g. Audio Foundation Models, Generative Models, TTS, Text-to-audio...). Please feel free to contact me.

Research Interests

Audio Foundation Models, Generative Models, Large Language Models, Audio/Speech Processing

Educations

The Chinese University of Hongkong

August 2023 - Now.

School of Electronic and Computer Engineering, Peking University

August 2020 - August 2023.

School of Computer Engineering and Science, Shanghai University

August 2016 - July 2020.

Experiences

July 2023 - Sep. 2023
MSRA, Speech Group, Intern.
Supervisor: Xu Tan,

May 2021 - May 2023
Tencent AI Lab, Speech Group, Intern.
Supervisor: Songxiang Liu, Chao Weng, and Bo Wu

Selected Publications (* denotes equal contributions. Refer to google scholar find all Publications)

Dongchao Yang, Jianwei Yu, Helin Wang, Wen Wang, Chao Weng, Yuexian Zou, Dong Yu
Diffsound: Discrete Diffusion Model for Text-to-sound generation
Accepted by IEEE Transactions on Audio, Speech and Language Processing., 2023. 2024 IEEE SPS Young Author Best Paper Award [Code]

Dongchao Yang*, Songxiang Liu*, Rongjie Huang, Chao Weng, Helen Meng
InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt
Accepted by IEEE Transactions on Audio, Speech and Language Processing , 2024. [https://github.com/yangdongchao/SoundStorm]

Dongchao Yang*, Jinchuan Tian*, Xu Tan, Rongjie Huang, Songxiang Liu, Xuankai Chang, Jiatong Shi, Sheng Zhao, Jiang Bian, Xixin Wu, Zhou Zhao, Helen Meng
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
ICML , 2024. Highligted by Artificial Intelligence Index Report 2024. Only Three Audio related papers be highligted (MusicLM (Google), MusicGen (Meta), and Ours) [link] [Code]

Dongchao Yang, Haohan Guo, Yuanyuan Wang, Rongjie Huang, Xiang Li, Xu Tan, Xixin Wu, Helen Meng
UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner
NIPS , 2024. [Code]

Dongchao Yang, Songxiang Liu, Haohan Guo, Jiankun Zhao, Yuanyuan Wang, Helin Wang, Zeqian Ju, Xubo Liu, Xueyuan Chen, Xu Tan, Xixin Wu, Helen Meng
ALMTokenizer: A Low-bitrate and Semantic-rich Audio Codec Tokenizer for Audio Language Modeling
ICML , 2025. [Code]

Dongchao Yang, Rongjie Huang, Yuanyuan Wang, Haohan Guo, Dading Chong, Songxiang Liu, Xixin Wu, Helen Meng
SimpleSpeech 2: Towards Simple and Efficient Text-to-Speech with Flow-based Scalar Latent Transformer Diffusion Models
IEEE Transactions on Audio, Speech and Language Processing (TASLP), 2025

Dongchao Yang, Dingdong Wang, Haohan Guo, Xueyuan Chen, Xixin Wu, Helen Meng
SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models
Proc. Interspeech, 2024. ISCA Best Student Paper Award https://github.com/yangdongchao/SimpleSpeech

Rongjie Huang*, Jiawei Huang*, Dongchao Yang*, Yi Ren, et al.
Make-an-audio: Text-to-audio generation with prompt-enhanced diffusion models
Accepted by ICML., 2023. [https://github.com/Text-to-Audio/Make-An-Audio]

Dongchao Yang, Helin Wang, Yuexian Zou, WenWu Wang
A Mixed Supervised Learning Framework for Target Sound Detection
DCASE Workshop, 2022. [Code]

Rongjie Huang*, Mingzhe Li*, Dongchao Yang*, Jiatong Shi, et all
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
Accepted by AAAI 2024 , 2023. [Code]

Dongchao Yang, Helin Wang, Yuexian Zou
Unsupervised Multi-Target Domain Adaptation for Acoustic Scene Classification
Proc. Interspeech, 2021. [Code]

Dongchao Yang, Helin Wang, Yuexian Zou, Zhongjie Ye, WenWu Wang
A MUTUAL LEARNING FRAMEWORK FOR FEW-SHOT SOUND EVENT DETECTION
ICASSP, 2022. [Code]

Preprints

Dongchao Yang, Songxiang Liu, Rongjie Huang, Jinchuan Tian, Chao Weng, Xuexian Zou
HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec
Preprints , 2023. https://github.com/yangdongchao/AcademiCodec

Services

IEEE/CAA Journal of Automatica Sinica

Reviewer

ICASSP, InterSpeech

Reviewer

ICML, Neurips, ICLR, ACM-MM, COLM, IJCAI

Reviewer

IEEE Transactions on Audio, Speech, and Language Processing

Reviewer

IEEE Transactions on Signal Processing Letter

Reviewer

Selected Awards

ICLR Notable Reviewer

2025

IEEE SPS Young Author Best Paper Award

2024

ISCA Best Student Paper Award

2024

Outstanding graduate of Peking University

2023

Excellent graduation thesis of Peking University

2023

1st Team Ranking of DCASE Challenge Task 5 (Judges’ award)

2021

Bronze Prize of ACM-ICPC Asia Regional Competition

2018