Hello 👋!
I am Yusheng Dai, a PhD candidate at Monash University in Australia, working with Prof. Jianfei Cai (IEEE Fellow) and Prof. Qiuhong Ke. Before that, I completed my Master’s program at University of Science and Technology of China (USTC), working with Prof. Jun Du and Prof. Chin-hui Lee (IEEE Fellow).
My research focuses on Audio-Visual Foundation World Models, working toward an interactive metaverse driven by real-time video and sound generation. Representative works include:
- Omni2Sound: A unified video-text-to-audio foundation model that achieves state-of-the-art on V2A, T2A, and VT2A with a simple DiT architecture.
- HolisticAudio: The first scalable end-to-end cinematic dubbing model that operates on holistic video without preprocessing pipelines, jointly modeling speech, sound effects, and music across multi-speaker, off-screen, and combined generation scenarios.
- ControlAudio: A controllable multi-event audio foundation model that produces millisecond-level temporally aligned audio from natural language descriptions.
- SaFa: Seamless long-form audio and panorama generation via latent swap joint diffusion, up to 20x faster than training-based methods.
Note: I am looking for collaborators to do great work in audio and speech — generation or understanding, frontend or backend. I bring strong research insights and sharp storytelling. If your work is potential enough to achieve high impact, email me. I am fast enough: I have gone to help my collorators from zero context to a finished paper in one week, multiple times.
News
- [Apr. 2026]: Our work on unified video-text-to-audio generation Omni2Sound has been accepted by CVPR 2026 as a Highlight (top 15%).
- [Apr. 2026]: Our work on controllable audio generation ControlAudio has been accepted by ACL 2026 as an Oral presentation (top 15%).
- [July 2025]: Our work on Timing Audio Generation benchmark —— AudioAtlas has been accepted by ACM MM 2025.
- [June 2025]: Our work on Seamless long-form audio and panorama generation SaFa** has been accepted by ICCV 2025.
- [Nov. 2024]: Obtained National Scholarship at University of Science and Technology of China.
- [Oct. 2024]: Obtained Monash International Tuition Scholarship (MITS) and Monash Graduate Scholarship (MGS).
-
CVPR
Yusheng Dai, Zehua Chen, Yuxuan Jiang, Baolong Gao, Qiuhong Ke, Jianfei Cai, Jun Zhu
IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR), 2026.
-
Under Review
Yusheng Dai, et.al.
Under Review, 2026.
-
ACL
Yuxuan Jiang, Zehua Chen, Zeqian Ju, Yusheng Dai, Weibei Dou, Jun Zhu
Annual Meeting of the Association for Computational Linguistics (ACL), 2026.
-
ICCV
Yusheng Dai, Chenxi Wang, Chang Li, Chen Wang, et.al.
International Conference on Computer Vision (ICCV), 2025.
-
CVPR
Yusheng Dai, Hang Chen, Jun Du, Chin-hui Lee, et.al.
IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR), 2024.
© Yusheng Dai, 2023