Hello 👋!
I am Yusheng Dai, a final-year M.S. student in University of Science and Technology of China (USTC), under the guidance of Prof. JunDu and Prof. Chin-hui Lee. Starting June 2025, I will be pursuing my Ph.D. at Monash University in Australia under the supervision of Prof. Jianfei Cai and Prof. Qiuhong Ke. I obtained my Bachelor’s Degree of Cyber Engineering from Sichuan University in June 2022.
My prior research primarily focuses on video, integrating both visual and audio streams while emphasizing their complementarity, alignment, and transitions across semantic and temporal dimensions. These work can be simply divided into two main categories. The early work, beginning in 2022, focus on audio-visual discriminative models related to talking-face videos in noisy, multi-speaker scenarios, such as Audio-Visual Speech Recognition (AVSR). More recently, since 2023, my research has shifted toward more flexible and high-quality audio and music generation, emphasizing atomic controllability and consistency in combination, guided by text or silent video.
News
- [Nov. 2024]: Obtaining National Scholarship in Univerisity of Science and Technology in China.
- [Oct. 2024]: Obtaining Monash International Tuition Scholarship (MITS) and Monash Graduate Scholarship (MGS).
- [Feb. 2024]: One paper on the robustness of audio-visual speech recognition (AVSR) has been accepted as the first CVPR paper in history of our speech group! We unexpectedly discovered this interesting multimodal bias phenomenon and successfully solve the problem of modality dropout, achieving a unification of AVSR and ASR. The paper has been upload on the arxiv and code will be coming soon!
- [Jan. 2024]: The extended paper of our self-driven work on financial time series prediction has been accepted by SDM 2024. Thanks to my co-authors for this unforgettable cooperation! Hopefully, this algorithm can help us make enough money for the future Mars trip :).
- [Dec. 2022] Welcome to join our MISP2023 competition on speech enhancement task! Here is the baseline system.
- [Mar. 2023]: One paper on low-level audio-visual signal processing has been accepted by ICME 2023, which is the first paper in my graduate career. I have the opportunity to go to Australia! For sure, here is the code.
- [Dec. 2022] Welcome to join our MISP2022 competition on speaker diarization and long-time AVSR task! Here is the baseline system.
- [Oct. 2022] We have some ideas to firstly extend MAMMAL in financial data analysis and the paper is accepted by NeurIPS DistShift 2022 workshop.
- [Dec. 2021] We release the largest Mandarin audio-visual dataset called MISP-AVSR. The dataset is recorded in TV rooms of home environments with multiple groups chatting simultaneously. Welcome to join our MISP2021 competition as the grand challenge of ICASSP! Here is the baseline system.
-
CVPR
Yusheng Dai, Hang Chen, Jun Du, Chin-hui Lee, et.al.
IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR), 2024.
-
ICME
Yusheng Dai, Hang Chen, Jun Du, Chin-hui Lee, et.al.
IEEE International Conference on Multimedia and Expo (ICME), 2023.
-
NeurIPS DistShift
Donglin Zhan*, Yusheng Dai*, Yiwei Dong*, Jinghai He, Zhenyi Wang, James Anderson (* means equal contribution)
Conference on Neural Information Processing Systems Workshop on Distribution Shifts (NeurIPS DistShift), 2022.
-
SDM
Donglin Zhan*, Yusheng Dai*, Yiwei Dong*, Jinghai He, Zhenyi Wang, James Anderson (* means equal contribution)
SIAM International Conference on Data Mining (SDM), 2024.
-
Electronics Letters
Yusheng Dai, Yang Jin, Yiwei Dong, et.al.
Electronics Letters
-
Interspeech
Chen Hang, Du Jun, Dai Yusheng, Lee Chin Hui, Siniscalchi, Sabato Marco, Watanabe Shinji, Scharenborg, Odette, Chen Jingdong, et.al.
In Proceedings of the Annual Conference of the International Speech Communication Association (Interspeech), 2022.
-
ICASSP
Shilong Wu, Chenxi Wang, Hang Chen, Yusheng Dai, et.al.
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024.
© Yusheng Dai, 2023