报告题目:Efficient Fine-Tuning of Vision-Language Models
报告人:Rui Zhu, City St George's, University of London, Senior Lecturer
报告时间:2026年6月18日(周四) 16:00
报告地点:先进技术研究中心513会议室
报告对象:全校感兴趣的老师、研究生
主办单位:bv伟德国际1946
报告人简介:
Rui Zhu received the Ph.D. degree in statistics from University College London in 2017. She is a Senior Lecturer in Statistics in the Faculty of Actuarial Science and Insurance, City St George's, University of London. Her research interests include machine learning, computer vision and interdisciplinary applications in actuarial science and finance. She serves as the Associate Editor for IEEE Transactions on Neural Networks and Learning Systems, IEEE Transactions on Circuits and Systems for Video Technology and Neurocomputing.
报告摘要:
Pre-trained vision-language models, such as CLIP, have demonstrated strong generalisation ability across a wide range of visual recognition tasks. However, adapting these large models to downstream domains remains challenging, as full fine-tuning is computationally expensive and may lead to overfitting. This talk presents two recent approaches for efficient VLM fine-tuning. The first introduces a closed-form dual alignment mechanism that enhances both same-modal consistency and cross-modal interaction between visual and textual representations, using ridge-regression-based alignment to reduce the number of learnable parameters. The second develops a batch-wise similarity-enhanced transductive adapter, which exploits relationships among unlabelled query samples within the same batch to improve test-time generalisation.