Yuanze Lin
I am a PhD student in the Computer Science department at the University of Oxford, where I'm advised by Prof. Ronald Clark and Prof. Philip Torr, focusing on diffusion and vision-language models.
Before going to Oxford, I spent great time at Microsoft Redmond, MSR Asia, CCVL @ Johns Hopkins University, Alibaba, etc. It's fortunate to work with Dr. Xun Guo and
Dr. Yan Lu at MSRA, Prof. Gao Huang at Tsinghua University, Dr. Yujia Xie, Dr. Dongdong Chen and Dr. Yichong Xu at Microsoft Redmond, Prof. Cihang Xie and Prof. Alan Yuille at Johns Hopkins University,
Dr. Yi-Hsuan Tsai and Prof. Ming-Hsuan Yang at UC Merced.
My research interest lies in machine learning and its applications, especially:
• Image/3D generation and editing based on diffusion models
• The applicability of large language models (LLMs)
• Large-scale training of vision-language models
• Self-supervised representation learning
yuanze.lin [at] cs.ox.ac.uk /
Google Scholar /
Github /
LinkedIn
I'm actively looking for research intern positions, feel free to contact me : )
|
|
Publications
Papers are sorted by recency, * denotes equal contribution.
|
|
Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge
Yuanze Lin,
Yunsheng Li,
Dongdong Chen,
Weijian Xu,
Ronald Clark,
Philip Torr,
Lu Yuan
Preprint, 2024
ArXiv
/
BibTeX
Propose a new visual prompt paradigm to insert external knowledge, e.g., localized information, into MLLMs addressing the challenge of fine-grained multimodal content correspondence.
|
|
DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric Diffusion
Yuanze Lin,
Ronald Clark,
Philip Torr
Preprint, 2024
ArXiv
/
Project Page
/
Code
/
BibTeX
Present a two-stage Gaussian Splatting based approach that enforces geometric consistency among views, which can generate consistent and realistic 3D objects.
|
|
Text-Driven Image Editing via Learnable Regions
Yuanze Lin,
Yi-Wen Chen,
Yi-Hsuan Tsai,
Lu Jiang,
Ming-Hsuan Yang
CVPR, 2024
ArXiv
/
Project Page
/
Video
/
Code
/
BibTeX
Introduce a region-based editing network that is trained to generate editing regions utilizing a text-driven editing loss with CLIP guidance, our method can edit the given images based on freely provided language descriptions.
|
|
SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training
Yuanze Lin,
Chen Wei,
Huiyu Wang,
Alan Yuille,
Cihang Xie
ICCV, 2023
ArXiv
/
Poster
/
Slides
/
BibTeX
Propose an efficient video-language pre-training framework, which enjoys both competitive performances on text-to-video retrieval and video question answering tasks, and much less pre-training costs by 1.9X or more.
|
|
REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering
Yuanze Lin, Yujia Xie, Dongdong Chen, Yichong Xu, Chenguang Zhu,
Lu Yuan
NeurIPS, 2022
ArXiv /
Poster /
Supplementary Material /
OpenReview /
Code /
BibTeX
Propose a new knowledge-based VQA method REVIVE, which utilizes the explicit information of object regions not only in the knowledge retrieval stage but also in the answering model. It achieves a new state-of-the-art performance on OK-VQA dataset.
|
|
Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
Haojun Jiang*,
Yuanze Lin*,
Dongchen Han,
Shiji Song,
Gao Huang
CVPR, 2022
ArXiv /
Poster /
Code /
BibTeX
Present Pseudo-Q to automatically generate pseudo language queries for supervised training, which achieves superior or comparable performance compared to existing weakly-supervised visual grounding methods on five datasets.
|
|
AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition
Yulin Wang*,
Yang Yue*,
Yuanze Lin,
Haojun Jiang,
Zihang Lai,
Victor Kulikov,
Nikita Orlov,
Humphrey Shi,
Gao Huang
CVPR, 2022
ArXiv /
Code /
BibTeX
Reformulate AdaFocus as a simple one-stage algorithm by introducing a differentiable interpolation-based patch selection operation and further present an improved training scheme. Extensive experiments on six benchmark datasets demonstrate its effectiveness.
|
|
Self-supervised video representation learning with meta-contrastive network
Yuanze Lin,
Xun Guo,
Yan Lu
ICCV, 2021
ArXiv /
Poster /
BibTeX
Propose a Meta-Contrastive Network (MCN), which combines contrastive learning and meta learning for pre-training. For video action recognition and video retrieval tasks, MCN outperforms state-of-the-art approaches on UCF101 and HMDB51 datasets.
|
|
EVA-GCN: Head Pose Estimation Based on Graph Convolutional Networks
Miao Xin,
Shentong Mo,
Yuanze Lin
CVPR AMFG Workshop, 2021   (Best Paper Award)
Paper /
Code /
BibTeX
Construct a landmark-connection graph, and propose to leverage the Graph Convolutional Networks (GCN) to model the complex nonlinear mappings between the graph typologies and the head pose angles.
|
Experiences
|
|
GenAI @ Microsoft Redmond
Researcher Intern, Feb 2024 - Dec 2024
hosted by Dr. Dongdong Chen, working on large multimodal models.
|
|
Vision and Learning Lab @ UC Merced
Visiting Student, May 2022 - Nov 2023
hosted by Prof. Ming-Hsuan Yang, working on text-driven image editing.
|
|
Alibaba Group
Senior Algorithm Engineer, Feb 2023 - Aug 2023
Working on vision-language pre-training, fine-tuning, and the applicability of large language models (LLMs).
|
|
CCVL @ Johns Hopkins University
Research Assistant, May 2022 - Feb 2023
with Prof. Cihang Xie and Prof. Alan Yuille, working on vision-language pre-training based on MAE.
|
|
Microsoft Redmond Researcher Intern, Feb 2022 - June 2022
with Dr. Yujia Xie, Dr. Dongdong Chen and Dr. Yichong Xu, working on knowledge-based VQA.
|
|
Microsoft Research Asia Researcher Intern, Dec 2020 - Sep 2021
with Dr. Xun Guo and Dr. Yan Lu, working on self-supervised learning and transformers for video tasks.
|
|
Tencent AI Lab Researcher Intern, Sep 2020 - Dec 2020
with Dr. Haozhi Huang, working on text-based editing of videos based on meta learning.
|
Professional Services
Program Comittee: AAAI 2025
Conference Reviewer: ICRA 2024, CVPR 2024, ECCV 2024, NeurIPS 2024, ICLR 2025, AISTATS 2025
Conference Reviewer: ICLR 2023, CVPR 2023, ICCV 2023, NeurIPS 2023
Conference Reviewer: CVPR 2022
|
No web trackers, feel free to see this website Last Update: 08/2024 Template
|
|