Yuanze Lin

I am a PhD student in the Computer Science department at the University of Oxford, where I'm co-advised by Prof. Ronald Clark and Prof. Philip Torr, focusing on 3D AIGC and vision-language models.

Before going to Oxford, I spent great time at Microsoft Redmond, MSR Asia, CCVL @ Johns Hopkins University, Alibaba, etc. I'm so lucky to work with Dr. Xun Guo and Dr. Yan Lu at MSRA, Prof. Gao Huang at Tsinghua University, Dr. Yujia Xie, Dr. Dongdong Chen and Dr. Yichong Xu at Microsoft Redmond, Prof. Cihang Xie and Prof. Alan Yuille at Johns Hopkins University, Dr. Yi-Hsuan Tsai, Dr. Lu Jiang and Prof. Ming-Hsuan Yang at UC Merced.

My research interest lies in machine learning and its applications, especially:

 •   Image/3D generation and editing based on diffusion models

 •   The applicability of large language models (LLMs)

 •   Large-scale training of vision-language models

 •   Self-supervised representation learning

yuanze.lin [at] cs.ox.ac.uk  /  Google Scholar  /  Github  /  LinkedIn

profile photo


[03/2024]   Check out DreamPolisher for high-quality text-to-3D generation!

[02/2024]   Text-Driven Image Editing via Learnable Regions is accepted to CVPR 2024.

[10/2023]   Start my PhD journey at CS @ University of Oxford.

[07/2023]   SMAUG is accepted to ICCV 2023.

[09/2022]   REVIVE is accepted to NeurIPS 2022.

[03/2022]   Pseudo-Q and AdaFocus V2 are accepted to CVPR 2022.

[07/2021]   MCN is accepted to ICCV 2021.

[06/2021]   EVA-GCN is accepted to CVPR 2021 AMFG Workshop and wins Best Paper Award!


Papers are sorted by recency, * denotes equal contribution.

DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric Diffusion
Yuanze Lin, Ronald Clark, Philip Torr
Preprint, 2024
ArXiv / Project Page / Code / BibTeX

Present a two-stage Gaussian Splatting based approach that enforces geometric consistency among views, which can generate consistent and realistic 3D objects.

Text-Driven Image Editing via Learnable Regions
Yuanze Lin, Yi-Wen Chen, Yi-Hsuan Tsai, Lu Jiang, Ming-Hsuan Yang
CVPR, 2024
ArXiv / Project Page / Video / Code / BibTeX

Introduce a region-based editing network that is trained to generate editing regions utilizing a text-driven editing loss with CLIP guidance, our method can edit the given images based on freely provided language descriptions.

SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training
Yuanze Lin, Chen Wei, Huiyu Wang, Alan Yuille, Cihang Xie
ICCV, 2023
ArXiv / Poster / Slides / BibTeX

Propose an efficient video-language pre-training framework, which enjoys both competitive performances on text-to-video retrieval and video question answering tasks, and much less pre-training costs by 1.9X or more.

REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering
Yuanze Lin, Yujia Xie, Dongdong Chen, Yichong Xu, Chenguang Zhu, Lu Yuan
NeurIPS, 2022
ArXiv / Poster / Supplementary Material / OpenReview / Code / BibTeX

Propose a new knowledge-based VQA method REVIVE, which utilizes the explicit information of object regions not only in the knowledge retrieval stage but also in the answering model. It achieves a new state-of-the-art performance on OK-VQA dataset.

Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
Yuanze Lin*, Haojun Jiang*, Dongchen Han, Shiji Song, Gao Huang
CVPR, 2022
ArXiv / Poster / Code / BibTeX

Present Pseudo-Q to automatically generate pseudo language queries for supervised training, which achieves superior or comparable performance compared to existing weakly-supervised visual grounding methods on five datasets.

AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition
Yulin Wang*, Yang Yue*, Yuanze Lin, Haojun Jiang, Zihang Lai, Victor Kulikov, Nikita Orlov, Humphrey Shi, Gao Huang
CVPR, 2022
ArXiv / Code / BibTeX

Reformulate AdaFocus as a simple one-stage algorithm by introducing a differentiable interpolation-based patch selection operation and further present an improved training scheme. Extensive experiments on six benchmark datasets demonstrate its effectiveness.

Self-supervised video representation learning with meta-contrastive network
Yuanze Lin, Xun Guo, Yan Lu
ICCV, 2021
ArXiv / Poster / BibTeX

Propose a Meta-Contrastive Network (MCN), which combines contrastive learning and meta learning for pre-training. For video action recognition and video retrieval tasks, MCN outperforms state-of-the-art approaches on UCF101 and HMDB51 datasets.

EVA-GCN: Head Pose Estimation Based on Graph Convolutional Networks
Miao Xin, Shentong Mo, Yuanze Lin
CVPR AMFG Workshop, 2021   (Best Paper Award)
Paper / Code / BibTeX

Construct a landmark-connection graph, and propose to leverage the Graph Convolutional Networks (GCN) to model the complex nonlinear mappings between the graph typologies and the head pose angles.


GenAI @ Microsoft Redmond
Researcher Intern, Feb 2024 - Present
with Dr. Yunsheng Li and Dr. Dongdong Chen, working on large multimodal models.
Vision and Learning Lab @ UC Merced
Visiting Student, May 2022 - Nov 2023
with Dr. Yi-Hsuan Tsai, Dr. Lu Jiang and Prof. Ming-Hsuan Yang, working on text-driven image editing.
Alibaba Group
Senior Algorithm Engineer, Feb 2023 - Aug 2023
Working on vision-language pre-training, fine-tuning, and the applicability of large language models (LLMs).
CCVL @ Johns Hopkins University
Research Assistant, May 2022 - Feb 2023
with Prof. Cihang Xie and Prof. Alan Yuille, working on vision-language pre-training based on MAE.
Microsoft Redmond
Researcher Intern, Feb 2022 - June 2022
with Dr. Yujia Xie, Dr. Dongdong Chen and Dr. Yichong Xu, working on knowledge-based VQA.
Microsoft Research Asia
Researcher Intern, Dec 2020 - Sep 2021
with Dr. Xun Guo and Dr. Yan Lu, working on self-supervised learning and transformers for video tasks.
Tencent AI Lab
Researcher Intern, Sep 2020 - Dec 2020
with Dr. Haozhi Huang, working on text-based editing of videos based on meta learning.

  Professional Services

Conference Reviewer: ICRA 2024, CVPR 2024, ECCV 2024

Conference Reviewer: ICLR 2023, CVPR 2023, ICCV 2023, NeurIPS 2023

Conference Reviewer: CVPR 2022

No web trackers, feel free to see this website       Last Update: 03/2024        Template