Yuxuan Xue

Hi There! I am a Ph.D. student in the Real Virtual Human group at University of Tuebingen , supervised by Prof. Dr. Gerard Pons-Moll. I affiliate with International Max-Planck Research School for Intelligent Systems (IMPRS-IS). My research interests include 3D/4D photorealistic reconstruction, generative models, and understanding of human appearance and behavior.

Prior to that, I spent wonderful time in Max-Planck-Institute for Intelligent Systems. I graduated with the double Master degree in Mechanical Engineering as well as Robotics both with distinction from the Technical University of Munich (TUM) in 2022. I received the Bachelor degree in Mechanical Engineering from the same University in 2020.

Email  /  Google Scholar  /  LinkedIn  /  CV  /  Twitter  /  Github

profile photo
News & Award

[2025-08] Our Papers InfiniHuman and PhySIC are accepted to ACM SIGGRAPH Asia 2025.
[2025-06] Started my internship at Meta as a Research Scientist Intern with Javier Romero.
[2024-11] Our Paper Gen-3Diffusion is accepted to T-PAMI 2025.
[2024-09] Our Paper Human 3Diffusion is accepted to NeurIPS 2024.
[2024-07] Awarded $5000 from OpenAI Research Access Program .
[2024-02] Our Paper E-LnR is accepted to IJCV 2024 (Vol.132).
[2024-01] Our Paper BOFT is accepted to ICLR 2024.
[2023-09] Our Paper OFT is accepted to NeurIPS 2023.
[2023-07] Our Paper NSF is accepted to ICCV 2023.
[2022-11] I am honored to receive the Best Student Paper Award from the BMVC 2022.
[2022-06] I received my M.Sc degree from TUM with distinction.
Publication
InfiniHuman: Infinite 3D Human Creation with Precise Control
Yuxuan Xue, Xianghui Xie, Margaret Kostyrko, Gerard Pons-Moll
ACM SIGGRAPH Asia 2025, HongKong
BibTex / Arxiv / Website / Code

We propose a new generative model which generates photorealistic 3D avatars from text description with precise control in body shape and wearing clothing. Our key idea is that the foundation models already understand human appearance and we distill 3D information from them, resulting a datasets with 111K diverse subjects.

PhySIC: Physically Plausible 3D Human-Scene Interaction and Contact from a Single Image
Yuxuan Xue*, Pradyumna Yalandur-Muralidhar*, Xianghui Xie, Margaret Kostyrko, Gerard Pons-Moll
ACM SIGGRAPH Asia 2025, HongKong
BibTex / Arxiv / Website / Code

We propose a general reconstruction framework for recovering holistic human and environment from monocular images. Leveraging to foundational depth and pose estimation models, we optimize towards physical constraints and result in physically plausible 3D human-scene interaction and contact.

Gen-3Diffusion: Realistic Image-to-3D Generation via 2D & 3D Diffusion Synergy
Yuxuan Xue, Xianghui Xie, Riccardo Marin, Gerard Pons-Moll
Transactions on Pattern Analysis and Machine Intelligence 2025
BibTex / Arxiv / Website / Code

We extend the idea of Human-3Diffusion to general objects. Our Gen-3Diffusion reconstructs high-fidelity 3D representation from single RGB Image within 22 seconds and 11 GB GPU memory, which allows an efficient large-scale 3D generation.

Human-3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion Models
Yuxuan Xue, Xianghui Xie, Riccardo Marin, Gerard Pons-Moll
NeurIPS 2024, Vancouver
BibTex / Arxiv / Website / Code

We propose a new approach to reconstruct high-fidelity avatar in 3D Gaussian Splats from single RGB Image. Our approach improves 2D multi-view diffusion process by using reconstructed 3D representation to guarantee 3D consistency at reverse sampling steps.

E-LnR: Event-Based Non-rigid Reconstruction of Low-Rank Parametrized Deformations from Contours
Yuxuan Xue, Haolong Li, Stefan Leutenegger, Jörg Stückler.
IJCV Volume 132, pages 2943–2961
BibTex / Website

We propose E-LnR, an event-based approach which reconstruct non-rigid deformation in the low rank parametrized space. This is a journal extension of our BMVC 2022 paper.

Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization
Weiyang Liu*, Zeju Qiu*, Yao Feng**, Yuliang Xiu**, Yuxuan Xue**, Longhui Yu**, Haiwen Feng, Zhen Liu, Juyeon Heo, Songyou Peng, Yandong Wen, Michael J. Black,, Adrian Weller, Bernhard Schölkopf.
ICLR 2024, Vienne
BibTex / Arxiv / Website / Code

We propose BOFT (Orthogonal Butterfly), a general orthogonal finetuning technique with butterfly factorization that effectively adapts foundation models to different tasks such as Vision, NLP, Math QA, and Controllable Generation.

NSF: Neural Surface Fields for Human Modelling from Monocular Depth
Yuxuan Xue*, Bharat Lal Bhatnagar*, Riccardo Marin, Nikolaos Sarafianos, Yuanlu Xu, Gerard Pons-Moll, Tony Tung.
ICCV 2023, Paris
BibTex / Arxiv / Website / Poster / Video (5min) / Code

We propose a new approach to define a neural field on the surface for reconstructing animatable clothed human from monocular depth observation. Our approach directly outputs coherent meshes across different poses at arbitrary resolution.

Controlling Text-to-Image Diffusion by Orthogonal Finetuning
Zeju Qiu*, Weiyang Liu*, Haiwen Feng, Yuxuan Xue, Yao Feng, Zhen Liu, Dan Zhang, Adrian Weller, Bernhard Schölkopf.
NeurIPS 2023, New Orleans
BibTex / Arxiv / Website / Code

We propose Orthogonal Finetuning (OFT), a fine-tuning approach for adapting text-to-image diffusion models to downstream tasks. OFT can preserve hyperspherical energy to maintain the semantic generation ability of the foundation models.

Event-based Non-Rigid Reconstruction from Contours
Yuxuan Xue, Haolong Li, Stefan Leutenegger, Jörg Stückler.
BMVC 2022, London, Oral, Best Student Paper Award
BibTex / Arxiv / Website / Oral Presentation (11min) / Poster

We propose a new approach for reconstructing fast non-rigid object deformations using measurements from event-based cameras. Our approach estimates object deformation from events at the object contour within a probabilistic optimization (EM) framework.

Robust event detection based on spatio-temporal latent action unit using skeletal information
Hao Xing, Yuxuan Xue, Mingchuan Zhou, Darius Burschka.
IROS 2021, Prague
BibTex / Arxiv

We present a new method for detecting event actions from skeletal information in RGBD videos. The proposed method uses a Gradual Online Dictionary Learning algorithm to cluster and filter skeleton frames. Additionally, the method includes a latent unit temporal structure to better distinguish event actions from similar actions.

Thesis
Event-based Non-Rigid 3D Tracking

Supervisor: Prof. Dr. Stefan Leutenegger
Advisor: Dr. Jörg Stückler, Haolong Li, M.Sc.
Defended in Jun. 2022
BibTex / VollText

Master Thesis in Robotics, Cognition, Intelligence (RCI) at TUM. Deformable objects tracking using monocular event stream.

A High Precision Lane Following Control Method for an Autonomous Robot

Supervisor: Prof. Dr.-Ing. Markus Lienkamp
Advisor: Jean-Michael Georg
Defended in Jan. 2020
BibTex / VollText

Bachelor thesis in Mechanical Engineering at TUM. High precision Lane Following control algorithm developed for autonomous driving car.

Academic Services
Conference Reviewer: ICCV 2023-2025, CVPR 2024-2025, ECCV 2024, NeurIPS 2024-2025, ICLR 2025, ICML 2025
Journal Reviewer: T-PAMI, TVCG, SIGGRAPH, SIGGRAPH Asia

   Updated June. 2025

Source