TL;DR: GeoRelight is a unified Multi-Modal Diffusion Transformer that jointly generates photorealistic relit images and high-fidelity 3D geometry from a single photo, enabled by a novel VAE-friendly geometry representation (iNOD) and strategic mixed-data training.
Relighting a person from a single photo is an attractive but ill-posed task, as a 2D image ambiguously entangles 3D geometry, intrinsic appearance, and illumination.
Current methods either use sequential pipelines that suffer from error accumulation, or they do not explicitly leverage 3D geometry during relighting, which limits physical consistency.
Since relighting and estimation of 3D geometry are mutually beneficial tasks, we propose a unified Multi-Modal Diffusion Transformer (DiT) that jointly solves for both: GeoRelight.
We make this possible through two key technical contributions: isotropic NDC-Orthographic Depth (iNOD), a distortion-free 3D representation compatible with latent diffusion models; and a strategic mixed-data training method that combines synthetic and auto-labeled real data.
By solving geometry and relighting jointly, GeoRelight achieves better performance than both sequential models and previous systems that ignored geometry.
Given input image under extreme light condition, GeoRelight is able to disentangle high quality intrinsics and geometry, and jointly generate plausible photorealistic relighting images with target environment maps.
This work was conducted during Yuxuan Xue's internship at Meta Reality Labs, hosted by Javier Romero. We thank Amaury Aubel for providing synthetic data, and Marco Dal Farra, Ahmed Osman, Julien Valentin, and other team members for their support. This work is made possible by funding from the Carl Zeiss Foundation. It is also funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) — 409792180 (EmmyNoether Programme, project: Real Virtual Humans) and the German Federal Ministry of Education and Research (BMBF): Tübingen AI Center, FKZ: 01IS18039A. The authors thank the International Max Planck Research School for Intelligent Systems (IMPRS-IS) for supporting Y. Xue. G. Pons-Moll is a member of the Machine Learning Cluster of Excellence, EXC number 2064/1 – Project number 390727645.
@article{xue2026georelight,
author = {Xue, Yuxuan and Liang, Ruofan and Zakharov, Egor and Bagautdinov, Timur and Cao, Chen and Nam, Giljoo and Saito, Shunsuke and Pons-Moll, Gerard and Romero, Javier},
title = {Learning Joint Geometrical Relighting and Reconstruction with Flexible Multi-Modal Diffusion Transformers},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2026},
}