GeoRelight: Learning Joint Geometrical Relighting and Reconstruction with Flexible Multi-Modal Diffusion Transformers

¹Meta, Codec Avatar Labs ²University of Tübingen, Tübingen AI Center
³Max Planck Institute for Informatics, Saarland Informatics Campus

Abstract

TL;DR: GeoRelight is a unified Multi-Modal Diffusion Transformer that jointly generates photorealistic relit images and high-fidelity 3D geometry from a single photo, enabled by a novel VAE-friendly geometry representation (iNOD) and strategic mixed-data training.

Relighting a person from a single photo is an attractive but ill-posed task, as a 2D image ambiguously entangles 3D geometry, intrinsic appearance, and illumination. Current methods either use sequential pipelines that suffer from error accumulation, or they do not explicitly leverage 3D geometry during relighting, which limits physical consistency. Since relighting and estimation of 3D geometry are mutually beneficial tasks, we propose a unified Multi-Modal Diffusion Transformer (DiT) that jointly solves for both: GeoRelight. We make this possible through two key technical contributions: isotropic NDC-Orthographic Depth (iNOD), a distortion-free 3D representation compatible with latent diffusion models; and a strategic mixed-data training method that combines synthetic and auto-labeled real data. By solving geometry and relighting jointly, GeoRelight achieves better performance than both sequential models and previous systems that ignored geometry.

Results

Given a single image, GeoRelight jointly recovers albedo, surface normals, a relighting result, and a 3D point cloud.

Input

Albedo

Normal

Relighting

Loading point cloud...

Drag to rotate · Scroll to zoom

Generalization

GeoRelight can generalizes to challenging input, such as human with objects or large dress.

Input

Albedo

Normal

Relighting

Loading point cloud...

Drag to rotate · Scroll to zoom

Acknowledgement

This work was conducted during Yuxuan Xue's internship at Meta Reality Labs, hosted by Javier Romero. We thank Amaury Aubel for providing synthetic data, and Marco Dal Farra, Ahmed Osman, Julien Valentin, and other team members for their support. This work is made possible by funding from the Carl Zeiss Foundation. It is also funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) — 409792180 (EmmyNoether Programme, project: Real Virtual Humans) and the German Federal Ministry of Education and Research (BMBF): Tübingen AI Center, FKZ: 01IS18039A. The authors thank the International Max Planck Research School for Intelligent Systems (IMPRS-IS) for supporting Y. Xue. G. Pons-Moll is a member of the Machine Learning Cluster of Excellence, EXC number 2064/1 – Project number 390727645.

BibTeX

@article{xue2026georelight, author = {Xue, Yuxuan and Liang, Ruofan and Zakharov, Egor and Bagautdinov, Timur and Cao, Chen and Nam, Giljoo and Saito, Shunsuke and Pons-Moll, Gerard and Romero, Javier}, title = {Learning Joint Geometrical Relighting and Reconstruction with Flexible Multi-Modal Diffusion Transformers}, booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year = {2026}, }

GeoRelight: Learning Joint Geometrical Relighting and Reconstruction with Flexible Multi-Modal Diffusion Transformers

GeoRelight can jointly generate Relit, Intrinsics, and Geometry from a single image.

Abstract

Results

Generalization

Acknowledgement

BibTeX