We appreciate GarvitaTiwari, Zehao Yu, Chuqiao Li, Yuliang Xiu, Zhen Liu, Zeju Qiu, Siyao Li, Weiyang Liu and other colleagues for their feedback to improve the work
This work was made possible by funding from the Carl Zeiss Foundation. This work is also funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - 409792180 (EmmyNoether Programme, project: Real Virtual Humans) and the German Federal Ministry of Education and Research (BMBF): Tübingen AI Center, FKZ: 01IS18039A.
G. Pons-Moll is a member of the Machine Learning Cluster of Excellence, EXC number 2064/1 – Project number 390727645.
The authors thank the International Max Planck Research School for Intelligent Systems (IMPRS-IS) for supporting Y.Xue.
For this project, R. Marin has been supported by the innovation program under the Marie Skłodowska-Curie grant agreement No. 101109330.