Alexander Gao

CV  |  Google Scholar  |  Github

I am a second-year PhD student in Computer Science at University of Maryland, College Park, advised by Ming C. Lin.

My research draws from a variety of computer science subfields, including computer graphics, machine learning & optimization, scientific computing, and computational geometry, and has applications in fields such as AR/VR and Robotics. Specifically, I aim to develop novel algorithms for simulation and rendering, by bridging classical geometric techniques and physically-based modeling with learning-based paradigms and neural representations.

I previously interned on the Google GeoAR team, and worked as an Applied Scientist at AWS Robotics. I received an MS in Computer Science from NYU and a BA in Film Production from USC School of Cinematic Arts.

Don't hesitate to reach out at gaoalexander [at] gmail dot com.


Jan 2023:
   Learning Simultaneous Navigation and Construction is accepted to ICLR 2023. Congrats to lead author Wenyu Han and all.

Dec 2022:
   We presented our paper NeuPhysics at NeurIPS 2022 in New Orleans!

profile photo
Learning Simultaneous Navigation and Construction in Grid Worlds
International Conference on Learning Representations (ICLR), 2023

Wenyu Han, Haoran Wu, Eisuke Hirota, Alexander Gao, Lerrel Pinto, Ludovic Righetti, and Chen Feng.

Paper / Code / Website

We propose to study a new learning task, mobile construction, to enable an agent to build designed structures in 1/2/3D grid worlds while navigating in the same evolving environments. Unlike existing robot learning tasks such as visual navigation and object manipulation, this task is challenging because of the interdependence between accurate localization and strategic construction planning. In pursuit of generic and adaptive solutions to this partially observable Markov decision process (POMDP) based on deep reinforcement learning (RL), we design a Deep Recurrent Q-Network (DRQN) with explicit recurrent position estimation in this dynamic grid world. Our extensive experiments show that pre-training this position estimation module before Q-learning can significantly improve the construction performance measured by the intersection-over-union score, achieving the best results in our benchmark of various baselines including model-free and model-based RL, a handcrafted SLAM-based policy, and human players.

NeuPhysics: Editable Neural Geometry and Physics from Monocular Videos
Conference on Neural Information Processing Systems (NeurIPS), 2022

Alexander Gao*, Yi-Ling Qiao*, and Ming C. Lin.     

Paper / Code / Website

We present a method for learning 3D geometry and physics parameters of a dynamic scene from only a monocular RGB video input. To decouple the learning of underlying scene geometry from dynamic motion, we represent the scene as a time-invariant signed distance function (SDF) which serves as a reference frame, along with a time-conditioned deformation field. We further bridge this neural geometry representation with a differentiable physics simulator by designing a two-way conversion between the neural field and its corresponding hexahedral mesh, enabling us to estimate physics parameters from the source video by minimizing a cycle consistency loss. Our method also allows a user to interactively edit 3D objects from the source video by modifying the recovered hexahedral mesh, and propagating the operation back to the neural field representation. Experiments show that our method achieves superior mesh and video reconstruction of dynamic scenes compared to other competitive Neural Field approaches, and we provide extensive examples which demonstrate its ability to extract useful 3D representations from videos captured with consumer-grade cameras.

*Denotes equal contribution.

This site is based on Jon Barron's template.