Global Motion Corresponder for 3D Point-Based Scene Interpolation under Large Motion

Junru Lin^*1,2 Chirag Vashist^*3 Mikaela Angelina Uy^2,4 Colton Stearns² Xuan Luo⁵ Leonidas Guibas² Ke Li³

¹University of Toronto ²Stanford University ³Simon Fraser University ⁴Nvidia ⁵Google
^*Denotes equal contribution

🎉 International Conference on Computer Vision (ICCV) 2025 🎉

Paper Code (coming soon) arXiv

PAPR in Motion

Dynamic Gaussians

Ours (GMC)

TLDR: Existing 3D scene interpolation methods fail when the scene undergoes large global motion. To address this, we propose a novel approach that robustly handles such motions by learning smooth global correspondences in a canonical space.

Video Presentation

Abstract

Existing dynamic scene interpolation methods typically assume that the motion between consecutive time steps is small enough so that displacements can be locally approximated by linear models. In practice, even slight deviations from this small-motion assumption can cause conventional techniques to fail. In this paper, we introduce Global Motion Corresponder (GMC), a novel approach that robustly handles large motion and achieves smooth transitions. GMC learns unary potential fields that predict SE(3) mappings into a shared canonical space, balancing correspondence, spatial and semantic smoothness, and local rigidity. We demonstrate that our method significantly outperforms existing baselines on 3D scene interpolation when the two states undergo large global motions. Furthermore, our method enables extrapolation capabilities where other baseline methods cannot.

Why is large motion challenging?

The core challenge in scene interpolation with point-based representations lies in establishing reliable correspondences: each point must predict its motion to a corresponding location in another frame.

Small inter-frame motion

Most existing methods rely on a critical assumption that the motion between adjacent timesteps is small enough that point positions do not change significantly. Since each point's position remains nearly constant, it is easy to establish correspondences between points from adjacent timesteps by simply identifying the Euclidean nearest neighbors. Under this assumption, determining correspondence reduces to matching points within small local neighborhoods, which has been successfully explored by existing works.

For small inter-frame motion, local neighborhood searches yield correct correspondence and motion prediction

Large inter-frame motion

When motion becomes sufficiently large, dynamic scene interpolation faces a fundamental breakdown. This failure stems from the ill-posed nature of point correspondence under a large motion: a point's local neighborhood becomes unreliable as objects move far from their original positions. Additionally, a point may have multiple plausible matches, making the point-to-point correspondence ambiguous. For instance, if correspondences rely solely on spatial proximity, severe mismatches can occur (shown below), causing existing methods to produce implausible trajectories that violate spatial rigidity and physical constraints.

With large global motion, naïvely matching nearest neighbors results in criss-cross matches, which are unusable for scene interpolation.

Ideal case

An ideal method would ensure that semantics and color of corresponding points are in agreement, and similar points in a neighborhood move coherently.

An ideal method would be able to predict correct correspondence and achieve global motion.

How to address the challenge of large motion?

We use Gaussian splats as our representations. First, two 3D Gaussian Splatting (3DGS) models are trained from the start state and end state. Then we transform both sets of Gaussians into a learnable shared canonical space where corresponding Gaussians occupy identical spatial locations:

$$ \underset{\hat{\boldsymbol{\mu}}^{(0)}_i}{\underbrace{\boldsymbol{R}^{(0)}_i \boldsymbol{\mu}_i^{(0)}+ \boldsymbol{t}^{(0)}_i}} = \underset{\hat{\boldsymbol{\mu}}^{(1)}_j}{\underbrace{\boldsymbol{R}^{(1)}_j \boldsymbol{\mu}_j^{(1)} + \boldsymbol{t}^{(1)}_{j}}}, \tag{1} $$

where $\boldsymbol{R}$ and $\boldsymbol{t}$ represent learnable point-wise transformations for the two states. The transformations $(\boldsymbol{R}^{(0)}_i, \boldsymbol{t}^{(0)}_i)$ and $(\boldsymbol{R}^{(1)}_j, \boldsymbol{t}^{(1)}_j)$ are obtained from our Unary Potential Fields $\mathcal{F}_{0}$ and $\mathcal{F}_{1}$, which are parameterized as MLPs. The parameters of these MLPs are optimized using our proposed Energy-based loss. More details can be found in the paper.

Method Overview

Scene Interpolation Results

Here are the scene interpolation results of our method, including both synthetic (row 1, 2) and real-world scenes (row 3, 4).

BibTeX


        @misc{GlobalMotionCorresponder,
          title={Global Motion Corresponder for 3D Point-Based Scene Interpolation under Large Motion}, 
          author={Junru Lin and Chirag Vashist and Mikaela Angelina Uy and Colton Stearns and Xuan Luo and Leonidas Guibas and Ke Li},
          year={2025},
          eprint={2508.20136},
          archivePrefix={arXiv},
          primaryClass={eess.IV},
          url={https://arxiv.org/abs/2508.20136}, 
        }