Abstract
World models enable intelligent agents to predict the consequences of their actions on the environment. In this paper, we propose Multi Rigid Object Gaussian World Model (MRO-GWM), a novel model that learns action-conditional dynamics of rigid objects in 3D. By representing the scene by object-centric Gaussians, we can represent arbitrary object shapes and multi-object scenes. We develop a novel spatio-temporal transformer architecture that predicts future rigid body motion from a history of object Gaussians and future actions. Objects are represented by their Gaussians in a canonical frame, which allows for describing object motion as rigid body transformation. Our model is trained on reconstructions from multiple viewpoints, which requires the model to handle partial observations of objects due to occlusions. We analyze prediction performance of our approach on synthetic datasets composed of typical household objects with multi-object dynamics and interactions by a robot end effector. We also evaluate our model in model-predictive control for non-prehensile manipulation in simulation.
Predictions
Predicted sequences over horizon 2.4s (right image part) and ground truth (left image part) replayed by rigid transformations of the splatted Gaussian representation. Examples are selected according to their combined prediction error rank from the top 10% quantile of pose changes.
Smallest error rank
Median error rank
Second smallest error rank
Worst error rank
Planning
Episodes from planning with our model in a model-predictive control setting. We visualize successful episodes for two tasks on scenes with 5 objects with the largest initial objective value.
Task 1 "push object to position": The center of the screwdriver is successfully aligned with the target position indicated by the green sphere.
Task 2 "clear middle": All object centers are successfully pushed out of the red region.
BibTeX
@article{MRO-GWM,
title={Learning Action-Conditional and Object-Centric Gaussian Splatting World Models for Rigid Objects},
author={Jens U. Kreber and Lukas Mack and Joerg Stueckler},
journal={arXiv preprint arXiv:2606.01950},
year={2026},
url={https://arxiv.org/abs/2606.01950}
}