Gaussian Scenes: Pose-Free Sparse-View Scene Reconstruction using Depth-Enhanced Diffusion Priors

Johns Hopkins University
arXiv 2025
TLDR: Gaussian Scenes is a generative approach for pose-free reconstruction of 360° scenes from a limited number of uncalibrated 2D images. We train a RGBD diffusion model capable of inpainting missing content and removing artifacts from novel view renders and depth maps of a 3DGS representation fitted to sparse inputs.

Our key contributions include a pixel-aligned confidence measure for better detection of empty regions and artifacts in novel views. We also propose context and geometry conditioning through FiLM modulation layers as a lightweight alternative to cross-attention layers.

Gaussian Scenes Overview

Our model comprises a variational autoencoder operating in a compressed latent space and a UNet denoiser for predicting noise in diffused latents. The UNet receives multimodal conditioning through four inputs: an RGBD image with artifacts, a confidence map identifying unreliable regions, CLIP features of source images providing semantic context, and camera encodings capturing geometric relationships between input views.

Sample Scene Reconstruction

More visualizations coming soon!

MASt3R + 3DGS

Ours

Qualitative Comparison with ReconFusion and CAT3D

We compare our approach with current state-of-the-art posed reconstruction techniques in ReconFusion and CAT3D. Unfortunately, both methods do not have open-source code available. Hence, we pick the relevant test views for 4 scenes showcased in their paper - Treehill, Flowers, Bicycle from MipNeRF360, and the plant scene from CO3Dv2 for a qualitative comparison. We use the same training views as open-sourced in their data splits.

MASt3R + 3DGS ReconFusion CAT3D Ours Ground Truth
Treehill (3)
Flowers (3)
Plant (3)
Bicycle (9)

Despite being a pose-free pipeline, our method achieves competitive novel-view synthesis (NVS) quality with state-of-the-art sparse-view reconstruction techniques. No image is available for CAT3D in the last row, hence it is left blank.

BibTeX

@article{paul2024gaussian,
        title={Gaussian Scenes: Pose-Free Sparse-View Scene Reconstruction using Depth-Enhanced Diffusion Priors},
        author={Paul, Soumava and Kaushik, Prakhar and Yuille, Alan},
        journal={arXiv preprint arXiv:2411.15966},
        year={2024}
      }