Gaussian Scenes: Pose-Free Sparse-View Scene Reconstruction using Depth-Enhanced Diffusion Priors

Soumava Paul, Prakhar Kaushik, Alan Yuille

Johns Hopkins University

arXiv 2025

TLDR: Gaussian Scenes is a generative approach for pose-free reconstruction of 360° scenes from a limited number of uncalibrated 2D images. We train a RGBD diffusion model capable of inpainting missing content and removing artifacts from novel view renders and depth maps of a 3DGS representation fitted to sparse inputs.

Our key contributions include a pixel-aligned confidence measure for better detection of empty regions and artifacts in novel views. We also propose context and geometry conditioning through FiLM modulation layers as a lightweight alternative to cross-attention layers.

Gaussian Scenes Overview

Our model comprises a variational autoencoder operating in a compressed latent space and a UNet denoiser for predicting noise in diffused latents. The UNet receives multimodal conditioning through four inputs: an RGBD image with artifacts, a confidence map identifying unreliable regions, CLIP features of source images providing semantic context, and camera encodings capturing geometric relationships between input views.

Method	Pose-free	Open-source	Generative Priors	Scene Reconstruction
FreeNeRF, DietNeRF, RegNeRF, DN-Gaussian, SparseGS, SparseNeRF	❌	✅	❌	✅
DiffusioNeRF, ZeroNVS	❌	✅	✅	✅
ReconFusion, CAT3D	❌	❌	✅	✅
Gaussian Object, iFusion, UpFusion	✅	✅	✅	❌
InstantSplat, COGS	✅	✅	❌	✅
Gaussian Scenes (Ours)	✅	✅	✅	✅

Table: Comparison of sparse-view reconstruction methods. Methods are grouped based on their requirement for accurate camera poses, open-source availability, need for generative priors, and applicability to large-scale scene reconstruction.

Sample Scene Reconstruction

More visualizations coming soon!

MASt3R + 3DGS

Ours

Qualitative Comparison with ReconFusion and CAT3D

We compare our approach with current state-of-the-art posed reconstruction techniques in ReconFusion and CAT3D. Unfortunately, both methods do not have open-source code available. Hence, we pick the relevant test views for 4 scenes showcased in their paper - Treehill, Flowers, Bicycle from MipNeRF360, and the plant scene from CO3Dv2 for a qualitative comparison. We use the same training views as open-sourced in their data splits.

	MASt3R + 3DGS	ReconFusion	CAT3D	Ours	Ground Truth
Treehill (3)
Flowers (3)
Plant (3)
Bicycle (9)

Despite being a pose-free pipeline, our method achieves competitive novel-view synthesis (NVS) quality with state-of-the-art sparse-view reconstruction techniques. No image is available for CAT3D in the last row, hence it is left blank.

BibTeX

@article{paul2024gaussian,
        title={Gaussian Scenes: Pose-Free Sparse-View Scene Reconstruction using Depth-Enhanced Diffusion Priors},
        author={Paul, Soumava and Kaushik, Prakhar and Yuille, Alan},
        journal={arXiv preprint arXiv:2411.15966},
        year={2024}
      }