Feed-forward 3D scene modeling replaces per-scene optimization with direct inference from images. This project page summarizes the survey through a problem-driven taxonomy, covering feature enhancement, geometry-aware improvement, efficiency, augmentation, temporal modeling, datasets, and applications.
Overview of the survey scope and organization.
If you find this survey useful, please consider citing it.
@article{wang2026feedforward,
title={Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective},
author={Wang, Weijie and Cao, Qihang and Gao, Sensen and Chen, Donny Y. and Xu, Haofei and Bian, Wenjing and Peng, Songyou and Cham, Tat-Jen and Zheng, Chuanxia and Geiger, Andreas and Cai, Jianfei and Bian, Jia-Wang and Zhuang, Bohan},
journal={arXiv preprint arXiv:2604.14025},
year={2026},
}
The survey organizes feed-forward 3D methods by the core problems they solve rather than only by representation.
Problem-driven taxonomy of feed-forward 3D scene modeling.
The current survey emphasizes a challenge-driven perspective: stronger implicit features, geometry awareness, model efficiency, augmentation strategies, and temporal consistency.
Direction Taxonomy Overview.
Representative methods: PixelNeRF, IBRNet, SRT, DUSt3R, MASt3R, iLRM, Dens3r, MoRE, Uni3R.
Representative methods: MVSNeRF, GeoNeRF, MuRF, pixelSplat, MVSplat, HiSplat, AnySplat, YoNoSplat.
Representative methods: ENeRF, ProNeRF, ZPressor, Long-LRM, FastVGGT, LiteVGGT, Speed3R, SR3R.
Representative methods: Puzzles, MegaSynth, Aug3D, MVBoost, MVSplat360, latentSplat, ProSplat.
Representative methods: StreamSplat, CUT3R, Stream3R, LongStream, MonST3R, 4DGT, MoVieS, DAS3R.
The survey revisits datasets from a new perspective, distinguishing geometry-oriented and visual-oriented resources, while also covering a broad mixed category used in practice.
Dataset landscape used in the survey.
Examples: DTU, 7-Scenes, ETH3D, ScanNet, ScanNet++, ARKitScenes, Waymo, nuScenes.
Examples: NeRF-Synthetic, ACID, LLFF, RealEstate10K, DL3DV-10K, Mip-NeRF 360, Neural3DV.
Examples: GSO, OmniObject3D, CO3D, WildRGBD, ShapeNet, MVImgNet, Objaverse-XL, Hypersim.
Feed-forward 3D scene modeling is increasingly used in downstream systems that need fast, robust, and geometry-aware inference without per-scene optimization.
Application landscape summarized in the survey.
Representative methods: SCube, InfiniCube, STORM, Driv3R, DrivingRecon, DrivingForward, EVolSplat, WorldSplat.
Representative methods: GraspNeRF, ManiGaussian, GAF, GaussianGrasper, EmbodiedSplat, UnitedVLN, VR-Robo, IGL-Nav.
Representative methods: VGGSfM, Light3R-SfM, MASt3R-SLAM, SLAM3R, VGGT-SLAM, ARTDECO, ViSTA-SLAM.
Representative methods: SLGaussian, SemanticSplat, AlignGS, 3DRS, Spatial-MLLM, VG-LLM, VLM-3R.
Representative methods: MVSplat360, JOG3R, GenFusion, ReVision, Geometry Forcing, 4DNeX, Lyra, WorldForge.
Representative methods: Splatter-360, PanSplat, PanoVGGT, Reloc3r, SAIL-Recon, Human3R, LoRA3D, Reflect3r.