Scalable Spatial Memory for Scene Rendering and Navigation

Wen Cheng Chen*, Chu Song Chen, Wei Chen Chiu, Min Chun Hu*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review


Neural scene representation and rendering methods have shown promise in learning the implicit form of scene structure without supervision. However, the implicit representation learned in most existing methods is non-expandable and cannot be inferred online for novel scenes, which makes the learned representation difficult to be applied across different reinforcement learning (RL) tasks. In this work, we introduce Scene Memory Network (SMN) to achieve online spatial memory construction and expansion for view rendering in novel scenes. SMN models the camera projection and back-projection as spatially aware memory control processes, where the memory values store the information of the partial 3D area, and the memory keys indicate the position of that area. The memory controller can learn the geometry property from observations without the camera’s intrinsic parameters and depth supervision. We further apply the memory constructed by SMN to exploration and navigation tasks. The experimental results reveal the generalization ability of our proposed SMN in large-scale scene synthesis and its potential to improve the performance of spatial RL tasks.

Original languageEnglish
Title of host publicationAAAI-23 Technical Tracks 1
EditorsBrian Williams, Yiling Chen, Jennifer Neville
PublisherAAAI press
Number of pages9
ISBN (Electronic)9781577358800
StatePublished - 27 Jun 2023
Event37th AAAI Conference on Artificial Intelligence, AAAI 2023 - Washington, United States
Duration: 7 Feb 202314 Feb 2023

Publication series

NameProceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023


Conference37th AAAI Conference on Artificial Intelligence, AAAI 2023
Country/TerritoryUnited States


Dive into the research topics of 'Scalable Spatial Memory for Scene Rendering and Navigation'. Together they form a unique fingerprint.

Cite this