Recent advances in vision-based navigation and exploration have shown impressive capabilities in photorealistic indoor environments. However, these methods still struggle with long-horizon tasks and require large amounts of data to generalize to unseen environments. In this work, we present a novel reinforcement learning approach for multi-object search that combines short-term and long-term reasoning in a single model while avoiding the complexities arising from hierarchical structures. In contrast to existing multi-object search methods that act in granular discrete action spaces, our approach achieves exceptional performance in continuous action spaces. We perform extensive experiments and show that it generalizes to unseen apartment environments with limited data. Furthermore, we demonstrate zero-shot transfer of the learned policies to an office environment in real world experiments.

How Does It Work?

Figure: During training, the agent receives a state vector with either the groundtruth direction to the closest object or its prediction. At test time it always receives its prediction. It furthermore receives 16 of its previous predictions, the variances of its x and y-position, the circular variance of its predictions, a collision flag, the sum over the last 16 collisions, its previous action, and a binary vector indicating the objects the agents have to find.

Starting in an unexplored map and given a set of target objects, the robot faces the complex decision on how to most efficiently find these objects. Our approach continuously builds a semantic map of the environment and learns to combine long-term reasoning with short-term decision making into a single policy by predicting the direction of the path towards the closest target object. The mapping module aggregates depth and semantic information into a global map. The predictive module learns long-horizon relationships which are then interpreted by a reinforcement learning policy.


Code and Models

A software implementation of this project based on PyTorch including trained model checkpoints can be found in our GitHub repository for academic usage and is released under the GPLv3 license. For any commercial purpose, please contact the authors.


Fabian Schmalstieg, Daniel Honerkamp, Tim Welschehold and Abhinav Valada,

(Pdf) (Bibtex)


This work was funded by the european union's horizon 2020 research and innovation program under grant agreement no 871449-OpenDR.