PathFinder: Attention-Driven Dynamic Non-Line-of-Sight Tracking with a Mobile Robot

Abstract

The study of non-line-of-sight (NLOS) imaging is growing due to its many potential applications, including rescue operations and pedestrian detection by self-driving cars. However, implementing NLOS imaging on a moving camera remains an open area of research. Existing NLOS imaging methods rely on time-resolved detectors and laser configurations that require precise optical alignment, making it difficult to deploy them in dynamic environments.

This work proposes a data-driven approach to NLOS imaging, PathFinder, that can be used with a standard RGB camera mounted on a small, power-constrained mobile robot, such as an aerial drone. Our experimental pipeline is designed to accurately estimate the 2D trajectory of a person who moves in a Manhattan-world environment while remaining hidden from the camera's field-of-view.

We introduce a novel approach to process a sequence of dynamic successive frames in a line-of-sight (LOS) video using an attention-based neural network that performs inference in real-time. The method also includes a preprocessing selection metric that analyzes images from a moving camera which contain multiple vertical planar surfaces, such as walls and building facades, and extracts planes that return maximum NLOS information. We validate the approach on in-the-wild scenes using a drone for video capture, thus demonstrating low-cost NLOS imaging in dynamic capture environments.

Video

Proposed Pipeline

Our proposed pipeline consists of three stages. It starts with raw, stereo, and IMU data for VIO. PlaneRecNet generates plane masks from consecutive frames. Homography from feature matching is applied to plane masks, creating difference images and plane IDs. The raw image at i+1 and the differential image between i and i+1 are input to MPP-T and DPP-T networks, yielding outputs X and V. These outputs, along with camera pose, normal estimates, and plane IDs k, are used in an optimization layer to produce the final trajectory. The figure on the right is the detailes patch processing transformer architecture.

Full training and inference pipeline of our system.

Datasets for NLOS-Patch Network

The effectiveness of the NLOS tracking system is validated using two types of datasets: synthetic and real-world. These datasets are crucial for training and testing the accuracy of the tracking algorithms under varied conditions.

Synthetic Dataset: The synthetic dataset is generated using advanced simulation techniques to mimic a variety of non-line-of-sight scenarios. These simulations provide controlled environments to test the robustness and accuracy of the NLOS tracking system, ensuring it can handle complex visual data and track movements accurately.

(a) Overhead view of a sample synthetic NLOS scene simulated using Blender, showing the camera (lower left), human character (NLOS object), and sources of ambient lighting in the room. (b) Samples of the eight characters from the Mixamo library that were used for synthetic data generation. (c) Samples of three sets of relay walls with different materials that were used for synthetic data generation.

Real-World Dataset: Complementing the synthetic data, the real-world dataset consists of video recordings collected from indoor environments using the mobile robot. This dataset challenges the system with dynamic, unpredictable conditions that are typical in real-world scenarios, testing the system’s adaptability and performance.

(a) Real-world data collection setup: A drone captures images of a relay wall while a person (NLOS object) is hidden from view. The person’s ground-truth position is obtained using motion capture cameras. (b) Side view of the setup. (c) Helmet mounted with IR markers for ground-truth data collection. (d) Samples of FOV regions in the dataset, with different surface textures and types of objects present.

Customized drone equipped with Intel RealSense cameras (highlighted in red) for visual inertial odometry (VIO) during indoor flight. Raw data, including color camera images, stereo images, and IMU data, are extracted for VIO and camera pose estimation. The onboard Jetson Nano (highlighted in yellow) facilitates real-time processing.

Quantitative Tracking Results

This section outlines the quantitative results obtained from testing the NLOS tracking system on both synthetic and real-world datasets. The metrics used include Root Mean Square Error (RMSE) for position and velocity, providing a clear measure of the system's accuracy and efficiency in tracking NLOS objects.

The NLOS system demonstrated superior tracking performance, with an RMSE for position significantly lower than that of existing methods. The system's ability to accurately estimate velocities in dynamic scenarios further underscores its robustness and potential for practical applications.

Ground-truth trajectories (dashed lines) and corresponding trajectories estimated by our method (multicolored lines), with the color indicating the RMSE (m) between the ground-truth position X(t) and estimated position X'(t) at a time t.

Compared to traditional NLOS tracking methods, our system showed a marked improvement in tracking accuracy and computational efficiency, making it suitable for real-time applications in complex environments.

Ground-truth trajectory and estimates from our method and baseline methods.

The corresponding Absolute Trajectory Error (ATE) vs. time.

ATE box plot for our method and baseline methods.

Real World Results

BibTeX

              @article{kannapiran2024pathfinder,
                title={PathFinder: Attention-Driven Dynamic Non-Line-of-Sight Tracking with a Mobile Robot},
                author={Kannapiran, Shenbagaraj and Chandran, Sreenithy and Jayasuriya, Suren and Berman, Spring},
                journal={arXiv preprint arXiv:2404.05024},
                year={2024}
              }