Description

Large scale deployment of embedded monitoring systems means that everything has to be autonomous: calibration should be as simple as possible and models as lightweight as possible. Thus we propose a novel approach for 3D object localization from single-viewpoint images with both intensity and depth information provided by a Time-of-Flight (ToF) sensor in the form of a point-cloud.

Previous approaches used either only intensity and depth information, or required an already calibrated camera in order to segment and localize objects in the scene. We propose a calibration step to improve the result of our localization step.

Our method 1

Figure 2 : Our object localization pipeline

As shown in Figure 2, we use two distinct segmentation CNNs in order to :

  1. Calibrate the z-axis of the camera using the segmentation, in order to create a height map and a correctly oriented estimated normals map.
  2. Locate an object in the scene using segmentation, and a point registration algorithm.

1. ToF calibration via floor segmentation

Floor segmentation allows us to calibrate the extrinsic parameters of the camera. A first convolutional neural network (CNN) is used to segment the pixels that belong to the floor. We then apply SVD (with RANSAC to reduce the influence of outliers) to find the normal vector to the floor-plane. This allows us to re-encode the spatial information given by the ToF. We rotate the pointcloud coordinates such that the z-axis represents the direction of gravity, with the floor at zero. This calibration stage will only need to re-run when the ToF is moved and thus reset.

2. Object localization in 3D space

This new spatial re-encoding allows us to train a more efficient CNN which we use to segment the object’s pixels, we use a bed here as example. For this second CNN we use height and normals, normalized with respect to floor-height to represent the scene geometry as 2D inputs. We then locate the object based on pointcloud alignment with a reference model.


  1. Vanderschueren, Antoine ; Joos de ter Beerst, Victor ; De Vleeschouwer, Christophe. Mutual use of semantics and geometry for CNN-based object localization in ToF images. ICPR, CARE2020 Workshop (10/01/2021). http://hdl.handle.net/2078.1/240744 ↩︎