Qi et al., CVPR 2017
Reviewer: Hyunjin Kim
arXiv: https://arxiv.org/abs/1612.00593
Point clouds are an important geometric data structure, but due to their irregular format, many studies have converted them into voxel grids or images for use. However, these approaches have led to inefficiencies and various issues. Therefore, the authors propose PointNet, a unified architecture that utilizes point clouds for various tasks such as classification and segmentation. PointNet is structurally simple yet highly efficient and effective, achieving state-of-the-art performance.
Why PointNet?
Typical CNNs require a regular input data format such as image grids or voxels. However, 3D geometric data primarily exists in the form of point clouds or meshes, which do not adhere to a regular input data format. Consequently, many researchers have converted them into 3D voxel grids or collections of images for usage, resulting in computational inefficiencies and quantization artifacts. To address these issues, authors proposed PointNet, which accepts point clouds as input and enables tasks such as classification and segmentation on 3D data.
What is PointNet?
PointNet is a network that takes a simple point cloud as input from 3D geometric data and outputs class labels for the entire input (classification) or per point segment/part labels (segmentation). Unlike meshes, point clouds do not possess combinational irregularities and complexities, making them easier to learn. Additionally, applying rigid body transformations to point clouds is also straightforward (by individually applying transformations to each point). Hence, PointNet utilizes point clouds. One important consideration is that a point cloud is simply a set of points, so it needs to satisfy permutation invariance and rigid motion invariance. Thus, the key approaches of PointNet for achieving these are as follow.
Main Contribution
PointNet takes an unordered set of points as input. Each point has coordinates (x, y, z) and may also have additional values such as normal or color, but for simplicity and clarity, PointNet only utilizes (x, y, z). PointNet returns output values corresponding to each task, which are as follows.
Classification
k scores for all the k candidate classes
Segmentation
n x m scores for each of the n points and each of the m semantic subcategories