Tutorial 1 : PointNet

Qi et al., CVPR 2017

Reviewer: Hyunjin Kim

1. Abstract

Point clouds are an important geometric data structure, but due to their irregular format, many studies have converted them into voxel grids or images for use. However, these approaches have led to inefficiencies and various issues. Therefore, the authors propose PointNet, a unified architecture that utilizes point clouds for various tasks such as classification and segmentation. PointNet is structurally simple yet highly efficient and effective, achieving state-of-the-art performance.

2. Introduction

Why PointNet?

Typical CNNs require a regular input data format such as image grids or voxels. However, 3D geometric data primarily exists in the form of point clouds or meshes, which do not adhere to a regular input data format. Consequently, many researchers have converted them into 3D voxel grids or collections of images for usage, resulting in computational inefficiencies and quantization artifacts. To address these issues, authors proposed PointNet, which accepts point clouds as input and enables tasks such as classification and segmentation on 3D data.
What is PointNet?

PointNet is a network that takes a simple point cloud as input from 3D geometric data and outputs class labels for the entire input (classification) or per point segment/part labels (segmentation). Unlike meshes, point clouds do not possess combinational irregularities and complexities, making them easier to learn. Additionally, applying rigid body transformations to point clouds is also straightforward (by individually applying transformations to each point). Hence, PointNet utilizes point clouds. One important consideration is that a point cloud is simply a set of points, so it needs to satisfy permutation invariance and rigid motion invariance. Thus, the key approaches of PointNet for achieving these are as follow.
- Permutation invariance: the use of a single symmetric function, max pooling
- Rigid motion invariance: add a data-dependent spatial transformer network
Main Contribution
- Designing a novel deep net architecture suitable for consuming unordered point sets in 3D.
- Showing how such a net can be trained to perform 3D shape classification, shape part segmentation, and scene semantic parsing tasks.
- Providing thorough empirical and theoretical analysis on the stability and efficiency of the method
- Illustrating the 3D features computed by the selected neurons in the net and developing intuitive explanations for its performance

3. Problem Statement

PointNet takes an unordered set of points as input. Each point has coordinates (x, y, z) and may also have additional values such as normal or color, but for simplicity and clarity, PointNet only utilizes (x, y, z). PointNet returns output values corresponding to each task, which are as follows.

Classification

k scores for all the k candidate classes
Segmentation

n x m scores for each of the n points and each of the m semantic subcategories