TTA

Test-Time Augmentation for 3D Point Cloud Classification and Segmentation

Tuan-Anh Vu1 #           Srinjay Sarkar2 #           Zhiyuan Zhang3 📧           Binh-Son Hua4           Sai-Kit Yeung1

1The Hong Kong University of Science and Technology, Hong Kong SAR 2VinAI Research, Vietnam
3Singapore Management University, Singapore 4Trinity College Dublin, Ireland
#co-first author 📧corresponding author

International Conference on 3D Vision (3DV 2024)

Overview of our test-time augmentation (TTA) method for point clouds sclassification and segmentation.

Abstract

Data augmentation is a powerful technique to enhance the performance of a deep learning task but has received less attention in 3D deep learning. It is well known that when 3D shapes are sparsely represented with low point density, the performance of the downstream tasks drops significantly. This work explores test-time augmentation (TTA) for 3D point clouds. We are inspired by the recent revolution of learning implicit representation and point cloud upsampling, which can produce high-quality 3D surface reconstruction and proximity-to-surface, respectively. Our idea is to leverage the implicit field reconstruction or point cloud upsampling techniques as a systematic way to augment point cloud data. Mainly, we test both strategies by sampling points from the reconstructed results and using the sampled point cloud as test-time augmented data. We show that both strategies are effective in improving accuracy. We observed that point cloud upsampling for test-time augmentation can lead to more significant performance improvement on downstream tasks such as object classification and segmentation on the ModelNet40, ShapeNet, ScanObjectNN, and SemanticKITTI datasets, especially for sparse point clouds.

Materials

Overview

Given a point set \(\{\mathbf{p}_i \}_{i=1}^n\) with \(\mathbf{p}_i \in \mathbb{R}^3\) represented by a matrix \(\mathbf{x}_0 \in \mathbb{R}^{n \times 3}\). To maintain the generality of our approach, we assume that the input \(\mathbf{x}_0\) is passed to pre-trained network \(f\) for feature extraction, and the features are passed to a network \(g\) for final label prediction \(f(\mathbf{x}_0)\) throughout the inference process. Our goal is to achieve performance improvement in the downstream task via test-time augmentation, where the final prediction can be defined as: \[g(\phi(f(\mathbf{x}_0), f(\mathbf{x}_1), f(\mathbf{x}_2), ...))\] where \(\phi\) is an aggregation function to combine multiple features resulting from the original point set \(\mathbf{x}_0\) and the augmented point sets \(\mathbf{x}_1\), \(\mathbf{x}_2\), etc. Note that the network \(f\) and \(g\) are pre-trained and left untouched in test-time augmentation; only the input is augmented. Traditionally, a simple method for test-time augmentation is jittering, which adds Gaussian noise to perturb the point cloud \(\mathbf{x}_0\) to generate an augmented point cloud \(\mathbf{x}_k\): \[\mathbf{x}_k = \mathbf{x}_0 + \lambda \mathbf{z}_k\] where \(\mathbf{z}_k \sim \mathcal{N}(0, I)\) is a random noise vector from a normal distribution, and \(\lambda\) is a scalar value to control the noise magnitude. This simple augmentation has been widely adopted since the seminal PointNet. An issue of such augmentation is that it does not consider the underlying surface or point distribution because the noise \(\mathbf{z}\) is independent of \(\mathbf{x}_0\), resulting in marginal performance improvement in many cases. In this work, we viewpoint set \(\mathbf{x}_0\) as a noisy estimate of a latent surface representation \(\mathcal{S}\), and therefore, we define point cloud augmentation as the process of sampling additional point clouds \(\mathbf{x}_k\) that explain the same surface. We propose to sample augmented point clouds \(\mathbf{x}_k\) (\(k \geq 1\)) in two ways: surface sampling and point cloud up-sampling. The sampled point clouds can then be leveraged for downstream tasks such as classification and segmentation.


Qualitative Results

Visualization of part segmentation results on ShapeNet dataset.


Visualization of semantic segmentation results on S3DIS dataset.


Visualization of semantic segmentation results on SemanticKITTI dataset.

Citation
        @inproceedings{tavu2024tta,
  title={Test-Time Augmentation for 3D Point Cloud Classification and Segmentation},
  author={Tuan-Anh Vu, Srinjay Sarkar, Zhiyuan Zhang, Binh-Son Hua, Sai-Kit Yeung},
  booktitle={Proceedings of International Conference on 3D Vision (3DV)},
  year={2024}
}

Acknowledgements

This work is supported by the Singapore Ministry of Education (MOE) Academic Research Fund (AcRF) Tier 1 grant (No.: 22-SIS-SMU-093), and Ningbo 2025 Science and Technology Innovation Major Project (No. 2022Z072), and an internal grant from HKUST (R9429).
This work is partially done when Srinjay Sarkar was a research resident at VinAI Research, Vietnam.
The website is modified from this template.