SensatUrban is a large-scale point cloud, and the first thing to consider when doing 3D Semantic Segmentation is to down-sampling, due to existing equipment considerations, most down-sampling methods cut the entire point cloud set into small point cloud blocks.
However. many objects will lose their original geometric structure of the object, so in the case of a large point cloud, it will be difficult for the neural network to learn the overall geometry structure of the object.
So, We choose RandLA-Net.
Using random sampling, we only have to use
K*O (1) time to select K points from the input N points
points, so the amount of calculation is only related to the number of points we need to sample, so it has a very high efficiency.
However, the uncertainty of using random sampling will lead to the loss of local information, for this
problem, the author proposes a local feature aggregation (LFA) module as a solution.
Including 3 parts:
1. LocSE:
To encode the coordinates of the point cloud, using KNN algorithm to find the nearest K points, then concatenate the center point coordinates, K neighbor point coordinates, relative coordinates and distances.
Then the points we concatenate above and corresponding neighbor points are concatenated and become a new feature.
2. Attentive Pooling
Design a shared function for each point to learn their own
attention score, and take attention score as a soft mask that can select important features, and the final learned features are the weighted sum of these neighborhood features.
3. Dilated Resudual Block
Because of the recent downsampling, the purpose of expanding the residual block is to increase the sense of each point
Receptive field.
Multiple local space encoding, self-attention pooling layer and Skip connections are combined into a Dilated Residual Block.
Since the training on a GTX 1080 is severely limited by VRAM size, we did not modify the ratio between the batch size and the data size on this GPU.
Only when the training resources of the second GPU (GTX Titan X) which is provided to us from the laboratory, we can adjust the ratio. This part will be shown in the result.
In this modification, we can see that after increasing the size of VRAM, the performance of the overall model has been greatly improved, which can be
See RandLA-Net is very sensitive to the size of the input data.
And we also find a ratio sweet spot in the RandLA-net.
The default sampling method of RandLA-net is to randomly sample a point, find 30,000 points adjacent to this point as the input of the Network, and we follow his sampling idea, but change the sampling method to The next two types:
created with
Website Builder Software .