The evaluation standard we used is Mean Intersection over Union(mioU), the formula is
Among them, š¯‘– represents the real value, š¯‘— is the predicted value, and š¯‘¯š¯‘–š¯‘— means that class š¯‘– is predicted as class š¯‘—
mIoU can also be represented as this graph:
mIoU is to calculate the overlap ratio of two circles, the formula is:
In the experiment of RandLA-Net, the hardware environment we used are:
CPU : Intel Core i7-6700
RAM : 64GB
GPU : GeForce GTX 1080 VRAM 8GB
The initial setting of RandLA-Net is:
Batch Size = 6, Data Size = 30000
random sampling
The experimental result gained from change of the three variables mentioned above
1. Data augmentation
Random and fixed rotation, random translation, scaling, Gaussian noise, and Tensorflow built-in data augment function were performed on the input data respectively, and all were tested in the same environment. The results are shown in the table, without any The mIoU of baseline which is without any data augmentation is 48.223. Although there are obvious differences in the performance of the models with data augmentation, all the performances are worse than the baseline, and the average mIoU is about 42, which is far from the expected results. Since the experimental results showed that data augmentation doesn't help, we later abandoned the modification for this part.
2. Change the ratio between batchsize and data size
The change of the ratio between batchsize and data size on GPU GTX Titan X will be showed in the following pages.
3. Sample method
1. The 3000 pieces of data comes from the same small file.
Easier in sampling.
Low point cloud diversity
2. The 3000 pieces of data comes from 33 file evenly. (2000/33 = 90)
The data gets more average than the previous method.
The data gets more average than the previous method.
3. The 3000 pieces of data comes from 33 file based on the original file size
The data comes average.
Needs more time to calculate the FPS sampling
During the training of RandLA-net, we found that the results were not satisfactory in the process of using GTX1080 with 8GB VRAM , so we borrowed another GPU from the laboratory for training, and the configuration of this computer is:
CPU Intel Core i7-6700
RAM 64GB
GPU GeForce GTX Titan X VRAM 12GB
With the upgrade of size of VRAM, we can increase the batchsize from 180,000 to 240,000, and we increase the datasize to 6*40,000 (compared with 8G VRAM to increase the Data size), and also tried to use the initial setting of 4*65536 proposed by the author of RandLA-Net, while using the same Anaconda virtual environment for training, the main point of change is the ratio between the batchsize and the data size, the results are as follows:
It can be seen that after increasing the size of VRAM, the performance of the overall model has been greatly improved. It shows that RandLA-Net is very sensitive to the size of the input data. After testing different ratios, we found that the ratio of 24*10000 will probably be a sweet spot of input size in this model, and ā€‹continuously increasing the batch size or data size will not get better performance.
For the RandLA-Net part, we also made statistics on the training speed. Although running on the CPU can use 64GB of memory, it takes a lot of time, and if it is running on the GTX1080 It takes about 20-25 minutes to run an Epoch. It can be seen that the current machine learning part still needs GPU operations to be faster. The data we tested are as follows:
In the experiment of BEV-seg, the hardware environment we used are:
CPU : Intel Core i7-6700
RAM : 64GB
GPU : ā€‹GeForce GTX TITAN X VRAM 12GBļ¼‰and GeForce GTX 1080 (VRAM 8GB)
The initial setting of BEV-Seg-Net is:
The model input bird's eye view size is 500x500
The original input image only has three channels of R, G, and B
We divide the system implementation into two variables for comparison and performance evaluation:
1. Change the input image size:
Due to the hardware limitation of the GPU, if we operate with the initial image size, the batch size can only be set to 1. The effect of the experiment is too poor, so we rescale to 100x100 and use RGB as our baseline, its mIoU is 16.32.
With the size of 100x100 of input, the batchsize can be increased to 32 in the GeForce GTX 1080 (VRAM 8GB)
2. Add the new channel, Altitude
Since the result of only using RGB is not good, we think that in the case of a bird's-eye view, what is most needed is the feature of height of the corresponding point of each picture, so we use the height as its fourth channel, and get the following results:
As the figure above shows, ā€‹the image size of 100x100 is trained on GeForce GTX 1080 VRAM 8GB), and each Batch Size is 32; the image size of 200x200 is trained on GeForce GTX TITAN X VRAM 12GB), and its Batch Size is 16; ā€‹the image size of 300x300 is also trained on GeForce GTX TITAN X VRAM 12GB), and its Batch Size can only be opened to 8.
The first thing that we can find from the above data is that with same image size and epoch number, there is a huge difference between with altitude channel and without. The mIoU of w/o altitude is only 16.32; but the mIoU of with altitude can reach 57.61. You can see that when using bird's-eye view projection for semantic segmentation, adding altitude channel can effectively improve the performance of model.
When looking at the data that also uses four channels, we can found that the size of the batchsize has a great influence. However, the larger picture (300x300) performs better on the small category (Rail). We speculate that it is because of the picture rescaling, small classes of objects such as Rails and Bikes may be too blurry to be recognized.
We also count the time of each experiment, and the time spent in each Epoch is mainly related to the size of the input image.
Our experimental results:
Baseline of each models :
created with
Website Builder Software .