Vehicles are common transportation nowadays, and are widely used in our daily life, for example, going to work, family trip, picking up passengers and transporting goods, etc.
Based on the statistical data extracted from the survey relating to highway traffic flow conducted by Directorate General of Highways, M.O.T.C, the traffic flow of major roads can be higher than 6000PCU (Passenger Car Unit) during peak hours. Due to the highly frequent usage rates, the safety of driving becomes extremely important.
According to Ministry of Transportation Traffic Safety Committee in Taiwan, drowsy driving has always been one of the main causes of traffic accidents. If the driver only takes less than four hours for rest, the chance of traffic accident will increase by 10 times.
Therefore, we hope to implement a safety mechanism for drowsy driving, to prevent such incidents by raising an alarm if the driver is drowsy, and to predict whether the driver is drowsy or not via deep learning.learn more
The steps through the Drowsy Driver Detection model are first, extract significant features of cropped images of eyes by MobileNet. Then, the feature sequence of a certain time interval is fed to the LSTM as input. Finally, a softmax layer is added to predict the driver’s behavior.more detail
A new dataset for retraining the MobileNet V1 is generated from the video of DDD Dataset (Training) by cropping images of eyes and removed the undesirable images. The reserved images are divided into two categories, Alert (about 5000 images) and Drowsy (about 4000 images).
For prediction of drowsy or not, another new dataset is generated, each test case is labeled with Alert or Drowsy and consists of 10 images which are extracted from the video of DDD dataset (Training and Evaluation). Each images are taken from every three of the frames in the dataset. The dataset consists of 776 Training (from DDD Training dataset) and 15 Test (from DDD Evaluation dataset).
In order to faster the process of eye detection, the range of eye detection need to be reduced; thus, face detection may need to proceed first in order to limit the range of eye detection. Utilize OpenCV for detecting facial characteristics, crop and resize the eyes of drivers.
Use MobileNet V1 (Convolution Neural Networks) to obtain significant visual features from the cropped images. By running images through MobileNet V1 and save the output of final pooling layer (AvgPool:0), we will get 512 Dimensional vector of features and these extracted features will finally be converted into sequences of extracted features.
A single 2048-wide LSTM layer, followed by a 1024 Dense layer, with some dropout in between for the final prediction.
If the prediction of the drowsy driver > 50%, then the system would make an alarm.
The system implemented on NVIDIA® Jetson™ TX2 gives a prediction within 1 to 2 seconds and obtained an approximate accuracy of 93.33%.
System Integration Implementation, 2019 Fall
國立清華大學 資訊工程學系 20級