Traffic Digital Twin" is an advanced technology used for simulating and managing urban traffic, also known as traffic digital twinning. Based on real-world traffic system data, such as single-angle footage from intersection surveillance cameras, we have constructed a virtual, precise traffic model. This model allows users to observe current traffic conditions from different perspectives. This digital twin can be utilized for monitoring, analyzing, and optimizing urban traffic to address issues related to traffic efficiency, congestion reduction, and improved traffic safety. We employ technologies like YOLOv3, Vehicle ReID, Object Tracking, K-means Clustering, among others, to assist urban planners and traffic management departments in better understanding and addressing urban traffic issues.
First, we convert the input surveillance camera footage into a sequence of images at a frequency of 10Hz. With the input video being nearly 50 seconds long, this results in 490 input images. Next, we use a pre-trained YOLOv3 model to detect objects in each frame.
ReID:Before conducting object tracking, it is crucial to perform ReID (Re-Identification) for each vehicle. Here, we utilize ResNet for ReID, primarily focusing on feature extraction.
Tracking:Each frame can be categorized into Matched Tracks, Unmatched Tracks, and Unmatched Detections. Each frame updates the age of the tracks, and if the age of an Unmatched Track exceeds the defined max_age, it implies that the vehicle has disappeared, and this track can be discarded. Moreover, if the score of an Unmatched Detection exceeds a certain threshold, a new track will be created for that detection.
We will crop each bounding box, allowing us to individually identify the dominant color within the region of each cropped vehicle. This approach ensures that the color detection process for one vehicle's area does not interfere with the detection of colors for other vehicles. We set the number of clusters to 3 for this clustering process.
As our goal is to derive the digital twin solely from a single angle of the surveillance camera without requiring additional intrinsic or extrinsic parameters of the camera, existing methods to determine the transformation matrix cannot be employed. The implementation details involve initially identifying the satellite imagery of the surveillance camera location. Then, we mark the features appearing concurrently in both the video footage and the satellite image, such as zebra crossings, motorcycle waiting areas, road markings, and so on.
The incomplete detection process can cause sudden disappearance of cars in the video, resulting in an undesirable video.
Utilize linear interpolation to insert points between two frames, creating a smoother transition throughout the entire video.