Multi-Object Tracking (MOT) is a popular topic in computer vision that has received lots of attention over past years in both research and industry. MOT has a variety of applications in security and surveillance, video communication, and self-driving or autonomous vehicles.
Multi-object tracking can be divided into two categories: online MOT and offline MOT. The difference between these two kinds of tracking is that online tracking can only use the information of previous image frames for inference, while offline tracking can use the information of a whole video sequence. Although offline tracking can perform much better than online tracking, in some scenarios such as self-driving cars, only online tracking can be used; because, the latter image frames cannot be used to perform inference analysis for the current image frame.
Recently, some online MOT systems have achieved state-of-the-art performance by using deep learning methods, such as Convolutional Neural Networks (CNN) and Long Short Term Memory (LSTM). However, all these methods cannot achieve real-time speed while maintaining high performance. Moreover, other purported real-time online MOT systems, such as those using only Kalman filters or a Markov Decision Process (MDP), also cannot achieve enough performance to be used in practice. Therefore, an improved real-time online MOT system with better performance is needed.