This disclosure relates generally to digital content processing, and more specifically to real-time, single view action recognition based on analysis of key poses of sports videos.
Smart computing devices, such as smart phones and tablet computers, have become increasingly popular. The increased availability and bandwidth of network access (for wired and wireless network) has enabled a variety of mobile applications for digital content processing with improved efficiency to enhance user experience with mobile applications. For example, a user may use a mobile application to record videos of him/her playing golf with his/her smart phone and to save the sports actions performed by the user, e.g., golf swings, such that the recorded sports actions can be compared with sports actions performed by professional athletes later. To compare sports actions by the user with those performed by professional athletes, the mobile application needs to be able to recognize sports actions recorded by the user's smart phone. Recognizing sports actions for a sports video is to find out whether a sports action (e.g., a baseball swing or a golf swing) has happened in the video frames of the sports video.
Various solutions for player action recognition in sports video were proposed based on e.g., machine learning techniques, exemplar-based multi-view analysis. For example, machine learning technologies are used to train feature models based on a large corpus of sports videos. The trained feature models are applied to input sports videos for real-time action recognition. However, conventional solutions based on machine learning techniques for training feature models often rely on manual classification to select video frames showing specific sports actions, which is computationally expensive and challenging to efficiently recognize sports actions in a large corpus of video frames. Additionally, existing solutions are generally not suited for real-time action recognition, especially for videos captured by mobile computing devices with limited computational power, which degrade user experience with sports videos.