This invention relates generally to digital content processing and particularly to detecting highlights in sports videos using voice recognition of audio data associated with the sports videos.
Smart handheld devices, such as smart phones and tablet computers, have become increasingly popular. The increased availability and bandwidth of network access (for wired and wireless networks) have enabled more communication platforms for digital content consumption and sharing, such as recording sports videos by smart phones and sharing video highlights of sports videos on social networking platforms. A video highlight of a sports video is a portion of the sports video and represents a semantically important event captured in the sports video, e.g., a short video clip capturing goals or goal attempts in a soccer game video clip. Given the complex spatiotemporal nature of sports videos, it is timing consuming and technically challenging to efficiently locate and select video highlights from a long video clip. For example, a 90-minute long video clip of a soccer game may contain three highlights capturing the three goal events, each of which may only last for 10˜20 seconds.
Some conventional solutions of video highlights detection rely on some domain knowledge, that are only suitable for specific types of sports, e.g., classifying sports videos into football or basketball prior to highlight detection. Alternatively, some existing solutions use image analysis techniques to detect highlights captured in a sports video, e.g., using color based visual features of the sports video to track players and tennis ball in tennis videos. However, given the complex spatiotemporal nature of sports videos and rich semantic information carried by audio data associated with sports videos, highlight detection based on visual cues without effectively making use of the audio data is hard to be efficient and effective.