Along with the development of Internet and the development of communication network, the video technology is rapidly developed accordingly, network video is widely popularized, and more and more users watch videos through network.
Currently, segments of audio contents frequently occur in some videos, then supplemental content (e.g., lyrics, captions, etc.) need to be added for the segments of the audio contents in the videos, so that the users can see the supplemental content of the segments and the user experience can be improved. For identifying and matching supplemental content for an audio content in a video, firstly, the song, to which the segment corresponds to, needs to be determined and/or positioned (i.e., the location of the segment needs to be identified in the song). Existing manners for determining or positioning the song, to which the video segment belongs, are mainly as follows: extracting a fragment of the segment in the video, then roughly matching the video segment fragment with the song in a music library, and taking the matched song as the song, to which the video segment belongs.
According to a scheme for determining or positioning the song, to which the segment belongs, provided by related technology, because the accuracy for extracting the video segment fragment is low, and a relatively simple matching manner is adopted for song matching, the accuracy for determining the song corresponding to the video segment is relatively low. Besides users need to switch among different applications, and manually identify the song and/or locate the segment in the song, which is time consuming, and has low accuracy and bad user experience.