With the continuous development of Internet technologies, the Internet has become an indispensable tool in daily life. A new trend of applications is to achieve recognition of unknown audios by using Internet devices and to perform audio recognition-based interaction.
There are many types of applications for audio recognition-based interaction. For example, one application is that a user hears a song, but does not know the title of the song. A segment of audio of this song can be recorded, and then the title, singer, and other information of this song can be recognized using an audio recognition technology.
According to the prior art, the recognition is typically performed by extracting and using feature points of a to-be-recognized audio. As shown in FIG. 1, the x-axis represents time and the y-axis represents frequency. The extracted feature points are “X” in the figure. Two feature points constitute a feature point pair, and there are eight feature point pairs in the target region. Recognition is performed in a database based on feature point pairs, and the database stores feature points of songs and information of songs, such as song titles, singers, and the like. If the same feature point pairs can be matched in the same target region in the database, the matching is successful; and then corresponding song information can be obtained. Under the inevitable influence by noise during audio recording, however, the extracted feature points may not necessarily occur at normal positions. As a result, the probability of matching success of feature point pairs is relatively low.
In summary, the existing technologies have a deficiency of low matching success rate based on feature points for audio recognition.