The present disclosure relates to an audio processing apparatus and method, and a program and, more particularly, to an audio processing apparatus and method, and a program, which are capable of extracting with high accuracy a hook from an audio signal formed of musical pieces.
Recently, as represented by a mobile telephone, an age of ubiquitous networking has arrived where the Internet may be accessed anywhere at any time, ways of personal enjoyment or lifestyle have diversified. Among them, if looking at music formed from musical pieces, and the like, until recently, a style of importing a purchased music album compact disc (CD) to a tape or a mini disc (MD) and listening to music using an audio player outdoors, such as on the subway or in the street, has generally been used. However, recently, as an audio player including a mass storage medium such as a flash memory has been introduced, a style of importing and viewing several thousands (or several tens of thousands) of musical pieces in the mass storage medium has been generally used. A mobile apparatus having a network function and including an audio player may access the Internet even outdoors so as to listen to or purchase music.
In this way, a large amount of musical pieces may be casually held and transferred casually outdoors. However, it is necessary to easily search for a desired musical piece without stress from an unfathomably large number of musical pieces.
That is, when a musical piece is selected, a user listens to the beginning of the musical piece, and by selecting the song title or artist, determines whether or not the user will listen to the musical piece. However, since the beginning of most musical pieces is accompaniment, it is difficult to determine whether it is a desired musical piece. If a large number of musical pieces is present, the user may encounter a musical piece they do not recognize, and the opportunity to listen to a desired musical piece at a desired time may be lost.
As a method for solving such a problem, there is a method of enhancing searchability by reproducing the “hook” part which is a climax part of a musical piece. Since the “hook” is the climax part of the musical piece, the hook makes a strong impression on the user. Thus, by detecting a hook with high accuracy and reproducing the hook when a musical piece is selected, it is possible to enhance the searchability of a musical piece. As in a music ranking TV program, sequentially reproducing the hooks becomes one music enjoyment method.
As a method of detecting a hook, a method of extracting a hook by calculating similarity by autocorrelation is proposed (see Japanese Patent No. 4243682).
As a method of detecting an audio change point and extracting a hook by focusing attention on an audio signal level, a method of detecting an audio change point from the maximum value of an evaluation function including a root mean square, and the like as a feature value and extracting a hook is proposed (see Japanese Patent No. 3886372).
A method of using an audio signal level as a feature value, a method of detecting an audio change point by distinguishing a threshold value of the amount of change or the level, and extracting a hook from a similar section of a time distribution or a combination of an interval of audio change points is proposed (see Japanese Unexamined Patent Application Publication No. 2008-262043).