1. Field of the Invention
This invention relates to a method and system for detecting a refrain in an audio file, a method and system for processing the audio file, and a method and system for a speech-driven selection of the audio file.
2. Related Art
Vehicles typically include audio systems in which audio data or audio files stored on storage media, such as compact disks (CD's) or other memory media, are played. Some times, vehicles also include entertainment systems, which are capable of playing video files, such as DVD's. While driving, the driver should carefully watch the traffic situation around him, and thus a visual interface from the car audio system to the user of the system, who at the same time is the driver, is disadvantageous. Thus, speech-controlled operation of devices incorporated in vehicles is becoming of more desirable.
Besides the safety aspect in cars, speech-driven access to audio archives is becoming desirable for portable or home audio players, too, as archives are rapidly growing and haptic interfaces turn out to be hard to use for the selection of files from long lists.
Recently, the use of media files such as audio or video files, which are available over a centralized commercial database such as ITUNES® from Apple has become very well-known. Additionally, the use of these audio or video files as digitally stored data has become a widely spread phenomenon due to the fact that systems have been developed, which allow the storing of these data files in a compact way using different compression techniques. Furthermore, the copying of music data formerly provided in a compact disc or other storage media has become possible in recent years. Sometimes these digitally stored audio files include metadata, which may be stored in a tag.
The voice-controlled selection of an audio file is a challenging task. First of all, the title of the audio file or the expression a user uses to select a file is often not in the user's native language. Additionally, the audio files stored on different media do not necessarily include a tag in which phonetic or orthographic information about the audio file itself is stored. Even if such tags are present, a speech-driven selection of an audio file often fails due to the fact that the character encodings are unknown, the language of the orthographic labels is unknown, or due to unresolved abbreviations, spelling mistakes, careless use of capital letters and non-Latin characters, etc.
Furthermore, in some cases, the song titles do not represent the most prominent part of a song's refrain. In many such cases a user will, however, not be aware of this circumstance, but will instead utter words of the refrain for selecting the audio file in a speech-driven audio player. Accordingly, a need exists to improve the speech-controlled selection of audio files and help to identify an audio file more easily.