1. Field of the Invention
The present invention relates to an information selection method selecting information associated with an audio information source, and an information selection apparatus employing such an audio information selection method.
2. Description of the Background Art
As a conventional technique of the method of selecting one information from a plurality of information, a retrieval engine using character information provided on a display is known. However, the task of continuously viewing the screen will be a burden on the user side. Accordingly, attention is now focused on the usage of voice and sound. In the case where a great number of audio media such as radio programs, music CDs and the like are the subject of selection, it will be easier for the user to actually listen to the contents rather than by selection using only character information.
A method of selecting information associated with a sound source according to a plurality of sound sources is disclosed in, for example, Japanese Patent Laying-Open No. 10-124292. According to this method, a plurality of sound sources are placed around the user. The audio outputs are basically issued at the same volume. The user distinguishes audibly the audio outputs generated simultaneously and specifies a desired direction to select the information associated with the sound of that direction. More specifically, various audio messages such as “play”, “record”, “rewind” and “stop” are determined for the front, right, back, and left, respectively, as the operation of a video equipment. When the user wants to effect recording, the right direction is to be selected using a pointing device such as a cross pad. Another method disclosed in this publication is devised to facilitate the audible feature of each generated sound by issuing the sound of each audio output with a slight time difference to define sound quality difference (male voice, female voice).
A method of playing again information that was missed audibly by the user indicating a certain direction with a single sound source rotating about the user is disclosed in “Dynamic Soundscape: mapping time to space for audio browsing” (CHI97) by Minoru Kobayashi and Chris Schmandt (MIT). According to this method, one sound source moves around a user while issuing audibly various topics at a constant volume and sound quality. In the case where the user missed a certain topic by the ear, the user points out the area providing the audio output of that topic using a pointing device, whereby a sound source is generated at that site. Playback is resumed from the topic that was audibly issued when passing that site of the sound source. In this playback operation, the volume of the former sound source is lowered and the sound source that newly provides the audio output effects playback at a higher volume. Both sound sources move in an orbit at the same time. Up to eight sound sources are allowed simultaneously in this system.
However, even if audio output is provided with time difference or with different sound quality corresponding to each direction to facilitate identification of the position of the sound as in the above-described publication, the direction that can be distinguished audibly by the human being is limited to eight directions at most. The case where there are a great number of selection branches cannot be accommodated. Audio output of only single words such as playback or recording as in the case of video reservation is generally of no problem. However, in the case where audio output of continuous contents such as a plurality of news programs is issued, it is difficult to audibly distinguish the contents even if the sound sources are located at less than eight directions.
In the present specification, “the position of sound” implies the site from which sound is audibly output, or a direction from which a sound can be heard.
In the above method, only one sound is audibly output unless the user provides an input. A plurality of information cannot be obtained at the same time.
The telephone push phone service is known as an information selection interface dedicated to audio (method 1). In selecting information, a voice guidance of “Please depress 1 for . . . ” is output. The user depresses an appropriate button according to the voice guidance.
Another method is known to operate a system by voice using a speech recognition function (method 2). According to this method, a predetermined operation command is input through voice, or a natural language processing function is added to the speech recognition function to operate the system in the manner of ordinary conversation.
A method in which the item to be subjected to selection is altered over time is disclosed in Japanese Patent Laying-Open No. 6-149517. According to this method, the item to be subjected to selection is altered by the user or program request (output at an elapse of a predetermined time). A label of that item is displayed on the screen and a tone scale corresponding to that item is issued from a speaker. The user can select a certain item by carrying out a predetermined input operation when the label of the desired item appears.
Method 1 is disadvantageous in that the number of information that can be selected or the button to be depressed differs each time depending upon the contents. The user has to depend upon the voice guidance at every select operation, which is time-consuming. Since the number of buttons that can be depressed increases, the user will not be able to remember the location of each appropriate button. The user will have to depress the appropriate button while confirming the location of each button. This labor is tedious. Particularly in the case where it is dangerous for the user to carry out a task with his/her view off, this button position confirmation will induce danger.
Method 2 is disadvantageous in that the user must learn and operate a plurality of types of predetermined voice commands, if any is predetermined. To date, the natural language processing function lacks the ability to recognize the meaning of the word input through an audio input of high degree of freedom. A technique at the level of practical usage is not yet established.
According to the technique disclosed in Japanese Patent Laying-Open No. 6-149517, a display device is inevitable. The required information cannot be provided to the user by just through the audio output. Also, the audio information associated with each information corresponds to only the musical scale of a predetermined tone. It will be difficult for the user to master the difference of the tone and the musical interval information.