1. Field of the Invention
The present invention relates generally to a speech recognition system for recognizing input speech data, and speech file recording system and method for recording speech data as speech files. Particularly, the present invention is applied to applications that involve operations for recording speech files, such as speech memo recording applications and speech electronic mail applications, etc., in portable terminals and telephone systems used as personal digital assistances (PDA).
2. Related Background Art
A portable terminal used as a PDA is reduced in size so as to be portable, and it is equipped with a pen or a small-size keyboard as an input device.
In the case of pen input using a pen, information is input by writing text characters to be input, specific marks determined for pen input, etc., or executing pen gestures with a pen tip, such as a cross, on an electronic pad with a pen tip brought into contact with the electronic pad.
In the case where text characters to be input are written on an electronic pad, a character inputting operation and a character recognition operation take much time, which causes an inconvenience to the user who wants to simply record brief memo information when he/she is out on the road or is in a meeting.
In the case where input is carried out by writing specific marks determined for pen input on an electronic pad, or by executing pen gestures with a pen tip, such as a cross, the user needs to memorize such specific marks and gestures. This sometimes is burdensome for the user.
Therefore, speech input has attracted attention for use in portable terminals such as PDA.
If speech input is available, what the user is required to do is simply to input contents of a note to be taken with speech via an equipped microphone. Thus, if a situation allows uttering, memo information can be recorded with speech readily.
FIG. 12 is a view illustrating a conventional speech memo information recording system for recording speech memo information that has been input with speech in a form of a speech file.
510 denotes a microphone, 520 denotes a speech file recording unit, 530 denotes a speech file name input unit, and 540 denotes a speech file reproduction unit.
The user inputs speech via the microphone 510. The speech is converted into speech data by the microphone 510. The speech data is recorded in the speech file recording unit 520 as a speech file. Here, it is necessary to render a file name to the speech file. The user inputs a file name for the speech file via the speech file name input unit 530. It is assumed that a pen input interface, such as a pen provided in a PDA, is provided as the speech file name input unit 530.
It should be noted that instead of the input of a file name by the user him/herself, a serial number may be rendered as a file name automatically by the speech file name input unit 530. In this case, for instance, speech files are named as “speech. 1”, “speech. 2”, and the like in an order in which they are recorded.
Among conventional PDAs, A speech memo information recording system has been known, which receives speech input, and does not record the input speech data as a speech file but executes a speech recognition operation subsequently, and records the same as a text file.
FIG. 13 is a view illustrating a conventional speech memo information recording system involving a speech recognition operation.
610 denotes a microphone, 620 denotes an acoustic analysis unit, 630 denotes an acoustic model, 640 denotes a speech recognition dictionary, and 650 denotes a matching recognition unit.
The user inputs speech via the microphone 610. The speech is converted into speech data by the microphone 610. The acoustic analysis unit 620 executes acoustic analysis to the speech data. The speech data are divided into phoneme units, and a feature value is extracted from each phoneme unit. The acoustic model 630 stores a set of feature values of phoneme units as a model in a data format for the matching of data. For instance, a probability model employing the Hidden Markov Model (HMM) is used.
The matching recognition unit 650 compares feature values of phoneme units of acoustic data supplied from the acoustic analysis unit 620 with a set of the feature value data of phoneme units stored in the acoustic model 630, for instance, a probability model of feature values of phoneme units, and recognizes the phoneme units of the input acoustic data. Here, the matching recognition unit 650 refers to the speech recognition dictionary 640, checks whether the information composed of the recognized phoneme units is recognizable as words, such as registered words, and outputs the recognized words as a speech recognition result.
Here, the performance of the speech recognition is significantly dependent on vocabulary of a dictionary stored in the dictionary storing unit 640. Only in the case where the word input by the user is included in the vocabulary of the dictionary, the speech recognition can be carried out. The expansion of the vocabulary of the dictionary increases the number of words that can be recognized in the speech recognition operation, but a small-size portable terminal such as a PDA has only a limited dictionary capacity, and an increase in the number of terms in the vocabulary causes a matching operation to take more time. Therefore, a vocabulary of a dictionary is limited.
Then, the utilization of a user dictionary whose vocabulary is customizable for a user is carried out widely, so that terms the user inputs are covered efficiently using the limited vocabulary.
The above-described conventional speech memo information recording system, however, has the following problems.
In the case where the user him/herself inputs a speech file name by inputting text data, a problem arises in this action by the user for giving the speech file name is inconvenient for the user.
The simplicity of inputting memo information with speech is impaired by the above-described inconvenience for the user of inputting a file name for the recorded information by pen input or keyboard entry, which is bothersome for the user.
Since a file name has to be given in a text data form, a constituent part that conducts a speech recognition operation is indispensable so as to give a file name by speech input.
Furthermore, as described above, in the case where a serial number rendered automatically is given as a file name to a speech file, the user does not have to input a file name. However, in the case where a multiplicity of speech memo information pieces are recorded as speech files, contents of the recorded speech files cannot be grasped from the serial numbers, and hence, a problem arises in that, in referring to the speech memo information, it is difficult to find which speech file records the speech memo information to be referred to.
Next, in the case of a speech memo information recording system that executes speech recognition of speech memo information input with speech and records the speech memo information in a text file form, the user incurs the difficulty in customizing the user dictionary.
As described above, to improve the recognition accuracy of the speech recognition, it is necessary to prepare a user dictionary having a vocabulary that efficiently covers terms that are presumed to be input by the user, with a limited vocabulary capacity. This user dictionary is necessarily built up by the user him/herself, which is bothersome for the user. If the building up of the user dictionary is carried out by pen input or keyboard entry, this bothersome work for the user increases further.