Discrete large-vocabulary speech recognition systems have been available for use on desktop personal computers for approximately 10 years by the time of the writing of this patent application. Continuous large-vocabulary speech recognition systems have been available for use on such computers for approximately five years by this time. Such speech recognition systems have proven to be of considerable worth. In fact, much of the text of the present patent application is being prepared by the use of a large vocabulary continuous speech recognition system.
As used in this specification and the claims that follow, when we refer to a large-vocabulary speech recognition system, we mean one that has the ability to recognize a given utterance as being any one of at least two thousand different vocabulary words, depending upon which of those words has corresponding phonetic models that most closely match the given spoken word
As indicated by FIG. 1, large-vocabulary speech recognition typically functions by having a user 100 speak into a microphone 102, which in the example of FIG. 1 is a microphone of a cellular telephone 104. The microphone transduces the variation in air pressure over time caused by the utterance of words into a corresponding waveform represented by an electronic signal 106. In many speech recognition systems this waveform signal is converted by digital signal processing performed either by a computer processor or by a special digital signal processor 108, into a time domain representation. Often the time domain representation comprises a plurality of parameter frames 112, each of which represents properties of the sound represented by the waveform 106 at each of a plurality of successive time periods, such as every one-hundredth of a second.
As indicated in FIG. 2, the time domain, or frame, representation of an utterance to be recognized is then matched against a plurality of possible sequences of phonetic models 200 corresponding to different words in a large vocabulary. In most large-vocabulary speech recognition systems, individual words 202 are each represented by a corresponding phonetic spelling 204, similar to the phonetic spellings found in most dictionaries. Each phoneme in a phonetic spelling has one or more phonetic models 200 associated with it. In many systems the models 200 are phoneme-in-context models, which model the sound of their associated phoneme when it occurs in the context of the preceding and following phoneme in a given word's phonetic spelling. The phonetic models are commonly composed of the sequence of one or more probability models, each of which represents the probability of different parameter values for each of the parameters used in the frames of the time domain representation 110 of an utterance to be recognized.
One of the major trends in personal computing in recent years has been the increased use of smaller and often more portable computing devices.
Originally most personal computing was performed upon desktop computers of the general type represented by FIG. 3. Then there was an increase in usage of even smaller personal computers in the form of laptop computers, which are not shown in the drawings because laptop computers have roughly the same type of computational capabilities and user interface as desktop computers. Most current large-vocabulary speech recognition systems have been designed for use on such systems.
Recently there has been an increase in the use of new types of computers such as the tablet computer shown in FIG. 4, the personal digital assistant computer shown in FIG. 5, cell phones which have increased computing power, shown in FIG. 6, wrist phone computers represented in FIG. 7, and a wearable computer which provides a user interface with a screen and eyetracking and/or audio output provided from a head wearable device as indicated in FIG. 8.
Because of recent increases in computing power, such new types of devices can have computational power equal to that of the first desktops on which discrete large vocabulary recognition systems were provided and, in some cases, as much computational power as was provided on desktop computers that first ran large vocabulary continuous speech recognition. The computational capacities of such smaller and/or more portable personal computers will only grow as time goes by.
One of the more important challenges involved in providing effective large-vocabulary speech recognition on ever more portable computers is that of providing a user interface that makes it easier and faster to create, edit, and use speech recognition on such devices.