The invention relates to a system for storing and automatically downloading vocabularies and speech templates from a host computer to a speech recognition station or process in response to a request by a particular operator for a particular sub-application, automatically generating new or updated vocabularies for new or updated sub-application requirements, and automatically generating corresponding new speech templates for an unenrolled operator.
Various speech recognition systems are known. The assignee's commercially marketed "INTELLiVOICE" speech recognition system is a speech entry system that operates as an accessory to a variety of specialized keyboard applications which permit either speech or keyboard entry of specific commands/keystrokes. Conventional speech recognition techniques are used to translate spoken words into keystrokes as an alternative to entering identical information by depressing a key.
The closest prior speech recognition systems require the basic elements of "speech templates" in combination with "syntax" programs. Speech templates, also referred to as "voiceprints", are required because each speaker has unique ways of saying particular words. Such unique ways are sufficiently different from the way another person speaks the same word that the only practical way (at the present state-of-the-art) of recognizing a spoken word is to compare an electronic representation of that spoken word (obtained by operating on the spoken word by known Fourier analysis techniques or the like to produce a digital representation of the word) with a previously digitized and stored voiceprint for that person. Storing such voiceprints for a vocabulary of words requires a great deal of memory.
A conventional technique for interpreting spoken speech commands is to utilize a "syntax" structure or program. A syntax structure or "grammar" is a program that prepares a "software construct" or software structure that defines which words are to be recognized and possible logical relationships between them. It is necessary to understand that each particular program or "application" or "sub-application" to be executed in response to a particular request has its own syntax.
Every application to be speech-accessed by various persons must utilize previously stored constructs that are specific to the presently desired application. These constructs consist of a syntax and all voiceprints of persons who are allowed to use speech to input information into the desired application program.
The most common so-called speech recognition systems presently available merely recognize spoken sounds, but make no attempt to interpret them. Spectral frequency content and timing information derived from the spoken sounds are used to create a table of unique codes that correspond to each sequence of sounds enunciated by the speaker. That table of codes is referred to as a speech template. Speech templates are required for every person who may input speech information to the system, since the spectral content, amplitude of each frequency component, and timing aspects for each component of enunciated sound are unique to each particular speaker.
In the closest prior art, voiceprint data and syntax information have been entered and stored at each speech recognition station. If, for example, a particular technician wishes to access a particular software operation or "process", for example to obtain results of a blood test for a patient in a hospital, the laboratory technician loads his voiceprint and the syntax information (both of which may be stored on a floppy disk that the laboratory technician carries with him) for the desired sub-application into the speech recognition station before his desired speech commands can be spoken. This technique has its shortcomings. Any time the "process" must be updated, for example to add new operators who can input speech requests, to add new vocabulary words that are acceptable to the syntax program, or to add new functions to an application or sub-application, it is necessary to update all of the data stored locally at each speech recognition station (or within each floppy disk). This manual process is inconvenient, time-consuming, and costly. It could also result in some systems or floppy disks being inadvertently missed. If floppy disks are employed, there is also the risk they will be lost.
To date, the use of speech recognition by a computer typically has involved an awkward combining of two distinct systems, a host computer and a speech recognition system. Little has been done in the prior art to automate the coordination of these two very different systems.