The present invention generally relates to a system and method for recognizing speech and transcribing speech.
A speech recognition system analyzes speech to determine what was said. In a frame based system, a processor divides a signal descriptive of the speech to be recognized into a series of digital frames, each of which corresponds to a small time increment of the speech. The processor then compares the digital frames to a set of stored models, each of which represents a word from a vocabulary, and may represent how that word is spoken by a variety of speakers. A speech model may also represent a phoneme that corresponds to a part of a word. Collectively, phonemes represent the phonetic spelling of the word.
The processor determines what is said by finding the model that best matches the digital frames that represent the speech. The words or phrases corresponding to the best matching model are referred to as recognition candidates. The processor may be part of a general purpose computer with an input/output unit, a sound card, memory, and various programs including an operating system, application program such as a word processing program], stenographic translation processor, and a speech recognition program. The input/output unit could include devices such as microphone, mouse, keyboard, monitor, stenographs and video data.
The system detects the speech through a speech recognition program. The speech may be conveyed from an analog signal to a sound card and then through a converter to be transformed to a digital format. Under the control of an operating system the speech recognition program compares the digital samples to speech models. These results may be stored or used as input to the application program. Speech programs and application programs can run concurrently so for example, a speaker can use a microphone as a text input device, alone or in conjunction with a mouse and keyboard. The speaker interacts through a GUI.
A speech recognition system may be a xe2x80x9cdiscrete systemxe2x80x9d which pauses between words or phrases, or it may be xe2x80x9ccontinuousxe2x80x9d, where the system recognizes words and phrases without the speaker having to pause between them. Such systems relate to down-line transcription used by attorneys reviewing real-time transcription during a proceeding such as a trial or deposition, or for the manipulation of audio and video transcripts by attorneys, judges, court reporters, witnesses and clients. A stenographic recorder is a machine used in this process, which may be backed up by a tape recording. The stenographic recorder may link to a computer aided transcription [CAT] system to transcribe stored electronic key-strokes. This system requires the reporter to work inter-actively with the CAT system to correct errors, often with the aid of a taped recording.
As the use of stenotype machines in this process, results in a high incidence of errors through undefined strokes improved processors have been incorporated into the translation systems. These include a means for providing a sequence of lexical stroke symbols and the processor for receiving them. This processor could have a scan chart memory storing a list of stroke symbol combinations and text part translations. Also, the system would have a means of combining language parts according to a set of defined rules to complete words in language text format. In addition, a speech recognition system for converting audio data to frame data sets with a stored vocabulary, as clusters of word models that can be compared and recognized by a processor system, would be linked to an output system defining the text.
While systems have been introduced with continuous speech recognition capability aimed at enabling direct voice to text capabilities, these systems generally are restricted. In the case of audio input they may be limited in only being able to recognize one user. Secondly, the systems have difficulty in processing bursts of rapid speech. The problem of dialects and accents have not been over come, and the systems require hands on personal training. Background noise often interferes with optimum training sessions leading to a confusion of developmental data being processed by the automatic speech recognition [ASR] system and the storage of an inaccurate data base. Overall, the lack of predictable consistency in error free operation and the difficulties in scaling [using unreasonably large training sets] have yet to be addressed.
Therefore, what is needed is a system and method that is efficient in terms of time frame and being trainable by more than one voice, including various accents and dialects. Further, what is needed is a system capable of handling continuous speech at various rates in a consistent and reliable manner. In addition, what is needed is a system that will eliminate background noise and solve the scaling problem incurred by large data sets.
To overcome the limitations in the prior art described above, and to overcome other limitations that will become apparent upon reading and understanding the present specification, the present invention is embodied in a system and method for recognizing speech and transcribing speech.
The system includes a computer, which, could be in a LAN or WAN linked to other computer systems through the Internet. The computer has a controller, or similar device, to filter background noise and convert incoming signals to digital format. The digital signals are transcribed to a word list which is processed by an automatic speech recognition system. This system synchronizes and compares the lists and forwards the list to a speech recognition learning system that stores the data on-site.
The stored data is forwarded to an off-site storage system, and an off-site large scale learning system that processes the data from all sites on the wide area network system. Users of the system can access the off-site storage system directly. The system solves the scaling problem by providing an efficient method of generating large training sets for multi-varied word patterns.
Other aspects and advantages of the present invention as well as a more complete understanding thereof will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example the principles of the invention. Moreover, it is intended that the scope of the invention be limited by the claims and not by the preceding summary or the following detailed description.