Speech recognition has traditionally been performed using systems in which the transmission of speech data within the system is free of errors. However, the emergence of the Internet and of digital wireless technology has given rise to situations where this is no longer the case. In applications where speech is sampled and partially processed on one device and then packetized and transmitted over a digital network for further analysis on another, packets of speech data may be delayed, lost or corrupted during transmission.
This is a serious problem for current speech recognition technologies, which require data to be present even if it has additive noise. Existing Internet protocols for error free data transmission such as TM? are not suitable for interactive ASR (“Automatic Speech Recognition”) systems, as the retry mechanisms introduce variable and unpredictably long delays into the system under poor network conditions. In another approach, real time delivery of data packets is attempted, ignoring missing data in order to avoid introducing delays in transmission. This is catastrophic for current recognition algorithms as stated above.
It would be desirable to have a class of recognition algorithms and transmission protocols intermediate the conventional protocols which are able to operate robustly and with minimal delays or incomplete speech data under poor network conditions. Ideally, the protocol would have a mechanism by which loss and delay may be traded off, either in a fixed manner or dynamically, in order to optimize speech recognition over lossy digital networks, for example in a client-server environment.