The present invention relates to speech recognition systems, and more particularly, but not exclusively, relates to speech recognition techniques in telephony applications.
Various methods are used in telephony applications for automating dialing of a telephone. Dialing can be accomplished by using speed dial or pre-recorded, person-specific voice commands. However, these methods typically require recording or entering information into the respective phone for each different user.
In streaming audio systems, such as speech recognition systems, real time protocol (RTP) and user datagram protocol (UDP) are typically used because they are usually best-suited for handling real-time transmissions. However, these protocols lack a reliable delivery mechanism. RTP packets are also difficult to work with because they can be received out of order or duplicated and there is little more than the physical sequence number of transmission to reorder them. Streaming audio systems face further difficulty in managing memory. Multiple buffers are typically created to handle the various phases an audio packet passes through. In some cases, buffers are allocated for the largest possible packet size. Since the larger packets are rarely received, this approach results in a large portion of allocated memory being unutilized. On the other hand, when buffers are allocated to only handle the typical packet size, larger packets cannot be handled. Available memory is over-run by such buffer arrangements and/or transmission quality suffers. Still another drawback of current speech recognition systems specifically is the difficult task of integrating with multiple speech engine vendors or changing between incompatible vendors.