1. Field of the Invention
The present invention relates generally to speech and audio signal processing. More particularly, the present invention relates to encrypted speech and audio signal processing.
2. Related Art
In a conventional voice-over-packet (“VoP”) system or voice over IP (“VoIP”) system, telephone conversations or analog voice may be transported over the local loop or the public switched telephone network (“PSTN”) to the central office (“CO”), where speech is digitized according to an existing protocol, such as G.711. From the CO, the digitized speech is transported to a gateway device at the edge of the packet-based network. The gateway device receives the digital speech and packetizes it. The gateway device can combine G.711 samples into a packet, or use any other compressing scheme. Next, the packetized data is transmitted over the packet network, such as the Internet, for reception by a remote gateway device and conversion back to analog voice in the reverse manner as described above.
For purposes of this application, the terms “speech coder” or “speech processor” will generally be used to describe the operation of a device that is capable of encoding speech for transmission over a packet-based network and/or decoding encoded speech received over the packet-based network. As noted above, the speech coder or speech processor may be implemented in a gateway device for conversion of speech samples into a packetized form that can be transmitted over a packet network and/or conversion of the packetized speech into speech samples.
More recently, there has emerged an interest to provide secure voice calls over packet networks through encrypted voice technology. With voice encryption, speech data is converted into encrypted voice data in a form which cannot be understood by unauthorized users as it is transmitted over packet networks. Upon receipt, the encrypted voice data is decrypted into a form which can be understood by authorized users. A number of problems, however, are presented by current encryption techniques employing block-based algorithms. For example, currently deployed voice codecs generally encode speech samples into encoded packet sizes which are different from other voice codecs. Moreover, these various encoded packet sizes are to a large extent not proportional to the block sizes used for block-based encryption algorithms. This disparity between the block sizes of encoded voice packets and the block sizes used for block-based encryption algorithms results in voice transmission delays and/or increased packet loss, and, as a consequence, degrades VoP quality and performance.
By way of illustration, one conventional technique for handling the disparity between the block sizes of encoded voice packets and the block sizes used for block-based encryption algorithms is for the transmitter to wait for two successive encoded voice packets and to use data in the second encoded voice packet to make up for data lacking in the first encoded voice packet; in this way, a portion of the first encoded voice packet overlaps with a portion of the second encoded voice packet to generate an “overlap packet.” However, this approach results in significant and unacceptable delay, particularly for codecs employing large encoded voice packet sizes. Furthermore, the receiver may also be required to wait for two encrypted packets associated with two successive encoded voice packets before being able to decrypt packets associated with one of the encoded voice packets, where for example, the speech sample block of the first encoded voice packet to be decrypted overlaps with a portion of the second encoded voice packet. Due to the large sizes of encoded voice packets, the additional wait time by the receiver severely degrades the receiving process. Moreover, in the event of packet loss of the overlap packet (a single packet), the speech data associated with both the first encoded voice packet and the second encoded voice packet (two packets) are lost, resulting in significantly increased packet loss as a result of the conventional technique described above.
Another problem with the conventional solution arises due the conventional technique of padding voice packets with addition data chosen using arbitrary methods in order to proportionally size the voice packet to the encryption unit block size. As a consequence, this padding technique increases the final encrypted voice packet size, resulting in degraded VoP performance.
Accordingly, there is a strong need in the art for an encryption processing apparatus and method which provides improved encryption data handling for voice of packet networks and can overcome the shortcomings in the art.