In telecommunications networks, it is sometimes necessary to transmit supplemental information to a receiver—supplemental messages that are above and beyond the speech payload generated by the transmitter. Such supplemental messages are used for network signaling and for a variety of applications, e.g., authentication, display information, ring tones, enhanced features, etc. Traditionally, a number of techniques have been used to send a supplemental message from transmitter to receiver in such a way as to carry the speech payload without unacceptably degrading the voice quality perceived by the listener.
One technique for carrying supplemental messages within traditional (G.711) networks has been to over-write some of the bits of the speech payload. This over-writing technique is known in the art as “bit-robbing,” because it re-purposes part of the transmitted speech payload to send the desired supplemental message. This technique has pronounced limitations in some networks.
Bit-robbing has proven ineffective when the speech payload travels outside the boundaries of a traditional G.711 network and encounters other types of networks. In such a scenario, a subsequent network is “unaware” that a supplemental message is embedded in the speech payload and the subsequent network operates on the speech payload as though the speech payload were entirely composed of speech.
Each network compresses or otherwise encodes the speech payload depending on the transmission technology employed by that network. For example, a G.729 network transmits only certain key characteristics (also known as model parameters) of a speech payload instead of the payload itself in order to conserve network utilization, i.e., bandwidth. A G.729 network compresses an ordinary speech payload from 64 Kbps to 8 Kbps—an eight-fold compression that is quite economical, but that can produce poor results when the speech payload comprises supplemental messages.
The disadvantage of the G.729 compression technique on the transmission of speech payloads that comprise supplemental messages becomes evident when reconstructing speech at the receiving end, because distortion is caused by the changes to the key characteristics of the speech payload that occurred in the encoding phase. When some key characteristics undergo changes beyond certain ranges during encoding, the quality of the reconstructed speech—based on the received key characteristics—suffers. In other words, reconstructing and playing a speech payload that comprises a supplemental message is risky and typically results in poor voice quality as perceived by a listener.
Not only is there risk in unacceptably degrading the voice quality of the play-back speech, but too, the supplemental message is lost in this scenario. Therefore, authentication, display information, etc. are not applied at the receiver.
When the scenario is repeated such that a speech payload comprising supplemental messages must travel across yet another type of network, the degradation is exacerbated. Therefore, there is a perceived need for a way to transmit a speech payload comprising supplemental messages from a transmitter to a receiver across a variety of network types such that (i) the supplemental messages survive the cross-network boundaries and (ii) the voice quality perceived at the receiving end is acceptably good.
FIG. 1 depicts a schematic diagram of the salient portions of telecommunications system 100 in the prior art. Telecommunication system 100 comprises: microphone 101, legacy transmitter 102, telecommunications network 103-1, legacy receiver 104-1, speaker 105-1, gateway/transcoder 106, telecommunications network 103-2, legacy receiver 104-2, and speaker 105-2.
Microphone 101 is an apparatus that captures an audio signal (e.g., a person's speech, a group of people's collective speech, a music source, an audio broadcast, an audio stream, etc.) and provides it to legacy transmitter 102. For the purposes of this specification, the term “audio signal” is used interchangeably with the following terms: speech input, speech output, speech, analog signal, analog speech signal, voice, acoustic signal, sound.
Legacy transmitter 102 is an apparatus that transmits the audio signal to telecommunications network 103-1 and is described in more detail in FIG. 2.
Telecommunications network 103-1 and telecommunications network 103-2 are telecommunications networks that are capable of carrying speech from a transmitter to a receiver. Telecommunications network 103-1 carries speech originating at microphone 101 to legacy receiver 104-1 for speaker 105-1.
Telecommunications network 103-1 and telecommunications network 103-2 are technologically different from each other, meaning that they require some form of encoding, transcoding, or other manner of protocol conversion in order to carry speech originating at microphone 101 to a final destination at speaker 105-2. The protocol conversion is performed by gateway/transcoder 106. For example, telecommunications network 103-1 is a network that uses G.711 mu-law to encode speech and telecommunications network 103-2 is a network that uses G.729 to encode speech, wherein G.711 and G.729 are ITU-T standard voice encoding protocols.
Legacy receiver 104-1 is an apparatus that receives transmissions from telecommunications network 103-1 and produces output for speaker 105-1. Legacy receiver 104-1 is described in more detail in FIG. 3.
Speakers 105-1 and 105-2 are apparatuses that output audio signals (e.g., speech, music, an audio stream, etc.).
Gateway/transcoder 106 is an apparatus that is equipped for interfacing a network with another network that uses different protocols. Gateway/transcoder 106 performs the encoding, decoding, transcoding, or any protocol conversions necessary to allow transmissions from telecommunications network 103-1 to reach destinations on telecommunications network 103-2 or to traverse telecommunications network 103-2 for other destinations beyond.
Legacy receiver 104-2 is an apparatus that receives transmissions from telecommunications network 103-2 and produces output for speaker 105-2. Legacy receiver 104-2 is described in more detail in FIG. 3. Although legacy receiver 104-2 is generally analogous to legacy receiver 104-1 described above, it should be noted that in the illustrative embodiment wherein telecommunications network 103-1 and telecommunications network 103-2 are technologically diverse as to voice encoding protocols, legacy receiver 104-1 and legacy receiver 104-2 are correspondingly technologically diverse, each legacy receiver being capable of decoding the respective encoded signal received from the respective network that uses the network's respective voice encoding protocol.
Speaker 105-2 is identical to speaker 105-1. Speaker 105-2 receives a payload from legacy receiver 104-2. In the present specification, all references to speaker 105-2 equally apply to speaker 105-1.
FIG. 2 depicts a schematic diagram of the salient portions of legacy transmitter 102 in the prior art. Legacy transmitter 102 comprises: analog-to-digital converter 201 and channel encoder 202.
Analog-to-digital converter 201 is an apparatus that converts an analog audio signal received from microphone 101 to a digital signal. Analog-to-digital converter 201 converts an analog signal from microphone 101 to a 64-kilobit per second (8 kilo-sample-per second, 8-bit per sample) pulse-code modulation (“PCM”) signal, which is a digital representation of an analog signal whose magnitude is sampled at uniform intervals and which is quantized into a digital format, resulting in a digital speech signal.
Channel encoder 202 encodes the digital speech signal from analog-to-digital converter 201 into a format suitable for transmission across telecommunications network 103-1, i.e., an encoded speech signal. Thus, channel encoder 202 is a G.711 mu-law codec for a North American telecommunications network. It will be clear to those having ordinary skill in the art how to make and use other embodiments of channel encoder 202 that encodes to another voice encoding protocol, such as A-law G.711, G.722, G.729, etc., as appropriate for telecommunications network 103-1.
It will be clear to those having ordinary skill in the art that some channel encoders compress the received signal to a lower bit rate than the received signal in conformance with the respective voice encoding protocol. It will be clear to those having ordinary skill in the art that in this context a “channel encoder” is also known in the art as a coder-decoder, or a compressor-decompressor, or a “codec,” or an audio compressor, or an encoder-decoder, or an “endec” device, etc.
FIG. 3 depicts a schematic diagram of the salient portions of a legacy receiver in the prior art. It will be clear to those having ordinary skill in the art that “Speaker 105-j” means any one of the speakers depicted in FIG. 1, i.e., speaker 105-1 or speaker 105-2. It will be clear to those having ordinary skill in the art that “Telecommunications Network 103-i” means any one of the networks depicted in FIG. 1, i.e., telecommunications network 103-1 or 103-2.
Legacy receiver 104-j comprises: channel decoder 301-j and digital-to-analog converter 302-j, wherein j∈{1, 2}.
Channel decoder 301-j decodes the encoded speech signal received from telecommunications network 103-i into an estimate of the digital speech signal. Thus, channel decoder 301-1 is a G.711 mu-law codec for a North American voice network when connected to telecommunications network 103-1. Channel decoder 301-2 is a G.729 codec when connected to telecommunications network 103-2. It will be clear to those having ordinary skill in the art how to make and use channel decoder 301-j that decodes from at least one voice encoding protocol, such as A-law G.711, G.722, G.729, etc., as appropriate for the network connected to channel decoder 301-j. 
It will be clear to those having ordinary skill in the art that some channel decoders decompress the received encoded speech signal from a lower received bit rate to a higher bit rate, in conformance with the received voice encoding protocol, typically into a PCM signal. It will be clear to those having ordinary skill in the art that in this context a “channel decoder” is also known in the art as a coder-decoder, or a compressor-decompressor, or a “codec,” or an audio decompressor, or an encoder-decoder, or an “endec” device, etc.
Digital-to-analog converter 302-j is a digital-to-analog converter that converts an estimate of the digital speech signal received from channel decoder 301-j (typically a PCM signal) into an analog audio signal destined for speaker 105-j. 
FIG. 4 depicts a flowchart of the salient tasks associated with telecommunications system 100 in the prior art.
At task 401, a speech input is collected. It will be clear to those having ordinary skill in the art how to collect speech input.
At task 402, the speech input is converted from analog to a digital speech signal, typically PCM.
At task 403 the digital speech signal is channel encoded for transmission to the network.
At task 404 the encoded speech signal is transported across the network.
At task 405, the encoded speech signal reaches a decision point, i.e., whether it has reached a gateway/transcoder or a final destination. In the case where a gateway/transcoder has been reached, task 406 follows. In the case where a final destination has been reached, task 407 follows.
At task 406, the encoded speech signal undergoes transcoding to another voice encoding protocol. It will be clear to those having ordinary skill in the art that “transcoding” in this context is well known in the art as any one of several methods of converting a first encoded speech signal based on one type of voice encoding to another encoded speech signal based on another type of voice encoding. Task 406 is followed by transport across another network at task 404 and the process continues indefinitely until the transmitted encoded speech signal reaches a final destination at task 405.
At task 407, the encoded speech signal that reached a final destination at task 405 is decoded. The resultant signal is an estimate of the digital speech signal created at task 403.
At task 408, the estimate of the digital speech signal is converted to analog speech.
At task 409 the analog speech is output.