1. Field of the Invention
The present invention relates generally to improved methods and apparatus for recording and playing back speech, and more particularly, to the recording and playing back of two way telephonic conversations using a digital wireless phone.
2. Dissussion of the Prior Art
Many telephone answering devices (TADs, also known as "answering machines") provide a "two way conversation record" feature, which allows the user to record both near-end and far-end sides of a telephone conversation and later playback the recorded two way conversation. When the user of the TAD, also referred to as the near-end party, activates the conversation record feature of the TAD, then a telephone conversation between the user and a far-end party is recorded as follows. The far-end and the near-end audio speech signals are mixed (i.e. added together with appropriate gains), and the resulting mixed audio signal is recorded in the same way that a message is recorded on the TAD. In particular, if the TAD is digital, the mixed audio signal is fed into a mixed speech encoder for conversion to speech packets which are written or saved into a memory.
Play back of the recorded mixed conversation in the TAD is performed in exactly the same way as a playback of a recorded message. More particularly, if the TAD is digital, the stored speech packets are fed into a speech decoder which synthesizes and reconstructs the mixed audio signal for input to audio output circuits including a speaker.
Mobile wireless telephones are often used in situations where it is difficult for the user to write notes of important details of telephonic conversations. Therefore, the feature of two way conversation recording is even more useful for wireless phones than for stationary, wired phones or TADs.
Many wireless phones include a nonvolatile memory ("recording memory") and a processing unit similar to the ones found in a digital TAD. Thus, adding a conversation recording capability to wireless phones is easily achieved with minimal cost, while providing a great benefit to the user.
The method of conversation recording used in TADs is suitable for implementation in analog wireless phones. However, using this recording method in digital wireless phones suffers from various drawbacks, such as exceeding the capability of typical digital signal processors (DSPs) included in wireless digital phones. Thus, a more complex and costly DSP is required. Further, the quality of the played-back recorded speech is degraded due to tandeming, as will be described below.
FIG. 1 shows a conventional wireless digital telephone 10 having a DSP 12 which performs numerous functions in a very short time to maintain an acceptable quality of telephonic conversations. During a conversation or call, the DSP functions include filtering, coding, decoding, error correction, tone generation, echo cancellation, muting and voice activity detection. These and other tasks increase the workload of the DSP, referred to as MIPS usage of the DSP, where MIPS is the acronym for million instructions per second.
The DSP 12 communicates with a nonvolatile memory 14 for storing information, a codec 16 for converting signals between digital and analog formats, a microcontroller 18 for managing operation of the phone such as detection of pressed keys, and a transceiver 20. The transceiver 20 is connected to an antenna 22 for the transmission and reception of signals. The codec 16 is also connected to a microphone 24 and a speaker 26.
The microcontroller 18 and transceiver 20 are connected to a system bus 28. Other elements of the phone 10 are connected to the system bus 28, such as a hardware control unit 30, a hardware monitor 32, a display 34, a keypad 36 and memory units which include a read only memory (ROM) 38 and a random access memory (RAM) 40.
FIG. 2 shows in greater detail modules used for recording and playback of telephonic conversations in a conventional digital wireless telephone 50. These modules include a cellular phone operation module 52, a conversation record module 54, and a conversation playback module 56.
The operation module 52 includes the codec 16 which is connected to the microphone 24 and to a speaker 26. Speech or audio signals from the near-end, i.e., the phone user, are provided from the microphone 24 to the codec 16, which digitizes the speech and provides digital transmission speech samples to a speech encoder 58. The speech encoder 58 encodes the digital transmission speech samples into a compressed form and provides digital transmission speech packets to a transmission channel encoder 60, which performs error correction encoding, and outputs a transmission bit stream to the transceiver 20 for modulation and transmission to the far-end.
Modulated radio frequency (RF) signals are received by the transceiver 20 from the far-end through the antenna 22 shown in FIG. 1. The received bit stream undergoes the reverse operations of the transmission bit stream. In particular, the received bit stream is decoded by a reception channel decoder 62 to provide reception digital speech packets to a reception speech decoder 64. The reception speech decoder 64 converts the reception digital speech packets to reception digital speech samples which are provided to the codec 16 for conversion to analog form and output to audio circuits and the speaker 26 for playback, as is typically performed in wireless communications.
Recording two way conversations between a far-end user and a near-end user, namely, the user of the conventional record and playback wireless telephone 50, is performed as follows. The transmission speech samples from the codec 16 and the reception speech samples from the reception speech decoder 64 are provided to the conversation record module 54 of the conventional record and playback phone 50.
In particular, transmission speech samples (i.e., near-end samples) and the reception speech samples (i.e., far-end samples) are provided to a mixer 70 through respective amplifiers 72, 74. The mixer 70 combines the near-end samples with the far-end samples and outputs mixed speech samples to a mixed speech encoder 76. The mixed speech encoder 76 encodes the mixed speech samples to form mixed speech packets, which are provided to the nonvolatile memory 14 for storage. FIG. 3 illustrates typical contents 90 of the nonvolatile memory 14, namely, the recorded mixed conversation, where each stored frame 92 is a mixed speech packet formed from the encoding of the mixture or combination of the far-end and near-end speech samples.
Returning to FIG. 2, the recorded conversations are stored as a mixture or combination of the far-end and near-end speech packets, in the nonvolatile memory 14. Playback of these conversations is provided by reading out the stored packets which are then provided to a mixed speech decoder 78 of the conversation playback module 56. The mixed speech decoder 78 decodes the mixed speech packets and outputs mixed speech samples. The mixed speech samples are provided to the speaker 26 through a switch 80 and the codec 16, which converts the mixed speech digital samples to analog audio signals.
The switch 80 selectively connects the mixed speech digital samples or the far-end reception speech samples from the speech decoder 64 to the codec 16, under the control of the DSP, for example, in response to an input from the user of the phone 50 to playback the recorded conversation. Similar to playback initiation, recording may be initiated in response to a user input, such as pressing a key on the keypad 36 (FIG. 1). Recording may be enabled in response to the user input using several approaches known in the art. For example, switches may be included at the inputs of the amplifiers 72 and 74 for controllably connecting and disconnecting these inputs to and from the operation module 52, or in other words, to and from the transmission speech encoder 58 and reception speech decoder 64, respectively.
Typically, the DSP 12 (FIG. 1) includes all the elements shown in FIG. 2, except for the transceiver 20, the nonvolatile memory 14, the microphone 24, the loudspeaker 26, and the codec 16. The conventional wireless digital phone 50 is equipped, by default, with the operation module 52 which performs speech encoding and decoding, which are also the engine of a digital TAD. Thus, providing record and playback features to a digital phone takes advantage of the pre-existing operation module 52 and only requires the addition of the record module 54 and the playback module 56, which share the nonvolatile memory 56.
The conventional wireless digital phone 50 requires an additional encoder, namely, the mixed speech encoder 76. To encode the mixed speech samples, an additional instance of encoding is performed. However, running a speech encoder consumes a large portion of the capacity of the DSP, typically accounting for about 40% of the real time load of the DSP which operates at near maximum capacity during a normal telephone conversation. In particular, during a normal telephone conversation, the DSP supports the running of the transmission speech encoders 58 and channel encoder 60, as well as the two reception decoders 62 and speech decoder 64 and various modem functions.
To additionally support the running of the mixed speed encoder 76, which will account for about 40% of the real time load on the DSP, is likely to exceed the real time capacity of the DSP used in conventional wireless telephones. Thus, to meet the computational requirements during conversation recording as outlined above, a more powerful DSP is required, which increases both the cost and power consumption of the system. Such increases are highly undesirable.
Another concern relates to the quality of the recorded conversation. In particular, the quality of the recording of the far-end reception speech, which is usually more important to the user than recording the near-end transmission speech, is degraded due to being encoded and decoded twice. In order to generate the mixed signal from the mixer 70 as outlined above, the far-end and near-end speech signals are added. However, the far-end signal which is available at the phone 50 has already been subjected to speech encoding at the far-end (via a speech encoder similar to the transmission speech encoder 58 in the near end phone 50 shown in FIG. 2) and to decoding at the phone by the reception speech decoder 64. Thus, when the mixed signal is encoded by the mixed speech encoder 76 for storage in the memory 14 on the phone 50, its far-end component is encoded for a second time; having encoded for the first time by the transmission encoder of the far-end phone for reception by the transceiver 20 of the near-end phone 50. This double encoding is called "tandeming" and is known to degrade the output speech quality upon playback.
Accordingly, there is a need for a wireless telephone that provides high quality record and playback capability of two way conversations without the need of a more powerful DSP, thus minimizing cost and power consumption. Such an approach also advantageously serves to increase the time between battery charging.