For quite some time telephone systems, including both private telephone systems, wireless telephone service provider systems, and PSTN service provider systems have included voice mail systems.
The voice mail functions of a telephone system are typically provided by a voice mail server. Upon the occurrence of certain events, such as a telephone line being busy or unanswered, the telephone system will signal the call (in a process typically referred to as roll-over” to the voice mail server. The voice mail server will receive the telephone call thereby opening a recording session.
During the recording session, the voice mail server will prompt the caller to leave a message, capture the audio stream from the caller, and store the captured audio stream for subsequent play back to the voice mail box owner. The recorded message is typically stored in digital form on magnetic media.
At some later point in time, the voice mail box owner may call the voice mail server to establish a play back session. During the play back session, the server will prompt the voice mail box owner to authenticate him or her self, retrieve the stored message, and generate an audio stream to the telephone system from which the voice mail box owner called into the server.
If the voice mail server is part of a circuit switched telephone system, both the recording session and the play back session will take place over circuit switched channels. During recording, the voice mail server will capture the analog audio or digital audio from the circuit switched channel. Digital audio can be stored in its received form and analog audio is readily digitized using known A/D converter systems.
It the voice mail server is part of a VoIP telephone system, both the recording session and the play back session will take place over IP channels. More specifically, when setting up the recording session, the server and the remote VoIP device (used by the caller) will negotiate a specific compression algorithm. Then, during the recording session, the voice mail server will receive a sequence of RTP packets over a UDP/IP channel. Each RTP packet includes one or more audio frames compressed using the negotiated compression algorithm. Each compressed audio frame represents a fixed time interval (on the order of 10 milliseconds) of digital audio. The server sequences and decompresses each compressed audio frame to regenerate digital audio for storage.
At a later time, when setting up the play back session, the server and the remote VoIP device (used by the voice mail box user) will negotiate a specific compression algorithm—which may be different than the compression algorithm used during the recording session. Then, during the play back session, the voice mail server will generate a sequence of RTP packets for sending to the remote VoIP device over a UDP/IP channel. Each RTP packet includes one or more audio frames compressed using the negotiated compression algorithm. The remote device sequences and decompresses each compressed audio frame to generate acoustic audio for the voice mail box user.
If the system is a hybrid type of system, it may support both: i) VoIP recording sessions and play back sessions; and ii) circuit switched recording and play back sessions. In such an embodiment, the session type of the recording session (e.g. VoIP or circuit switched) does not need to match the session type of the play back session.
In a separate field of development, technology for transmitting motion video over IP networks has been developed. During an IP video session, a transmitting video IP device will receive a sequence of image frames from a video camera, compress each image frame of the sequence, and transmit each compressed image frame to a receiving video IP device.
The receiving video IP device will decompress each compressed image frame of the sequence, and sequentially display each image frame of the sequence, to generate a motion video display.
The International Telecommunication Union (ITU) has recommended the H.263 standard entitled Video Coding for Low Bit Rate Communications as a standard for compression motion video for telephony. The existence of a standard facilitates the development of video IP devices.
There exist significant differences between transmitting motion video over an IP channel and transmitting audio over an IP channel. First, each compressed audio frame represents digital audio over a short duration of time. Each compressed video frame represents an image of the video stream at a fixed instant in time. Secondly, each audio frame is encapsulated in a single RTP packet and can be decompressed without reference to any other frames. The significance of this is that if an audio frame is lost in transmission, the audio for the duration of time represented by the lost frame is lost, but neither the preceding audio frame nor the following audio frame is affected.
On the other hand, sequential video frames are mostly interdependent. Each video frame is compressed into independent frames and dependent frames. The independent frames, referred to as “intra frames” or “i-frames”, can be decompressed to generate a complete image frame without reference to a preceding frame or a following frame. The dependent frames can be of two types. The first type may be referred to as “predictive frames” or “p-frames”. A p-frame represents the difference between such frame and the preceding p-frame or i-frame. As such, a p-frame can not be decompressed without reference to the preceding p-frame or i-frame. The second type may be referred to as “bi-directional frames” or “b-frames”. A b-frame can only be decompressed with reference to one or more preceding i/p frames and optionally one or more following i/p frames.
The ratios between i-frames, p-frames, and b-frames is not fixed, but is dependent on encoding algorithms and dependent on the video image content—or scene. As such, the quantity of dependent frames between independent frames varies based on the video content. As such, the time duration between independent frames also varies based on video content.
A problem with interdependent frames is that if a frame is lost during transmission, all subsequent frames that depend thereon are also lost. The image on the receiving IP device will freeze until the next independent frame is received. Frame loss is further exacerbated by the fact that each image frame can be transmitted in multiple RTP packets. Loss of any RTP packet will cause the loss of the entire frame.
It has been proposed to combine video transmissions with voice mail server technology to provide video mail services. A problem exists in that implementing a video mail server causes the video message to be transmitted twice. The first transmission is from the caller to the video mail server. The second transmission is from the video mail server to a user “calling in” to retrieve video mail messages.
As such, frame loss exposure is doubled. Frames can be lost in the transmission from the caller to the voice mail server resulting in freeze frame periods stored on the voice mail server. When transmitted to the user, the freeze frame periods that already exist at the sever will be sent to the user and, in addition, further freeze frame periods will be created due to lost frames in the transmission to the end user.
As such, a need exists for a video mail solution that operates in conjunction with legacy video IP devices and does not suffer the disadvantages and impracticalities discussed above.