The present invention relates to a message storage device and a moving image message processor.
Described embodiments relate to a method and system for providing a video-enabled enhancement of audio voicemail features to users of videotelephony networks in an enterprise environment, particularly in large enterprises with 5000 or more people.
Electronic voice communications today are carried out by a great variety and range of devices, such as traditional land-line or fixed-line telephones and mobile phones as well as smartphones. Today, land-line telephones mainly have a dual-tone multi-frequency signalling (DTMF) or Touch-tone keypads including numeral keys 0 to 9 as well as a “*” key and a “#” key together with, in enterprise environments, some extra function keys to, for example, transfer calls, put a caller on hold or access voicemail. Mobile telephones typically also have DTMF keypads, but also some other function keys which vary in number and function from manufacturer to manufacturer and model to model. Some of these function keys may be user definable. Smartphones, on the other hand, typically have a touch-screen which defines keys whose number and function may be varied and defined both by a user, an app or application running on the smartphone or from an external input. A DTMF keypad may be defined as well as other arrangements.
Recently, real-time video communications (also including voice) have become more wide spread and may be carried out using a great variety of devices including smartphones, typically using proprietary video networks such as Apple Facetime (registered trade mark) or Skype (registered trade mark) (both also useable using a standard desktop or laptop personal computer), as well as fixed-line systems often seen in enterprise environments such as (1) “all-in-one” personal videoconferencing telephones or video-enabled Internet protocol (IP) telephones, such as the Tandberg T150, that look much like a conventional fixed-line phone with the addition of a camera; (2) videoconferencing room systems or Integrated Services Digital Network (ISDN) videoconferencing devices, such as the Tandberg Telepresence T3; and (3) IP videoconferencing endpoints or software plug-ins for general-purpose computers connected to voice-only telephones, such as the Cisco VT Advantage. All of these devices have different functionality and capabilities, including different user interfaces and different screen sizes, resolutions and configurations.
A “video-voicemail” system is an arrangement in which a caller calling another person's device (the callee) is provided with a recorded message if the callee is not available typically along the lines of, “This is Bob's phone. Please leave a message after the tone”. The caller is provided with the ability to record a message for the callee such as “Please call me when you are free”. In a video-voicemail environment, these messages may include video, such as of the caller or callee speaking, as well as audio.
As a result of the great variation in accessing devices described above, compatibility of enterprise “video-voicemail” systems for video and audio calls has been problematic and a reliable service has not been available. This is one reason why such systems have not become widely accepted.
Known video/voicemail systems are typically tied to one specific video network such as Apple Facetime (registered trade mark), and are little more than a video equivalent of an answerphone where the caller can leave a message and the callee can retrieve it.
Advanced features (for example, folders, group mailboxes, message forwarding, and message multicast and broadcast) found in voice-only enterprise-class voicemail systems are not found in known video/voicemail systems.
Embodiments of the invention described herein address these problems of lack of cross-video-network compatibility and features to provide a video/voicemail system that may be used with great variation in the functionality of accessing devices or endpoints.
Video data for video/voicemail systems requires large amounts of electronic data and thus it is compressed both for storage (to reduce the storage space required to manageable levels) and transmission (to reduce bit rates required to manageable levels).
There are differences in functionality between video endpoints. For example, there may be differences in video compression protocols that they can handle, and differences in permitted image sizes. Embodiments of the invention described herein provide a video/voicemail system to handle these differences effectively and efficiently.
Another reason why video-voicemail systems have not been widely accepted is usability.
In audio-only voicemail systems, audio data is typically stored uncompressed because little electronic data needs to be stored to represent the voicemail and it can be transmitted at a low bitrate. During playback of audio-only voicemail messages, users have become used to precisely being able to fast forward or rewind the message to, for example, listen again to a telephone number that has been said. As voicemail data is typically uncompressed, this is simple to implement.
In contrast, for video systems, as mentioned above, video data requires relatively large amounts of electronic data and thus it is compressed both for storage and transmission. The compression of the data is generally optimised to minimise bit rate and latency. For example using the H.264 format, greatly simplified, compression is done by sending a starting frame (key frame or intra frame) including a full image and subsequent difference frames which describe how parts of the image have changed since previous frames. In other words, the key frames are decodeable independently of other image frames while subsequent frames are decodeable dependent on other frames. Keyframes require relatively large amounts of storage space and bandwidth or bit rate compared to difference frames. Thus to reduce these requirements, the number of key frames is minimised. In a broadcast video system such as digital TV, key frames are sent at regular short intervals so that, for example, when someone changes to a new channel a TV will only after a short delay receive a key frame and will therefore be able to start displaying video. However, in a videoconferencing network the receiver is able to send messages to the transmitter. This enables the receiver to request that the transmitter send a key frame. Thus, the video network is able to avoid sending key frames at all unless they are actually needed, allowing the bit rate to be kept low.
A video stream stored in a voicemail system using a file formatted in this way may be readily played from start to finish. However, users require to fast forward and rewind video messages or, in other words, seek to a point in the video stream and the methodology described above of storing and transmitting video data means that this is not possible in a way that provides a good user experience. This is because a required image at a particular point in time is likely to be represented only by a difference frame. Thus, to display the image represented at this point in time the other frames required to decode the difference frame must be sought and themselves decoded. This is a time consuming process and leads to jerky and/or slow “fast” forwarding or rewinding giving a very poor user experience. Embodiments of the invention described herein address this problem to effectively provide a good user experience and smooth fast forwarding and rewinding of video in voice/video mails.