1. Field of the Invention
The present invention is generally related to video streaming and, more particularly, to methods and systems for implementing digital video cassette recording (VCR) functionality, such as random access, fast-forward, and fast-backward, on a video streaming system.
2. Description of the Related Art
Today""s multimedia technologies allow network service providers to offer versatile services such as home shopping, games, video surveillance and movie on demand. In these applications, video streaming technology plays an important role in media delivery. A video streaming system should be capable of delivering concurrent video streams to a large number of users. The realization of such a system presents several challenges, such as the need for high storage-capacity and throughput in the video server and the high bandwidth in the network to deliver a large number of video streams. In addition, with the proliferation of online multimedia content, it is also highly desirable that multimedia streaming systems support effective and fast browsing.
A key technique that enables fast and user friendly browsing of multimedia content is to provide full VCR functionality. The set of effective VCR functions includes forward, backward, stop (and return to the beginning), pause, step-forward, step-backward, fast-forward, fast-backward, and random access. This set of VCR functions allows the users to have complete control over the session presentation and is also useful for other applications such as video editing.
With the establishment of video coding standards, it is expected that many video sequences for streaming applications will be encoded in MPEG (Moving Picture Experts Group) or H.26x (ITU-T Recommendation H.261 or H.263) formats. However, the implementation of full VCR functionality with the MPEG/H.26x coded video is not a trivial task. MPEG/H.26x video compression is based on motion compensated predictive coding with an I-P or I-B-P frame structure. Here, I, P, and B frames represent intra, predictive, and interpolated frames, respectively.
FIG. 1 shows a group-of-picture (GOP) structure of MPEG. The I-P or I-B-P-frame structures allow a straightforward realization of the forward-play function, but impose several constraints on other trick modes such as random access, backward play, fast-forward play, and fast-backward play, for reasons discussed below.
With the I-B-P structure, to decode a P frame, the previously encoded I/P-frames first need to be decoded. To decode a B-frame, both the I/P-frames before and after this B-frame first need to be decoded. To implement the backward-play function, a straightforward implementation is for the decoder to decode the whole GOP, store all the decoded frames in a large buffer and play the decoded frames backward. However, this will require a huge buffer (e.g., an N-frame buffer, if the GOP size is N) in the client machine to store the decoded frames, which is not desirable. Another possibility is to decode the GOP up to the current frame to be displayed, and then go back to decode the GOP again up to the next frame to be displayed. This does not require the huge buffer but would require the client machine to operate at an extremely high speed (up to N times the normal decoding speed), which is also not desirable. The problem soon becomes impractical when the GOP size is large.
Besides the problem with backward-play, fast-forward/backward and random-access also present difficulties. When a P/B-frame is requested, all the related previous P/I-frames need to be sent over the network and decoded by the decoder. This requires the network to send all the related frames besides the actually requested frame at a much higher rate which can be many times that required by the normal forward-play. When many clients request trick-modes, it may result in much higher network traffic compared to the normal forward-play situation. It also requires high computational complexity in the client decoder to decode all these extra frames.
One way to solve the problem with fast-play may be to only send the I-frames for the trick-modes while encoding the video with the I-B-P or I-P structure. However, if the applications use a very large GOP-size, or require high-precision in video-frame access, sending I-frames only may not be acceptable. Also, this method does not resolve the problem for backward-play.
Another way to solve the problem with implementation of VCR functionality is encoding all the frames of the video as I-frames. This will result in the lowest complexity requirement for the client machines. However, it will require very large server storage and network bandwidth since the I-frames will result in high bit rates. Since the network bandwidth usually is the greatest concern, it would be preferable to encode the video with the I-B-P or I-P structure that can achieve high compression ratios for transport over a network with minimum bandwidth resources.
Therefore, methods and systems consistent with the present invention are directed to implementing VCR functions for MPEG/H.26x compressed video in video-on-demand (VOD) or streaming video applications, while minimizing extra network traffic and video decoder complexity, and retaining a desirable quality of decoded pictures.
Methods and systems consistent with the present invention provide an encoded video stream from a server to a client over a network.
The server has a memory for storing a forward-encoded bit-stream and a reverse-encoded bit-stream for video data. The forward-encoded bit-stream includes first frames encoded without inter-frame dependencies and second frames encoded depending on forward-direction preceding frames, and the reverse-encoded bit-stream includes third frames encoded without inter-frame dependencies and fourth frames encoded depending on reverse-direction preceding frames. The server reads out selected frames among the first, second, third, and fourth frames in accordance with a request with a video cassette recording (VCR) function from the client, and transmits the selected frames to the client.
The client that has transmitted the request to the server receives the frames selected from among the forward-encoded bit-stream and the reverse-encoded bit-stream, and then decodes the frames in an order as received, by predicting a next frame in one direction from a currently-decoded frame in another direction if switching between the forward-encoded and reverse-encoded bit-streams occurs in the frames received.