1. Field of the Invention
The present invention relates generally to video streaming and more particularly to dynamic encoding of multiple video streams onto a single video stream based on input from a user.
2. Description of the Prior Art
Video and audio streaming are well-known in the art. Earlier radio stations and other audio content was streamed to computers in real-time. More recently, with the increase in processor speed, the improvement of compression algorithms, and the advent of more efficient ways to stream through networks, video streaming has become widespread. A typical system has been called x264. Streaming in general means putting a real-time audio or video content into a network such as the Internet and receiving and playing that content at a remote station. The state of the art now allows even streaming of high definition (HD) video over the Internet.
Generally, streaming involves the ability to continuously put packets containing content into the network, and, with look-ahead buffering (called VBV buffers) at the receiving end, continuously streaming the content to a decoder. The buffering is necessary since typically at the receiver, the flow of packets may not come in continuously, but rather arrive in bursts with periods between bursts with no packets. The buffering allows the receiver to recreate the real-time stream. The only requirement is that the average packet arrival rate be at least equal the transmission rate.
Recently, the ability to perform video streaming on a dynamic application such as an action video game has become possible with the appearance of what is called low latency encoding. This allows the user to provide input such as keyboard clicks and mouse movement back to a video game running on a remote server, while simultaneously receiving video depicting the game screen from that server over the network with very little lag. Current systems can reduce the lag or latency to a point where it appears that the game is being played on the user's computer.
Latency or lag is the amount of delay the system introduces into the packet stream. Typical applications that send streams of video may have latencies approaching several seconds. An example of this is the voice and video telephone system known as SKYPE™. Another example is video conferencing which may have latencies of from 500 to 750 mS. While these latencies are acceptable for mostly one-way communication with a sparse number of turn-arounds, they are totally unacceptable for video games and the like where when a user clicks the mouse, they expect an almost instantaneous response from the game.
Low latency video streaming can work either by scan line encoding or frame (or frame slice) encoding. Frame encoding uses a single pass per frame and works by having at least three frame buffers in the encoder and also in the decoder. One buffer is filling with content, while simultaneously a second buffer is being coded, while simultaneously the third buffer is being transmitted (with the reverse process occurring at the receiver). This can lead to a overall latency of around 160 mS to 200 mS (plus the network delay). Scan line encoding can reduce this latency to several milliseconds. In a scan-line system, coding is done only on a few scan lines at a time (typically three). It has been reported that it is possible to approach a latency of about 1 mS for a 1080 pixel frame at 30 frames per second. Of course the encoding of only three scan lines at a time does not result in as good an image as full frame or multiple frame encoding.
A very recent type of very low latency encoding uses frame slice (or multi-slice) encoding. In this technique, every frame is capped to a fixed maximum size. Keyframes, which typically occur between groups of frames, have been eliminated by the use of a column of so-called intra blocks that move across the video from side-to-side refreshing the video. The classical keyframe has been totally replaced by spreading it over many frames in the form of intra blocks. Motion vectors are restricted so that blocks on one side of the refresh column do not reference blocks on the other side. Also, instead of encoding an entire frame, slices of frames are encoded (in a technique called slice-based threading). Every frame is split into slices, each slice is encoded on one core typically, and then the result is stitched back together to form the coded frame. Using these techniques, it is possible to stream an 800×600 pixel video stream running 30 frames per second with an end-to-end latency of under 10 mS (not including transport delay). (See http://x264dev.multimedia.cx/archives/249)
Thus, in such a system, the total latency in effect depends only on the end-to-end network delay. With various types of tunnels and routing algorithms, this can also be drastically reduced.
The low latency systems of the prior art have been directed almost exclusively to gaming with one forward path data flow and one reverse path data flow. It would be extremely advantageous to be able to take advantage of low latency encoding to multiplex multiple video content into a stream.
It would also be very advantageous to be able to take multiple video streams, encode them on a server into a single video stream, and then stream them to a user based on user input.