Remote gaming applications, in which a server-side game is controlled by a client-side player, have attempted to encode the video output from a three-dimensional (3D) graphics engine in real-time using existing or customized encoders. However, the interactive nature of video games, particularly the player feedback loop between video output and player input, makes game video streaming much more sensitive to latency than traditional video streaming. Existing video coding methods can trade computational power, and little else, for reductions in encoding time. New methods for integrating the encoding process into the video rendering process can provide significant reductions in encoding time while also reducing computational power, improving the quality of the encoded video, and retaining the original bitstream data format to preserve interoperability of existing hardware devices.
On the first pass of a multi-pass encoding process, the cost of encoding or size of each encoded video frame is calculated before the data is efficiently packed to fit a bitrate constraint on successive passes. The benefits of multi-pass encoding are substantial, providing the highest possible quality for a bitrate constraint, but traditional multi-pass encoding requires access to the complete video file making it unsuitable for live streaming applications.
Live streaming applications typically use single-pass encoding since the video is not available in advance. The time constraints on live stream encoding impede the encoder's ability to efficiently pack the video information for a constrained bitrate. Because encoding costs are not calculated in a single-pass encode, the network traffic spikes when high-entropy frames are encoded.
Real-time rendered video is increasingly utilized in live streaming applications, like video game streaming, where high quality and constrained bandwidth are both highly valued. Rendered video, unlike recorded video, has access to additional information about each frame which can be re-used to estimate the cost of encoding the frame. In this manner, the results of a first pass in a multi-pass encoding scheme can be approximated to achieve the highest quality encoded video within a bitrate constraint. Many rendering engines have partial information about the images that will be rendered and may pre-generate encoder quality settings that can be used during runtime. In this manner, the benefits of a multi-pass encoding mode can be achieved in a live-streaming environment. However, as explained below, present computer technology remains deficient in estimating encoding quality to a sufficient degree to perform rendering of high-quality real-time rendered video while compensating for traffic spikes due to increased entropy. Moreover, there is no encoding technology that presently pre-encodes spatially, rather than temporally, replicating multi-pass encoding while remaining in a real-time environment.
U.S. Pat. No. 7,844,002 B2 (“the '002 patent”) discloses systems and methods for effectuating real-time MPEG video coding with information look-ahead in order to achieve a constant bit rate. The system is comprised of two video encoders, one of which delays the input by an amount of time relative to the other encoder's look-ahead window. In the system of the '002 patent, one of the video encoders operates as a buffer (look-ahead) device, delaying the input video frames so that the second of the video encoders, acting as the information collector/processor, will have the time needed to extract relevant information and determine an encoding strategy for the video frames. Once that strategy is determined, the coding parameters are passed to the encoder device for execution. The technology of the '002 patent is deficient in comparison to the present invention at least because it does not disclose techniques for calculating the cost of encoding frames of rendered video in a live streaming application, providing sufficiently low latency for live streaming for gaming applications, or providing techniques for using video data to maximize encoded video within bitrate constraints. The present invention is also superior because it collects and stores encoder settings for video data, which can be reused indefinitely.
U.S. Patent Publication No. US2016/0198166 A1, (“the '166 Publication”), discloses systems and methods for pseudo multi-pass encoding techniques that provide a solution for real-time encoding. The system disclosed is one in which the input video frames are down-sampled and encoded in a first pass to form a sub-group of pictures. Those sub-groups are then used to generate encoding statistics which are used to generate a set of second-pass coded frames. The techniques described by the '166 Publication are inferior to the present invention at least because the present invention teaches techniques for calculating a specific cost for encoding frames of rendered video in a live streaming application and for using such data to maximize encoded video within bitrate constraints without any down-sampling.
U.S. Pat. No. 9,697,280 (“the '280 patent”), discloses systems and methods for producing a mobile media data record from the normalized information, analyzing the mobile media data record to determine a settlement arrangement, and providing at least some of the participants represented in the mobile media record with relevant information from the settlement agreement. The systems and methods are capable of performing multi-pass encoding where outputs of a previous encoder are daisy-chained to the inputs of a next encoder resulting in a delay before the encoded file is available for consumption. To reduce latency associated with sequential encoding, while achieving equivalently high quality, successive encoding stages may be configured in a pipeline such that the output of a first encoder is fed to the input of a second, so that encoding in each encoder is offset by a small amount of time, allowing most of the encoding to run in parallel. The total latency may then approximate the sum of the latencies of each encoder from the first block read in to the first block written out. The total latency may readily facilitate real-time multi-pass encoding. Similar to the other technologies described in this section, however, the '280 patent does not disclose techniques for calculating the cost of encoding frames of rendered video in a live streaming application and for using such data to maximize encoded video within bitrate constraints, as are disclosed in the present invention.
U.S. Patent Pub. No. US 20170155910 A1 (“the '910 Publication”), discloses systems and methods for splitting the audio of media content into separate content files without introducing boundary artifacts. The '910 Publication discloses a system where the encoder segments the original content file into source streamlets and performs two-pass encoding of the multiple copies (e.g., streams) on each corresponding raw streamlet without waiting for a TV show to end, for example. As such, the web server is capable of streaming the streamlets over the Internet shortly after the streamlet generation system begins capture of the original content file. The delay between a live broadcast transmitted from the publisher and the availability of the content depends on the computing power of the hosts. However, the '910 Publication does not disclose techniques for calculating the cost of encoding frames of rendered video in a live streaming application, providing sufficiently low latency for live streaming for gaming applications, and for using video data to maximize encoded video within bitrate constraints, as are disclosed in the present invention.
U.S. Pat. No. 9,774,848 (“the '848 patent”), discloses systems and methods for the enhancement to the video encoder component of the MPEG standard to improve both the efficiency and quality of the video presentation at the display device. The technology disclosed teaches performing video compression by performing adaptive bit allocation by means of look-ahead processing. In MPEG video compression, a given number of video frames (15, 30, 60 and so on) are grouped together to form a Group-of-Pictures (GoP). Pictures within a GoP are coded either as I, P or B pictures (frames). The number of bits allocated to each GoP is made proportional to the number of frames contained in it. The system performs real-time look-ahead to collect statistics that enable adaptive bit allocation. It also discloses methods for motion estimation in which modified 3D pipeline shader payloads are able to handle multiple patches in the case of domain shaders or multiple primitives when primitive object instance count is greater than one, in the case of geometry shaders, and multiple triangles, in case of pixel shaders. A motion estimation engine is used by graphics processor components to assist with video in decoding and processing functions that are sensitive or adaptive to the direction or magnitude of the motion within the video data. The '848 patent, however, does not disclose techniques for calculating the cost of encoding frames of rendered video in a live streaming application, providing sufficiently low latency for live streaming for gaming applications, and for using video data to maximize encoded video within bitrate constraints, as are disclosed in the present invention. Further, the technology of the '848 patent acts, at best, as an assist, and does not perform precoding in the spatial manner as disclosed in the present invention. As such, it is not able replicate advantageous multi-pass encoding in the same real-time manner as the present invention.
U.S. Pat. No. 9,749,642 (“the '642 patent”), discloses systems and methods in which a video encoder determines an [motion vector] MV precision for a unit of video from among multiple MV precisions, which include one or more fractional-sample MV precisions and integer-sample MV precision. The video encoder can identify a set of MV values having a fractional-sample MV precision, then select the MV precision for the unit based at least in part on prevalence of MV values (within the set) having a fractional part of zero. Or, the video encoder can perform rate-distortion analysis, where the rate-distortion analysis is biased towards the integer-sample MV precision. Again, however, the '642 patent does not disclose techniques for calculating the cost of encoding frames of rendered video in a live streaming application, providing sufficiently low latency for live streaming for gaming applications, and for using video data to maximize encoded video within bitrate constraints, as are disclosed in the present invention.
European Patent No. EP1820281B1 (“the '281 patent”), discloses systems and methods for dual-pass encoding. The methods disclosed include the steps of: a) receiving the picture, (b) calculating a first degree of fullness of a coded picture buffer at a first time, (c) operating on the first degree of fullness to return a second degree of fullness of the coded picture buffer at a second time, (d) storing the picture for an amount of time, (e) during that amount of time, measuring a first degree of complexity of the picture, (f) operating on the first degree of complexity of the picture and the second degree of fullness to return a preferred target size for the picture, and (g) subsequently to step d, providing the picture and the preferred target size to the multi-processor video encoder, where the first time corresponds to the most recent time an accurate degree of fullness of the coded picture buffer can be calculated and the second time occurs after the first time. Again, however, the '281 patent does not disclose techniques for calculating the cost of encoding frames of rendered video in a live streaming application, providing sufficiently low latency for live streaming of gaming applications, and for using video data to maximize encoded video within bitrate constraints, as are disclosed in the present invention.
Japanese Patent No. JP06121518B2 (“'518 patent”), discloses systems and methods for encoding a selected spatial portion of an original video stream as a stand-alone video stream, where the method comprises obtaining picture element information pertaining to the selected spatial portion; obtaining encoding hints derived from a complementary spatial portion of said original video stream that is peripheral to the selected spatial portion; and encoding the selected spatial portion with use of the encoding hints. Once again, however, the '518 patent does not disclose techniques for calculating the cost of encoding frames of rendered video in a live streaming application, providing sufficiently low latency for live streaming for gaming applications, and for using such data to maximize encoded video within bitrate constraints, as are disclosed in the present invention.
U.S. Patent Publication No. 2006/0230428 (“the '428 Publication”) discloses systems and methods directed to a networked videogame system that allows multiple players to participate simultaneously. The '428 Publication discloses a server that has the ability to store pre-encoded blocks that are compressible and correspond to subsections of a video frame for a game. The system is also able to generate game content using pre-encoded blocks in response to user actions in the game. That content can then be transmitted to the user. Again, this technology does not perform precoding in the spatial manner as disclosed in the present invention, and it is not able replicate advantageous multi-pass encoding in real-time. Furthermore, unlike the technology of the '428 Publication, the present invention allows for the system to change parameters over all portions of the frames in a temporal sequence (such as resolution) during runtime and provides sufficiently low latency for live streaming for gaming applications.
U.S. Pat. No. 8,154,553 (“the '553 patent”) discloses systems and methods that are directed to a streaming game server with an interception mechanism for rendering commands, and a feed-forward control mechanism based on the processing of the commands of a rendering engine, on a pre-filtering module, and on a visual encoder. The '553 patent technology uses a graphics API to extract a set of object-level data, referring to the visual complexity and to the motion of the objects in the scene. That information is used to control the rendering detail at the GPU level, the filtering level at the video pre-processor, and the quantization level at the video encoder. The system also computes a motion compensation estimate for each macroblock in the target encoded frame in a video encoder. Similar to the other technologies discussed herein, the system disclosed in the '553 patent does not perform precoding in the temporal or spatial manner disclosed in the present invention, and it is not able to replicate advantageous multi-pass encoding in real-time because it, in fact, drops frames in response to bitrate peaks. Furthermore, unlike the technology of the '428 Publication, the present invention allows for the system to provides sufficiently low latency for applications live game streaming.
As is apparent from the above discussion of the state of the art in this technology, there is a need in the art for an improvement to the present computer technology related to the encoding of real-time game environments.