Digital signal compression is widely used in many multimedia applications and devices. Digital signal compression using a coder/decoder (codec) allows streaming media, such as audio or video signals to be transmitted over the Internet or stored on compact discs. A number of different standards of digital video compression have emerged, including H.261, H.263; DV; MPEG-1, MPEG-2, MPEG-4, VC1; and AVC (H.264). These standards, as well as other video compression technologies, seek to efficiently represent a video frame picture by eliminating or reducing spatial and temporal redundancies within a given picture and/or among successive pictures. Through the use of such compression standards, video contents can be carried in highly compressed video bit streams, and thus efficiently stored in disks or transmitted over networks.
Many codecs make use of different types of coding of frames. Examples of different frame coding formats include Intra-coded frames (I-frames), predictive coded frames (P-frames) and bi-predictive coded frames (B-frames). In general terms, an I-frame is coded without reference to any other frame. An I-frame can be decoded independent of the decoding of any other frames. I-frames may be generated by an encoder to create a random access point that allows a decoder to start decoding properly at the location of the I-frame. I-frames generally require more bits to encode than P-frames or B-frames.
P-frames are coded with reference to one or more other frames, such as an I-frame or another P-frame. A P-frame contains changes in the image from one or more previous frames. Decoding a P-frame requires the previous decoding of one or more other frames. P-frames require fewer bits to encode than I-frames. B-frames are similar to P-frames but contain image differences with respect to both previous and subsequent frames. B-frames can be coded in some prediction modes that form a prediction of a motion region within the frame by averaging the predictions obtained using two different previously-decoded reference regions. B-frames require fewer bits to encode than I-frames or P-frames.
The coding of video streams into bitstreams that contain I-frames for transmission over the Internet is subject to certain problems. One problem is compression delay. Even though an I-frame typically requires more bits than a P-frame or B-frame it takes more time to compress and encode a video image as a P-frame or B-frame than as an I-frame. Another problem is referred to as bit-rate jitter. Because I-frames consume much more bit counts than P-frames or B-frames, the bit rate for producing encoded pictures is uneven. Additionally for each section several different parameters must be encoded within the video stream to enable proper decoding. These parameters are additional bits that must be added to the encoded video stream and thus increase the size of the encoded bit stream. It would be more desirable to have a smaller bit stream and a thus a smoother bit rate.
A field of recent development that has had an impact on a wide range of other fields is neural networks (NN). Neural networks have been applied successfully in a myriad of fields including image recognition, voice recognition and handwriting recognition as well as stock market prediction. A neural network at its simplest level is a series of nodes with transition weights and internal biases. An input, referred to as a feature, is provided to the neural net. When the neural network is being trained the input will have a desired result, called a label. To train the neural network to produce the correct label for the feature the weights are adjusted using a cost function over numerous attempts until the label is given correctly for the particular feature. A common type of neural network used in applications such as image recognition and stock market prediction is the recurrent neural network (RNN). The RNN adds a second output to the typical node network design; this second output may simply be a repetition of the node itself. The second output represents an added memory component which allows the network to maintain unbounded history information about the features and related labels. This repetition may be thought of as an additional hidden node layer which has the same transition weights and biases as the previous layer.
It is within this context that aspects of the present disclosure arise.