It is known to provide dedicated hardware configured to perform video encoding, involving receiving a video sequence and encoding that video sequence into a compressed version which may be output as an encoded bitstream. Contemporary video encoders can be highly efficient, achieving both a very high level of compression of the input video sequence, using contemporary video compression formats such as H.264 or VP8, and by parallelizing the video encoding process to be performed by multiple processor cores.
The parallelization of the video encoding process across multiple processor cores may for example be implemented as shown in FIG. 1A, which represents a frame of a video sequence subdivided into macroblocks. In the single core example shown on the left, the processor core may simply proceed in raster scan order, and as represented in FIG. 1A, the hashed macroblocks have already been encoded by the processor core and the processor core is currently encoding the macroblock marked with a dot. In a multicore set up, the parallelization may be achieved by dividing the frame of the video sequence into slices where, as illustrated on the right in FIG. 1A, the two slices into which this (partial) frame has been subdivided are encoded independent of one another, one slice being allocated to a first processor core and the second slice being allocated to a second processor core.
The video encoding process itself is known to be provided as schematically illustrated in FIG. 1B. This shows how the macroblocks of the input video sequence are first subjected to a motion estimation process, then a transform is applied (such as the well known discrete cosine transform), and then quantization of the transformed coefficients is performed to achieve some of the data compression required.
The final stage of the encoding process is represented by the entropy coding block in FIG. 1B, after which the encoded macroblocks of the video can be output as an encoded bitstream. FIG. 1B also shows how size information relating to the output bitstream is fed back to the transform & quantization stage and in particular this bitrate information is applied to the quantization step to determine the level of quantization to apply to the transformed coefficients in order to enable the output bitstream bitrate to be maintained at a desired level.
It is also known, when seeking to implement a video encoder in a multi-core system, that advantage may be derived from splitting the video encoding process into two distant stages. This is schematically illustrated in FIG. 2. In this configuration in a first stage the motion estimation and transform & quantization processes mentioned above with reference to FIG. 1B are divided between the multiple processor cores available by allocating macroblocks to processor cores on a stripe basis, where a stripe represents a horizontal band of macroblocks (e.g. two macroblocks high) across a frame of the video sequence. The staggered nature of the processing of the respective stripes corresponds to the fact that the macroblocks of each stripe may have dependencies to on macroblocks of a previous stripe, and hence a certain time lag in the processing of each stripe is necessary to enable these dependencies to be resolved. The transformed & quantized macroblocks produced by each processor core at stage 1 are stored in an intermediate buffer from where they may be retrieved for the second stage of the video encoding process to be carried out. The second stage of the video encoding process is the entropy coding mentioned above with respect to FIG. 1B and, as shown in FIG. 2, is carried out on a slice basis by the respective processor cores in order to generate the output encoded video bitstream.
FIG. 2 also illustrates the fact that information from the output encoded video bitstream (in particular bitrate information) is fed back to the stage one video encoding process, so that the selection of a quantization parameter in the process of quantizing the transform coefficients may be made and a target bitrate for the output encoded video bitstream maintained. However, some disadvantages may arise in a video encoder configured in the manner represented in FIG. 2. On the one hand the number of macroblocks encoded depends on the timing of the individual cores, with the result that the bitstream output becomes dependent on the particular hardware timing. As a result, the encoding performance of such a configuration lacks consistency and repeatability which is undesirable. Secondly, the final number of bits used to encode a particular macroblock is not known until after the second stage of processing (bitstream entropy encoding) which can take place some time after the “stripe processing” stage (stage 1), giving a potentially significant latency to the feedback of the bitrate information.