Many consumer electronic devices (i.e., client devices) nowadays, such as television set-top boxes (STBs), smart TVs, smart phones, and tablets, have some sort of HTML browser that is capable of rendering at least a subset of the HTML language. These browsers are often slow and not completely standards-compliant, with almost every browser implementation having its own shortcomings. This lack of compliance impedes the original proposition of what is currently the most recent version of HTML, known as HTML5. This version promised a standardized set of APIs and protocols to run an application (at least partially) client-side using a combination of markup and JavaScript. Hence, a standardized application execution environment via the complex family of HTML5 languages usually does not work well for these devices. Although STBs and smart TVs have become considerably more powerful, so have the requirements to implement a complex standards-compliant browser. Moreover, the browser is usually added as a ‘good enough’ addition to the software running on each device, reflecting a preference for application SDKs such as for example Google's Android or Apple's iOS SDK over a high-quality browser-based approach.
The inadequacy of embedded browser technology presents a problem in reliably executing applications, within these browser environments, as hosts for certain user-interface applications. One way of solving the shortcomings of simple, low-capability browser environments is by running the applications on a server and outputting HTML pages (of fragments thereof), or updating the HTML document object model (DOM) using JavaScript. Although this alleviates the client from running complex logic, it still leaves significant layout and rendering to be done by the client. Not only is this slow on some clients, lack of standards compliancy and software bugs force the application developer to use the common set of supported primitives of the HTML language between all clients.
It is known to encode and transmit multimedia content for distribution within a network. For example, video content may be encoded as MPEG or H.264/5 video wherein pixel-domain data is converted into a frequency-domain representation, quantized, entropy encoded, and placed into an appropriate transport format (e.g., MPEG transport stream). The video stream can then be transmitted to a client device, decoded, and returned to the spatial/pixel domain for display on a display device.
The encoding of the video may be spatial, temporal, or a combination of both. Spatial encoding generally refers to the process of intra-frame encoding wherein spatial redundancy (information) is exploited to reduce the number of bits that represent a spatial location. Spatial data is converted into a frequency domain over a small region. In general, for small regions it is expected that the data will not drastically change and therefore in the region much of the information will be stored in low-frequency components with the higher-frequency components being at or near zero. Thus, the lack of high-frequency information in a small area is used to reduce the representative data size. Data may also be compressed using temporal redundancy. One method for exploiting temporal redundancy is through the calculation of motion vectors. Motion vectors establish how objects or pixels move between frames of video. Thus, a ball may move between a first frame and a second frame by several pixels in a specific direction. Thus, once a motion vector is calculated, the information about the spatial relocation of the ball information from the first frame to the second frame can be used to reduce the amount of information that is used to represent the motion in an encoded video sequence. In practical applications the motion vector is rarely a perfect match and an additional residual pixel representation is used to compensate for the imperfect temporal reference.
Motion-vector calculation is a time-consuming and processor-intensive step in compressing video content. Typically, a motion-search algorithm is employed to attempt to match elements within the video frames and to define motion vectors that point to the new location to which objects or portions of objects have moved. This motion search algorithm tries to find for each macroblock the optimal representation of that macroblock in past and/or future reference frames, and determines the vector to represent that temporal relation. The motion vector is subsequently used to minimize the residual pixel information that is compressed in the compression process. It would be beneficial if a mechanism existed that assists in the determination of these motion vectors.
Another time-consuming and processor-intensive component of the encoding process for more advanced codecs is the process to find the optimal macroblock type, partitioning of the macroblock, and the weighing properties of the slice. H.264, for example, has four of 16×16, nine of 8×8 and nine of 4×4 luma intra-prediction modes and four 8×8 chroma intra-prediction modes, and inter-macroblocks can be partitioned from as coarse as 16×16 to as fine grained as 4×4. In addition, it is possible to assign a weight and offset to the temporal references. A mechanism that defines or assists in finding these parameters directly would improve scalability.
Many of these complex video encoding/decoding concerns are, for the purposes of ordinary video program encoding and playback, addressed in hardware (e.g., by silicon chips). However, to utilize advanced capabilities of video encoding/decoding to aid a remote application in effectively serving a client device, these functions need to be executed outside of a hardware solution. Hence, the difficulty is substantial to exploit powerful image processing subsystems in an application software environment outside of hardware support. When considering the minimal computing power of many client-side consumer electronics devices, it would not be possible to execute an application that depends on such capabilities in the client.