Region-of-interest editing of a video stream is desirable for many reasons. In a video stream, the data bits are segmented into a consecutive series of frames (or pictures), each frame defining one or more objects therein. An object can be, for example an image of a person's face. Within each frame, the position of the object is defined by a set of positional coordinates that specify the relative horizontal and vertical locations of the object within the frame.
In one example, and with reference to a video stream, region-of-interest editing involves modifying unwanted portions of the video stream while not modifying a desired portion (a “region-of-interest” portion) of the video stream. In modifying the unwanted portion of the video stream by region-of-interest editing, data outside the region-of-interest, for example extensive background in the video, is not included in the edited stream. Consequently, because of region-of-interest editing of the video stream, the data in the edited stream is reduced. Thus, for example, if, the region-of-interest comprises the image of a face of a person and a portion of the background in a video frame, then region-of-interest editing can be used to remove everything else from the video frame except the face and a portion of the background within the frame. An example of a background is a wall in room. Thus, after the region-of-interest editing is completed, only the face and the portion of the background within a frame are seen when viewing the video. In the existing art, numerous off-the-shelf hardware devices and software packages are available to perform region-of-interest editing.
Because region-of-interest editing modifies unwanted portions of the video stream, the data remaining in the video stream is reduced. Since the amount of data is reduced, the required bandwidth for transmitting the video in a computer network is reduced. Further, since region-of-interest editing reduces the size of the video stream, region-of-interest editing correspondingly reduces the need for data storage capacity and data processing capacity (i.e., CPU capacity).
Further reductions in the need for data storage and data processing capacity to process a video stream can be achieved through the use of data compression techniques. Data compression techniques typically reduce the amount of video data bits necessary to provide, for example, an acceptable quality video stream. One well known and widely used standard for compressing video data is the MPEG-2 (Moving Picture Experts Group) standard. Compressing the video data in accordance with the MPEG-2 standard generates an MPEG-2-compliant compressed video data stream. In the existing art, numerous off-the-shelf hardware compression cards and software packages are available to perform video data compression in accordance with the MPEG-2 and various other industry standards.
In the prior art, region-of-interest editing of a video stream in the compressed domain is not known. In any event, even if it is practical to perform region-of-interest editing of a video stream in real time in the pixel domain, such task is not without disadvantages. Specifically, and with regard to a video stream, because the height and width of the positional coordinates of the region-of-interest portion in the video frames are required to be encoded in the sequence header of the edited video stream, it would be necessary to start a fresh sequence whenever the region-of-interest coordinates changes.
Thus, for example, in the prior art, if a region-of-interest editing in real time of a video stream was attempted on a portion comprising an image of person's face, each time that the person moves closer to, or farther away from the camera, the region-of-interest editing in the pixel domain would need to stop and start a new stream because the positional coordinates of the region-of-interest (i.e., the person's face) in the frames are changing. Since the need to stop and start a new stream can occur very frequently, such a prior art approach will suffer from significant overhead in processing of the video stream and lead to inefficient coding. Hence, prior art approaches will ultimately result in increased complexity in the system.
Further, in the prior art, even if it is possible to edit the video stream in the pixel domain in real time to accommodate changing region-of-interest positional coordinates in the video stream, the resulting stream will consist of many concatenated sequences. Generation of such a video stream and decoding thereof is not easy to implement, as the behavior of the decoder when encountering such a video stream is not well defined. For example, during an MPEG-2 compression, typically a frame is received. The positional coordinates of an object within the frame are utilized in the compression technique. For example, as a person's face changes its position from one frame to the next in the video frame, positional coordinates are utilized by the compression technique to predict the movement of that object into the next frame. When positional coordinates are changed, a new stream need not be generated. However, if horizontal dimensions are changed, a new steam is required.
In view of the desire for region-of-interest editing of video streams in real time, and also in view of the deficiencies of the prior art, there is a need for a more efficient way to edit video streams in real time. Embodiments of this invention address this need.