1. Field of the Invention
Applications in video surveillance as well as many other applications of video technology require video images in high resolution. But regular video surveillance systems provide images at TV resolutions like 640×480 or 704×576 pixel or at even lower resolutions. Such resolutions can provide a sufficient overview of the scene but a lot of image details are lost. Enlarging the interesting image regions (so called Regions of Interest, ROIs) still can't provide the detailed information, because of the lacking high frequency image information.
Using cameras providing a higher resolution like 1920×1080 or 1280×960 pixel preserves enough image details but also causes an enormous increase in the amount of image data. Applying compression methods (e.g. H.264 or JPEG2000) can reduce the amount of image data, but for archiving the video sequences over longer periods of time the amount of image data might still be too large.
But as higher spatial resolution is required at certain image regions only, it is not necessary and even adversarial to provide the whole image at higher resolution. Often, it is sufficient to only have certain regions of interest in an image in high resolution while saving vast amounts of data through encoding the rest of the image in lower resolution. Since the user has to be able to select a region of interest and to request high frequency information for the ROI, the image as to be provided at a moderate resolution at first.
2. Description of the Related Art
Only simulcast based techniques can be used to initially provide a hierarchical coded image in lower image resolution for the user, in order that the user can request a certain image region in higher resolution. But simulcast methods generate two separate codestreams, one for the image in moderate resolution and one for the ROI. Since these two codestreams are handled independently from each other, coding, transmitting and decoding of the codestreams is also performed separately. Thus, redundancies between the image in moderate resolution and the ROI in high resolution can not be considered.
There exist methods for coding ROIs using JPEG2000. These methods do consider redundancies between the image in moderate resolution and the ROI in high resolution, but either the region of interest has to be defined before the coding is done see for example C. Christopoulos, A. Skodras, and T. Ebrahimi “The JPEG2000 still image coding system: an overview” in IEEE Transactions on Consumer Electronics, 46(4):1103-1127, November 2000 or M. Rabbani and R. Joshi “An overview of the JPEG 2000 still image compression standard” in Signal Processing: image Communication, 17(1):3-48, January 2002 or R. Rosenbaum and H. Schumann “Flexible, dynamic and compliant region of interest coding in JPEG2000” in IEEE International Conference on Image Processing (ICIP), pages 101-104, Rochester, N.Y., September 2002 or a decoding, which re-arranges the coded data into new packets and re-encodes the packet headers and therewith increases the transcoding technique as well as the computational complexity, is necessary, see R. Rosenbaum and H. Schumann (see above) or J. Hou, X. Fang, J. Li, H. Yin, and S. Yu “Multi-rate, dynamic and compliant region of interest coding for JPEG2000” in IEEE International Conference on Multimedia and Expo (ICME), pages 733-736, 2006.
In the Joint Photographic Experts Group (JPEG) standard recommended by the International Standard Organization (ISO) and International Telecommunication Union (ITU) as the international standard coding method for still pictures, Differential Pulse Code Modulation (DPCM) is used for reversible compression and Discrete Cosine Transform (DCT) is used for non-reversible compression. It is also sometimes the case that X-ray images are encoded before storage so as to reduce the volume of data involved, but it is preferable that such encoded images be displayed promptly, without delay. The ability to display such encoded images promptly makes it possible also to priority display an area useful to the diagnosis from the encoded image data. To solve this problem U.S. Pat. No. 7,031,506 B2 Tsujii et al provides an image processing apparatus comprising:
first acquisition means for acquiring a first portion of a data stream obtained from image data that has been sequenced, converted and encoded;
decoding means for decoding the data stream acquired by the first acquisition means and obtaining a two-dimensional image;
analysis means for analyzing the two-dimensional image obtained by the decoding means and determining an area of interest within the two-dimensional image; and
second acquisition means for acquiring a second portion selected from the data stream based on the area of interest determined by the analysis means.
Additionally U.S. Pat. No. 7,031,506 B2 Tsujii et al provides an image processing method, comprising:
a first acquisition step for acquiring a first portion of a data stream obtained from image data that has been sequenced, converted and encoded;
a decoding step for decoding the data stream acquired in the first acquisition step and obtaining a two-dimensional image;
an analysis step for analyzing the two-dimensional image obtained in the decoding step and determining an area of interest within the two-dimensional image; and
a second acquisition step for acquiring a second portion selected from the data stream based on the area of interest determined in the analysis step.
U.S. Pat. No. 7,031,506 B2 Tsujii et al displays preferably regions of interests ROI and uses the scalable of JPEG2000 on a codeblock-basis for a diagnosis from encoded image data und thus makes it possible to priority read an area of interest (AOI) important to the diagnosis or observation as determined by a diagnostic support means and to improve the quality of that image of that area, so as to effectively improve the accuracy of diagnostic support. Nevertheless U.S. Pat. No. 7,031,506 B2 Tsujii et al doesn't show a ROI-transcoding method or means for ROI-transcoding.
Furthermore US 2005/0237380 A1 Kakii et al shows coding and decoding methods for motion-image data transmitted and received between the terminal equipments of a video conference in two-way interactive systems. In the JPEG2000 Part-I ROI coding, there is the difference between compression levels for the region of interest and for the region of no interest, but the total code length is invariant. The ROI coding is implemented by adjustment of wavelet coefficients, but the wavelet coefficients are calculated using a plurality of spatial pixels, which caused the problem that a boundary was blurred between the ROI and the region of no interest in a decoded still image and it did not allow an image processing operation such as a work of embedding only the ROI in another image. To solve this problem US 2005/0237380 A1 Kakii et al shows a coding method for motion-image data comprising a step, prior to image compression, of dividing an image frame to be coded among image frames constituting motion-image data, into a plurality of sub-regions, and a step of grouping each of the sub-regions into either of a region of interest set in the image frame and a region of no interest different from the region of interest. Then the coding method for the motion-image data compresses each of the sub-regions so that a code length of a sub-region grouped into the region of interest (hereinafter referred to as ROI) out of the plurality of sub-regions is larger than a code length of a sub-region grouped into the region of no interest (hereinafter referred to as non-ROI), thereby generating coded data of each image frame. The shape of the sub-regions of each image frame does not have to be limited to rectangular shapes such as a square and rectangles, but may be one of various polygonal shapes such as triangles, rhomboids, trapezoids, and parallelograms. Furthermore, these sub-regions may be comprised of those of mutually different shapes such as a combination of plural types of polygons, or shapes including curves forming a part of a circular, elliptical, or other shape. The ROI may be preliminarily set by a user himself or herself, or the setting of the ROI may be altered on the way of communication. Furthermore, it can also be contemplated that a sub-region in which a motion of an image is detected, out of the plurality of sub-regions is automatically grouped into the ROI. In the coding process for the rectangular regions in the non-ROI out of the plurality of rectangular regions, the code length of the rectangular regions may be 0 during a certain period of time (which means that the non-ROI is not coded), in consideration of the degree of influence of each region in the image frame on the dialogue. When the tiling technology of JP2 is applied to each of the plural types of images allocated to the plurality of sub-regions forming the virtual image frame as described above, these plural types of images corresponding to tiles can be individually coded at mutually different compression levels. Nevertheless US 2005/0237380 A1 Kakii et al doesn't show a ROI-transcoding method or means for ROI-transcoding but only non-ROI-regions where coded with shorter codewords than ROI-regions.
Furthermore US 2007/0217698 A1 Son relates to an image compressing apparatus, an image compressing method and a program therefore, for compressing and coding image data according to JPEG2000 or such. To provide an image compressing apparatus, by which, instead of carrying out a complicated calculation and a complicated circuit configuration, said image compressing apparatus is configured to carry out code amount control upon JPEG2000 coding, in which the number of coding passes and a code amount for each code block generated by a MQ (arithmetic) coder are input, and a number of coding passes and a code amount are determined based on a first code amount control standard and a second code amount control standard given for each code block. In this configuration, for a specific code block, the second code amount control standard is used to determine the number of coding passes and the code amount, while, for the other code blocks, the first code amount control standard is used to determine the number of coding passes and the code amount. The code blocks to use the second code amount control standard are determined by a frame to process. The image compressing method of US 2007/0217698 A1 Son for carrying out code amount control upon JPEG2000 coding, in which a number of coding passes and a code amount for each code block generated by a MQ (arithmetic) coder are input, and a number of coding passes and a code amount are determined based on a first code amount control standard and a second code amount control standard given for each code block, comprising the steps of:
a) using, for a specific code block, the second code amount control standard to determine the number of coding passes and the code amount; and
b) using, for the other code blocks, the first code amount control standard to determine the number of coding passes and the code amount.
Nevertheless US 2007/0217698 A1 Son shows JPEG2000 coding with ROI based on code blocks but doesn't show a ROI-transcoding method or means for ROI-transcoding.
Finally US 2007/0230658 A1 Okada et al provides an image coding method and an image coding apparatus capable of realizing various processing that utilize a region of interest when it is specified in a part of an image and to provide an image decoding method and an image decoding apparatus therefore. The image coding method according to US 2007/0230658 A1 Okada et al is such that information for specifying a region of interest defined on an image is explicitly described in a codestream containing coded data of the image. The “information for specifying a region of interest” may be information which is coded by referring to difference information between frames. This “different information” may be represented by a variation between frames in at least one of position, size and shape of the region of interest. It may be a difference between an average value of at least one of values representing the position, size and shape of the region of interest in each frame and values corresponding to those of a frame to be coded. Alternatively, it may be a difference between an average value of a variation, between frames, of at least one of the position, size and shape of the region of interest and a variation of a corresponding value between frames in a frame to be coded. Further, it may be a difference between an average value of variations between frames and at least one of values representing the position, size and shape of the region of interest in each frame.
This is effective if the region of interest is greatly enlarged and reduced and the like. The “information for specifying a region of interest defined on an image” may be coded as a function of time. This is effective in a case when the region of interest varies with a certain rule. The aforementioned information is explicitly described in a codestream. Thus, if a region of interest is set within an image, useful information can be provided to a decoding side and various types of processings for the region of interest can be realized. The apparatus according to US 2007/0230658 A1 Okada et al comprises: a region-of-interest setting unit which defines a region of interest (ROI) on an image; an image coding unit which encodes the image; a ROI information coding unit which encodes information for specifying the region of interest; and a codestream generator which generates a codestream by including therein the coded image and the coded information in an explicit manner. The “region-of-interest setting unit” may define the region of interest on the image by a specification from a user or by an automatic recognition of an object. Also the aforementioned information is explicitly described in a codestream so as to generate the codestream. When a plurality of regions of interest are defined on the image, a degree of priority is included in the information. Further US 2007/0230658 A1 Okada et a US 2007/0230658 A1 Okada et al relates to an image decoding method. This method is characterized in that a region including a region of interest is decoded from a codestream by referring to information for specifying the region of interest defined on an image wherein the information is explicitly described in the codestream that contains coded data of the image. The “region including a region of interest” may be a region of interest, a region including the region of interest and its peripheral region, or the entire image. Finally US 2007/0230658 A1 Okada et al US 2007/0230658 A1 Okada et al relates to an image decoding apparatus. This apparatus comprises: a region-of-interest information decoding unit which decodes information for specifying a region of interest defined on an image wherein the information is explicitly described in a codestream that contains coded data of the image; and an image decoding unit which decodes a region including the region of interest from the codestream by referring to the decoded information. Nevertheless US 2007/0230658 A1 Okada et al US 2007/0230658 A1 Okada et al doesn't show a ROI-transcoding method or means for ROI-transcoding as in the case of “virtual” ROI-transcoding the codestream doesn't contain information about ROIs.
Normally moving objects are the interesting image region. For example, U.S. Pat. No. 6,590,999 B1 Comaniciu et al discloses a method and apparatus for real-time mean shift tracking of non-rigid objects. The computational complexity of the tracker is critical for most applications, since only a small percentage of a system's resources are typically allocated for tracking, while the rest of the resources are assigned to preprocessing stages or to higher-level tasks such as recognition, trajectory interpretation, and reasoning. In U.S. Pat. No. 6,590,999 B1 Comaniciu et al the tracking is based on visual features, such as color and/or texture, where statistical distributions of those features characterize the target. A degree of similarity is computed between a given target in a first frame and a candidate target in a successive frame, the degree being expressed by a metric derived from the Bhattacharyya coefficient. A gradient vector corresponding to a maximization of the Bhattacharyya coefficient is used to derive the most probable location of the candidate target in the successive frame.
More specifically, the method and apparatus in accordance with U.S. Pat. No. 6,590,999 B1 Comaniciu et al for real-time tracking of a target which appears in a plurality of successive image frames, comprises:                a) Developing a statistical distribution as a characterization of a visual feature of each of a given target in a first frame and a candidate target in a successive frame,        b) Computing a degree of similarity between the given target and the candidate target, the degree being expressed by a metric derived from the Bhattacharyya coefficient, and        c) Applying an iterative comparative procedure to the degrees of similarity computed in step b), the iterations being based on a gradient vector corresponding to a maximization of the Bhattacharyya coefficient in order to shift the location of candidate target in the successive frame, to derive as the location of the candidate target in the successive frame that location which has characteristics most similar to the characteristics of the given target in the first frame.        
The characterization of each target is expressed as a histogram and for deriving a new location y1 for the candidate target in the successive frame by computing a gradient vector which corresponds to a maximization of the Bhattacharyya coefficient in the area of y0 there is used a mean shift iteration to compute the gradient vector. A characterization processor develops a probability distribution of a feature of the target as a characterization of each target wherein the feature is selected from the group of color or texture of the target, said characterization processor develops a histogram as a characterization of each target and wherein said controller uses a mean shift iteration to compute a gradient vector along which the location of the candidate target is shifted.