In recent years, digitalization of dynamic image data has developed, and performing compression coding for a dynamic image signal to treat has become general. As a technology for coding a dynamic image signal with a low bit rate, a high compression rate, and high definition, to generate coding data, or decoding the coded dynamic image, H.261 and H.263, standardized by the ITU (International Telecommunication Union), MPEG-1 (Moving Picture Experts Group-1), MPEG-2 and MPEG-4 by the ISO (International Organization for Standardization), and VC-1 by the SMPTE (Society of Motion Picture and Television Engineers) are listed. These technologies are widely employed as the international standards. Furthermore, there is H.264/MPEG-4 AVC, which ITU and ISO have standardized jointly in recent years (See non-patent document 1). This H.264 has been known to provide compression efficiency and an image quality, which are better than a related dynamic image coding technology.
In these dynamic image coding technologies, in order to compress a dynamic image signal efficiently, interframe predictive coding technology using temporal correlation between respective frames is widely used. In the interframe predictive coding, an image signal of a present frame is predicted from an image signal of a frame which has been already coded, and a prediction error signal, between the predicted signal and a present signal, is coded. Because a high correlation exists between the image signals of temporally adjacent frames in the case of a general dynamic image, this technology is effective for improving the compression efficiency. In the dynamic image coding technologies, such as the above-mentioned MPEG-1, MPEG-2, MPEG-4 and H.264, the dynamic image is coded using I picture (intra frame coding image), which does not use the interframe predictive coding, P picture (one-way prediction encoded image) using the interframe predictive coding from one frame, which has already been coded, and B picture (bidirectional prediction encoded image) using the interframe predictive coding from two frames which, have already been coded, in combination. A frame of I picture can be decoded independently in the case of decoding. However, because the P picture and the B picture require, in advance, image data to be used for a prediction in the interframe prediction for decoding, the independent decoding for one frame cannot be performed.
The dynamic image compression technology, such as above-mentioned MPEG-2 and H.264, is used for many usages, such as a digital broadcasting, an image distribution via an optical disk medium and an image delivery via the internet or the like.
Though accumulation of a dynamic image, and transmission thereof or the like has become facilitated by these dynamic image compression technologies, different from the incompressible image, interactive processing such as viewing by cutting out only a certain area of an image and changing a viewing range dynamically cannot be realized easily.
In recent years, demands for making a screen larger and for making definition of an image higher have been increasing, and a resolution (1920×1080 pixels, 1280×720 pixels or the like) higher than a related art, called HD (High Definition), is becoming a mainstream in an image used for the broadcasting, the image content sale via an optical disc and the image delivery. Furthermore, a measure for images with higher definition of 4K×2K (4096×2048 pixels), 8K×4K (8192×4096 pixels) or the like has also been developed. On the other hand, scenes of viewing an image content is increasing, and a demand for viewing an image in a various screen size, a distance to a screen and ambient surroundings, such as viewing an image on a small size TV, viewing an image on a PC (personal computer), viewing an image on a cellular phone and viewing an image on a portable image player is increasing, besides watching an image on a big screen television (TV) in a home living room.
Here, assuming that, for example, when a high-resolution image of 8K×4K (8192×4096 pixels) is watched on a small size screen (640×480 pixels, for example) of a cellular phone, an object to see is displayed in a very small size compared with a background image when a whole image content is displayed entirely in accordance with the screen size, and a visual recognition of the object may be difficult. In such a case, instead of displaying the whole image content, cutting out a certain area only where a user has an interest and displaying the area will make the viewing comfortable. Moreover, changing interactively the display region in response to the user's request will lead further to improving the convenience for the user.
As an example, for an image of a soccer game, there is a style of viewing, in which an image of the whole stadium is recorded with high definition by 8192×4096 pixels, and on watching the image on a cellular phone, only a certain area around a goal or a player with a ball is cut out to be watched and the viewing range is changed appropriately as the player moves.
For uncompressed video signal, cutting out a certain area of an image as mentioned above will be easily realized. However, for the video signal, for which compression coding was performed, because the video signal has been coded by using spatial and temporal correlations in a dynamic image, cutting only a part of the video signal to be decoded and to be displayed is difficult. For example, performing the delivery of an image, which is an area of 640×480 pixels cut out from a video signal of 8192×4096 pixels, requires a processing with a method such as:
(A) the whole dynamic image of 8192×4096 pixels is transmitted, received and decoded, and a part of the uncompressed image of the decoding result is cut out, and displayed, or
(B) the whole dynamic image of 8192×4096 pixels is decoded and an image of the area to be watched is cut out from uncompressed image of the decoding result, the coding processing is performed for data of the image in the area, the coded data is transmitted and received, and displayed.
However, the method (A) requires a high-performance decoding device capable of performing the decoding process for a dynamic image larger than the display size of the receiving device, and the transmission path band is unnecessarily consumed. Furthermore, in the method (B), because the decoding, cutting and coding processings are performed in the transmitter side, a decoding device and a high coding device require high performance. Furthermore, there is a problem that the coding devices, a number of which corresponds to the maximum number of the receivers connected simultaneously, must be provided.
Regarding the above-described problems, patent document 1 discloses a coupling technology of multiple MPEG coding video streams related to interactive video communication media such as an interactive television. In this technology, plural MPEG coding video streams are received, and a display position code corresponding to a display position on each slice for each video stream is corrected, if the correction process is needed. By interleaving each slice for each of the MPEG coding video streams including the video stream, changed as above, to a single synthesized video stream, the user can select and display the plural video streams on the TV set, and can select the display position on each of the plural video streams.
Furthermore, patent document 2 discloses a technology relating to a coding system for scrolling a coded MPEG still picture. In this technology, a compressed picture is divided into slices, so that an image can be scrolled smoothly, and only data of a slice corresponding to a range to be viewed is decoded and displayed. In the scroll operation, data of a slice which is not included in the image, which was viewed before, among ranges to be viewed, is added, and the data of the slice, which is not included in the range to be currently viewed, among the images viewed before, is deleted.
Furthermore, patent document 3 discloses a technology, in which a still picture is coded as I picture of MPEG when coding an image of scrolling a still picture in a longitudinal direction, and the coded picture is stored as a slice data whose header is deleted, and the I picture of MPEG is generated by adding a header to the slice data, which corresponds to the display region. Patent document 3 also discloses a technology, in which, in a subsequent picture, a motion vector reflecting a movement corresponding to the scrolling is generated, with reference to display data of a prior picture, and a new display region slice data is read and P picture of MPEG is generated for the display region, in which the prior picture has not been displayed.