1. Field of the Invention
The present invention relates to image data transmitting apparatuses and methods and image data reproducing apparatuses and methods. More specifically, the present invention relates to an image data transmitting apparatus and method for transmitting images by using encoded image data, and an image data reproducing apparatus and method for performing high-speed replay of images by using encoded image data.
2. Description of the Background Art
The recent remarkable progress of digital image/audio signal processing technologies has brought various systems and devices into practical use. Such systems and devices include a system for transmitting high-quality digital image signals via a network, such as the Internet, and a device for recording digital image signals on a hard disk (HDD) or DVD-RAM (digital Versatile Disc-RAM) for replay.
One main example of those digital signal processing technologies is an encoding technology for compressing and encoding digitized signals. There are various standards for the encoding technology. Particularly in the field of moving pictures, MPEG (Moving Picture Image Coding Experts Group) standards are mainstream. Such standards include, for example, MPEG-2 typically used in digital broadcasting systems and MPEG-4 typically used in image transmission by camera-equipped cellular phones.
As is well known, a move from narrowband to broadband Internet access is accelerating. Therefore, in recent years, image data can be used not only after being entirely downloaded, but also on a real-time basis in the order of reception, which is typically implemented by a streaming technique.
In the streaming technique, a technology for image distribution on a real-time basis plays an important role. Therefore, it is crucial how problems of excessive communication loads, that is, congestion, on the network, typically caused by a lot of accesses are addressed. Conventionally, in order to get around such problems, the receiving side temporarily stops replaying until necessary image data arrives, or the transmitting side discards part of image data so as to adjust a transfer rate in accordance with the state of congestion. This causes yet other disadvantages at the time of congestion to the receiving side, such as interruption of image viewing and degradation of image quality.
In order to overcome such disadvantages, hierarchical encoding techniques are useful, for example. FIG. 19 is an illustration for describing an outline of a first hierarchical encoding technique. An encoder 101 performs a predetermined encoding process on input image data to generate a plurality of encoded image data pieces (streams) of different image qualities for one piece of image data. In the example of FIG. 19, three types of encoded image data are generated: high-quality image data for 1 Mpbs, intermediate-quality image data for 500 kbps, and low-quality image data for 200 kbps. Those pieces of data are stored in an image storage section 102. A distribution server 103 checks the state of a network 106, reads from the image storage section 102 one type of encoded image data that is transmittable at a transfer rate in accordance with the state, and then transmits the read data to a client 105.
FIG. 20 is an illustration for describing an outline of a second hierarchical encoding technology using a slicing technique defined by MPEG. An encoder 111 performs a predetermined encoding process on input image data to generate encoded image data. At this time, the encoder 111 divides the input image data into a plurality of slices of appropriate quality based on the transfer rate in accordance with the state of the network 106 reported from a distribution server 113, and then gives a different priority to each of these slices. A distribution server 113 transmits these slices and their priorities to the client 105.
The concept of the slicing technique defined by MPEG is briefly described below. In MPEG-4, each slice is called a video packet (VP), and therefore is hereinafter referred to as VP.
Moving images encoded with MPEG-4 are composed of a series of images as illustrated in FIG. 21, which is called a video object (VO) in MPEG-4. Each VO is composed of a plurality of video object planes (VOPs). A VOP is a basic unit of image data corresponding to one frame. In the drawing, I denotes a VOP of an I frame (intraframe) representing intraframe encoded image data, P denotes a VOP of a P frame (predicted frame) representing encoded image data obtained by previous frame prediction, and B denotes a VOP of a B frame (bi-directional frame) representing encoded image data obtained by bi-directional prediction. A plurality of frames starting from an I-VOP forms a GOV (group of VOPs). Each VOP is divided into a plurality of areas called video packets (VPs) (FIG. 22). Each VP contains a plurality of macroblocks (MBs) (FIG. 23). Each MB contains a luminance block of 16×16 pixels that is composed of four blocks of 8×8 pixels each, and two color-difference blocks of 8×8 pixels each (FIG. 24). The amount of encoded data of the MB is varied depending on the contents of the image.
In a VP dividing technique, VPs are generated for each VOP so as to have a size in accordance with the amount of movement of the image. Specifically, the VOPs are divided so that the total amount of encoded data (the number of bits) for one VP is equal or approximately equal to that for every other VP (FIG. 25). For an area containing a small motion, a large VP composed of a large number of MBs is generated. For an area containing a large motion, a small VP composed of a small number of MBs is generated. For example, in FIG. 22, VP1 and VP7 each represent an area containing a small motion, while VP3, VP4, and VP5 each represent an area containing a large motion. FIG. 26 is an illustration showing the structures of a VOP stream and each VP.
As such, with the amounts of encoded data contained in the VPs being set approximately equal to each other, each VP has a similar probability of inclusion of error. Therefore, VPs containing a large motion, which have a great impact on the image, occupy a small space, thereby making it possible to better localize portions degraded in image quality due to error.
The above-described conventional hierarchical encoding technologies are disclosed in, for example, Japanese Patent Laid-Open Publication Nos. 2000-78573 and 8-242445 (1996-242445).
Meanwhile, with the advance of digital network technologies, encoded image data of MPEG-4 and MPEG-2 may be mixed for use. In order to handle such mixture of encoded data, Japanese Patent Laid-Open Publication 2003-32617 discloses a recording and reproducing device capable of recording a plurality of pieces of encoded image data of different standards, such as MPEG-2 and MPEG-4, in a single recording medium and reproducing these pieces of data individually.
FIG. 27 is a block diagram illustrating the configuration of the recording and reproducing device disclosed in the above Japanese Patent Laid-Open Publication 2003-32617. In FIG. 27, an MPEG-2 encoder 121 encodes an input image signal based on MPEG-2 standard. An MPEG-4 encoder 122 encodes an input image signal based on MPEG-4 standard. A controller 123 causes the encoded data supplied by these encoders to be recorded on an HDD (image storage section) 124. The controller 123 also reproduces the encoded data recorded in the HDD 124. To reproduce the MPEG-2 encoded data, the controller 123 supplies the encoded data reproduced from the HDD 124 to an MPEG-2 decoder 125. To reproduce the MPEG-4 encoded data, the controller 123 supplies the encoded data reproduced from the HDD 124 to an MPEG-4 decoder 126. With this, the conventional recording and reproducing device allows recording and reproducing of moving images based on MPEG-2 and MPEG-4 standards.
However, the above-described conventional technologies and the above-described recording and reproducing device have drawbacks as follows. In the above first hierarchical encoding technology, a plurality of pieces of encoded image data have to be redundantly retained, thereby requiring a recording medium of large capacity. Moreover, the image that can be transmitted to the client is restricted to any of these pieces of data redundantly retained. Therefore, transmission always at a transfer rate best suited for the state of the network cannot be ensured. For example, even a slight change in transmission band that can be allocated on the network from 1 Mbps to 800 kbps leads to an unreasonable reduction in image quality to the intermediate-quality image data for 500 kbps.
Furthermore, in the above second hierarchical encoding technology, encoding and VP dividing processes are performed on the input image signal always in consideration of the state of the network. This puts the encoding process under a large load. Particularly when a plurality of clients make requests through Video On Demand (VOD) for image distribution, different encoding processes have to be performed for those clients. This significantly increases the amount of encoding process, and therefore is not practical.
Still further, in the above conventional recording and reproducing device, when a high-speed replay function suggested in MPEG-2 is applied to the encoded image data of MPEG-4, images cannot be smoothly replayed at high speed. This is because an I frame for high-speed replay is inserted approximately every 0.5 seconds in MPEG-2, while an I-VOP, which is equivalent to the I frame, is inserted generally at a relatively low frequencies, for example, approximately every five seconds, in MPEG-4. For this reason, a change in image between I frames is small in MPEG-2 and does not visually affect the image replayed at high speed, while a change in image between I-VOPs is large in MPEG-4 and visually affects the image replayed at high speed.