Recently, with the advent of the age of multimedia which handles audio, pictures and other pixel values in an integrated manner, conventional information media, such as newspapers, journals, TVs, radios, and telephones, through which information is carried to people, have come under the scope of multimedia. Generally, multimedia refers to a representation in which not only text but also graphics, audio, particularly pictures, and/or others are simultaneously associated with one another. The information for the above conventional information media must first be digitized before it can be handled as multimedia information.
However, the estimated amount of the multimedia information as digital data is only 1 or 2 bytes per character of text, but 64 Kbits per second of (telephone quality) audio, and 100 Mbits or higher per second of video (at current television receiver quality). It is therefore not practical to handle these massive amounts of the multimedia information in digital form. For example, video telephony service is available over Integrated Services Digital Network (ISDN) lines with a transmission speed of 64 Kbit/s to 1.5 Mbit/s, but video for a television and a camera cannot be sent as it is over the ISDN lines.
Data compression therefore becomes essential. Video telephony service, for example, is implemented using video compression techniques standardized in International Telecommunication Union, Telecommunication Standardization Sector (ITU-T) Recommendations H.261 and H.263. Using the data compression techniques defined in MPEG-1, picture information can be recorded together with audio information on a conventional audio compact disc (CD).
The Moving Picture Experts Group (MPEG) is an international standard for compressing moving picture signals, and has been standardized by the International Organization for Standardization and the International Electrotechnical Commission (ISO/IEC). MPEG-1 is a standard that enables transmission of a moving picture signal at 1.5 Mbps, that is, compression of information in a television signal approximately to a hundredth part of its original size. The moderate picture quality is targeted in MPEG-1 because the transmission speed for MPEG-1 moving pictures is limited to approximately 1.5 Mbit/s. Therefore, MPEG-2, which has been standardized to meet the demand for higher picture quality, enables transmission of a moving picture signal at 2 to 15 Mbit/s to satisfy television broadcast quality.
Furthermore, the working group (ISO/IEC JTC1/SC29/WG11) that has worked on the standardization of MPEG-1 and MPEG-2 has standardized MPEG-4 that achieved a compression rate higher than those of MPEG-1 and MPEG-2. MPEG-4 not only enables coding, decoding and operating on a per object basis, but also introduces a new capability required in the multimedia age. MPEG-4 achieved a compression rate higher than those of MPEG-1 and MPEG-2, and further enables coding, decoding and operating on a per object basis.
At first, MPEG-4 had been developed for the purpose of the standardization of a coding method for a smaller bit rate. Then, it was extended to a more versatile coding method including a method for coding even interlaced pictures at a higher bit rate. The MPEG-4 AVC and ITU-T H.264 have been standardized as a method for coding a picture at a higher compression rate through collaboration between the ISO/IEC and the ITU-T.
Here, a picture signal can be consecutive pictures (also referred to as frames or fields) that are groups of pixels at a same time. Since pixels have a strong correlation with adjacent pixels in each picture, pictures are compressed using the correlation in each picture. Furthermore, the consecutive pictures are compressed using a correlation between pixels in different pictures because the consecutive pictures have the strong correlation between pixels. Here, compression using a correlation between pixels in different pictures and a correlation between pixels in a picture is referred to as inter picture coding, whereas compression using the correlation between pixels in a picture without using the correlation between pixels in different pictures is referred to as intra-picture coding. The inter picture coding that uses the correlation between pictures can achieve a compression rate higher than that of the intra-picture coding.
Furthermore, in accordance with MPEG-1, MPEG-2, MPEG-4, MPEG-4 AVC, and H.264, each picture includes blocks (or macroblocks) that are groups of pixels in a two-dimensional rectangular area, and the inter picture coding and the intra-picture coding are switched per block.
On the other hand, with widespread high-speed network environment using Asymmetric Digital Subscriber Lines (ADSLs) and optical fibers, general households can transmit and receive information at a bit rate over several Mbit/s. Furthermore, it is expected that information can be transmitted and received at several tens of Mbit/s in the next few years. Thereby, the expectation is that with the picture coding technique, not only companies using dedicated lines but also general households will introduce video telephony service and teleconferencing systems that guarantee the television broadcast quality and HDTV broadcast quality.
When coded picture data, that is, a stream is transmitted through a network, a part of the stream may be lost due to network congestions and/or others. When the part of the stream is lost, the receiver cannot accurately decode a picture corresponding to the lost part of the stream. Furthermore, with the compression using the correlation between pixels in different pictures, the state where the receiver cannot accurately decode the picture is continued in the subsequent pictures. In other words, picture quality of the subsequent pictures is continuously deteriorated. Thus, defined is slices which are units of coding and each of which includes blocks. The slice is the minimum unit per which coding and decoding are independently possible. Even when a part of a stream is lost, pictures can be decoded per slice.
FIG. 1 illustrates a relationship between slices and blocks using a slice division method in accordance with MPEG-2. The picture (1 frame) in FIG. 1 includes blocks. Furthermore, blocks in a same row compose a slice, from among the blocks included in the picture. For example, the diagonally shaded slice is an I-slice, and the remaining slices are P slices. An I-slice contains only intra-coded blocks, and a P slice contains inter coded blocks or intra-coded blocks.
In accordance with H.264, generally, the I-slice is coded using only a correlation between pixels within the I-slice, whereas the P slice is coded using a correlation between pixels within the P slice and a correlation between pixels of different slices. Here, “of slices” means between a current slice and a slice other than the current slice, and may mean between slices of different pictures excluding the current slice in a picture. In other words, the I-slice is a slice that does not employ predictive coding using adjacent picture signals (signals outside the current slice), that is, a slice including only intra-macroblocks to be intra-coded. In contrast, the P slice is a slice that enhances compression efficiency with the predictive coding, that is, a slice including both inter macroblocks to be inter coded and the intra-macroblocks.
There are operational standards in accordance with which inclusion of both I-slices and P slices within a picture is not allowed, unlike H.264. Thus, the I-slices in Description even including specific P slices that are intentionally coded using only the correlation between pixels within a slice are collectively referred to as I-slices for convenience.
FIG. 2 illustrates a coding order of blocks included in a picture. The blocks in the picture in FIG. 1 are coded in an order indicated in FIG. 2, that is, an order from left to right and from up to down per slice in the picture to generate a coded stream.
Even when a decoder receives the stream of the slices without any loss in the transmission path and decodes the stream per slice, there is no guarantee that pixels that have been processed to be decoded cannot be accurately decoded. Even when the slices are not accurately decoded due to the transmission loss in a part of the stream in previous time, when the slices are intra-coded, the decoder can accurately decode pixels only with the intra-coded slices in the stream. However, when a picture subsequent to a lost part of stream is decoded, in the case where the slices of the picture are inter coded, the picture is decoded with reference to a picture immediate previously decoded. Thus, when a picture with deteriorated picture quality due to a loss in a part of the stream in previous time is referred to, the pixels of the picture cannot be accurately decoded.
Thus, there is a problem that, when a part of stream is lost and a picture subsequent to the lost part of the stream is inter coded, the picture cannot be accurately decoded, and recursively, pictures subsequent to the picture cannot be accurately decoded.
Accordingly, a method to be described hereinafter prevents recursive occurrence of deteriorated picture quality of pictures.
FIG. 3 illustrates an example of divided slices in pictures that are temporally consecutive. Here, the diagonally shaded slices are I-slices, and the remaining slices are P slices as in FIG. 1. Slices are on a per row basis. Furthermore, (a) to (l) in FIG. 3 are the pictures that are temporally consecutive. Furthermore, in time order, (a) in FIG. 3 is the first picture, and (l) in FIG. 3 is the last picture. In FIG. 3, an I-slice is moved down one row in the subsequent picture in time order. When the I-slice is moved to the lowest row, it comes back to the highest row (from (j) to (k) in FIG. 3).
Each picture includes an I-slice that is resilient to loss in a part of a stream, and P slices that are not resilient to loss in a part of the stream but include inter coded slices each having a higher compression rate. Here, the positions of I-slices circulate within pictures in time order. Even when a part of a stream is lost at some point in time and picture quality of a P slice is deteriorated, in the case where a slice in a position where the P slice corresponding to the lost stream becomes an I-slice in the subsequent picture in time order, pictures subsequent to the picture including the I-slice are accurately decoded. In other words, a stream can be restored from the deteriorated picture quality. Thus, the picture quality of subsequent pictures can be prevented from being continuously deteriorated.
Next, a picture coding apparatus 800 that prevents the continuation of deteriorated picture quality in FIG. 3 will be described.
FIG. 4 illustrates a block diagram of a configuration of the picture coding apparatus 800 using a slice division method in accordance with MPEG-2.
The picture coding apparatus 800 includes a block number counting unit 802, an intra/inter determining unit 804, a slice determining unit 806, and a video encoder 808.
The block number counting unit 802 counts the number of blocks in a picture to be coded by the video encoder 808. Furthermore, the block number counting unit 802 notifies the intra/inter determining unit 804 and the slice determining unit 806 of a position of a block to be coded by the video encoder 808, in the picture.
The intra/inter determining unit 804 determines whether a slice, in the picture, to be coded by the video encoder 808 is an I-slice or a P slice, from the position of the block notified by the block number counting unit 802. The intra/inter determining unit 804 notifies the video encoder 808 of the determined slice type (I-slice or P slice).
The slice determining unit 806 includes a block position obtaining unit 8062 and a slice boundary determining unit 8066.
The slice determining unit 806 determines, from the position of the block notified by the block number counting unit 802, whether or not the notified block is at a slice boundary, that is, the notified block is the last block in the rows each composed by blocks corresponding to a slice. In the slice determining unit 806, the block position obtaining unit 8062 obtains the position of the block notified by the block number counting unit 802, and the slice boundary determining unit 8066 determines whether or not the notified block is a last block in a slice. The slice boundary determining unit 8066 notifies the video encoder 808 of information of the determined position of the block (whether the block is a last block in a slice).
The video encoder 808 codes an input picture VIN per block using a coding method available by a corresponding slice (intra-picture coding method or inter picture coding method), based on the slice boundary (on a per slice basis) notified by the slice boundary determining unit 8066 in the slice determining unit 806, and on the slice type notified by the intra/inter determining unit 804. The video encoder 808 provides the coded input picture VIN to a packetizing unit 820 as a stream STR.
Since the slices in accordance with MPEG-2 are used as described hereinbefore, the unit of slices in a picture to be coded by the video encoder 808 is based on a unit of rows each including blocks included in the picture.
The packetizing unit 820 converts the stream STR into a format appropriate for its transmission through a network.
FIG. 5 illustrates an example of a relationship between slices and blocks using a slice division method in accordance with MPEG-4. Although slices can be composed on a row unit basis as MPEG-2 in FIG. 1, slices are often composed so that the number of bits obtained by coding slices becomes constant. Thus, the shape of each slice is not limited to a rectangle as illustrated in FIG. 1. Furthermore, the number of blocks included in each slice is variable, depending on an image pattern indicated by an input picture signal.
Here, a picture (1 frame) in FIG. 5 is composed of blocks. Furthermore, from among blocks included in the picture, blocks enclosed by a thick line compose a slice. For example, blocks included in diagonally shaded blocks and enclosed by a thick line compose an I-slice, whereas the rest of blocks simply enclosed by thick lines compose P slices.
Furthermore, the diagonally shaded blocks are indicative of blocks to be intra-coded. FIG. 5 indicates the example that the diagonally shaded positions of the blocks to be intra-coded do not need to match the unit of slices in accordance with MPEG-4.
FIG. 6 illustrates an example of divided slices in pictures that are temporally consecutive. The diagonally shaded blocks compose I-slices, and are blocks to be intra-coded. The rest of blocks enclosed by thick lines compose P slices. Furthermore, (a) to (l) in FIG. 6 are pictures that are temporally consecutive. As illustrated in FIG. 3, each picture includes an I-slice that is resilient to loss in a part of a stream, and P slices that are not resilient to loss in a part of the stream but include inter coded blocks each having a higher compression rate. Here, the positions of slices to be intra-coded and including an I-slice circulate within the pictures in time order. Thus, deteriorated picture quality can be prevented from being recursively continued in the subsequent pictures.
Next, a picture coding apparatus 900 that prevents the continuation of deteriorated picture quality in FIG. 6 will be described.
FIG. 7 illustrates a block diagram of a configuration of a picture coding apparatus apparatus 900 using the slice division method in accordance with MPEG-4.
The picture coding apparatus 900 includes the block number counting unit 802, the intra/inter determining unit 804, a slice size determining unit 906, and a video encoder 908. The units included in the block diagram of the picture coding apparatus in accordance with MPEG-2 in FIG. 4 are numbered by the same numerals in FIG. 7 when the units operate in the same manners, and thus the descriptions in FIG. 7 are omitted hereinafter.
The video encoder 908 codes the input picture VIN per block using an available coding method (intra-picture coding method or inter picture coding method), based on the slice boundary (on a per slice basis) notified by the slice size determining unit 906 and the slice type notified by the intra/inter determining unit 804. The video encoder 908 provides the coded input picture VIN to the packetizing unit 820 as a stream STR, and simultaneously, notifies the slice size determining unit 906 of the coded number of bits.
The slice size determining unit 906 includes a slice boundary determining unit 9066 and a coded bit number counting unit 9068. In the slice size determining unit 906, the coded bit number counting unit 9068 obtains the number of bits notified by the video encoder 908, and the slice boundary determining unit 9066 determines whether or not a block coded by the video encoder 908 indicates is a last block in a slice, based on the notified number of bits.
The coded bit number counting unit 9068 notifies the slice boundary determining unit 9066 of the number of bits notified by the video encoder 908 or information obtained when the notified number of bits becomes a predetermined number of bits.
The slice boundary determining unit 9066 determines, based on the information of the number of bits notified by the coded bit number counting unit 9068, whether or not a current block is the last block in a slice, that is, a last block in the slice. The slice boundary determining unit 9066 notifies the video encoder 908 of a result of the determination indicating that the block is at the slice boundary.
Although the diagonally shaded positions of blocks to be intra-coded move from the top to the bottom per picture as illustrated in FIGS. 5 and 6, the positions do not match the unit of slices.
Dividing a picture into slices is effective at preventing the picture quality from being deteriorated due to loss in a part of a stream in a network.
Hereinafter, the number of bits when a stream is transmitted per slice through a network will be described using an example.
FIG. 8 illustrates an example of divided slices in pictures that are temporally consecutive, and FIGS. 9A and 9B illustrate examples of number of bits obtained by coding each slice in time order. More specifically, FIGS. 9A and 9B illustrate the examples of the number of bits obtained by coding each slice in time order, when the stream is transmitted per slice through a network as illustrated in FIG. 8.
For example, each picture is composed of 5 slices, and includes an I-slice and 4 P slices, as illustrated in FIG. 8.
The I-slice has a larger number of bits, generally speaking, approximately several times to ten times larger than those of the P slices. FIG. 9A illustrates time on a horizontal axis, and the sizes of the number of bits obtained by coding slices on a vertical axis. The slices are included in each picture and coded in an order from the top to the bottom in each of the pictures. As seen from FIGS. 8 and 9A, the I-slices included in the pictures are positioned from the top to the bottom in time order, that is, cyclically circulate.
Furthermore, FIG. 9B illustrates that the pictures that are temporally consecutive are coded and the coded pictures are transmitted at a constant bit rate in coding order. Here, in FIG. 9B, the width (length) in the horizontal direction shows the size of the number of bits, representing the time necessary for transmitting data corresponding to the slices. Since each picture includes the same number of I-slices, the number of bits per picture is almost the same.    Non Patent Reference 1: MPEG-2 standard: ISO/IEC 13818-2, “Information Technology-Generic Coding Of Moving Pictures And Associated Audio Information: Video”, International Standard, Second Edition, December 2000    Non Patent Reference 2: MPEG-4 standard: ISO/IEC 14496-2, “Information Technology-Coding Of Audio-Visual Objects-Part 2: Visual”, International Standard, Third Edition, July 2004