1. Field of the Invention
The present invention relates to image processing apparatuses and methods, and programs, and more particularly, to an image processing apparatus and method, and a program that allow accurate conversion of an input image into a high-quality image having the number of pixels different from that of the input image.
2. Description of the Related Art
The assignee of this application previously proposed conversion processing for converting a standard definition (SD) image into a high definition (HD) image in, for example, Japanese Unexamined Patent Application Publication No. 7-79418. In this conversion processing, an HD image is predicted from an input SD image. More specifically, features of a plurality of pixels in a predetermined area in the input SD image are determined by performing adaptive dynamic range coding (ADRC) processing. Then, in accordance with the determined features, subject pixels of the HD image to be determined from the SD image are allocated into classes, and then, a linear expression of predictive coefficients which have been determined for the individual classes by learning processing, and the pixel values of the plurality of pixels in the predetermined area of the input SD image are calculated, so that the HD image can be predicted from the input SD image.
FIG. 1 is a block diagram illustrating a typical example of a conversion device 1 that performs known conversion processing.
The conversion device 1 shown in FIG. 1 includes a class tap extracting unit 11, an ADRC processor 12, a prediction coefficient memory 13, a prediction tap extracting unit 14, and a prediction computation unit 15.
An interlace SD image is input into the conversion device 1, and is then supplied to the class tap extracting unit 11 and the predictive tap extracting unit 14.
The class tap extracting unit 11 sequentially selects the pixels forming an interlace HD image to be determined from the input interlace SD image as subject pixels, and extracts some of the pixels forming the SD image as class taps, which are used for classifying the subject pixels. The class tap extracting unit 11 then supplies the extracted class taps to the ADRC processor 12.
The ADRC processor 12 performs ADRC processing on the pixel values of the pixels forming the class taps supplied from the class tap extracting unit 11 to detect the ADRC code as the feature of the waveform of the class taps.
In K-bit ADRC processing, the maximum value MAX and the minimum value MIN of the pixel values of the pixels forming the class taps are detected, and DR=MAX−MIN is set as the local dynamic range of a set, and then, the pixel values of the pixels forming the class taps are re-quantized into K bits based on the dynamic range. That is, the minimum value MIN is subtracted from the pixel value of each pixel forming the class taps and the resulting value is divided by DR/2K.
Then, the K-bit pixel values of the pixels forming the class taps are arranged in a predetermined order, resulting in a bit string, which is then output as the ADRC code. Accordingly, if one-bit ADRC processing is performed on the class taps, the pixel value of each pixel forming the class taps is divided by the average of the maximum value MAX and the minimum value MIN so that it is re-quantized into one bit with the decimal fractions omitted. That is, the pixel value of each pixel is binarized. Then, a bit string of the one-bit pixel values arranged in a predetermined order is output as the ADRC code.
The ADRC processor 12 determines the class based on the detected ADRC code to classify each subject pixel, and then supplies the determined class to the prediction coefficient memory 13. For example, the ADRC processor 12 directly supplies the ADRC code to the prediction coefficient memory 13 as the class.
The prediction coefficient memory 13 stores a prediction coefficient for each class obtained by learning discussed below with reference to FIG. 7. The prediction coefficient memory 13 reads out the prediction coefficient according to the class supplied from ADRC processor 12, and supplies the read prediction coefficient to the prediction computation unit 15.
The prediction tap extracting unit 14 extracts, from the input interlace SD image, as prediction taps, some of the pixels forming the SD image used for predicting the pixel value of a subject pixel. More specifically, the prediction tap extracting unit 14 extracts, from the SD image, as prediction taps, pixels corresponding to the subject pixel, for example, a plurality of pixels of the SD image spatially closer to the subject pixel. The prediction tap extracting unit 14 then supplies the extracted prediction taps to the prediction computation unit 15.
The prediction taps and the class taps may have the same tap structure or different tap structures.
The prediction computation unit 15 performs prediction computation, such as linear expression computation, for determining the prediction value of the true value of the subject pixel by using the prediction taps supplied from the prediction tap extracting unit 14 and the prediction coefficient supplied from the prediction coefficient memory 13. Then, the prediction computation unit 15 predicts the pixel value of the subject pixel, i.e., the pixel value of a pixel forming the interlace HD image, and outputs the predicted pixel value.
FIG. 2 illustrates an example of the tap structure of the class taps extracted by the class tap extracting unit 11 shown in FIG. 1. In FIG. 2, the white circles indicate the pixels of the SD image, and the rhomboids designate the pixels of the HD image. The same applies to FIG. 3.
In FIG. 2, the class taps are formed of nine pixels, and more specifically, pixels in the m-th (m=1, 2, . . . ,) field of the SD image, such as a pixel 23 corresponding to a subject pixel 27, pixels 20 and 26 that are adjacent to the pixel 23 in the upward direction and the downward direction, respectively, pixels 21 and 22 adjacent to the pixel 23 in the leftward direction, and pixels 24 and 25 adjacent to the pixel 23 in the rightward direction, and pixels in the (m−1)-th field of the SD image, such as pixels 29 and 30 adjacent to a position 28 corresponding to the pixel 23 in the upward direction and the downward direction, respectively.
FIG. 3 illustrates an example of the tap structure of the prediction taps extracted by the prediction tap extracting unit 14 shown in FIG. 1.
In FIG. 3, the prediction taps are formed of 13 pixels, and more specifically, pixels in the m-th (m=1, 2, . . . ,) field of the SD image, such as a pixel 43 corresponding to a subject pixel 47, pixels 40 and 46 that are adjacent to the pixel 43 in the upward direction and the downward direction, respectively, pixels 41 and 42 adjacent to the pixel 43 in the leftward direction, and pixels 44 and 45 adjacent to the pixel 43 in the rightward direction, and pixels in the (m−1)-th field of the SD image, such as pixels 50 and 53 adjacent to a position 48 corresponding to the pixel 43 in the upward direction and the downward direction, respectively, pixels 49 and 51 adjacent to the pixel 50 in the leftward direction and the rightward direction, respectively, and pixels 52 and 54 adjacent to the pixel 53 in the leftward direction and the rightward direction, respectively.
FIGS. 4 and 5 illustrate positional relationships between the pixels of the interlace SD image input into the conversion device 1 and the pixels of the interlace HD image output from the conversion device 1.
In FIGS. 4 and 5, the white circles indicate the pixels of odd-numbered fields of the SD image, while the black circles designate the pixels of even-numbered fields of the SD image. The white rhomboids indicate the pixels of the odd-numbered fields forming a predetermined frame of the HD image, while the black rhomboids designate the pixels of the even-numbered fields of the HD image. The interval between the pixels of the HD image in the vertical and horizontal directions is 1.
FIG. 4 illustrates the positional relationship between the pixels of the HD image and the pixels of the SD image in the vertical direction. In FIG. 4, the horizontal axis represents the time, and the vertical axis designates the vertical position of the pixels.
A pixel 71 of the odd-numbered field of the SD image is vertically located, as shown in FIG. 4, at a position away from a pixel 61 of the HD image, which is positioned vertically closest to the pixel 71, by ½, i.e., a position away from a pixel 62 immediately under the pixel 61 by 3/2.
A pixel 72 of the even-numbered field of the SD image is vertically located, as shown in FIG. 4, at a position away from a pixel 64 of the HD image, which is positioned vertically closest to the pixel 72, by ½, i.e., a position away from a pixel 63 immediately above the pixel 64 by 3/2.
FIG. 5 illustrates the positional relationship between the pixels of the HD image and the pixels of the SD image in the horizontal direction. For the convenience of representation, the image in which odd-numbered fields and even-numbered fields are combined is shown in FIG. 5.
The pixel 71 of an odd-numbered filed and the pixel 72 of an even-numbered field of the SD image are horizontally positioned between the pixel 61 of the HD image positioned horizontally closest to the pixels 71 and 72 and a pixel 81 positioned right-adjacent to the pixel 61. That is, the pixels 71 and 72 are horizontally located at a position away from the pixel 61 by ½ in the rightward direction and away from the pixel 81 by ½ in the leftward direction.
Accordingly, since there is a difference in the number of pixels between the SD image before conversion and the HD image after conversion, i.e., there is a difference in the sampling frequency therebetween, the positions of the pixels of the HD image are displaced from that of the SD image.
Prediction processing performed by the conversion device 1 shown in FIG. 1 for predicting an interlace HD image is described below with reference to the flowchart in FIG. 6. This prediction processing is started when, for example, an interlace SD image is input into the conversion device 1.
In step S1, the class tap extracting unit 11 selects, as a subject pixel, one of the pixels forming the interlace HD image to be determined from the input interlace SD image.
In step S2, the class tap extracting unit 11 then extracts, as class taps, some of the pixels forming the input SD image, such as those shown in FIG. 2, used for classifying the subject pixel selected in step S1, and supplies the extracted class taps to the ADRC processor 12.
In step S3, the ADRC processor 12 performs ADRC processing on the pixel values of the pixels forming the class taps supplied from the class tap extracting unit 11, and sets the resulting ADRC code as the feature of the class taps.
In step S4, the ADRC processor 12 determines the class based on the ADRC code to classify the subject pixel, and then supplies the determined class to the prediction coefficient memory 13.
In step S5, the prediction tap extracting unit 14 extracts, as prediction taps, some of the pixels forming the input SD image, such as those shown in FIG. 3, used for predicting the pixel value of the subject pixel. The prediction tap extracting unit 14 then supplies the extracted prediction taps to the prediction computation unit 15.
In step S6, based on the class supplied from the ADRC processor 12, the prediction coefficient memory 13 reads out the prediction coefficient corresponding to the class and supplies the prediction coefficient to the prediction computation unit 15.
In step S7, the prediction computation unit 15 performs prediction computation, for example, linear expression computation, for determining the prediction value of the true value of the subject pixel by using the prediction taps supplied from the prediction tap extracting unit 14 and the prediction coefficient supplied from the prediction coefficient memory 13.
In step S8, the prediction computation unit 15 outputs the predicted pixel value of the subject pixel as a result of the prediction computation, i.e., the pixel value of the corresponding pixel forming the interlace HD image.
In step S9, the class tap extracting unit 11 determines whether all the pixels forming the interlace HD image determined from the input interlace SD image have been selected as the subject pixels.
If it is determined in step S9 that not all the pixels forming the HD image have been selected as the subject pixels, the process proceeds to step S10. In step S10, the class tap extracting unit 11 selects a pixel which has not been selected as the subject pixel, and returns to step S2. Steps S2 and the subsequent steps are then repeated. If it is determined in step S9 that all the pixels forming the HD image have been selected as the subject pixels, the prediction processing is completed.
As discussed above, the conversion device 1 predicts an HD image from an input SD image and outputs the predicted HD image. That is, the conversion device 1 converts an SD image into an HD image and outputs the converted HD image.
FIG. 7 is a block diagram illustrating the configuration of a learning device 90 that conducts learning for determining a prediction coefficient for each class to be stored in the prediction coefficient memory 13 shown in FIG. 1.
The learning device 90 shown in FIG. 7 includes a two-dimensional decimation filter 91, a class tap extracting unit 92, an ADRC processor 93, a prediction tap extracting unit 94, a normal equation generator 95, a prediction coefficient generator 96, and a prediction coefficient memory 97.
A target interlace HD image obtained after prediction read from a database (not shown) is input into the learning device 90, and is then supplied to the two-dimensional decimation filter 91 and the normal equation generator 95.
The two-dimensional decimation filter 91 decimates the pixels of the input interlace HD image in the horizontal and vertical directions to reduce the number of pixels by ½. That is, the two-dimensional decimation filter 91 generates a learner image, which is an interlace SD image corresponding to the original image before prediction, from the input interlace HD image. The two-dimensional decimation filter 91 then supplies the learner image to the class tap extracting unit 92 and the prediction tap extracting unit 94.
The class tap extracting unit 92, which is similarly configured to the class tap extracting unit 11 shown in FIG. 1, sequentially selects the pixels forming the supervisor image as subject supervisor pixels, and extracts class taps, such as those shown in FIG. 2. The class tap extracting unit 92 then supplies the class taps to the ADRC processor 93.
The ADRC processor 93, which is similarly configured to the ADRC processor 12 shown in FIG. 1, performs ADRC processing on the pixel values of the pixels forming the class taps supplied from the class tap extracting unit 92, and sets the resulting ADRC code as the feature of the class taps. The ADRC processor 93 determines the class based on the ADRC code and supplies the determined class to the normal equation generator 95.
The prediction tap extracting unit 94, which is similarly configured to the prediction tap extracting unit 14 shown in FIG. 1, extracts, from the learner image supplied from the two-dimensional decimation filter 91, as prediction taps, such as those shown in FIG. 3, some of the pixels forming the learner image used for predicting the pixel value of the subject supervisor pixel. The prediction tap extracting unit 94 then supplies the prediction taps to the normal equation generator 95.
The normal equation generator 95 establishes normal equations for each class supplied from the ADRC processor 93 by using the input supervisor image and a prediction tap supplied from the prediction tap extracting unit 94 as a learning pair used for learning the prediction coefficient. The normal equation generator 95 then supplies the normal equations to the prediction coefficient generator 96.
The prediction coefficient generator 96 solves the normal equations for each class supplied from the normal equation generator 95 to determine the prediction coefficient that statistically minimizes a prediction error for each class. The prediction coefficient generator 96 then supplies the prediction coefficient to the prediction coefficient memory 97 and stores it. The prediction coefficient stored in the prediction coefficient memory 97 is to be stored in the prediction coefficient memory 13 shown in FIG. 1.
In this manner, the conversion device 1 converts an SD image into an HD image by using the prediction coefficient that minimizes the prediction error, which is obtained by the learning device 90, thereby achieving high-precision conversion processing.