The present invention relates to a picture conversion apparatus and a picture conversion method. In particular, the invention relates to a picture conversion apparatus and a picture conversion method which makes it possible to obtain a picture having better picture quality.
In converting a standard-resolution or low-resolution picture (hereinafter referred to as an SD (standard definition) picture where appropriate) into a high-resolution picture (hereinafter-referred to as an HD (high definition) picture where appropriate), or in enlarging a picture, pixel values of absent pixels are interpolated (compensated for) by using what is called an interpolation filter or the like.
However, even if pixels are interpolated by using an interpolation filter, it is difficult to obtain a high-resolution picture because HD picture components (high-frequency components) that are not included in an SD picture cannot be restored.
In view of the above, the present applicant previously proposed a picture conversion apparatus which converts an SD picture into an HD picture including high-frequency components that are not included in the SD picture.
In this picture conversion apparatus, high-frequency components that are not included in an SD picture are restored by executing an adaptive process for determining a prediction value of a pixel of an HD picture by a linear combination of the SD picture and predetermined prediction coefficients.
Specifically, for instance, consider the case of determining a prediction value E[y] of a pixel value y of a pixel (hereinafter referred to as an HD pixel where appropriate) constituting an HD picture by using a linear first-order combination model that is prescribed by linear combinations of pixel values (hereinafter referred to as learning data where appropriate) x.sub.1, x.sub.2, . . . of a certain number of SD pixels (pixels constituting an SD picture) and predetermined prediction coefficients w.sub.1, w.sub.2, . . . . In this case, a prediction value E[y] can be expressed by the following formula. EQU E[y]=w.sub.1 x.sub.1 +w.sub.2 x.sub.2 +. . . (1)
For generalization, a matrix W that is a set of prediction coefficients w, a matrix X that is a set of learning data, and a matrix Y' that is a set of prediction values E[y] are defined as follows: ##EQU1##
The following observation equation holds: EQU XW=Y' (3)
Consider the case of determining prediction values E[y] that are close to pixel values y of HD pixels by applying a least squared method to this observation equation. In this case, a matrix Y that is a set of true pixel values y of HD pixels as teacher data and a matrix E that is a set of residuals e of prediction values E[y] with respect to the pixel values y of the HD pixels are defined as follows: ##EQU2##
From Formula (3), the following residual equation holds: EQU XW=Y+E (5)
In this case, prediction coefficients w.sub.i for determining prediction values E[y] that are close to the pixel values y of the HD pixels are determined by minimizing the following squared error: ##EQU3##
Therefore, prediction coefficients w.sub.i that satisfy the following equations (derivatives of the above squared error with respect to the prediction coefficients w.sub.i are 0) are optimum values for determining prediction values E[y] close to the pixel values y of the HD pixels. ##EQU4##
In view of the above, first, the following equations are obtained by differentiating Formula (5) with respect to prediction coefficients w.sub.i. ##EQU5##
Formula (9) is obtained from Formula (7) and (8). ##EQU6##
By considering the relationship between the learning data x, the prediction coefficients w, the teacher data y, and the residuals e in the residual equation of Formula (5), the following normal equations can be obtained from Formula (9): ##EQU7##
The normal equations of Formula (10) can be obtained in the same number as the number of prediction coefficients w to be determined. Therefore, optimum prediction coefficients w can be determined by solving Formula (10) (for Formula (10) to be soluble, the matrix of the coefficients of the prediction coefficients w need to be regular). To solve Formula (10), it is possible to use a sweep-out method (Gauss-Jordan elimination method) or the like.
The adaptive process is a process for determining optimum prediction coefficients w in the above manner and then determining prediction values E[y] that are close to the component signals y according to Formula (1) by using the optimum prediction values w (the adaptive process includes a case of determining prediction coefficients w in advance and determining prediction values by using the prediction coefficients w).
The adaptive process is different from the interpolation process in that components not included in an SD picture but included in an HD picture are reproduced. That is, the adaptive process appears the same as the interpolation process using an interpolation filter, for instance, as long as only Formula (1) is concerned. However, the adaptive process can reproduce components in an HD picture because prediction coefficients w corresponding to tap coefficients of the interpolation filter are determined by what is called learning by using teacher-data y. That is, a high-resolution picture can be obtained easily. From this fact, it can be said that the adaptive process is a process having a function of creating resolution of a picture.
FIG. 9 is an example of a configuration of a picture conversion apparatus which converts an SD picture as a digital signal into an HD picture.
An SD picture is supplied to a delay line 107, and blocking circuits 1 and 2. The SD picture is delayed by, for instance, one frame by the delay line 107 and then supplied to the blocking circuits 1 and 2. Therefore, the blocking circuits 1 and 2 are supplied with an SD picture of a current frame (hereinafter referred to as a subject frame where appropriate) as a subject of conversion into an HD picture and an SD picture of a 1-frame preceding frame (hereinafter referred to as a preceding frame where appropriate).
In the blocking circuit 1 or 2, HD pixels which constitute an HD picture of the subject frame are sequentially employed as the subject pixel and prediction taps or class taps for the subject pixel are formed from the SD pictures of the subject frame and the preceding frame.
It is assumed here that, for example, HD pixels and SD pixels have a relationship as shown in FIG. 10. That is, in this case, one SD pixel (indicated by mark ".quadrature.", in the figure) corresponds to four HD pixels (indicated by mark ".largecircle." in the figure) located at top-left, top-right, bottom-left, and bottom-right positions of the SD pixel and adjacent thereto. Therefore, the SD pixels are pixels obtained by decimating the HD pixels at a rate of one for two in both horizontal and vertical directions.
In the blocking circuit 1 or 2, for example, when some HD pixel is employed as the subject pixel, a block (processing block) of 3.times.3 pixels (horizontal/vertical) having, as the center, the SD pixel in the subject frame corresponding to the subject pixel (HD pixel) is formed as shown in FIG. 10 and a block of 3.times.3 pixels having, as the center, the SD pixel in the preceding frame corresponding to the subject pixel is formed as shown in FIG. 11. The 18 pixels (SD pixels) in total are employed as prediction taps or class taps. FIG. 10 shows SD pixels and HD pixels in the subject frame by marks ".quadrature." and ".largecircle." respectively, and FIG. 11 shows SD pixels and HD pixels in the preceding frame by marks ".box-solid." and ".circle-solid." respectively.
The prediction taps obtained by the blocking circuit 1 are supplied to a prediction operation circuit 6, and the class taps obtained by the blocking circuit 2 are supplied to a class code generation circuit 4 via an ADRC circuit 3.
In the above case, prediction taps and class taps are formed by 3.times.3 SD pixels having, as the center, the SD pixel in the subject frame corresponding to the subject pixel and 3.times.3 pixels having, as the center, the SD pixel in the preceding frame corresponding to the subject pixel. Therefore, the same prediction taps and class taps are formed when any of HD pixels a, b, c, and d shown in FIG. 10 is employed as the subject pixel.
The class taps that have been supplied to the class code generation circuit 4 via the ADRC circuit 3 are classified there. That is, the class code generation circuit 4 outputs, as a class of the class taps (or subject pixel), a value corresponding to a pattern of pixel values of the SD pixels (as described above, 18 SD pixels) that constitute the class taps.
Where a large number of bits, for instance, 8 bits, are allocated to represent the pixel value of each SD pixel, the number of patterns of pixel values of 18 SD pixels is enormous, that is, (2.sup.8).sup.18, making it difficult to increase the speed of the following process.
In view of the above, for example, an ADRC (adaptive dynamic range coding) process that is a process for decreasing the number of bits of SD pixels constituting class taps is executed on the class taps in the ADRC circuit 3 as a pre-process for the classification.
Specifically, in the ADRC circuit 3, first a pixel having the maximum pixel value (hereinafter referred to as a maximum pixel where appropriate) and a pixel having the minimum pixel value (hereinafter referred to as a minimum pixel where appropriate) among the 18 SD pixels constituting the class taps are detected. The difference DR (=MAX-MIN) between the pixel value MAX of the maximum pixel and the pixel value MIN of the minimum pixel is calculated and employed as a local dynamic range of the processing block. Based on the dynamic range DR, the respective pixel values constituting the processing block are re-quantized into K bits that are smaller than the originally allocated number of bits. That is, the respective pixel values constituting the processing block are subtracted by the pixel value MIN of the minimum pixel and resulting differences are divided by DR/2.sup.K.
As a result, the respective pixel values constituting the processing block are expressed by K bits. Therefore, where, for example, K=1, the number of patterns of pixel values of 18 SD pixels become (2.sup.1).sup.18, which is much smaller than in the case where the ADRC process is not executed. The ADRC process for causing pixel values to be expressed by K bits will be hereinafter referred to as a K-bit ADRC process, where appropriate.
The class code generation circuit 4 executes a classification process on the class taps that have been subjected to the above ADRC process, whereby a value corresponding to a pattern of the SD pixel values constituting the class taps is supplied to a prediction coefficients memory 5 as a class of the class taps (subject pixel corresponding to it).
The prediction coefficients memory 5 stores, for each class, prediction coefficients that have been determined in advance through learning. When supplied with a class from the class code generation circuit 4, the prediction coefficients memory S reads out prediction coefficients that are stored at an address corresponding to the class and supplies those to the prediction operation circuit 6.
In the prediction operation circuit 6, the operation represented by Formula (1), that is, an adaptive process, is performed by using prediction taps (pixel values of SD pixels constituting the prediction taps) x.sub.1, x.sub.2, . . . that are supplied from the blocking circuit 2 and prediction coefficients adapted to the prediction taps, that is, prediction coefficients w.sub.1, w.sub.2, . . . corresponding to the class of the subject pixel that are supplied from the prediction coefficients memory 5. A prediction value E[y] of the subject pixel y is thereby determined and output as a pixel value of the subject pixel (HD pixel).
Thereafter, similar processes are sequentially executed while the other HD pixels of the subject frame are employed as the subject pixel, whereby the SD picture is converted into an HD picture.
FIG. 12 shows an example of a configuration of a learning apparatus which executes a learning process for calculating prediction coefficients to be stored in the prediction coefficients memory 5 shown in FIG. 9.
An HD picture (HD picture for learning) to serve as teacher data y of learning is supplied to a decimation circuit 101 and a teacher data extraction circuit 27. In the decimation circuit 101, the HD pixel is reduced in the number of pixels by decimation and is thereby converted into an SD picture (SD picture for learning). Specifically, since one SD pixel corresponds to four HD pixels adjacent thereto as described above in connection with FIG. 10, for example, in the decimation circuit 101 the HD picture is divided into blocks of 2.times.2 HD pixels and the average value of those pixels is employed as a pixel value of the SD pixel located at the center of each block of 2.times.2 HD pixels (i.e., the SD pixel corresponding to the 2.times.2 HD pixels).
The SD picture obtained by the decimation circuit 101 is supplied to a delay line 128 and blocking circuits 21 and 22.
The delay line 128, the blocking circuits 21 and 22, an ADRC circuit 23, and a class code generation circuit 24 execute the same processes as the blocking circuits 1 and 2, the ADRC circuit 3, and the class code generation circuit 4 shown in FIG. 9, respectively. As a result, the blocking circuit 21 outputs prediction taps that have been formed for the subject pixel and the class code generation circuit 24 outputs a class of the subject pixel.
The class that is output from the class code generation circuit 24 is supplied to respective address terminals (AD) of a prediction taps memory 25 and a teacher data memory 26. The prediction taps that are output from the blocking circuit 21 are supplied to the prediction taps memory 25. The prediction taps memory 25 stores, as learning data, the prediction taps that are supplied from the blocking circuit 21 at an address corresponding to the class that is supplied from the class code generation circuit 24.
On the other hand, a teacher data extraction circuit 27 extracts an HD pixel as the subject pixel from an HD picture supplied thereto, and supplies it to the teacher data memory 26 as teacher data. The teacher data memory 26 stores the teacher data that is supplied from the teacher data extraction circuit 27 at an address corresponding to the class that is supplied from the class code generation circuit 24.
Thereafter, similar processes are executed until all HD pixels constituting the HD picture that is prepared for the learning in advance are employed as the subject pixel.
As a result, SD pixels and an HD pixel that are in the positional relationships described above in connection with FIGS. 10 and 11 are stored, as learning data x and teacher data y, at the same addresses of the prediction taps memory 25 and the teacher data memory 26, respectively.
The prediction taps memory 25 and the teacher data memory 26 can store plural pieces of information at the same address, whereby a plurality of learning data x and a plurality of teacher data y that are classified as the same class can be stored at the same addresses.
Then, an operation circuit 29 reads out, from the prediction taps memory 25 and the teacher data memory 26, pixel values of SD pixels constituting prediction taps as learning data and pixel values of HD pixels as teacher data that are stored at the same addresses, respectively. The operation circuit 29 calculates prediction coefficients that minimize errors between prediction values and the teacher data by a least squares method by using those pixel values. That is, the operation circuit 29 establishes normal equations of Formula (10) for each class and determines prediction coefficients for each class by solving the normal equations.
The prediction coefficients for the respective classes that have been determined by the operation circuit 29 in the above manner are stored in the prediction coefficients memory 5 shown in FIG. 9 at an address corresponding to the class.
Where prediction coefficients obtained by the learning apparatus of FIG. 12 are stored in the prediction coefficients memory 5 of the picture conversion apparatus of FIG. 9 and a conversion from an SD picture to an HD picture is performed, basically the picture quality of a resulting HD picture can be improved by increasing the number of SD pixels constituting class taps and prediction taps.
However, as the number of SD pixels constituting class taps and prediction taps is increased, the SD pixels come to include ones that are distant from the subject pixel spatially or temporally. In such a case, SD pixels having no correlation with the subject pixel come to be included in class taps and prediction taps. Once this situation occurs, it is difficult to improve the picture quality of an HD picture by further adding SD pixels having no correlation to class taps and prediction taps.