The present disclosure relates to a signal processing apparatus, a signal processing method, an output apparatus, an output method, and a program and particularly, to a signal processing apparatus, a signal processing method, an output apparatus, an output method, and a program that enable an accurate base signal to be obtained.
Recently, various image restoration technologies using sparse coding have been studied. The sparse coding is a method of modeling a human visual system, decomposing a signal into base signals, and representing the signal.
Specifically, in the human visual system, an image that is captured by a retina is not transmitted to an upper recognition mechanism as it is and is decomposed into a linear sum of a plurality of base images as represented by the following expression 1 and is transmitted, at a stage of an early vision.(Image)=Σ[(Coefficient)×(Base Image)]  (1)
In the expression 1, a large number of coefficients become 0 and only a small number of coefficients become large values. That is, the coefficients become sparse. For this reason, the method of modeling the human visual system, decomposing the signal into the base signals, and representing the signal is called the sparse coding.
In the sparse coding, first, the base signal that is modeled by the above expression 1 is learned using a cost function represented by the following expression 2. In this case, it is assumed that a signal becoming a sparse coding object is an image.L=argmin {∥Dα−Y∥2+μ∥α∥0}  (2)
In the expression 2, L denotes a cost function and D denotes a matrix (hereinafter, referred to as a base image matrix) in which an arrangement of pixel values of individual pixels of base images in a column direction is arranged in a row direction for every base image. In addition, α denotes a vector (hereinafter, referred to as a base image coefficient vector) in which coefficients of the individual base images (hereinafter, referred to as base image coefficients) are arranged in the column direction and Y denotes a vector (hereinafter, referred to as a learning image vector) in which pixel values of individual pixels of learning images are arranged in the column direction. In addition, μ denotes a previously set parameter.
Next, in the expression 2, a base image coefficient when the cost function calculated using the learned base image and the sparse coding object image, instead of the learning image, becomes a predetermined value or smaller, is calculated.
Recently, a method of dividing the sparse coding object image into blocks and calculating base image coefficients in units of the blocks has been devised (for example, refer to Michal Aharon, Michael Elad, and Aired Bruckstein, “K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation”, IEEE TRANSACTION ON SIGNAL PROCESSING, VOL. 54, NO. 11, NOVEMBER 2006, P4311-4322).
As restrictions for the base image coefficient in the cost function, in addition to an L0 norm represented by the expression 2, an L1 norm or an approximate expression of the L1 norm exists (for example, refer to Libo Ma and Liqing Zhang, “Overcomplete topographic independent component analysis”, Neurocomputing, 10 Mar. 2008, P2217-2223). When the base image coefficient is restricted by the L1 norm, the cost function is represented by the following expression 3 and when the base image coefficient is restricted by the approximate expression of the L1 norm, the cost function is represented by the following expression 4.L=argmin {∥Dα−Y∥2+μ∥α∥1}  (3)L=argmin {∥Dα−Y∥2+μF(αTα)}F(y)=a√{square root over (y)}+b  (4)
In the expressions 3 and 4, L denotes a cost function, D denotes a base image matrix, α denotes a base image coefficient vector, Y denotes a learning image vector, and μ denotes a previously set parameter. In the expression 4, a, y, and b denote previously set parameters.
Meanwhile, a most important element of the sparse coding is learning of the base signals. In the related art, the base signals are learned commonly to all signals.