Principal Components Analysis (PCA) is widely used in computer vision, pattern recognition, and signal processing. PCA enables analysis by reducing multidimensional data sets to a lower number of dimensions. PCA can be considered an orthogonal linear transformation that maps data to new coordinate systems, with the greatest variance by any projection of the data lying on the first coordinate (called the first principal component or principal subspace), the second greatest variance on the second coordinate, etc. Thus, PCA is conventionally used for dimensionality reduction in data sets by retaining those attributes of the data set that contribute most to its variance in lower-order principal components and discarding higher-order principal components. The low-order components usually contain the “most important” attributes of the data.
In face recognition, for example, PCA is performed to map samples into a low-dimensional feature space where the new representations are viewed as expressive features. In a feature space, each sample is typically represented as a point in n-dimensional space that has a dimensionality determined by the number of features used to describe the sample. Discriminators like Linear Discriminant Analysis (LDA), Locality Preserving Projection (LPP), and Marginal Fisher Analysis (MFA) are performed in the PCA-transformed spaces. In active appearance models (AAM) and 3-dimensional (3D) morphable models, textures and shapes of faces are compressed in PCA-learned texture and shape subspaces. These texture and shape features enable deformation and matching between faces.
In manifold learning, tangent spaces of a manifold are presented by the PCA subspaces and tangent coordinates are the PCA features. Representative algorithms in manifold learning, such as Hessian Eigenmaps, local tangent space alignment (LTSA), S-Logmaps, and Riemannian normal coordinates (RNC) are all based on tangent coordinates. In addition, K-Means, the classical algorithm for clustering, was proven equivalent to PCA in a relaxed condition. Thus, PCA features can be naturally adopted for clustering. The performance of the algorithms mentioned above is determined by the subspaces and the features yielded by PCA. There are also variants of PCA, such as probabilistic PCA, kernel PCA (KPCA), robust PCA, non-negative PCA, weighted PCA, generalized PCA, and sparse PCA.
However, PCA has some limitations. First, PCA is sensitive to noise. Noise samples may cause significant change of the principal subspaces. PCA becomes unstable with perturbed sample points. To address this issue, robust PCA algorithms have been proposed, but these sacrifice the simplicity of PCA.
Weighted PCA was developed to perform smoothing on local patches of data in manifold learning. An iterative approach was used to compute weights that have convergences that cannot be guaranteed. Weighted PCA is performed only on local patches of data with no insight into how to derive a global projection matrix from locally weighted scatters.
In principle, PCA is only reasonable for samples in Euclidean spaces where distances between samples are measured by L2 norms. For non-Euclidean sample spaces, the scatter of samples cannot be represented by the summation of Euclidean distances. For instance, histogram features are non-Euclidean. Their distances are better measured by Chi square techniques. Therefore, the principal subspaces of such samples cannot be optimally obtained by conventional PCA. The KPCA algorithm was designed for extracting principal components of samples whose underlying spaces are non-Euclidean. However, KPCA cannot explicitly produce principal subspaces of samples, which are required in many applications. Besides, KPCA is also sensitive to noise data because its criterion for optimization is intrinsically equivalent to PCA.