Field of the Invention
The present invention relates generally to improved systems and methods for image recognition. More particularly, the present invention relates to systems and methods for pattern recognition in digital images. Even more particularly, the present invention relates to systems and methods for performing image classification and recognition functions utilizing a new and novel semi-metric distance measure called the Poisson-Binomial Radius (PBR) based on the Poisson-Binomial distribution.
Description of the Related Art
Machine learning methods such as Support Vector Machines (SVM), principal component analysis (PCA) and k-nearest neighbors (k-NN) use distance measures to compare relative dissimilarities between data points. Choosing an appropriate distance measure is fundamentally important. The most widely used measures are the sum of squared distances (L2 or Euclidean) and the sum of absolute differences (L1 or Manhattan).
The question of which to use can be answered from a maximum likelihood (ML) perspective. Briefly stated, L2 is used for data which follows an i.i.d Gaussian-distribution, whereas L1 is used in the case of i.i.d Laplace-distributed data. See [1], [2]. Consequently, when the underlying data distribution is known or well estimated, the metric to be used can be determined.
The problem arises when the probability distributions for input variables are unknown or non-identical. Taking image acquisition as an example, images captured by modern digital cameras are always corrupted by noise. See [3]. For example, the output of a charge-coupled device (CCD) sensor carries a variety of noise components such as photon noise, fixed-pattern noise (FPN) along with the useful signal. See [4]. Moreover, images are prone to corruption by noise during signal amplification and transmission. See [5]. Some of the most common types of noise found in the literature are additive, impulse or signal-dependent noise. However, the type and amount of noise generated by modern digital cameras tends to depend on specific details such as the brand and series name of the camera in addition to camera settings (aperture, shutter speed, ISO). See [6]. Further, image file format conversions and file transfers resulting in the loss of metadata can add to this problem. Even if the captured image appears to be noise-free, it may still consist of noise components unnoticeable to the human eye. See [7]. Given that feature descriptors are subject to such heterogeneous noise sources, it is reasonable therefore to assume that such descriptors are independent but non-identically distributed (i.n.i.d). See [8].
Inherent in most distance measures is the assumption that the input variables are independent and identically distributed (i.i.d). Recent progress in biological sequencing data analysis and other fields have demonstrated that in reality, input data often does not follow the i.i.d assumption. Accounting for this discrepancy has been shown to lead to more accurate decision-based algorithms.
Several threads have contributed to the development of semi-metric distance measures. The first relates to the axioms which need to be satisfied by distance measures in order to qualify as distance metrics. These are the axioms of non-negativity, symmetry, reflexivity and the triangle inequality. Measures which do not satisfy the triangle inequality axiom are by definition called semi-metric distances.
Although distance metrics are widely used in most applications, there have been good reasons to doubt the necessity for some of the axioms, especially the triangle equality. For example, it has been shown that the triangle inequality axiom is violated in a statistically significant manner when human subjects are asked to perform image recognition tasks. See [9]. In another example, distance scores produced by the top performing algorithms for image recognition using the Labelled Faces in the Wild (LFW) and Caltech101 datasets have also been shown to violate the triangle inequality. See [10].
Another thread involves the “curse of dimensionality.” As the number of dimensions in feature space increases, the ratio of distances of the nearest and farthest neighbors to any given query tends to converge to unity for most reasonable data distributions and distance functions. See [11]. The poor contrast between data points implies that nearest neighbor searches in high dimensional space become insignificant. Consequently, the fractional Lp semi-metric [12] was created as a means of preserving contrast. If (xi,yi) is a sequence of i.i.d. random vectors, the Lp distance is defined as:
                                          L            p                    ⁡                      (                          x              ,              y                        )                          =                              (                                          ∑                                  i                  =                  1                                n                            ⁢                                                          ⁢                              |                                                      x                    i                                    -                                      y                    i                                                  ⁢                                  |                  p                                                      )                                1            ⁢                          /                        ⁢            p                                              (        1        )            Taking p=1 gives the Manhattan distance and p=2, the Euclidean distance. For values of p ε (0,1), Lp gives the fractional Lp distance measure.
In a template matching study for face and synthetic images comparing the Lp and L2 distances, it was concluded that values of p ε (0.25,0.75) outperformed L2 when images were degraded with noise and occlusions. See [13]. Other groups have also used Lp distance to match synthetic and real images. See [14]. The idea of using Lp distance for content-based image retrieval has been explored by Howarth et al [15] and results suggest that p=0.5 might yield improvements in retrieval performance and consistently outperform both the L1 and L2 norms.
Other semi-metric distances worth mentioning are the Dynamic Partial Function (DPF) [16], Jeffrey Divergence (JD) [17] and Normalized Edit Distance (NED) [18].
To date, no distance measure has been demonstrated in pattern recognition to handle i.n.i.d. distributions. Thus, there is a need for improved systems and methods for pattern recognition.