1. Field
The proposed method is used for classification in open-set scenario, whereby it is often not possible to obtain training data beforehand for all possible classes that can occur in the test phase (which corresponds to the proposed method). During test phase, the test samples belonging to one of the classes used in training phase must be classified as the correct known class and the test samples belonging to any other classes must be rejected and classified as unknown.
In comparison to the existing solutions, the present invention adds value to solutions/products by providing outperformance on classification and recognition systems, such as: fingerprint recognition, face recognition, speech recognition, object recognition, scene recognition, character recognition, actions recognition, remote sensing image classification, and others general pattern recognition applications. Medical applications can also take advantage of the present invention, considering that most of these real medical cases must deal with unknown classes (e.g.: a new type of cancer, a view of the heart ultrasound image in which the doctors are not interested in, unknown types of diseases, etc.).
2. Description of the Related Art
A method known in the prior art is the Optimum-Path Forest (OPF) classifier, which is a graph-based classifier that was developed as a generalization of other method, the Image Forest Transform (IFT), and is inherently multiclass, and independent of parameterization. OPF is similar to the well-known k-Nearest Neighbors (kNN) method. The OPF makes no assumption about the shapes of the classes and can support some degree of intersection/overlapping between the classes. OPF has shown good results in many classification problems. Notice that OPF is inherently closed set, i.e., a test sample is always classified as one of the trained classes.
Another known method in the prior art is the traditional binary Support Vector Machine (SVM) classifier, which can assign a test sample to a certain class even if the test sample is very different from the training samples of the class. SVM defines half-spaces and does not verify how far the test sample is from the training samples. This strong generalization may not be useful in the open-set scenario; probably the test sample faraway from hyper plain must be classified as unknown, instead of one of the known classes. Therefore, the SVM can be considered as a binary closed-set classifier. However, the One-vs-All approach applied on the SVM generates a multiclass (MCSVM) classifier, which can be considered proper to the open-set scenario.
Patent document JP 2993826 B2, titled: “Method and Device for Recognizing Signal and Learning Method and Device of Signal Recognizing Device”, published on Dec. 27, 1999, describes a recognition method that handles open-set scenario in the trivial way: by simply defining a threshold in the classification output. For example, the study/work of Phillips et al (P. J. Phillips, P. Grother, R. Micheals, “Evaluation methods in face recognition”, in: S. Z. Li, A. K. Jain (Eds.), Handbook of Face Recognition, Springer, 2011, pp. 551-574) also used the approach of defining a threshold on the “similarity score” to classify as unknown. However, this approach has drawbacks like performance degradation, considering the difficulty to establish/compute similarity scores on high dimensional space. The proposed method of the present invention defines a threshold on the “ratio of similarity scores”, which is different and advantageous in comparison to a “threshold on the similarity score” (proposed by JP 2993826 B2, and which is the most used approach to handle open-set classification problems), as will be demonstrated along the specifications of the present invention. Another difference is that the proposed solution in the document JP 2993826 B2 allows updating the “rejection threshold” and other parameters, while the method of the present invention obtain the threshold on the ratio on similarity score and this threshold is not updated anymore during the testing phase.
Patent document US 2013/0144937 A1, titled: “Apparatus and Method for Sharing User's Emotion”, published on Jun. 6, 2013, describes a closed-set classification method. Differently from the method proposed in the present invention, the document US 2013/0144937 A1 does not propose an open-set recognition method and does not use ratio on similarity scores of classification. An eventual similarity with the method proposed in the present invention is the fact that the method of US 2013/0144937 A1 document uses an “emotion rate”, based on two “emotional states” (see claim 1 of US 2013/0144937 A1). But the “emotional state” disclosed on US 2013/0144937 A1 document is not the same of “similarity score” of the method proposed in the present invention. According to FIG. 2, claim 1 and paragraphs [0056]-[0058] of US 2013/0144937 A1 specifications, it becomes clear that the “emotion rate” is used to give an answer (classification) to the user with a degree (rate) within a range of possible emotional states, i.e., the “emotion rate” is not used for recognizing purposes. The document US 2013/0144937 A1 does not mention or suggest a “ratio” or comparison of different classification scores for the purpose of automatically learning the relationship of known classes to eliminate the ones that are unknown at testing phase, as proposed by the method of the present invention.
The next 4 patent documents U.S. Pat. No. 7,308,133, U.S. Pat. No. 8,306,818, U.S. Pat. No. 8,515,758 and KR 2013-0006030 do not propose open-set recognition methods; rather they aim at proposing solutions to closed-set classification/recognition scenarios.
Patent document U.S. Pat. No. 7,308,133 B2, titled: “System and Method of Face Recognition Using Proportions of Learned Model”, published on Dec. 11, 2007, proposes a system and method for performing face recognition using proportions of the learned model. It refers to a classifier to classify multiple profiles of individuals additionally to the frontal face. Based on the image used in the testing phase, the matter disclosed on document U.S. Pat. No. 7,308,133 B2 generates different versions (proportions) of that image to match against the training ones. A voting scheme, whereby each proportion of the image generates a vote, is used to decide the class of the testing image. But it is not mentioned how are treated the cases in which the testing image belongs to none of the training classes (i.e., it is a closed-set scenario). The term “unknown” used in patent document U.S. Pat. No. 7,308,133 refers to the sample that appear during testing phase in which its class is not known before the classification is performed, but the document U.S. Pat. No. 7,308,133 B2 assumes the test sample belongs to at least one of the training classes and it will be classified as such. Differently, in the context of the present invention, the term “unknown” is used to refer to the test samples that belong to none of the training classes. Furthermore, in contrast with the present invention, the document U.S. Pat. No. 7,308,133 B2 does not propose an open-set recognition method and it does not use ratio on similarity scores for classification.
Patent document U.S. Pat. No. 8,306,818 B2, titled: “Discriminative Training of Language Models Text and Speech Classification”, published on Nov. 6, 2012, describes a statistical classifier for the specific problem of speech and text classification. This classifier does not perform open-set recognition, as can be seen in FIG. 4 of document U.S. Pat. No. 8,306,818 B2. In fact, the test sample is assigned to the “class with the highest resulting value” (column 8, line 46), i.e., it is a closed-set classifier. It is also explained that the classifier can classify the test sample into one or more classes, depending on the probability (column 1, line 44). The term “unknown” mentioned in patent document U.S. Pat. No. 8,306,818 B2 refers to the classes of words not considered by the system, i.e., explicitly eliminated from it. Differently, in the context of the present invention, the term “unknown” refers to the test samples that belong to none of the training classes. Furthermore, in contrast with the present invention, the document U.S. Pat. No. 8,306,818 B2 does not propose an open-set recognition method and it does not use ratio on similarity scores for classification.
Patent document U.S. Pat. No. 8,515,758 B2, titled: “Speech Recognition Including Removal of Irrelevant Information”, published on Aug. 20, 2013, presents a speech recognition system based on a statistical classifier responsible for classifying an input utterance, and the classification method is not based on similarity scores (neither on “ratio on similarity scores”, as proposed in the method of the present invention). The term “unknown” mentioned in document U.S. Pat. No. 8,515,758 B2 refers to the test sample whereby its correct class is not known a priori, i.e., when it is not known which of the training classes the test sample belongs to. Differently, in the present invention, the “unknown” term refers to the test sample that belongs to none of the training classes.
Patent document KR 2013/0006030 A, titled: “Construction Method of Classification Model for Emotion Recognition and Apparatus Thereof”, published on Jan. 16, 2013, presents a method and apparatus to classify a plurality of emotions (emotion recognition) from biometric data by using a binary system. In contrast with the present invention, patent document KR 2013/0006030 A does not present an open-set recognition method, i.e., a method to classify a test sample as belonging to none of the trained classes. The document KR 2013/0006030 A does not present any kind of “ratio of similarity scores” to use in classification.
Patent document US 2006/0933208 A1, titled: “Open-Set Recognition Using Transduction”, published on May 4, 2006, proposes the TCM-kNN (Transductive Confidence Machine-k Nearest Neighbors), a method for biometric open-set recognition. A system specifically designed in terms of a face recognition application, and mentioned the method can be applied to other recognition problems, but do not go into details how it would be done. In order to allow unknown classification, it is automatically defined a rejection threshold on what the inventors called “peak-to-side ratio”. The peak-to-side ratio is a value that can be obtained based on p-values for each training class. The threshold is obtained based on several peak-to-side ratios (one for each training class). The method of document US 2006/0933208 A1 can be used in applications whereby each training class can be represented as a template sample (or sample identifier). In contrast, the proposed method in the present invention can be applied to general open-set recognition problems, not only those in which all training samples of a certain class can be condensed on a template sample.
A contribution of the WO 20060933208 A1 solution is on automatically defining a threshold on the training template samples to allow unknown classification. In contrast, the method of the present invention does not define the threshold on the distance function, but rather the threshold is defined on the ratio of the distances (similarity scores) of two different classes. Furthermore, it is not clear on document US 2006/0933208 A1 whether the peak-to-side ratio continues to make sense or work perfectly at all when the classifier is trained with several samples for each class.
The patent document WO 2008008142 A2, titled: “Machine Learning Techniques and Transductive Data Classification”, published on January 17, describes a binary classifier that uses transductive learning, which is a type of semi-supervised learning. The proposed method does not deal with transduction classification neither with semi-supervised classification, i.e., does not use the “unlabeled data points as training examples” nor propagate labels of known examples to unknown ones. Finally, the method proposed in document WO 2008008142 A2 requires at least one labeled example per class, which transform the problem in a closed-set classification problem. This is not the scope and purpose of the proposed method in the present invention.
The report titled: “Estimating The Support of a High-Dimensional Distribution” (B. Schölkopf; J. Platt; J. Shawe-Taylor; A. Smola; R. Williamsom; Technical Report MSR-TR-99-87; Microsoft Research; 1999), Schölkopf et al. (1999) proposes an extension of the SVM called the one-class SVM (OCSVM). This classifier is trained on just one known class, and finds the best margin with respect to the origin. This is the most reliable approach in cases where the access to a second class is very difficult or even impossible. Despite this approach is very suitable for the open-set scenario, it refers to a one-class classifier (binary-based) and therefore it does not take advantages of all available classes for training (since it only uses just one known class for training, even if there are other available classes).
In fact, the paper “Relevance Feedback in Image Retrieval: a Comprehensive Review” (X. Zhou and T. Huang; Multimedia Systems 8; pp. 536-544; Springer-Verlag; 2003) mentions that the OCSVM has a limited use because it does not provide good generalization or specialization ability. Differently from the method disclosed by Schôlkopf et al. (1999), the method proposed in the present invention is a multiclass classifier, and it is not SVM-based.
The Paper “Open-Set Source Camera Attribution” (F. Costa, M. Eckmann, W. Scheirer and A. Rocha; XXV SIBGRAPI—Conf. on Graphics, Patterns and Images, 2012, pp. 71-78), Costa et al. (2012) presents a camera source attribution algorithm considering the open-set scenario. As the original binary SVM risk minimization is based only on the known classes, it can misclassify the negative and unknown classes that can appear in the test phase. So, Costa et al. (2012) proposed the method called in the state of the art as SVM with Decision Boundary Carving (SVMDBC) which minimizes the risk of the unknown instead of finding the maximum margin separation hyper plane. The minimization of the risk of the unknown is done by moving the decision hyper plane found by the traditional SVM by a value c inwards or outwards the positive class. The value c is defined by an exhaustive search to minimize the training data error. The method SVMDBC disclosed by Costa et al. (2012) is SVM-based, binary and open-set, in contrast with the method of the present invention which is not SVM-based, multiclass and open-set scenario. Effectively, the method presented by Costa et al. (2012) refers to a method for controlling false positives.
The Paper titled: “Towards Open-Set Recognition” (W. Scheirer, A. Rocha, A. Sapkota, T. Boult; IEEE Transactions on Pattern Analysis and Machine Intelligence—TPAMI, July 2013, vol. 35, no. 7, pp. 1757-1772): Scheirer et al. (2013) introduced the 1-vs-Set Machine with a linear kernel formulation that can be applied to both the binary and one-class SVMs. Also, the objective is to minimize the risk of the unknown, what is done by minimizing the positive labeled region (i.e., the open space risk) combined with margin constraints to minimize empirical risk (measured on training data). Scheirer et al. (2013), similarly to the disclosed matter of Costa et al. (2012), also moves the original SVM hyper plane inwards the positive class, but now adding a far hyper plane “after” the positive samples aiming to decrease the open space risk. The hyper planes are initialized to contain all the positive samples. Then, a refinement step is performed to adjust the hyper plane in order to generalize or train the classifier according to the user parameters. As noted by Scheirer et al. (2013), better results are usually obtained when the SVM original hyper plane is neat to the positive boundary aiming a specialization, and the added hyper plane is adjusted aiming generalization. Despite the generalization of the second hyper plane, this is a form of specialization when compared to the original SVM where it can be considered the second hyper plane is at infinity. According to the authors of the article, the negative samples after the second hyper plane (added) are not close to the positive samples, and that is the reason of the generalization of this hyper plane. The method proposed in the present invention goes further since it can handle multiclass open-set classification problems and define a bounded open space of risk.
Open space of risk refers to the region in the feature space of the samples, such that any sample inside this region is always classified as one of the known classes instead of unknown. In the state of the art, the proposed solutions to handle with open-set recognition problem are mainly SVM-based solutions. In these solutions, the objective is to minimize the risk of the unknown by minimizing the open space of risk. As SVM methods define half-spaces (i.e., a single borderline), it is not trivial to create a bounded open space. Every SVM extension or improvement for open-set scenario found in the state of the art maintains an unbounded open space of unlimited risk. The challenge of potential solutions is to minimize the open space of risk, preferably creating a finite open space of risk, what the present invention does.
According to the known state of the art and the analyzed solutions, there are four types of methods to classification/recognition problem:
(1) Multiclass and closed-set: OPF, kNN;
(2) Binary and closed-set: SVM;
(3) Binary and open-set: SVM, OCSVM (Scholkopf et al. 1999), SVMDBC (Costa et al. 2012), 1-vs-Set Machine (Scheirer et al., 2013);
(4) Multiclass and open-set: the method of the present invention; MCSVM (adapted SVM); adjusted kNN using a threshold on the similarity score (adapted KNN); MCSVMDBC (SVMDBC adapted using One-vs-All approach); and One-vs-All Machine (when adapted to open-set scenario using One-vs-All approach).
No paper or patent document discloses a multiclass method inherently for open-set scenario, despite prior art can be adapted to be multiclass and open-set (however, without good results). As will be described in details hereinafter, the method proposed in the present invention has novel, distinct features and aspects in comparison to prior art solutions.