This invention relates generally to data classification and more particularly to classifying unknown data patterns in mult-variate feature space.
Data classification is used for many different applications. For example, it is suitable for use in the remote monitoring of a complex system, where the system under observation and the monitoring equipment and personnel are located at different locations. In this type of remote monitoring application, a sensor suite is typically connected to the system for acquiring operational data and transferring the data to the monitoring equipment and personnel. The data comprise data features, which are raw data that have been cleaned of errors not compatible with further processing in the system or data which have been transformed to extract information from the raw data (e.g., statistical properties, transformations into a different domain, principle component analysis, etc.). The remote monitoring equipment and personnel can use the data features to distinguish an array of possible fault conditions such as tripped performance thresholds, faulty actuators, some material fatigue faults, etc. Generally, the monitoring equipment and personnel use algorithms that employ methods such as threshold detection, classification, case-based reasoning, expert systems, etc., to recognize known faults. For the most part, these algorithms when tuned properly work reasonable in recognizing known faults. However, problems arise when a new condition is encountered that these algorithms have not been trained to recognize. In this type of situation, misclassifications by the algorithms are often unavoidable. Misclassifications can result in an incorrect assessment of the operating conditions, which may lead to a potentially undesirable and even damaging outcome. Therefore, there is a need for a mechanism that can detect previously unknown data patterns and set up classes for these unknown patterns so that when encountered again, the patterns can quickly and accurately be recognized.
In accordance with this invention, there is provided a system and a method for classifying data obtained from a process in a multi-variate feature space. In this embodiment, an observer evaluates the closeness of the data to any one of a plurality of known classes defined in the multi-variate feature space. A classifier classifies the evaluated data into one of the plurality of known classes. A flagger flags data having an unknown classification. A label requester requests that the flagged data be interpreted into either a new class or one of the plurality of known classes. A classifier adjuster adjusts the classifier according to the interpretation.
In accordance with another embodiment, there is a multi-variate data assessment tool and method for assessing data obtained from a process. In this embodiment, an observer evaluates the closeness of the data to any one of a plurality of known classes defined in a multi-variate feature space for the process. A classifier classifies the evaluated data into one of the plurality of known classes. A flagger flags data having an unknown classification. A label requester requests that the flagged data be interpreted into either a new class or one of the plurality of known classes. A new class adder adds a data cluster representative of a new class into the classifier. A class resetter resets the plurality of known classes in the classifier to accommodate new class boundaries.