1) Field of the Disclosure
The disclosure relates generally to machine learning data classification systems and methods, and more particularly, to a computer implemented data driven classification system and method having systems and methods for troubleshooting an associative memory.
2) Description of Related Art
Data driven classification or management systems, such as machine learning systems, may use associative memory systems or approaches using artificial intelligence, neural networks, fuzzy logic, and/or other suitable technologies capable of forming associations between pieces of data and then retrieving different pieces of data based on the associations. The different pieces of data in the associative memory may come from various data sources.
Associative memory systems or approaches used in data driven classification and data management systems may be used to manipulate data for use in downstream systems to provide information that may be used in making technical data or industry decisions. The accuracy and performance of the associative memory are important for providing better, more accurate data results to such downstream systems. To address accuracy and performance related problems that occur within the associative memory, or to simply improve accuracy and performance of the associative memory, it is desirable to have efficient, reliable, and low cost troubleshooting systems and methods to effectively identify, evaluate and/or correct such accuracy and performance related problems within the associative memory, and to improve the accuracy and performance of the associative memory.
Text based data sets, such as records, stored within an associative memory system of a data driven classification or management system may require additional data-related work by analysts, or other system-support personnel, to address underlying accuracy-related problems. Identifying such text based data sets requiring accuracy-related attention that are stored among the large amount of data contained in the associative memory system may be challenging.
Known associative memory troubleshooting systems and methods to identify text based data sets or records within an associative memory exist. Such known troubleshooting systems and methods may employ combinations of statistical methods, keyword-based groupings, and intensive manual analysis of individual records. However, such known troubleshooting systems and methods may require many hours of labor to perform, which may result in increased labor costs and increased work time. In addition, identifying records of below-threshold accuracy using such known troubleshooting systems and methods may require an extensive amount of effort from teams of analysts, which may also result in increased labor costs.
Thus, it would be advantageous to have an efficient and low cost data driven classification and troubleshooting system and method that provide useful insight to enable and guide analysts, or other system-support personnel, to easily and quickly identify specific areas within the associative memory system of text based data sets, such as records, requiring accuracy-related attention, in order to improve the accuracy of such text based data sets, and in turn, to improve the accuracy of the associative memory system.
Similarly, individual classifications stored within an associative memory system of a data driven classification or management system may require additional work by analysts, or other system-support personnel, to address underlying accuracy-related problems. Identifying such individual classifications requiring accuracy-related attention that are stored among the large number of classifications contained in the associative memory system may prove challenging.
Known associative memory troubleshooting systems and methods to identify classifications within an associative memory exist. Such known troubleshooting systems and methods may include reporting of overall classification error rates. However, such reporting of overall classification error rates may not be useful for identifying individual classifications within the associative memory. Moreover, known troubleshooting systems and methods may employ combinations of statistical methods, keyword-based groupings, and intensive manual analysis of individual records. However, such known troubleshooting systems and methods may require many hours of labor to perform, which may result in increased labor costs and increased work time.
Thus, it would be advantageous to have an efficient and low cost data driven classification and troubleshooting system and method that does not simply report overall classification error rates, and that provide useful insight to enable and guide analysts, or other system-support personnel, to easily and quickly identify individual classifications within the associative memory system requiring accuracy-related attention, in order to improve the accuracy of such individual classifications, and in turn, to improve the accuracy of the associative memory system.
Moreover, there may be a challenge of knowing where in the associative memory system to look to repair or remedy problems that contribute to an underperforming data driven classification or management system and an underperforming associative memory, where the associative memory system is used as a data source to the underperforming system. Typically, analysts, or other system-support personnel, may only be able to observe that individual results are not producing good similarity matches, but may not be able to get a system-level view of the associative memory system to pinpoint root causes for poor matches or mismatches. In addition, because the associative memory system is used as the only data source, no method exists to directly query with common computer programming languages, such as standard SQL (Structured Query Language), that may be used if a relational database was the only data source.
Analysts, or other system-support personnel, that use an associative memory system may gain some level of insight into an underperforming memory by examining individual records. From a given individual record, the associative memory system may present other results in descending order of similarity relevance. The analyst, or other system-support personnel, must look at those results and then determine which associative memory categories and values are producing good similarity matches and which are producing bad matches. The process may be a labor-intensive one and may not provide an adequate way to obtain a global view of contributions by all categories and values across the associative memory system.
Thus, it would be advantageous to have a quick and efficient data driven classification and troubleshooting system and method that provide a system-level view of the associative memory system to pinpoint root causes for poor matches or mismatches of data, and that provide a means for data from the associative memory system to be queried by common computer programming languages, such as standard SQL (Structured Query Language).
Further, an associative memory system of a data driven classification or management system may contain a previously created domain vocabulary consisting of canonical designations and their corresponding variants that are specific to a domain (i.e., given sphere of knowledge or activity) and that have been generated from free text data or other data sources. Such free text data may need to be “cleaned” and/or normalized into a domain vocabulary that enables downstream systems that utilize free text data to generate more effective results.
Adequately capturing the domain vocabulary greatly improves the ability of associative memory queries to find all respective groupings of similar records that exist across the entire associative memory data set. An optimized domain vocabulary that maximizes such query ability of the associative memory system is desirable.
Thus, it would be advantageous to have a data driven classification and troubleshooting system and method that identify how well a domain vocabulary matches across an associative memory's data set, in order to assess whether the domain vocabulary needs improvement or optimization.
In addition, an associative memory system of a data driven classification or management system may have source records that are missing key information. An associative memory system generates results based upon the amount of detail present in a source record. If a source record is sparse, in that minimal information is present to utilize for comparison against other records in a data set, then results returned by the associative memory system may not be well correlated. The more information the source record has, the better correlated will be the results returned by the associative memory system.
Thus, it would be advantageous to have a data driven classification and troubleshooting system and method that provide additional or clarifying information to a sparse source record to enable the associative memory system to produce relevant and highly-correlated similarity matches from the rest of the records in a data set.
Further, an associative memory system of a data driven classification or management system may typically use predictive models to provide predicted information regarding future behavior or outcomes. The predicted information is only as effective as its quality and correctness. A quality rating metric may be calculated to measure the accuracy of a given prediction.
Known associative memory troubleshooting systems and methods exist for computing a quality rating metric. One such known system and method for computing a quality rating metric includes an associative memory system using a nearest neighbor algorithm that only uses the proportion of similar or nearest neighbors as the quality rating metric. However, such known system and method for computing the quality rating metric does not measure the absolute similarity between the object to be categorized and the similar or nearest neighbors. This may be inaccurate because it does not take into consideration how similar the object to be categorized is to its similar or nearest neighbors.
Thus, it would be advantageous to have a data driven classification and troubleshooting system and method that provide an associative memory system using a nearest neighbor algorithm that calculates a quality rating metric based on an absolute similarity between an object to be categorized and its proportion of nearest neighbors.
Accordingly, there is a need in the art for a data driven classification and troubleshooting system and method that have improved accuracy and performance, are reliable and efficient, and that provide advantages over known systems and methods.