1. Field of the Invention
This application is related to the field of data mining and text mining.
2. Description of the Prior Art
Data mining is a process for selecting, analyzing and modeling data generally found in structured forms and having determinative content. Text mining is a complementary process for modeling data generally found in unstructured forms and having contextual content. Combined systems treat both types of mining employing hybrid solutions for better-informed decisions. More particularly, document clustering and classification techniques based on these types of mining can provide an overview or identify a set of documents based upon criteria that amplifies or detects patterns within the content of a representative sample of documents.
One of the most serious problems in today's digital economy concerns the increasing volume of electronic media, (including but not limited to: documents databases, emails, letters, files, etc.) many containing non-structured data. Non-structured data presents a semantic problem in identifying meaning and relationships especially where a group of documents contain a certain class of information expressed in non-equivalent ways. A claim, whether financial, legal, fraud or insurance claim file is but one of several types of files that contain information in a certain class, but that may be expressed in non-equivalent ways. Assessing insurance subrogation claims manually requires a significant expenditure of time and labor. At least part of the inefficiency stems from the volume of documents, the use of non-standard reporting systems, the use of unstructured data in describing information about the claim, and the use of varying terms to describe like events.
Improving the automated assessment of claims through improved recognition of the meaning in the unstructured data is targeting the use of conventional search engine technology (e.g. using parameterized Boolean query logic operations such as “AND,” “OR” and “NOT”). In some instances a user can train an expert system to use Boolean logical operators to recognize key word patterns or combinations. However, these technologies have proved inadequate for sorting insurance and fraud claims into classes of collection potential when used solely. Other approaches have developed ontologies and taxonomies that assist in recognizing patterns in language. The World Wide Web Consortium (W3C) Web Ontology Language OWL is an example of a semantic markup language for publishing and sharing ontologies on the World Wide Web. These approaches develop structured informal descriptions of language constructs and serve users who need to construct ontologies. Therefore, a need exists for an automated process for screening commercial files for purposes of classifying them into a range of outcomes utilizing a wide range of techniques to improve commercial viability.