Advancements in technology have reduced the cost of computers to the point where many events in one's day are recorded by a computer. Events recorded by computer are numerous and include, for example, every transaction made by an individual. Computers store the data associated with the transactions they process resulting in very large databases of information. Also, companies and individuals frequently use computers to record events related to a specific domain. For example, a meteorologist may enter into a computer database many records of data relating to weather occurrences.
The problem, therefore, arises of how to make efficient use of the tremendous amount of information in these databases. When the number of records in a database rises to a certain level, simply sorting the information in the database provides no meaningful results. While statistical analysis of the records in a database may yield useful information, such analysis must generally be performed by persons with advanced training in math or computer science. Typically, these people are also needed to understand the results of the analyses. Additionally, translation of the statistical analysis of the information in a large database into a useful form is also difficult. For example, a strategic business activity such as marketing may require analytical information to be converted into a form specifically suited to the activity of marketing. Difficulties in providing or obtaining information in a useful form may prevent the effective use of the information in a database and preclude the use of a possibly valuable data resource.
Organizations of all types commonly collect and store business and technical data in various types of databases. Strategic and/or technical knowledge may be contained in the databases. In some instances, based on many years of experience, experts are able to glean knowledge from databases existing in their particular domain of expertise. In the absence of such experts, however, strategically useful information may not be available to the organization controlling or accessing a given database. The inability to obtain this knowledge may be detrimental to the business objectives of the organization. For example, if a business cannot extract useful knowledge from the data it possesses, it will likely be at a competitive disadvantage compared to a business that can discover such knowledge. Thus, the ability to discover knowledge from data contained in databases would be a valuable asset to any organization.
Certain tools are available which assist a nonexpert to gain some knowledge from a database. For example, some data analysis tools respond to queries input by the user. A query might be: "How many people within the database are within a certain age range." The data analysis tool looks to all the records in which an age field meets the age range requirement of the query. Then, the tool simply counts the number of records. Query tools require the user to have an extensive knowledge of the database domain and the queries generally are very rigid in their structure. Thus, query tools are very limited in their ability to enable a user to analyze data.
One improvement of query tools is the development of data mining tools. Such tools, however, still require the user to have prior knowledge of the domain of the database. Also, such tools generally require the user to input one or more hypotheses, assumptions or goals in connection with analyzing the database to determine knowledge. For instance, a retail data mining tool might be used to analyze a retail database to determine the concept: "Diapers and beer are generally purchased at the same time." This knowledge would be useful to retail executives who plan marketing strategies. However, typical data mining tools require a user to first propose one or more hypotheses in connection with the data. One hypothesis in this example might be that products are purchased together. Another hypothesis might be that something is purchased together with diapers. For instance, a user would likely have to select a first product (e.g., diapers) from many products contained in the database. Then, the user would have to make the assumption that a second product was purchased, and that it was purchased at the same time as the first product. Alternatively, the user might begin with the first product and then ask the database how often the second product was purchased at the same time. Each of these assumptions requires that the user of the data mining tool have prior knowledge of the retail domain and of the particular database being analyzed.
Data mining can also be performed based on goals. In connection with the previous example, a goal that would be input by a user might be: "Improve sales of beer." With that goal "in mind" the data mining tool might respond by offering: "Position the beer adjacent the diapers." The development of goals, however, also requires prior knowledge of the domain and the database, and the formation of intelligent input by the user. Thus, known data analysis tools cannot autonomously discover knowledge within a database.
The aforementioned problems are not intended to be exhaustive. They are merely examples. Those having ordinary skill in the data analysis art will appreciate that there are other problems associated with known data analysis tools.