The subject matter discussed in the background section should not be assumed to be prior art merely as a result of its mention in the background section. Similarly, a problem mentioned in the background section or associated with the subject matter of the background section should not be assumed to have been previously recognized in the prior art. The subject matter in the background section merely represents different approaches, which in and of themselves may also be inventions.
Business organizations typically generate, store, and analyze huge amounts of data as part of their normal business activities. Organizations that process large amounts of data typically rely on large storage resources and integrate their various databases into data warehouses through data warehousing techniques that centralize data management and retrieval tasks to maintain a central repository of all organizational data. Although such centralization of data is helpful in maximizing data access and analysis, in many organizations data resides in different locations and may be managed by different database engines. This situation is relatively common, since many enterprises actually decentralize many of their departmental operations. The creation of disparate data sources and database management systems also occurs due to merger/acquisition activities of companies with dissimilar IT (information technology) systems and databases.
The process of data mining involves analyzing data from different perspectives and summarizing it into useful information that can be used to improve performance, such as by increasing revenue and/or cutting costs. As a functional process, data mining involves of finding correlations or patterns among dozens of fields in large relational databases. These patterns themselves comprise useful information about the data, and various different data mining programs have been developed to allow users to analyze and categorize data, and summarize the relationships among the data elements.
Present data mining systems are usually proprietary and tied to specific databases. As such, analytic flows and the models generated by these flows are tightly coupled to a particular source database. In an industry or organization where different data sources and database engines are used, this tight coupling results in models that are not portable or reusable. For example in many retail industries, such as telecom or finance, churn models are useful in forecasting customer turnover and retention. Using present data miners that are specific to a particular database results in generating churn models that are applicable only to a particular business or data set. This limits the utility of the model since it is not shareable among different enterprises.
What is needed, therefore, is a data mining process that is data source-agnostic by being less data-dependent and more process oriented in order to produce analytic flow models that can be shared between organizations using different data sources and database engines.
Present data mining systems also require data to be moved out of the database environment and native data source and moved to another location in order for analytic flow operations to be performed on the data. The resulting model data is often then saved in a separate data repository using a proprietary format. This further restricts portability and also adds a great deal of processing overhead and latency in performing data mining operations. This also prevents the analytic flow operations to be executed on the entire data set if a large amount of data is involved, since the wholesale movement of data can be expensive in terms of memory and processor requirements. Such movement of data also introduces potential data security risks, as data fetch and store cycles are greatly increased during data mining operations.
What is further needed, therefore, is a data miner architecture that reduces unnecessary data movement and latency, and supports optimal data governance and data mining process standardization, to achieve fast and efficient analytic data processing.
What is yet further needed is a data miner that performs analytic flow within the database so as to leverage a database's common security, auditing and administration capabilities, in order to reduce data movement and increase data utilization, and that performs the analytic flow operations across all of the data in a data set.