1. Field of the Invention
The present invention relates to a data analysis method and apparatus for data warehouses and databases. More particularly, the invention pertains to a data analysis method and apparatus for data mining.
2. Description of Related Art
In the current-day technological field of information processing systems, there are two different kinds of information processing implementations; On-Line Transaction Processing (OLTP) serving to handle information and data in transaction process applications, and On-Line Analytical Processing (OLAP) serving to work with information and data in analytical process applications. In the OLTP, online realtime data management processing is carried out through use of databases, e.g., the OLTP is employed for updating data in repetitive routine tasks. In the OLAP, data analytical processing is performed through use of data warehouses, e.g., the OLAP is employed for supporting decision-making in end-user computing.
In the technical literature xe2x80x9cDATA WAREHOUSExe2x80x9d (Y. Ishii, Japan Management Science Institute, 1996, pp. 232-237), the positioning of each of the OLTP and OLAP is reported as mentioned below: Conventionally, in extraction of information from a large-scale database, the OLTP system has been used to carry out such analysis as comparison between certain statistic data values and variable data values. Recently, however, since trends in the analytical needs of end users have been toward analyses of more complex and dynamic historical data, the OLTP system featuring centralized computing resources has become unsatisfactory due to difficulty in letting the end users be free to access and process desired data from anywhere whenever necessary. Therefore, at present, the OLAP system is becoming increasingly prevalent through which necessary data is extracted from a database and then transformed to meet particular requirements for individual users"" applications.
Although the OLAP system mentioned above as a known arrangement is capable of accomplishing complex analyses in a diversity of applications, only the data transformed after extraction from the database is subjected to analytical processing in most cases. Hence, there is a problem that analytical processing is not allowed while reflecting updated data in realtime, and also it is difficult to construct a highly responsive system capable of operating efficiently based on analytical results.
It is therefore an object of the present invention to overcome the abovementioned disadvantages by enabling analytical processing while reflecting updated data in realtime and utilization of analytical results in realtime.
In accomplishing this object of the present invention and according to one aspect thereof, there is provided a data analysis method, comprising the following steps of:
a) generating summary data by summarizing transaction data input to a first server online, and storing the thus generated summary data into the first server;
b) reading in summary data from a second server connected with the first server, and updating the summary data stored in the first server by joining the thus read-in summary data to the summary data stored therein;
c) generating rules using the summary data stored in the first server, and storing the thus generated rules into the first server;
d) detecting data characteristics online using the summary data and rules generated and stored in the first server; and
e) outputting results of detection of data characteristics.
In accordance with another aspect of the present invention, at step a), there is included a step of selecting a record of the input transaction data for adding only the record thus selected, and data summarization is performed upon completion of adding the thus selected record. Further, according to another aspect of the present invention, data summarization at step a) is accomplished by deriving a sum value, a maximum value, a minimum value, a mode value or a weighted sum value from the input transaction data.
Further, in accordance with another aspect of the present invention, a timing point of summary data joining at step b) is determinedon the basis of a specific condition. Further, according to another aspect of the present invention, in summary data joining at step b), only the record of the summary data read from the second server, which is updated after the previous joining is joined to the summary data stored in the first server. Further, according to another aspect of the present invention, in summary data joining at step b), non-summary data read from the second server is temporarily transformed to summary data, which is then joined to the summary data stored in the first server.
Further, in accordance with another aspect of the present invention, a timing point of rule generation at step c) is determined on the basis of a specific condition. Further, according to another aspect of the present invention, in rule generation at step c), an If-Then rule (at least one If-Then rule) is extracted to represent such factors as regularity and causal relation latent in the summary data. Further, according to another aspect of the present invention, in rule generation at step c), an association rule (at least one association rule) for attributes latent in the summary data is extracted.
Further, in accordance with another aspect of the present invention, data characteristic detection at step d) is carried out upon completion of summarization processing of data selected on the basis of a specific condition. Further, according to another aspect of the present invention, in data characteristic detection at step d), it is judged whether a record of the updated summary data satisfies a condition part of a rule generated using non-updated summary data. Further, according to another aspect of the present invention, in data characteristic detection at step d), it is judged whether a record of the updated summary data satisfies a condition part and a conclusion part of a rule generated using non-updated summary data.
Further, in accordance with another aspect of the present invention, at step e), an output of results of data characteristic detection and a destination of output are determined on the basis of the results of data characteristic detection.
The above and other objects, features and advantages of the present invention will become more apparent from the following detailed description with reference to the accompanying drawings.