Herein the phrase “data store architecture” refers to the relationship between the columns of data store tables. The information about the initial design of the architecture is usually stored in a graphic/text document and is not part of the data store itself. This document is usually written at the initial stage of designing the data store and usually it is not updated after upgrades/changes. This document becomes less and less accurate very rapidly. Using the incorrect columns/wrong operation on columns in an application will cause inaccurate or wrong results. Many applications, which are usually not developed at the same time as the initial data store architecture, use the data store. Each such application causes some changes to the data store architecture. The end result is that the original design document does not reflect accurately the actual architecture of the data store. The difference gets larger each time that another application is implemented on the data store.
Herein the word “user” is used to refer to either a person who is responsible for applications improvement, or to an automatic software application which uses information about the data store to improve performance of data store.
Herein the phrase “end-user” is used to refer to a person that asks a query and expects to get an answer.
Herein the phrase “architecture approximation” is used to refer to an analysis report which is generated after using a “Data Store Architecture Analyzer”. An architecture approximation includes a technical description of data store architecture, i.e. the relationship between objects and columns, with some useful information about data store objects e.g. percentage of object activity, level of object relative performance, e.g. relative to object's size or to best possible performance.
Successful use of a data store by users requires a complete understanding of its architecture. Many alternative representations of the same data store can be developed and used. These representations differ in semantics, symbols, and means of representing relationships. If a company's requirements are simple, the standard tools for data management satisfy all of the company's needs. However, if the company's needs become more complicated, it will need to look for more sophisticated data store management packages having more capabilities. Certain business processes are often managed using specialist data store products or applications which are specifically designed for managing and manipulating information within a specific business. Similarly, many business types such as manufacturing, publishing, insurance, etc. will have data store solutions specifically targeted at their precise needs and requirements. Data store architecture is continuously updated, reconstructed and renewed. In the course of time data store architecture becomes extremely complicated, and a lot of human effort is needed in order to even determine an approximation of the data store architecture. In an effort to provide a solution to this problem research has evolved in the direction of creating “autonomic databases”. The goal of this research is to develop self-managing databases or, more generally, self-managing data stores. In other words, the goal is to develop data stores which can be self-configuring, self-optimizing, self-protecting and self-healing. One example of this type of research is the DB2 Autonomic Computing project, also known as SMART (Self-Managing And Resource Tuning) [http://www.almaden.ibm.com/cs/projects/autonomic/].
In most situations the typical user is someone who is not involved in the data store architecture development and/or maintenance and/or data mining and works only with a part of a data store. To use the data store efficiently, the user of the data store needs to understand accurately the architecture of the data store or at least the part of the data store that he needs to use at a particular time. To automatically define the architecture of a data store, existing systems (called “analyzers”) are based on data store exploration and analysis of a dataset of users' queries. An efficient model of the data store architecture is not generated if the “analyzer” did not examine these two sources of knowledge. The problem with this approach is that the user needs to work with the data store i.e. to insert, to remove or to request data, before being able to receive an estimate of the architecture of the data store from the “analyzer”.
It is therefore a purpose of the present invention to provide a method and a system for automatic recognition of data store architecture and tracking dynamic changes and evolution in it.
It is another purpose of the present invention to provide a method and a system which can automatically generate a data store architecture approximation.
It is yet another purpose of the present invention to provide a method and a system which can generate a data store architecture approximation working only with the data store and its data and without knowledge of previously asked queries.
It is still another purpose of the present invention to provide a method and a system which can track changes and evolution in data store architecture.
Further purposes and advantages of this invention will appear as the description proceeds.