One of the most important assets of an organization is its information. One of the most common forms to keep this asset today is using a Data Warehouse. The term Data Warehouse was coined by Bill Inmon in 1990. He defined it in the following way: “A warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process”. The fundamental goals of the Data Warehouse:                To make an organization's information accessible;        To make the organization's information consistent;        To be an adaptive and resilient source of information;        To be a secure bastion that protects the organization's information;        To be the foundation for decision making.        
Data warehousing has not always delivered on the promise. There is a range of unmet challenges when dealing with data warehouse:                Identification of reporting needs by subject area and organization role;        Bridging the gap between reporting needs and technical specifications;        Meeting the expectations of the time it takes to implement        Defining an effective implementation strategy;        Measuring the impact goals, which have been too broadly defined;        Providing all the benefits promised, notably, e.g. easy end-user access to corporate data.        
These problems have been further compounded because IT departments operate under pressure driven by quarterly corporate fiscal goals. This outlook is not conducive to the long-term process of fully implementing a data warehouse, which in some cases can take 18 months to three years. End users are hindered in their acceptance of data warehousing because they do not understand how it applies to their business and everyday jobs. They're not getting what they want or what they need. The problem lies in the lack of an adequate Customer Adoption Process: understanding the end users and their real needs, providing adequate resources to support their selection process and offering the follow-through implementation necessary to provide them with what they need, when they need it.
Despite the problems, as demand for responsive Business Intelligence (BI) and Business Performance Management (BPM) grows, global enterprises are still turning to data warehouses as their preferred source of data for analysis. The principle of gathering corporate data into a single, consistent store remains perfectly valid, but as businesses are constantly changing, the practice of traditional data warehousing can prove complex, costly and prone to failure. The fundamental problem is that traditional data warehousing methodology promotes stasis of the business model, but businesses thrive on change. The difficulty of reconciling these opposites is a major contributor to the fact that four in every ten data warehouse implementations are expected to fail. Conventional data warehousing wisdom says that you should plan for a lengthy and expensive implementation, that you will need an army of skilled project managers and technicians, and that you can forget about trying to reflect the changing state of your business: a data warehouse is static data in a static model, custom-built to meet fixed user requirements. However, in order to be able to adapt intelligently and at high speed to new competitive challenges, business users need access to information that remains consistent however much their organization is changing. The cost and time overheads of re-coding a conventional data warehouse to track every change in the business are prohibitive, so reporting in such an environment will always be delayed or inaccurate, and Business Intelligence initiatives will not deliver actionable conclusions.
Companies are moving to larger data warehouses and giving access to more internal and external users. With more people accessing more data, issues like scalability and performance are driving larger data warehouse investments. There is a lot of attention being focused on BI and its promise. Delivering on this promise requires sophisticated data warehousing strategies.
When the underlying data changes, then the data warehouse structure needs to change with it in such a way as to guarantee flexibility, consistency and synchronization. Therefore a valid question for a manager faced with the problem of choosing a data warehouse system for his company is: “Will your data warehouse technology choice be flexible and easily configurable to accommodate changes in business rules, requirements and data flow?”
A data warehouse requires a considerable amount of time to fully develop. In other words, it takes a long time to gain experience with the usual problems that develop at different phases of a data warehousing effort. Despite the best efforts to architect a data warehouse so “maintenance” demands are minimized, many data warehouses by their very nature require a great deal of care and feeding once they are in “production”. It is important to note that successful data warehouse requires a lot of maintenance. Organizations that cannot or will not staff to meet these maintenance demands should think twice before they jump into the data warehousing business.
A data warehouse cannot be static. New business requirements arise. New managers and executives place unexpected demands on the data warehouse. New data sources become available. At the very least, a data warehouse needs to evolve as fast as the surrounding organization evolves. Dynamic, turbulent organizations make the data warehouse task more challenging. Given the churning, evolving nature of the data warehouse, expectations and techniques from the original idealistic static view must be adjusted. Flexible and adaptive techniques need to be designed into it.
One of the prior art solutions that attempts to provide solution to the aforementioned problems is called data warehouse appliance. This one, as its name suggests, is a preconfigured stack of hardware and software that includes an operating system, a dedicated storage platform, a relational database and a parallel processing engine. A data warehouse appliance derives its processing power from parallel architecture. These appliances have evolved to allow administrators to scale processing and data store size on demand, but not be subjected to the diminishing returns as the system is scaled out. Specifically MPP (Massive Parallel Processing) based appliances, also commonly referred to as shared nothing systems, are designed around the concept that data warehousing workloads and queries can be cleanly divided into separate independently executable and parallelized operations across a federated system. MPP systems are clusters of two or more Symmetric Multiprocessing (SMP) server nodes, where each node has its own operating system, memory and exclusive access to a partitioned data set. Queries sent to the data warehouse are de-constructed into parallel queries that are executed by individual nodes. The results of these parallelized queries are rolled up and summarized after each node completes it processing. Traditionally, MPP systems have offered near unlimited scalability, but this has come at the cost of high overall management and OPEX (Operating Expenses) cost to maintain a MPP system of significant scale. Typically, the setup for a MPP system is more complicated, requiring thought about how to partition a common data store among processors and how to assign work among the processors. Moreover, once partitioned, administrators must engage in ongoing tuning to ensure that data is redistributed and partitioned optimally across all nodes in the MPP system. This re-partitioning can be quite challenging if the data warehouse is growing rapidly and users are performing new and different types of queries periodically. As a result, the implementation and ongoing management costs of a MPP data store can run in the tens of millions of dollars. The drawback of data warehouse appliances is that they do not utilize main stream data store engines and do not integrate easily into existing environments, and often impose a rigid partitioning scheme on the data that limits the types of queries for which they deliver optimal results.
Typically, prior art systems expose limitations in queries processing since they do not adapt to changes both in queries and also to metadata and data properties changes (density, correlations etc.). These systems are not effectively “self-learning”. Today, most of the systems use calculation on the fly and do not pre-aggregate the data. If they do pre-aggregate the data, it is on a very limited scale (i.e. selectively on specific queries) due to performance issue. Different systems of the prior art use “typical” queries, i.e. pre-defined queries (template reports) which support very narrow range. When a new (not pre-defined) query arrives, the prior art systems will use on the fly aggregation to answer them. This means that in the best case, costly full data scans and calculations will be processed each time a new query arrives, even if the query repeats. In the worst case this process may be very lengthy and many times will cause the users to abort the query before an answer is provided. In order to fix the situation, the system's supervisor takes an action and solves the problem by creating an appropriate summary table to deal with a new query. This is not a satisfactory solution since summary tables which are good for today's queries stream might become useless for the queries stream of tomorrow due to the highly dynamical nature of activities of information systems.
A very good summation of the development for the expectations of users of data warehousing systems is the following remark by IBM's Marc Andrews, program director of data warehousing for Big Blue. “We would characterize BI as having three generations. The first generation was about understanding the past. The second was about analyzing why things happened and making recommendations about the future. That's better than first, but I still liken this to driving a car by looking in the rear view mirror. The new, third generation is about making information available to the people in front of the customer.”
This is a truly significant shift in the way enterprises use data warehouses. First- and second-generation systems needed to support a limited number of people who ran large, complex analytical queries.
If a system that is based on the assumptions that both the business model and reporting requirements are ever-changing were available, Enterprise leaders, seeking to improve the Return Of Investment of their management information initiatives, would no longer need to feel that BI/analysis reporting holds them back.
U.S. Pat. No. 6,438,537 describes a system which treats a query as it arrives into the system, including all specific parameters of the query. This approach may be useful in small systems with a low rate of arriving queries. Applying the method of query treatment taught in U.S. Pat. No. 6,438,537 to real world systems, in which queries arrive at the rate of tens or even hundreds a second, will result in poor performance in the best case and won't be possible in the worst case. This prior art system is able to set a parameter that indicates to the system to capture one out of every X queries. Working in this manner important information about the queries stream content may be lost and the effectiveness of the whole system workflow is reduced.
It is therefore a purpose of the current invention to provide a complementary system to an existing data store system that overcomes the deficiencies of prior art regarding BI and analysis by providing a system that is based on the assumption that both the business model and reporting requirements are ever-changing.
It is a purpose too of the current invention to enable data store users not only to obtain up to date business intelligence, but also to compare present, past and predicted performance, no matter what the business structure is at any given time.
It is yet purpose of the current invention to deliver a consistent view of the past and the present without requiring any costly changes to existing source systems.
It is a further purpose of the current invention to provide a system and method for more efficient data arrangement and queries management.
It is another purpose of the current invention to provide a system and method for automatic dynamic update of arranged data for effective execution of continuously changing queries.
Further purposes and advantages of this invention will appear as the description proceeds.