1. Field of the Invention
The present invention relates to data warehousing and decision support systems. More particularly, the invention relates to a system and method of operation to support data query and retrieval in a data warehousing environment including a plurality of data providing computers accessible via a suitable communication network, wherein the information appears as though it may be emanating from a single (logical) data source.
2. Background and Objects of the Invention
The advent of mainframe computing provided the ability to manage large amounts of data in a manner that had not been previously possible. The advantages of managing such volumes of data by computer was obvious, especially to government and large corporate entities. As can be seen in FIG. 1A, the architecture of such prior art systems, such as system 10a, initially provided a user accessing the system with a terminal 12 that was coupled to the mainframe 20 by way of a cable 12a. The cable 12a establishes what may be termed a `communication channel`. These systems generally provided very simple ASCII-type user interfaces, wherein data was presented in tables (or lists) on monochrome screens. The early terminal interfaces enabled a user to enter, view, and modify the stored data, which could later be printed for hardcopy records. Importantly, even with the significant expense in deploying and maintaining these systems, they provided a cost-effective means to store, maintain, and manipulate very large amounts of data in a manner never before possible.
Soon thereafter, the advent of the mini-computer resulted in the introduction of reduced cost low-end database systems. Now it was possible for relatively small and medium sized companies to enjoyed the benefits of computer based data handling and management. Again, the interface screens employed by users of such systems were rather mundane by today's high resolution graphical user interface (GUI) standards. However, the availability of mini-computers also made it possible to have a plurality of such systems employed within an organization. For example, different departments or geographical locations within a single organization could employ one or more such systems.
Eventually, the advent of server based systems resulted from a merging of low-cost microprocessor based technology, and complex network and multi-user operating systems. These system architectures were, of course, driven by the introduction and proliferation of the personal computer or PC. As the computing power of these personal computer systems rapidly increased, and as prices continued to drop, the ubiquitous PC lead to the wide scale deployment of LAN based client/server computing environments. These systems provided advantages with regard to scalability and redundancy not previously available (especially when considering cost). Another significant advantage of these systems was their ability to run a variety of business application programs, which evolved and steadily improved in power and ease-of-use. The cost of these application programs was also quite low when compared to the cost of mainframe and mini-computer software. The PC based applications where generally simple and intuitive to use, and provided a `standard interface` (GUI) having excellent graphical capabilities. Once a user understood the basics of the interface, new applications, or updated versions of older applications (having additional feature sets), were easy to learn and use.
However, the significant investment in mainframe and mini-computer based database systems, termed by some as `legacy` database systems, combined with their stability and relative reliability, lead to a desire by some to mix legacy systems with newer server based data storage systems, which are often referred to as data warehousing systems. As depicted in FIG. 1B, when desiring to maintain older or original data providing systems, the result was the birth of hybrid systems. Whether hybrid (consisting of a plurality of mixed physical servers) or homogeneous (consisting of a plurality of the same or equivalent physical servers) these systems may require the analysis of information stored therein, and may accordingly be termed `decision support systems` or DSS environments. An illustrative DSS 10b, as shown in FIG. 1B, includes clients such as client 14a, 14b, etc., with the ability to access data stored in one or more data marts 20a, data warehouses 20b, and/or legacy data-base systems 20c. It is important to note that the DSS environment may be formed by a mixed or hybrid arrangement (as illustrated), or a homogenous arrangement, say consisting only of data warehouses 20b. In each instance, however, a number of problems resulted from the desire to establish and maintain such DSS architectures. For example, certain database systems may employ accessing formats (or query languages) that differed from later PC and client/server based data warehouse query formats. This results in a need for query translation and reformatting. Also, as data within an organization could be located on any one of a number of data providing physical servers, there was a need to determine where the specific data being requested was actually located. This need to locate data, which has been exacerbated by the introduction of Web and enterprise resource planning (ERP) applications (which are typically distributed throughout an enterprise or organization), may be termed `query routing`. That is, query requests are steered or routed to a particular database, data warehouse, or data mart, wherein the data required is located. Query routing generally works best when based on cost and business rules.
Yet, other problems existed in these distributed DSS architectures, such as the need to manage redundancy, replication, and load-balancing, as well as provide security to confidential or sensitive information. In short, the DSS landscape was complex and full of pitfalls! The recent explosion in the use of the Internet as a data distributing medium certainly further complicated the situation.
One solution which was initially embraced to solve the above listed problems and concerns is illustrated in FIG. 2. The arrangement therein employed a middle-tier server 22, which was interposed between the client PCs, such as 16aa and 16bb, and the data sources, such as data mart 20a, data warehouse 20b, and or legacy data bases 20c. In theory, all the above listed problems, including translation, query routing, security, etc., would simply be handled by the middle-tier server 22 employing suitable `middle-ware` software. A request for data would be passed to the middle-tier server 22, translated if necessary, and passed to the appropriate system to satisfy the query. The queried data would then be passed back to the requesting application program (executing on a respective client PC 16) via the middle-tier server. As one skilled in the art will recognize, the architecture of FIG. 2 clearly has several major drawbacks. First, the middle-tier server 22 is clearly a bottleneck for the flow of information. Not only do the queries flow through the middle-tier server 22, all requested (queried) data also passed through it. If very large volumes of data are requested by a client PC 16, the utilization demands placed on such a (middle-tier) server increases quickly. If a large number of users are to be serviced by the arrangement of FIG. 2, the middle-ware server may need to be a very high throughput, high-cost system. Accordingly, the middle-tier server 22 may require more computing power than the actual data mart 20a or data warehouse 20b systems! In addition, if the middle-tier server 22 were to fail and go down, the entire system would generally be rendered inoperable with regard to delivering queried data to the client PCs 16. Skilled individuals will also appreciate the difficulty in scaling the arrangement of FIG. 2, wherein a large number of client PCs must be added over a period of time. Eventually, regardless of the actual middle-tier server computing power available, scalability of the arrangement of FIG. 2 becomes a significant issue.
Therefore, when considering prior art DSS system architectures, there is a need for more fault tolerant, scalable, and secure arrangements to overcome the above stated problems. Objects of the present invention are, therefore, to provide new and improved decision support and data warehousing system methods and architectures to support the rapid access to large amounts of distributed data having one or more of the following capabilities, features, characteristics, and or advantages:
employs an architecture not requiring a bottleneck producing middle-tier server configuration; PA1 significantly improved system throughput over many presently available DSS architectures; PA1 incorporates a `logical server`, which may be used to implement resource management and dynamically balance the loading of a plurality of physical data providing sources including data marts, data warehouses, legacy databases, Web based information servers, etc. PA1 supports query routing that may be dynamically altered in a manner that is transparent to the system users (i.e., clients PCs and client applications) PA1 a scalable and dynamically alterable architecture incorporating redundancy, load balancing, and security; PA1 enables queried data to be directly transmitted from a data providing source computer, such as a data warehouse or a data mart, to the requesting client computer by way of virtually any available means to support networking and communication; PA1 enables easy management and reconfiguration, possibly from a single location; PA1 supports automatic and transparent client driver software updates, as needed or determined based on client generated query requests; and PA1 simple relatively low cost architecture employing embodiments based on many off-the shelf hardware components.
The above listed objects, advantages, and associated novel features of the present invention, as well as others, will become more apparent with a careful review of the description and figures provided herein. Attention is called to the fact, however, that the drawings and the associated description are illustrative and exemplary only, and variations are certainly possible.