1. Field of the Invention
The invention described herein relates to dataset access and processing, and in particular relates to the organization of networked servers in support of dataset processing.
2. Background Art
It is a common requirement for a user to have to access a large body of information in some organized manner. Typically, the user acts through a graphical user interface and a client machine or process, to access a server. The server may retrieve data from a database and build a report that is then forwarded to the client. In order for the server to accomplish this, the server needs to have or obtain all the data needed to create the report. For large datasets, the server needs to have significant storage and processing capability. For large reports and large datasets, this can be a problem. Generally, such an architecture functions smoothly for smaller information retrieval problems. Scalability to larger datasets, however, is problematic. In particular, memory requirements and processing throughput can strain the operation of such an architecture, particularly if the user requires frequent or near real time updates.
One way of dealing with the scalability problem is to have multiple servers at the disposal of a given client. While this would seem to multiply the capabilities of such a system, the fact remains that for any given requested report, the accessed server still needs to have all the data required for that report. Therefore, scalability problems remain. The same issues of throughput, responsiveness, and storage capacity persist. Moreover, such an architecture can be inefficient. In some situations the same data may have to be loaded on multiple servers. Consider, for example, the problem of a user trying to create a report dealing with financial investment information. If a database contains information relating to four investment portfolios, different reports may require different portfolios. If we call the available portfolios 1, 2, 3, and 4, assume that a first report A requires portfolios 1, 2, and 3. Say that a second report B requires portfolios 2, 3, and 4. A first server may be tasked with creating report A. In this case, the server would require portfolios 1 through 3. If a second user requires report B, then a second server would require portfolios 2 through 4. In this case, portfolios 2 and 3 are loaded twice. Portfolios 2 and 3 will be loaded on the first server, as well as on the second server. This represents an inefficiency. The information relating to portfolios 2 and 3 is necessarily loaded onto both the servers. Therefore, even though multiple servers may be available for a community of users, problems remain as to inefficiency, scalability, and throughput.
What is needed, therefore, is a flexible architecture that allows fast and responsive processing of large datasets. Such an architecture should minimize redundancy and inefficiency. Moreover, such an architecture should be scalable, so that larger communities of users and larger datasets may be accommodated.
Further embodiments, features, and advantages of the present invention, as well as the operation of the various embodiments of the present invention, are described below with reference to the accompanying drawings.