The quantity of information that is available in electronic form continues to grow at a rapid pace. Managing the electronic information in a manner which permits responsive distribution to users is a key design consideration in many information system architectures. This design consideration is counterweighted by an underlying requirement that the information be supplied in a cost effective manner.
There remains a challenge to provide cost-effective high-performance access to a very large quantity of information. It would be ideal if every piece of information was immediately available to each requesting user. However, this ideal is impractical. Storage costs, access costs, transaction fees, data management costs, and so on, render it implausible (if not impossible) to offer every electronic piece of data in a timely fashion.
Additionally, due to the cost of maintaining and distributing electronic information, it is imprudent to make each piece of information as equally available as the next. Generally, demand for different items of information varies significantly. This demand can change rapidly as user preferences shift. For instance, the demand for information on a hot new movie might be very high during its initial release, and then decrease dramatically in the following few weeks. As another example, market-related information is in high demand during market hours between 9:00 AM and 4:00 PM (EST), and in low demand during the other hours of the day.
The challenge of providing cost-effective high-performance access to large pools of information is particularly relevant to public networks, such as the Internet. Public networks consist of a large number of cites interconnected by a data communications network (e.g., the phone lines, cable lines, satellite, etc.). Each cite provides various quantities of resources. A university cite typically permits access to a vast amount of resources whereas a regional cite servicing rural areas might offer only limited resources. Yet, with a public network, all of these resources are made available to a user, regardless of where the resources are physically located or where the user resides.
For example, a Seattle-based user desiring information on giant pandas might use the public network to access resources maintained by the Smithsonian Institute and National Zoo in Washington, D.C. This action typically involves the user accessing the public network through a local Seattle-based network cite (e.g., an Internet Service Provider), which then traffics communication over the public network to the Smithsonian Institute. The same user might also be interested in information on the Seattle Mariners baseball team, which can be accessed locally from the Seattle-based network cite.
Access to the resources on the public network has associated costs, which may vary depending upon location and access frequency of the resources. From the perspective of the operator of the Seattle-based network cite, for example, there is a cost associated with making the resources of the Smithsonian Institute and the Seattle Mariners available to the user. These costs include charges for using the public network, number of connections allocated to users at the local cite, management fees, storage expenses for local resources, and so on.
It is therefore desirable to design a network system which meets user demand for resources by making the resources available within acceptable time frames, while satisfying the cite operators' desire to contain costs.
In database theory, there is a concept known as hierarchy storage management (or HSM) which contemplates shifting data among various storage devices of different performance levels as related to data availability to a user. The HSM system might include, for example, cache memory, RAM, disk drives, CD ROM carousel, and tape back-up. These various storage devices range from high performance/high cost (e.g., cache memory) to low performance/low cost (i.e., tape back-up). These HSM systems typically shift data among the storage hierarchy according to criteria such as access frequency wherein more frequently used data is stored on higher performance/higher cost devices, or access recency wherein more recently used data is stored on higher performance/higher cost devices. HSM systems do not, however, make any value determination that is useful to an operator who provides the data. Moreover, the HSM systems are not conveniently scaleable to large network systems, such as the Internet, where vast pools of information are widely dispersed among many cites.