The present invention relates to a hierarchical data storage system that stores data in a library apparatus, caches part of the stored data in a high-speed cache device, and supplies the stored and cached data on request, more particularly to an improvement in the caching method.
This type of hierarchical data storage system is used by, for example, video-on-demand providers who supply audio and video data to paying customers. The library apparatus is similar to a jukebox, but instead of storing short musical selections, it stores compressed audio and video data on optical discs with capacities ranging, at present, from several hundred megabytes to several gigabytes each. A very large amount of data can thus be stored. To reproduce the stored data, the library apparatus typically has a plurality of optical drives, enabling it to provide output on a plurality of channels simultaneously.
The high-speed cache device, typically a magnetic hard disk drive, improves the performance of the system in several ways. Once audio and video data have been cached in the cache device, they can be supplied to users as soon as requested, without the delay (typically ten seconds or more) occasioned by the physical transport of an optical disc from its storage location to an optical drive in the library apparatus. The number of output channels can also be increased, some channels being served with data from the cache while other channels are served from the library apparatus. Furthermore, the cache device can reproduce a single cached copy of an audio-video program or `title` on several output channels at once, asynchronously, thereby reducing the need to store multiple copies of popular titles in the library apparatus, and allowing more different titles to be stored.
A conventional method of caching data employs the least-recently-used algorithm. When a request for a particular title is received, the control unit of the system first checks whether the data for the requested title are already cached. If so, the request is served from the cache device. If not, an optical disc storing the requested title is loaded into an available optical drive, and the request is served from the library apparatus. In the latter case, while being supplied to the user, the requested audio and video data are also copied into the cache, so that the next request for the same title can be served from the cache device. If the cache has free space, the data are copied into the free space. If the cache does not have free space, the cached title that has been least recently requested is deleted to make space.
If the library apparatus has multiple drives, two or more titles can be copied from the library apparatus to the cache concurrently. With the conventional caching method, however, there are problems related to the access speed and storage capacity of the cache.
The cache access speed is high, but not unlimited. Because of the limited access speed, when a certain number of titles are being copied from the library apparatus into the cache, the number of output channels that can be served from the cache is reduced by the same number.
The storage capacity of the cache is also limited, in part by cost considerations. Because of the limited cache capacity, with the conventional caching method, titles have to be copied to and deleted from the cache frequently, especially if the requests are varied and the amount of data per title is large and only a few titles can be cached at a time. These conditions are typical of actual video-on-demand systems. If the well-known MPEG-2 video compression method recommended by the Moving Picture Experts Group is used with a compressed data rate of three megabits per second (3 Mbps), for example, then two hours of video, which is a typical length per title, requires 2.7 gigabytes of storage space. High-speed cache facilities that can be provided at a reasonable cost cannot store a large number of titles of this length. As for the variety of requests, some titles are more popular than others, but different users have different preferences, and by no means are all of the requests concentrated on the few most popular titles.
FIG. 1 illustrates a hypothetical case in which the library apparatus has two drives and the cache device (a magnetic disk drive) can store only two titles at once. Initially, titles A and C are cached, title A being the least recently used. When a request for title B is received, title A is deleted from the cache and title B is stored in its place, while being reproduced by the first optical drive in the library apparatus. A short time later, title D is requested, so title D is reproduced by the second optical drive in the library apparatus and copied into the cache, replacing title C. While titles B and D are being copied into the cache, no other titles can be reproduced from the cache because none are stored in the cache.
Shortly after the caching of title D is completed, titles A and C are requested again, so they are recopied into the cache, replacing titles B and D. In this example, the caching of titles B and D has served no useful purpose. Furthermore, if titles A and C had not been deleted to make room in the cache for titles B and D, then the requests for titles A and C could have been served from the cache, leaving the two optical drives free to serve other requests.
As this example illustrates, with the conventional least-frequently-used caching method, when requests for many different titles arrive frequently and many of these titles are not already cached, the cache device is kept busy copying data from the library apparatus, many unnecessary cache replacements are performed, overall efficiency is lowered, the number of output channels that can be supported is reduced, and potential revenue is lost.