A growing number of database applications require on-line access to large volumes of data. For example, telecommunication service providers store large amounts of phone call data to satisfy the requirements of billing, call statistics, and fraud detection. This information is accessed regularly by telecommunication providers. Other examples of users of on-line data access include digital libraries and video-on-demand servers.
Despite the rapid decline in magnetic disk prices, tape storage systems continue to be far more economical than disk storage systems (disk arrays) for storing large amounts of data. A tape-based tertiary storage system is the usual solution for a very large-scale storage system, which would otherwise require a large and impractical number of disk drives to contain the entire database. Tape-based systems are also cost-efficient and provide a wide variety of speed rates.
Even though tape-based systems have many advantages, a major disadvantage of tape systems compared to disk arrays is the high access delay, known as latency, for random retrievals. A typical tertiary storage system is made of several tape drives accessing a large pool of tapes. In high-end mass storage systems this usually takes the form of large "tape silos" that contain several drives and thousands of tapes. Robotic arms locate requested tapes and insert and remove tape from drives as needed. This tape-switching operation consumes a significant amount of time. The switching and positioning time in a randomly accessed tape-based system is measured in tens of seconds or minutes.
In addition to the latency resulting from the tape switching operation, locating and reading the information stored on tapes also consumes time.
Various scheduling processes have been developed to handle incoming data-block requests in an efficient manner. A conventional scheduling process is implemented by the combination of a major rescheduler, which chooses a tape and forms a retrieval schedule (service list) for the selected tape, and an incremental scheduler, which handles requests arriving after the formation of the retrieval schedule by the major rescheduler. The incremental scheduler either schedules the newly arriving requests at the instant they arrive, or defers them to a "pending list" until the next invocation of the major rescheduler. In a typical scheduling process, all the incoming requests that have not been scheduled are sent to the pending list. Next, a desired tape is selected and a retrieval schedule is formed by selecting requests from the pending list that are located on the selected tape. Conventional methods select tapes in an order, starting with the tape after currently mounted tape and then moving in a sequential order.
Three commonly used conventional scheduling processes are the null scheduling process, the static scheduling processes, the dynamic scheduling processes. A null scheduling process is based upon the "first in first out" principle which simply services requests in their order of arrival. This process gives reduced performance for random requests because the system may have to switch tapes many times to satisfy all of the requests, wasting valuable time. These multiple switches may include switches from a first tape to a second tape, and then back to the first tape, which is very inefficient.
The static scheduling processes freeze the service list once the major rescheduler has selected a tape and formed the service list. During the execution of the service list, newly arriving requests are appended to the pending list. Once the service list for the currently selected tape has been executed, then the requests remaining on the pending list are evaluated, a new tape is selected, and the process continues until all pending requests have been executed. This results in inefficient operation since during a sweep of the tape head, every newly arriving request is deferred to the pending list, even if the request is for a block that the tape head will pass over during the sweep.
Dynamic scheduling processes use a dynamic incremental scheduler that can insert a newly arriving request into the service list at the moment it arrives and takes advantage of forward (e.g., left-to-right) and reverse (e.g., right-to-left) locate phases of a locate operation. During the forward phase, any request for a data block on the currently loaded tape is executed. If the requested data block is at or beyond the position of the tape head (e.g., to the right of the tape head) at the time the request is received, it is placed into the forward locate phase of the service list and is read when the tape head reaches the data block. If the new request is for a data block on an area of the tape where the forward-moving tape head has already passed (e.g., to the left of the tape head), it is placed into the reverse phase of the service list and is read during the reverse phase. However, the tape head will not switch back to the forward phase to go back and read a newly-requested data block that it has already passed in the reverse locate phase. Such a request is deferred to a pending list.
All tape systems expend a significant amount of time in ejecting the old tape, putting away the old tape and retrieving a new tape, and loading the new tape into the drive. For example, in a system using an Exabyte EXB-8505 XL drive and an Exabyte EXB-210 tape library, this process can take 81 seconds.
Thus, there exists a need for a system which can decrease the access latency by minimizing the number of tape switches that need to be made and by globally monitoring all requests as they arrive, adjusting the schedule as needed, and optimizing the order in which the tapes are selected for reading.