The application relates to memory storage systems for digital computers and, more particularly, to disk drive access control apparatus for connection between a host computer and a plurality of disk drives to provide an asynchronously operating, high-speed, high-capacity, fault-tolerant, error-correcting storage system to receive read and write requests from the host computer, read and write data from and to the plurality of disk drives, and transfer data to and from the host computer, comprising, a plurality of disk drive controller channels connected to respective ones of the plurality of disk drives and controlling transfers of data to and from an associated one of the plurality of disk drives in response to received high level commands, each of the plurality of disk drive controller channels including a cache/buffer memory and a micro-processor unit for controlling the transfers of data; an interface and driver unit interfacing with the host computer; a central cache memory; cache memory control logic controlling transfers of data from the cache/buffer memory of the plurality of disk drive controller channels to the cache memory and from the cache memory to the cache/buffer memory of the plurality of disk drive controller channels and from the cache memory to the host computer through the interface and driver unit; a central processing unit managing the use of the cache memory by requesting data transfers only with respect to ones of the plurality of disk drives where data associated therewith is not presently in the cache memory and by sending high level commands to the plurality of disk drive controller channels to effect data transfers thereby; a first (data) bus interconnecting the plurality of disk drive controller cache/buffer memories, the interface and driver unit, and the cache memory for the transfer of information therebetween; and, a second (information and commands) bus interconnecting the plurality of disk drive controller channels, the interface and driver unit, the cache memory control logic, and the central processing unit for the transfer of control and information therebetween.
As systems employing digital computers have evolved, so have their requirements for storage systems associated therewith. Early computer systems typically had a drum or disk storage device for rapid access to required files which could not be maintained in the random-access main memory. If large quantities of data were being processed that could not fit on the drum or disk, the data were stored on removable magnetic tapes reels that could be mounted (upon request from the computer program) on a tape transport mechanism and read into the computer. Such systems were an improvement over non-computer approaches for applications requiring only occasional access to individual files such as in banking, bookkeeping, payroll, insurance, and similar undertakings. As many applications became more computationally intensive and systems became more complex, the limitations and requirements of the storage systems needed to change. In most instances, however, storage technology has not kept up with the needs of the computer systems. In the original digital computers, a program and its data was loaded, executed, and printed out the answers. The programs involved were straight forward and relatively easy to program. If a program contained an error (i.e. a "bug"), it simply failed to produce an executable program when compiled or assembled and failed to produce an answer when ultimately executed. In a computer system where multiple computers are linked together and run multiple programs on multiple priority levels under the control of an interrupt structure, the problem is not so easy. A program improperly programmed in so-called "re-entrant" coding may run for days or even weeks before the right set of circumstances cause an error. Not only that, such systems are designed to run continuously and only produce results in response to some external stimulus. A bug in such a system may actually be fatal to the system and cause the entire computer to stop executing its programs. If the system in question is an air traffic control system, for example, the term "fatal" bug could be true both literally and figuratively.
If there is a problem with the disk drive system, it should be transparent to the users and the disk drive system should continue to operate despite errors. Moreover, it should correct the errors so as to provide reconstructed, error-free data. Likewise, if the disk drive system is unable to contain all the data files, the required data should be retrieved automatically from a near line archival storage system in as short a time as possible. Preferably, the various functions should be under the control of an artificial intelligence type of system so that needs of the users can be learned and anticipated so as to optimize the system's performance capabilities.
A given system's overall performance capability is the result of the integration of a number of independent technologies whose individual growth in performance/capability over time differs markedly from each other. FIG. 1(a) to (g) shows the historical performance of computer systems' underlying technologies. FIG. 1a shows the exponential growth in semiconductor component performance. The factors behind this are well known and include advances in process technology that increase circuit density and speed. Shrinking geometries and increased wafer yields combined with circuit design innovations mean semiconductor performance should continue its exponential growth. FIG. 1b reflects the exponential growth in CPU hardware performance as measured in MIPS. CPU's are the direct beneficiaries of semiconductors as well as circuit design improvements and architectural innovations such as massively parallel CPU's. FIG. 1c shows the trend in performance for operating systems. OS performance is being flattened by several factors such as the additions of user interfaces, graphics support, and the sheer growth in size over the years, which has made OS's one of the most voracious consumers of computer resources. FIG. id shows the capacity/performance improvements of disk drives. This curve could be best described as "leap-linear." Disk drive device performance and capacities tend to grow linearly until a new technological event occurs. Such events in the past have been the introduction of sealed disk Winchester technology in the early 70s, the introduction of thin film heads and plated media in the 80s, and, in the 90s, the general introduction of 5400 and 7200 RPM drives to cut latency delays and contact recording technology that may push track densities to 100,000 per inch. FIG. 1e demonstrates the performance of I/O systems. This curve represents the composite effect of CPU and controller hardware, operating systems, and disk drives. It should be noted that the exponential growth in semiconductor performance has not been reflected in I/O system performance, which has seen near linear growth. FIG. 1f reflects a commonly understood phenomena with applications programs that, over time, additions, changes, and maintenance to the program tend to lead to a decrease in its performance. FIG. 1g demonstrates that, over all systems performance has showed continued improvement, but at a much slower rate than its underlying technologies In fact, without hardware upgrades and improvements over all system performance declines in response to the performance of operating systems and application programs.
FIG. 1(i) shows a typical prior art computer system employing disk drives for storage. The host computer 10 (i.e. the one interfacing with the computer operators) includes an operating system 12. As known to those skilled in the art, the operating system is a set of computer programs that run continuously while the computer has its power on. The operating system controls all the functions of the computer including requests for operating portions of the memory, error response, and input/output (I/O) requests. The computer 10 has a disk controller 14 connected thereto and the disk controller 14, in turn, is connected to four disk drives 16. In use, an applications program (not shown) makes a request for data from the operating system 12. The location of the data is completely transparent to the applications program; that is, the applications program has no idea where the data is physically located. At system setup time (or possibly subsequently through operator input), the locations of the data is stored in tables (not shown) which are part of or accessible by the operating system 12. Knowing from the tables that the requested data is on a particular disk drive 16 at a particular track between starting and ending sectors, the operating system 12 outputs a disk read request on line 18 to the disk controller 14. The disk controller 14, in turn, then issues a read request to the appropriate disk drive 16 on its connecting line 20 which causes the read head (not shown) within the disk drive 16 to move to the designated track and then read data and output it to the disk controller 14 on the line 20 from the starting sector to the ending sector. When the data has been received by the disk controller 14 (into an appropriate cache/buffer memory, the operating system 12 is informed by an appropriate signal on line 18.
As can be appreciated, if one wants the operating system 12 to do more, the programming of the operating system 12 must get more complex. Given the present state of complexity of the typical operating system and the capabilities of the average systems, computer programmer with respect to such esoteric matters as re-entrant coding and "run anywhere" coding, to ask the operating system to do more is to ask for trouble because of information handling bottle necks.
There is also the problem of system overhead. If you ask the operating system to do more, it will add to the overhead of the operating system and, therefore, to the overhead of every program which accesses it.
For any given OS and computer system, implementation of any real time function will cause the OS to consume a large portion of the computing resource, rapidly degrade the performance of the system from the user's perspective, and severely limit the work product computing potential.
As those skilled in the art will also readily recognize and appreciate, even if the penalty of added overhead is made to the operating system so as to achieve the convenience in other areas, such an approach includes no means of ever reducing the added overhead.
File Maintenance, Management, and Archival Copy (FMMAC) are tasks essential to the reliability, useability, and integrity of computer stored data. These tasks are now performed by Operating Systems functions, separately run applications programs, operator or system manager manual intervention, or a combination of these techniques.
These FMMAC tasks almost always require a manual operator decision to initiate and complete. Often they require the computer system to be taken offline and therefore not available to users during the time it takes to complete these tasks. Some larger, sophisticated Operating Systems allow a designated File to be taken offline leaving most of the computer resource available. However, manual intervention is still required to initiate file maintenance and archival copy.
Because these crucial FMMAC tasks rely on manual intervention, arbitrary circumstances and schedules, the predictability of these tasks being performed is low. This is especially true outside of centralized "Mainframe" Computer centers (FMMAC tasks are typically performed here by a dedicated maintenance shift at great additional operating expense). However, most computers (and by extension, most computer stored data) are not located inside "computer centers" or have the benefit of dedicated file maintenance staffs. Therefore the reliability, usability, and integrity of most computer stored data now rests on human nature and motivation and the dubious assumption that the circumstances surrounding the computer system itself are immune from intervening events such as device failures or rush jobs that take priority over FMMAC tasks.
Continuous duty computer systems such as Real Time monitoring and control systems or Online Transaction Processing systems present additional barriers to FMMAC tasks. In theory, no time is available to perform the tasks. In reality, such systems or files are simply shut off for FMMAC. In "Critical Mission" applications (for example Nuclear Power Plant Monitoring and Control) the FMMAC problem is often abated by duplicating hardware systems at great economic cost.
In the above-referenced parent application of which this is a continuation-in-part, a high-speed, high-capacity, fault-tolerant, error-correcting storage system was disclosed which provides a solution for many of the above-described needs of modern computer systems (both military and commercial). Since its original filing which contained the best mode as contemplated at that time, continued work has developed new embodiments which include novel improvements over the teachings contained therein. In particular, the performance and capacity potential has been significantly increased as has its flexibility as a system component. Moreover, additional improvements to the overall storage system have been incorporated therein.
Wherefore, it is an object of this application to provide significant and patentably distinct improvements to the high-speed, high-capacity, fault-tolerant, error-correcting storage system of the above-referenced parent application of which this is a continuation-in-part.
It is another object of this application to provide significant and patentably distinct improvements to high-speed, high-capacity, storage systems for digital computers in general.
It is yet another object of this present invention to provide a means to maintain, manage, and archive copy files on an automatic and User transparent basis and to provide high predictability for the reliability, usability, and integrity of computer stored data.
Computer stored Files are generally classified by their application use such as "payroll files" or "inventory files" much in the same way that a typical office filing cabinet and the folders inside would be labeled. Those skilled in the art will recognize that files can also be classified by their usage in time. That is to say that some files are used more often than others and can be reliably classified on that basis:
______________________________________ CLASSIFICATION USAGE IN TIME ______________________________________ Continuous: A file that is heavily accessed. An example would be a directory or VTOC file. Accesses could be seconds, minutes, or even hours apart. Systematic: A file opened & accessed as a consequence of other files being used. Periodic: A file that is used on a predictable schedule such as daily, weekly, monthly, 1st of month, 15th of month, etc. Occasional: A file whose usage in time cannot be predicted or classified by the above criteria. Transient: A file created and deleted in a short time interval. A scratch file not intended for any future use. ______________________________________