This disclosure relates to multi-processing and multi-programming environments which require the flushing of data buffers and updating of the database structure control information.
It is recognized that modern computer and communication systems continuously utilize huge mounts of data. Often as a result of this, the data management and data storage operations have been seen to become significant technical issues. Multi-programing and multi-processing operations which are typical of complex systems have large data requirements which require the use of massive amounts of storage.
In such multi-processing and multi-programing type systems, the data is generally stored on a storage media, such as magnetic disk, as blocks of data. Subsequently, the data is often read from the storage media into a temporary memory such as a cache or buffer which might consist of Random Access Memory. After this, it can be accessed by user application programs. One of the chief problems involved has to do with the updating of the database structure control information and flushing the data buffers of stale information. It is desirable that each of these updates will occur utilizing the minimal or least amount of time by taking advantage of the computer systems ability to run asynchronous processes and to overlap the Input/Output operations.
The optimum situation for the taking place of Input/Output operations is that they take place while no user applications are actively accessing that particular data within the database. Copies of the data will exist both within the applications and within the database. This is done to maintain both physical and referential data integrity. It is quite consistently necessary to provide updates to the database""s buffers, and rather than maintaining control as a single process, or initiating multiple new independent processes to perform the updates, it is possible to use local environments that can access shared data and be so used to take advantage of the user application infrastructures that are already present. These types of procedures can, in effect, run xe2x80x9con topxe2x80x9d of the user programs.
Prior systems which flushed data buffers and updated the database structures operated on a relatively slow serial basis. This type of mechanism was responsible for determining which of the structures had data buffers to be flushed, then writing to the disk, then testing for Input/Output completion, then writing the control information, then restarting the other applications. Due to the fact that this was a serialized process, it was not only inefficient, but did not take advantage of the performance that is inherently possible in multi-processor technology.
The presently described method operates to eliminate the relatively slow serial process and provide a fast high-speed flushing of data buffers and operations for updating the database structure control information. The described method and system is completely asynchronous. Rather than simply waiting, the user tasks are utilized concurrently in time as workers and as such, user tasks participate in the process of independently writing the data buffers, then testing for I/O completions, and finally updating the structure control information.
In the presently-described method, various tasks may enter the process and engage at any particular phase or change roles at any time. These processes are xe2x80x9cFirst-In, First-Outxe2x80x9d (FIFO) in nature. As an example, the task initiating the WRITES for a given set of data buffers is not necessarily the task that assures their completion.
Further, coordination is achieved by selecting a single process to perform only those housekeeping functions that absolutely require serialization. The use of shared data is limited only to those instances where it is required to drive the process forward. Asynchrony is further assured by only restricting access, via a software data lock, to those instances where the shared variables require alteration. This mechanism limits serialization to the absolute minimum necessary to ensure integrity. The previously used systems and methods for a given database configuration, did not significantly benefit from an increase in performance beyond that which could be afforded only by the addition of a second processor. Thus, the operations did not appropriately scale in proportion to the number of processors added. However, the presently described system and method has the advantage of being about 30% faster for two and three processor configurations, and of continuing to provide increased throughput even as the workload increases and additional processors are added.
The present method describes an asynchronous mechanism for distributing the operating workload of flushing data buffers over xe2x80x9cNxe2x80x9d number of processes. One distinct advantage involved is that of making use of the user tasks already running on the computer system, so that the time required to accomplish a given task is minimized in two specialized ways.
Firstly, all of the write operations are initiated: this indicates that there is a high probability that the first set of Write operations will have finished even before the last set of Write operations have been initiated. Then, in a similar fashion, the requisite testing for the correct completion of each of these operations is handled. That is to say, by using the same orderly sequence as in the initiation sequence then, the majority, if not all, of the Writes will have completed by the time the completion information has been examined. As a result the probability of having to wait for some operation to complete, is extremely low.
Secondly, the workload is spread across the active update users of the database. This allows the multi-processor system to also take advantage of its ability to perform multiple tasks simultaneously. A concomitant advantage is that the use of the existing application tasks does not burden the system with the overhead of initiation, and the management thereof with specialized tasks to perform these functions. Thus, the method takes full advantage of the parallelism inherent in multi-processor systems.