This disclosure relates to systems of multi processors, which utilize a multi-programming environment to manage file structures in a database.
It is recognized that modern computer and communication systems continuously utilize huge amounts of data. Often as a result of this, the data management and data storage operations have been seen to become significant technical issues. Multi-programming and multi-processing operations which are typical of complex systems have large data requirements which require the use of massive amounts of storage.
In such multi-processing and multi-programming type systems, the data is generally stored on a storage media, such as magnetic disk, as blocks of data. Subsequently, the data is often read from the storage media into a temporary memory such as a cache or buffer which might consist of Random Access Memory. After this, it can be accessed by user application programs. One of the chief problems involved has to do with the updating of the database structure control information and flushing the data buffers of stale information. It is desirable that each of these updates will occur utilizing the minimal or least amount of time by taking advantage of the computer systems ability to run asynchronous processes and to overlap the Input/Output operations.
The optimum situation for the taking place of Input/Output operations is that they take place while no user applications are actively accessing that particular data within the database. Copies of the data will exist both within the applications and within the database. This is done to maintain both physical and referential data integrity. It is quite consistently necessary to provide updates to the database""s buffers, and rather than maintaining control as a single process, or initiating multiple new independent processes to perform the updates, it is possible to use local environments that can access shared data and be so used to take advantage of the user application infrastructures that are already present. These types of procedures can, in effect, run xe2x80x9con topxe2x80x9d of the user programs.
Prior systems which flushed data buffers and updated the database structures operated on a relatively slow serial basis. This type of mechanism was responsible for determining which of the structures had data buffers to be flushed, then writing to the disk, then testing for Input/Output completion, then writing the control information, then restarting the other applications. Due to the fact that this was a serialized process, it was not only inefficient, but did not take advantage of the performance that is inherently possible in multi-processor technology.
The presently described system operates to eliminate the relatively slow serial process and provide a fast high-speed flushing of data buffers and operations for updating the database structure control information. The described system is completely asynchronous. Rather than simply waiting, the user tasks are utilized concurrently in time as workers and as such, user tasks participate in the process of independently writing the data buffers, then testing for I/O completions, and finally updating the structure control information.
In the presently-described system, various tasks may enter the process and engage at any particular phase or change roles at any time. These processes are xe2x80x9cFirst-In, First-Outxe2x80x9d (FIFO) in nature. As an example, the task initiating the WRITES for a given set of data buffers is not necessarily the task that assures their completion.
Further, coordination is achieved by selecting a single process to perform only those housekeeping functions that absolutely require serialization. The use of shared data is limited only to those instances where it is required to drive the process forward. Asynchrony is further assured by only restricting access, via a software data lock, to those instances where the shared variables require alteration. This mechanism limits serialization to the absolute minimum necessary to ensure integrity. The previously used systems and methods for a given database configuration, did not significantly benefit from an increase in performance beyond that which could be afforded only by the addition of a second processor. Thus, the operations did not appropriately scale in proportion to the number of processors added. However, the presently described system has the advantage of being about 30% faster for two and three processor configurations, and of continuing to provide increased throughput even as the workload increases and additional processors are added.
A plurality of processors and database engines are inter-related through a memory system to provide an asynchronous mechanism for distributing the operating workload of flushing data buffers over a multiple number (N) of worker tasks. A multiple number of database engines having access routines available to multiple numbers of user application programs are connected to different sets of data file structures in databases composed of multiple physical files. The system provides a considerable advantage by making use of the user tasks already running on multiple processors (CPUs) so that the time required to accomplish any particular one of the given tasks is minimized in several specialized ways.
Initially, all of the Write operations to various buffer units of a buffer pool are initiated concurrently which would indicate that there is a high probability that the first initiated set of Write operations will be finished even before the initiation of the last set of Write operations. Subsequently then in a similar fashion, there is the requisite testing for the correct completion of each of the I/O operations which is handled. Thus, by using the same orderly sequence as was done in initiation of the Write sequence, then the majority, if not all, of the Write operations will have been completed by the time the completion information has been examined to verify the proper Input/Output transfer to the database file structures. As a result, the probability of having to wait for any one particular operation to be completed, is extremely low.
Secondly, the operating workload for completing the various tasks is spread across the active update users of the database. This enables the multi-processor system to take advantage of its ability to perform and operate upon multiple tasks simultaneously. Another advantage is that the use of the existing application tasks does not burden the system with the overhead of initiation and the management of specialized tasks which might otherwise be required to perform these functions. The system takes full advantage of the parallelism inherent in multi-processor systems.