(1) Field of the Invention
This invention relates to records processing and more particularly to processing records held in the memory files of a digital computer.
(2) Description of the Prior Art
Businessmen and establishments in particular have a very active need to keep detailed records of customers, suppliers, parts, equipment and literally myriads of other matters and items. Such records in the past were kept in card files and the like where they could be inspected one at a time in sequence or in some other predetermined order such as alphabetical order, numerical order and the like. Specific items of information could be arranged on certain portions of such cards to facilitate an orderly search through a large collection or batch of cards in sequence to locate a specific bit or item of information. In later systems, the file data was entered upon punched cards by means of an appropriate punch code. The appropriate cards containing certain information could then be found by sorting through such punched cards with appropriate means which in one way or another would detect the holes or openings in the cards and remove or segregate the cards having the appropriate holes. It obviously would take a significant time to sort through stacks of cards even when done by automatic mechanical or optical-mechanical sorting means.
More recently, with the advent of digital electronic computers, data has been filed either in the memory of the central processing unit (CPU) or in peripheral or mass memory devices such as magnetic recording tapes, disks and the like. A great number of digital computer programs have been developed to handle such masses or files of data in digital computer apparatus. While such computer programs have been written and used in almost all the basic computer languages such as, for example, BASIC itself, FORTRAN, COBOL and other higher level digital computer languages, most have not been designed especially for the efficient handling of file data and consequently special computer languages have been developed especially adapted for the handling of large amounts of file-type data. One widely used example is so called dBASE and its elaborations dBASE II, dBASE III, dBASE III Plus, etc., which have been widely marketed by the Ashton-Tate Company especially for handling masses of file data expeditiously and efficiently. Such programs can be referred to as database management programs which allow the user to manipulate the information to obtain reports and printouts with various desired combinations of the data. While every higher level programming language has some method or other for filing information and later retrieving it, database management languages such as the dBASE languages are particularly designed to make such task easy and efficient.
Even with the great speed of computers, however, and the use of special languages developed for expeditious database management, when there are large masses of data to be sorted through, or processed, which processing constitutes the pre-eminent function of a database management system, a great deal of time may be expended in aggregate in sorting or processing through such data. For example, even if one record of a database can be examined for pertinent data in a fraction of a second, the consecutive examination of thousands of records can take a number of hours. Since many large business databases, therefore, may contain thousands and tens of thousands of records, it is not unusual for one complete examination of such databases to take upwards of five or six hours or more to complete. Such lengthy periods are expensive both in machine or computer time and in operator time, particularly as the operator usually has little to do during the sorting. The CPU is also largely unavailable during such periods for other tasks and in fact, full scale sorting is frequently conducted during off-hours to avoid, as much as possible, interference with other tasks. Normally the individual records of the database are transferred from storage memory devices to the working memory of the computer one at a time and processed for data whereupon the record is transferred back to the storage means for the working memory and the next record is retrieved for processing.
The previously available programs provide rather intricate systems for the user to access the information in the system. Such system, or "user interface", can take the form of a menu, a question and answer dialogue, various displays and the like. The true value, however, is always basically how fast the system can process the data contained and output such data in a desired usable form. Programs such as the Ashton-Tate dBASE programs offer program language through which a skilled programmer can create, to a limited extent, his or her own interface. However, the actual, processing of data through the program may not be very fast and in fact, it is not infrequent with modern computers that only a limited use is made by the program of the computer's actual capabilities.
The dBASE language and most other database programming packages use random access files to store their information. The random access technique allows the user, via the user interface, to recall a single record at random from among thousands without waiting very long. In contrast, with sequential file access, which is basically like a magnetic tape, regardless of its actual memory storage mode, it is very hard to go to a specific part of the records on the tape without rewinding the tape to the part you want. Random access allows speedy access to all records irrespective of the location of the record in the file. Sequential database management systems usually read the entire file possibly composed of thousands of records, into memory. Once in memory, the file is processed and the database package then rewrites the entire file. This technique wastes time if you want only to access one record and also limits database size to that of the available computer working memory. To change one record on such a system could often take hours.
Random file or data access has the advantage of speedy access to individual records, low memory requirements, and ease in maintaining the database. But, since the system is built around accessing only one record at a time, performing operations on many records wastes considerable time and with modern high capacity computers is a waste of the computer's capabilities.
The operating system (OS) of a computer is the set of more or less standard routines, more or less built into the computer and accessible to the operator to the programmer for performing certain functions. These routines provide the programmer a more or less standard environment in which to work so that the program can work or be used in all computers that use the same operating system or OS even though different hardware, storage devices, display devices and the like may be used. Each OS, however, has what is sometimes referred to as "overhead". This is essentially time taken for each operation. If an operation has to be performed a number of times, the overhead for this particular operation of the operating system is a multiple of the single operation time. Random access operating systems have a relatively high overhead for random access which is particularly detrimental in database management systems or programs where large amounts of data are stored and processed in the form of a large number of individual records. Such overhead is responsible, in large part, for the long periods frequently required even by the most modern computers to process records using the popular database management systems.
There has been a need, therefore, for faster sorting or processing of databases in database management systems in general, and, in particular, in the popular database systems such as dBASE and other higher level database languages.