The present invention relates to a system for controlling the storing of data in a storage of an electronic computer system, and more particularly to a data storing control system in which data are stored and read in by a program.
In a system for storing files blocks and databases in a storage for example in a nonvolatile storage such as a magnetic disk, each of the files blocks and the databases is composed as a set of records, and the record is composed as a set of data items. The program requires inputs and outputs in units of record. Input and output processes between the magnetic disk and a main memory device are performed in units of page or block (hereinafter called page) as a set of records.
FIG. 19 shows a conventional network database (NDBA). The database is composed as a set of a record A1, record A2, record B1, record B2, and record C1. The record A1 is a set of data items of A11, A12, A13 and A14, the record B1 is a set of data items of B11, B12, B13 and B14, and record C1 is a set of data items of C11, C12, C13 and C14. Each of the records A1, B1 and C1 is an owner record of indexed sequential organization. Each of the records A2 and B2 is designated as a via set of each of the records A1 and B1. Therefore, the records A1, A2, B1, B2 and C1 are stored in a storage area 21b of the magnetic disk of the FIG. 20 in the arrangement shown in the figure. The program requests input and output in units of record (A1, A2, B1, B2, C1). However, between the storage area 21b and the main memory device, the input and output are carried out in units of page (page 1, page 2, page 3).
In order that the program obtains a desired record in a file or database, it is necessary to decide a head address information of the record wherein the record is stored. It is necessary that the program can exactly read all records even if the number of records changes. Therefore, the storing position of the record is decided by using a positioning logic which is commonly applied to respective records in files or databases. For example, in a sequential organization file, such a positioning logic that a leading position of a record is decided by an address which is obtained by adding the length of the last record to the leading address of the last record.
As another positioning logic, there is provided a logic for a direct organization. The logic is composed in such that a desired record is stored in a page which is decided by hashing a key information for identifying the desired record. In the case that a plurality of records are stored in a page decided by the hashing, the records are detected by using the key information so as to identify the desired record. The key information for deciding a desired record from a plurality of records stored in the same page which is decided by the positioning logic is hereinafter called record identifying information.
In the case that a program uses a data item in a record, the record is read in a predetermined position of a main memory, such as an input buffer or a user working area. The record has a fixed structure in which respective data items are arranged in a predetermined order based on a physical continuity, and the relative address from the head of the record is decided for the data item. Therefore, the head position of the data item is decided by using the relative address from the head of the read in record.
Meanwhile, in the electronic computer system, the shortening of the access time to data stored in an above described storage causes the process time to shorten. The fact that the process time is shortened has important value for the industry in aspects such as the improvement of process efficiency, improvement of productivity, and others.
A first example of a conventional method for shortening the access time will be described hereinafter. Assuming that data necessary for a program are records A1, B1 and C1 at the database (NDBA) having a structure shown in FIG. 19, and that the database (NDBA) has a storing structure of FIG. 20, it is necessary to read three pages of "page 1", "page 2" and "page 3" in storage area 21b.
On the assumption that, as shown in FIG. 21, there are two storage areas 21b and 21c, and the records A1 and B1 are stored in the page 1 of the storage area 21b, and the record C1 is stored in the page 2 of the storage area 21b, and that the remaining records are stored in a storage area 21c, and that the page storing a desired record is already read and stored in the main memory, it is not necessary to read again, the page from the storage area. (This operation is, for example, described in the Japanese Patent Application Publication 7-89334). Therefore, two pages of page 1 and page 2 are read from the storage area 21b. As a result, the average access time for the records A1, B1 and C1 is shortened to 2/3+L .
There is a case that, in the database NDBA of FIG. 19, a program requires three data comprising A11 in the record A1, B11 in the record B1 and C11 in the record C1. In the case that the database NDBA has the storing structure of FIG. 20, it is necessary to input three pages comprising the page 1 including the data item A11, page 2 including data item B11 and page 3 including data item C11. If there are two storage areas 21b and 21c as shown in FIG. 22, and data items A11, B11 and C11 are stored in the page 1 of the storage area 21b, remaining data items are stored in the storage area 21c, the average access time for the data items A11, B11 and C11 is to read only page 1 of the storage area 21b, thereby being shortened to 1/3+L .
The effect on the shortening of the average access time dependent on the adjacent location of records or data items is hereinafter called adjacent location effect. The combination of the records A1 and B1 has a higher adjacent location effect than the combination of records A1 and A2. The combination of data items A11 and B11 has a higher adjacent location effect than the combination of data items B11 and B12. The aggregating of data of high adjacent location effect is hereinafter called adjacent concentration. As an example of the using the adjacent concentration in the record units, there is the via set specification for a member record of the network database. It is possible to obtain the adjacent concentration effect by specification to a database schema. There is no technique capable of adjacently aggregating data in the units of data item.
As a second method of shortening the access time, the disk cache is used. This method, in the case of using a record on a magnetic disk, is to locate a copy of a page including the record on the disk cache which is an upper storage hierarchy. After the location, the record can be used at the speed of the access time of the disk cache.
A third method for shortening the access time is to move a file or database to a memory of a faster storage hierarchy when the file or database is used in units of the file or database. For example, when a file which is ordinarily located on the magnetic disk is used, the file is moved to an electronic disk. Japanese Patent Laid Open 6-44108 discloses an example of the third method.
Above described second and third access time shortening methods use the storage hierarchy. In the use of the storage hierarchy, the access time can be shortened by locating data having a high probability of use on a higher storage hierarchy. Since the memory capacity decreases as the storage hierarchy becomes higher, it is important to locate the data on the storage hierarchy in accordance with the probability of use of the data for the shortening of the access time.
It is possible to grasp the adjacent location effect in units of the records or data items and the probability of use, by analyzing the internal structure of the program used for the data or by statistically analyzing the past history of action of the data.
In accordance with the above described first method for shortening the access time, although the access time is shortened in all records and data items dependent on the adjacent concentration, the effect is limited to the member record of the network database. In addition, since a schema must be designated in the design stage, an adjacent concentration of an only one fixed pattern for all clusters is implemented. In other words, the first problem of the first method for shortening the access time is that the adjacent concentration can not be implemented in arbitrary record units or in arbitrary data item units.
In the second method, the first access to a page including a desired record to be used must be performed on the magnetic disk. Therefore, the access time shortening effects on only the access to the same page after the first access. Consequently, the page including data to be used can not previously be located on a memory of an upper storage hierarchy. Namely, the storage hierarchy can not effectively be utilized.
In accordance with the third method, although data can be located on an upper storage hierarchy in advance, the location is performed in the file or database units. As a result, even if only a part of data, for example 20% of a file, is used by a program, an upper hierarchy must have a capacity capable of storing 100% of the file. In addition, the upper storage hierarchy is wastefully occupied by 80% of unused data.
The second problem common to the second and third methods for shortening the access time is that the file or the database can not "previously and partially" be located in an upper hierarchy, and hence the access time shortening can not be more effectively implemented.
The reason for the above described first and second problems is that data can not be stored in an a desirable designated storage area in the record or data item units. The reason why data can not be stored in an a desirable designated storage area in the record units is that the position where the record is stored is decided by the positioning logic. On the other hand, the reason why data can not be stored in an a desirable designated storage area in the data item units is that the physical continuity between data items in a record is a precondition.