1. Field of the Invention
The invention relates to a storing apparatus and a storing structure for constructing an address space on a secondary storage by a hard disk drive or the like and, more particularly, to a storing apparatus and a data storing structure for realizing an offset space on a secondary storage by a data storing structure of a hypercube.
2. Description of the Related Art
Hitherto, as address spaces of a computer, in addition to an address space which is realized on a main storage, an address space which is realized on a secondary storage using an external storing apparatus such as a disk drive or the like is used. In the following description, the address space on the secondary storage is called an offset space in order to distinguish it from the address space on the main storage. An address on the offset space is merely called an offset. As an example of the offset space, a file system of UNIX as a typical operating system (OS) can be mentioned. It is assumed that the offsets are sequentially allocated to 1, 2, 3, . . . from 0 on a byte unit basis in a manner similar to the general address space or UNIX file. It is required that data of a CAD, complicated data such as a molecule structure or the like, or long data such as a multimedia like an image, a document, or the like is stored in a recent database. On the other hand, it is obviously requested that relatively simple data such as conventional numerical values, character data, records, or the like and a small amount of data can be handled. Those requests can be unitedly handled by the offset space. Therefore, it is desirable to realize an offset space at a high access efficiency. The reasons why the offset space meets the requirement such that a small amount of simple data is unitedly stored simultaneously with the complicated and long data will now be described in more detail.
(Storage of Complicated Data)
In a recent database, it is required that CAD data or complicated data such as a molecule structure or the like can be also handled. The complicated data can generally have a treexe2x80x94or network-shaped structure. To realize those data by a secondary storage, like a data structure on the main storage, it is desirable that addresses are distributively allocated to areas and the data structures can be linked by the addresses. In this instance, it is desirable that the addresses are as simple as possible from viewpoints of easiness of programming and a space efficiency. The offset space meets such a requirement.
(Storage of Long Data)
In a recent database, it is required to store long data of an image, a document, or the like. Those data are often stored as files. The offset space conceptually includes a file as mentioned above and can store long data. Specifically speaking, when there is long data of (n) bytes, by storing the long data into offsets 0 to nxe2x88x921 of the offset space, the long data can be stored. The offset space meets the above requirement.
(Storage of Simple Data)
Since complicated data can be stored in the offset space, conventional character train, numerical values, and records can be also obviously stored therein. In case of simple data, however, an amount of data is also generally small. In the case where data can be fully inserted in one page, in the database, it is obviously stored into one page. In such a case, therefore, it is desirable that the offset space can be also realized on a page unit basis. Hitherto, a database has a problem about how to solve an overflow. The offset space can also provide a possible solution for such a problem.
(Page is Made Unconscious)
A database is divided into areas called pages of a fixed length of, for example, 4 kB and stores data therein. xe2x80x9cPagexe2x80x9d is a unit of an input/output between the secondary storage and the main storage. A unit of an input/output in the OS is also generally such a page. As shown in FIG. 1, a page 100 is usually divided into a header portion 102 and a data portion 104. In case of storing user data, information on the system side is necessary independent on the data. For example, information such as size of user data and amount of remaining space areas is necessary. That information is called management information. User data and, in a special case, management information are stored in the data portion 104. The management information is stored in the header portion 102.
FIG. 2 shows a state of storage of data in the database. In this example, a record R1 and a record R2 have been stored in 0-page 100-0. Subsequently, when the user intends to insert a record R3, if it cannot be fully inserted into the 0-page 100-0, a new 1-page 100-1 is captured and the record R3 is stored in the 1-page 100-1. In case of constructing a system of the database, usually, programming is performed while the user is conscious of the pages. For example, it is now considered that the record R1 in FIG. 2 is updated. If a length of record R1 increases and the record R1 cannot be fully inserted in the 0-page 100-0, as one of countermeasures, there is a measure such that a 2-page 100-2 is newly assured and some portion in the record R1 after the updating or the whole record R1 is transferred as shown in FIG. 3. FIG. 3 shows a case where the record R1 becomes two partial records R11 and R12 after the updating, the record R11 has been stored in a 0-page 104-0, and the record R12 has been stored in a 2-page 104-2. A state where a bundle of data cannot be fully inserted into one page as mentioned above is called xe2x80x9coverflowxe2x80x9d. The overflow is shown as an example in which the user is conscious of the page. In case of using the offset space, generally, data can be handled without allowing the user to be aware of the overflow and page. However, since there is a problem about continuity of areas, although it depends on a realizing method, a situation such that the data has to be copied or the like occurs. FIG. 4 shows a state where the overflowed records in FIG. 3 are stored in an offset space of offsets 0 to 7999.
Hitherto, there is a file system in the OS such as UNIX or the like as a typical example of the offset space. The file system of UNIX has the following features.
(1) Offset space
The file of UNIX is considered as an offset space. The offsets of 0, 1, 2, . . . are allocated to respective bytes of the data of the UNIX file from the head.
(2) Management page is separately managed
In UNIX, the user data and management information are managed by different pages. A page to store the management information is called a management page.
(3) Management page has a tree structure
The management page is realized as an unbalanced tree structure.
(4) There is no function for area management
A function for area management is provided in the address space on the main storage. Specifically speaking, functions of malloc ( ) and free ( ) are supported in UNIX. When the user wants to assure an area whose dimensions are xe2x80x9csizexe2x80x9d, by calling this function as
address=malloc(area);
the system side assures a continuous area existing on the main storage and returns the address as an xe2x80x9caddressxe2x80x9d. By setting
free(address)
this area is released. So long as the area corresponding to xe2x80x9caddressxe2x80x9d is not opened, this area is not again transferred by malloc ( ). However, the area managing function of the main storing space does not exist in the file as an offset space of UNIX. UNIX has a process for writing the contents of a continuous area of (n) bytes (such an area is called a xe2x80x9cuser bufferxe2x80x9d) on the main storage designated by the user from some offset into an area of (n) bytes and a process for, contrarily, reading out the contents of (n) bytes from some offset into a user buffer of (n) bytes designated by the user. In the database or the like, however, the area management is performed with regard to an area of the record or the like.
(5) Copy into buffer
Although pages on the secondary storage are copied into the buffer which is managed by the system, they are further copied therefrom to the user buffer on the main storage. This is because continuity of an area existing over the pages is not guaranteed.
However, the following problems remain in order to realize an offset space of a high access efficiency in the conventional system as mentioned above.
(Efficiency of Data Access)
As a simplest method of realizing the offset space, first, a method whereby pages constructing the offset space are linked in a line shape is possible.
FIG. 5 shows an example in which pages constructing an offset space are linked in a line shape. In the example, data of offsets 0 to 3999 is stored in the first 0-page 100-0, data of offsets 4000 to 7999 is stored in the next 1-page 100-1, and data of offsets 8000 to 11999 is stored in the last 2-page 100-2. In case of linking the pages in a line shape as mentioned above, the user has to trace the halfway pages in order to access to data behind the link, so that the number of inputting/outputting times, namely, costs for input/output increase. For example, in FIG. 20, if the user intends to access to an area of an offset 10000, he has to first access to the 0-page 100-0, subsequently access to the 1-page 100-1, and access to the 2-page 100-2. That is, it is necessary to input three times.
FIG. 6 shows an example in which pages constructing the offset space are linked like a tree. In this case, link information of all pages existing behind has been stored in the head page. By linking the pages like a tree as mentioned above, in case of accessing to the area of offset 10000, the user first accesses to the head 0-page 100-0 and, thereafter, he can soon access to the 2-page 100-2 by the link information L02, so that the number of inputting times can be reduced to 2. In the file system of UNIX, the offset space is realized by the unbalanced tree-like link.
(Coexistence of User Data and Management Information)
In the file system of UNIX, the management information and user data have been stored in different pages. Therefore, in case of accessing to the user data, the user first accesses to the management page and subsequently accesses to the user page. That is, it is necessary to input at least twice. If the management information and the user data were stored in the same page, such a problem can be solved. FIGS. 5 and 6 show examples in which the management information and the user data are stored in the same page and it is a system which is usually used in the database.
(Realization of Offset Space Which can be Expanded)
Although an amount of user data is initially small, the user data is often added and updated later and its amount further increases. That is, it is desirable that the offset space can be expanded. If the maximum value of the size of offset space is small, the user data cannot be added nor updated any more. It is, therefore, desirable to limit the maximum value as gentle as possible. A method of enlarging the offset space can be realized by constructing such a structure by a plurality of stages, although the tree is constructed by one stage, for example, in case of FIG. 21. The system of expanding the offset space by increasing the number of stages is also used in UNIX.
(Balance of User Data and Management Information)
In case of a relatively small amount of data, it is desirable to store the data into the head 0-page as possible from a viewpoint of an access efficiency. This is because if an amount of management information is large, the user data is pressed, the data to be inserted in the head 0-page is expelled, and the access efficiency deteriorates. On the other hand, in case of the long data, since the total number of inputting/outputting times causes a problem, contrarily, it is desirable to insert the management information as much as possible to the head 0-page or the upper page of the tree structure. That is, it is desirable that the balance of the user data and the management information can be adjusted by the data amount.
(Realization of Continuous Area and Discontinuous Areas)
In case of storing data such as long data, for example, data of 10000 bytes, in FIG. 6, it is sufficient to store the data in a range from the offset 0 of the 0-page 100-0 to the offset 9999 of the 2-page 100-2. However, contrarily, in case of accessing to the data of 10000 bytes stored in the offset space, since the data of the offsets 0 to 3999 has been stored in the same 0-page 100-0, it is continuous. However, the data of the offsets 4000 to 7999 and the data of the offsets 8000 to 9999 have been stored in the different pages of the 1-page 100-1 and the 2-page 100-2, respectively, it is impossible to continuously access them. Therefore, in the case where the data has been divided into a plurality of pages and stored in the offset space as mentioned above, by sequentially copying the data of the offsets 0 to 3999, the data of the offsets 4000 to 7999, and the data of the offsets 8000 to 9999 from the head of the user buffer in which the data of 10000 bytes can be usually inserted and which has been prepared by the user, they can be accessed as continuous information. Such a method of copying the data into the user buffer is generally used to continuously access the long data. In this case, however, costs to copy are high. For example, in case of small data of 100 bytes, the areas of 100 bytes can be assured as a continuous area on one page. In this case, there is no need to copy the data into the user buffer but it is possible to directly access as a continuous area. To realize the offset space of a high access efficiency, it is desirable that both of them can be realized. It is also desirable that the maximum value of the size of continuous area is as large as possible.
According to the invention, there are provided a storing apparatus and a storing structure for realizing an offset space of high access efficiency by satisfying various conditions which are required for the offset space.
(Hypercube Structure)
A storing apparatus of the invention comprises: a hypercube constructing unit for realizing an offset space of a secondary storage as a data storing structure of a hypercube; and an access processing unit for accessing to an area of the requested offset space at a high speed by using a data structure of a hypercube. The hypercube constructing unit defines a hypercube by a dimension (d), a node (apex), and a side, allocates each of the pages in a range from the head page (top page) to the last page which divisionally constructs the offset space to the node of the hypercube, and sets a side for linking the pages from the node of the head page toward the node of the subsequent page. On the basis of the requested offset and size (the number of bytes), the access processing unit accesses to the requested page in accordance with a route determined by the dimension (d), node, and side of the hypercube.
According to the invention, by constructing the offset space in a hypercube shape as mentioned above, an efficient data access is realized. With respect to the side, all of the ordinary sides of the hypercube are not used but only the sides of the portion constructing a tree in a state where the pages are allocated are used like a hypercube. This is because all of the sides are not always necessary for data access but so long as there is a portion of the tree including the nodes, the high efficient data access can be realized without deteriorating the access efficiency.
The nodes of the hypercube correspond to the pages of the offset space. The numbers allocated to the respective nodes of the hypercube are page numbers. The data is sequentially stored from the data of the smaller number. For example, assuming that the data portion of the 1-page consists of 4 kB (4096 bytes), the offsets 0 to 3999 are stored into the 0-page, the offsets 4000 to 7999 are stored into the 1-page, and the offsets 8000 to 11999 are stored into the 2-page. While the data is fully inserted in the 0-page, only the 0-page exists. When the data cannot be inserted in the 0-page, the hypercube constructing unit assures the 1-page, thereby expanding the offset space. In a manner similar to the above, the offset space is sequentially expanded in order of 2-page, 3-page, . . . , n-page.
A hypercube 34 has a nature such that even if the number of nodes increases, the number of sides extending from one node does not increase so much. For example, in case of a n-dimensional hypercube, the number of nodes is equal to the nth power of 2. However, the number of sides extending from one node is equal to up to (n) and the total number of sides is equal to (the nth power of 2) minus 1 (=2nxe2x88x921) that is smaller than the number of nodes by 1 considering the tree structure. There is also a preferable nature such that the distance from the original node to the farthest node, namely, the number of sides which are traced is not increased so much. Therefore, the structure of the hypercube is used in a parallel computer or the like. Although it is undesirable to increase the number of communication lines extending from one node, namely, from a processor even in the parallel computer, it is requested to also shorten an average distance between the nodes in order to reduce communication costs. The hypercube has a balanced structure. According to the invention, it is intended to balance the distance and the number of sides by using such a nature of the hypercube, thereby realizing the offset space of a high access efficiency.
(Coexistence of User Data and Management Information)
The hypercube constructing unit constructs each page by a header portion and a data portion and stores link information to the subsequent page allocated to the nodes of the hypercube in the header portion. Therefore, the access processing unit sequentially accesses from the head page toward the requested page with reference to the link information of each page. As link information in the header portion, the hypercube constructing unit sets a page identifier of a link destination, the obtainable maximum area size in the subsequent page, and the last offset of the subsequent partial hypercube. Therefore, when the requested offset is included in the own page area, the access processing unit finishes the route search of the super cube and accesses the data in the relevant area. When the requested offset is not included in the own page area, the access processing unit selects and accesses the page of the subsequent route in which the requested offset is included. As mentioned above, the present invention fundamentally has a structure such that the user data and the management information are inserted in the same page. This is because it is intended to realize the coexistence of the user data and the management information in consideration of the access efficiency. Further, according to the invention, a ratio of the user data and the management information can be customized. In the extreme case, only the management information can be used with regard to the top page.
(Customization by the Dimension and the Length of Side)
In the case where the number of pages which are allocated onto the side including the nodes of the hypercube is defined as a length (e) of side, the hypercube constructing unit can set the length (e) of side to 3 or more and set the number of pages existing on one side to 3 or more. The hypercube constructing unit realizes an offset space adapted to various requests by adjusting the dimension (d) of the hypercube and the length (e) of side including the nodes. In a so-called hypercube, although two nodes exist on one side, a construction in which three or more nodes exist on one side is further possible in the invention. In the case where (e) nodes exist on the side, it is assumed that a length of this side is equal to (e). When three pages are arranged on the side, a length (e) of side is equal to 3. When two pages are arranged on the side, the length (e) of side is equal to 2. In the invention, the dimension (d) and the length (e) of side can be adjusted.
As for the pages existing on the side, it means that they can be accessed from the page of the node of the apex that is nearer to the 0-page by the input of once. For example, to access from the 0-page to the 1-page or to the 2-page, it can be accessed by inputting once in both cases. This is because the link information to the 1-page and 2-page existing on the same side is stored in the 0-page. Explanation will be made further in detail. As a case where (e=2) nodes exist on one side, it is now assumed that there is the first node at the edge of the side extending from the start node serving as node 0, there is the second node at the edge of the side extending from the first node, and further, there is the third node at the edge of the side extending from the second node. In this case, among the nodes of each side, as link information to access from the first node near the start node to another second node of this side, one link information is necessary. In the second node, the link information regarding the first node in the direction returning to the start node is unnecessary and it is sufficient that there is only link information regarding the third node of the other side.
When (e=3) nodes exist on one side, for example, when an attention is paid to a certain side on which three nodes of the first, second, and third nodes exist, (exe2x88x921), namely, two link information for directing to each of the second and third nodes is necessary in the first node at the edge near the start node of this side. On the contrary, with respect to the second and third nodes, the link information in the direction returning to the first node is unnecessary. By enabling the length (e) of side to be set to 3 or more and enabling the dimension (d) and the length (e) of side to be adjusted as mentioned above, it is possible to cope with various requests for the offset space. For example, it is possible to cope with the following different requests.
I. It has initially been known that the data amount can be inserted in the 1-page and no link is necessary and the user wants to insert user data into the top page as much as possible.
II. In future, there is a possibility that a data amount increases. Although a link for this purpose is necessary, since the data amount is initially small, the user wants to also insert data into the top page.
III. Since the data amount is large, although it is sufficient that the contents in the top page are constructed only by the link information, the user wants to raise the access efficiency as high as possible by reducing a troublesomeness for tracing the link.
(Expansion of Space)
When the offset space in which the pages have been allocated onto the hypercube which has previously been constructed is full, the hypercube constructing unit can expand the offset space by linking a new hypercube. According to the invention, the offset space can be first enlarged by increasing the pages to the hypercube designated by the dimension (d) and the length (e) of side. It is unnecessary to prepare those pages from the beginning. For example, the offset space can be expanded from the 0-page to the 15-page. However, when the page reaches the 15-page, the offset space becomes full. In the case where a data amount which is increased cannot be known unless the offset space can be expanded any more, a situation such that the dimension (d) and the length (e) of side have to be designated to those than they are needed or the like occurs, so that there is a case where it becomes inconvenient. Therefore, when the offset space constructed in the beginning becomes full, the offset space is enabled to be expanded by linking a new hypercube. However, the expansion exceeding the offset space of the initially set hypercube should not be often used because the access efficiency deteriorates. When the size of data is presumed, the dimension (d) and the length (e) of side should be exactly designated.
(Realization of continuous area and discontinuous areas)
The hypercube constructing unit can make the sizes of pages which are allocated to the hypercube different. For example, the sizes of pages which are allocated to the hypercube are set to the minimum size and a multiple of the minimum size or a multiple of the power. For example, now assuming that the minimum size is set to 4 kB, the page sizes are set to 8 kB, 16 kB, 32 kB, . . . as multiples of the power of 2. Even if the sizes of pages are fixed to, for example, 4 kB, the data generally becomes discontinuous on the secondary storage. However, if the user prepares the continuous area onto the buffer, it is possible to access as a continuous area by copying from the page to the buffer. However, the costs to copy are high. Therefore, in the invention, by providing the page sizes of 8 kB, 16 kB, 32 kB, . . . , the data which cannot be fully inserted in the page of 4 kB can be directly accessed without copying it as a continuous area.
The invention provides a data storing structure itself and comprises: an offset space of a secondary storage constructed by an external storing apparatus; and a hypercube constructing unit for realizing the offset space as a data storing structure of a hypercube. The details of the hypercube constructing unit are substantially the same as those in case of the storing apparatus.
The above and other objects, features, and advantages of the present invention will become more apparent from the following detailed description with reference to the drawings.