1. Field of the Invention
The present invention relates to a computer system using a virtual memory, and more particularly, to the memory management and the protection for controlling memory accesses to the virtual memory in the computer system.
2. Description of the Background Art
Conventionally, in a computer system using a virtual memory, a so called MMU (Memory Management Unit) has been used in translating the logical addresses to the physical addresses and protecting the memory region specified by the physical addresses. In a usual MMU, a program number uniquely assigned to each program is utilized for distinguishing a plurality of programs which can make accesses to the memory, such that the same logical address can be translated into different physical addresses for the different programs.
In such a computer system using a virtual memory, it has become popular in recent years to employ a programming mode such as a server-client type programming in which a plurality of programs sharing the same data are executed in mutual cooperation as a result of the recent progresses made in the network technique and the parallel processing technique. In such a programming mode, a plurality of memory accesses are made from a plurality of programs to the identical physical address storing the shared data, so that it becomes necessary to provide a memory region protection in order to limit the memory accesses made from the plurality of programs to only those which are judged as proper ones.
To this end, a conventionally employed memory region protection method has been that which utilizes the program number as the identifier for indicating a program from which each memory access originates. Namely, in such a conventional memory region protection method, each data is accompanied by the program number of the program which may make accesses to this data, while the program numbers available to each program are appropriately assigned to each program as an identifier in advance. Then, the access to each data is permitted only when this access is judged as a proper one having the identifier indicating the program number which matches with the program number accompanying each data. The well known examples of this type of a conventional memory region protection method include a segmentation method and a ring protection method.
In a segmentation method, a dedicated memory region can be secured for each program by assigning a unique program number available only to that program, while a shared memory region for a plurality of programs can be secured by assigning a program number commonly available to these plurality of programs, so that the highly flexible memory region protection can be realized.
However, in order to deal with a plurality of program numbers, it becomes necessary in this segmentation method to provide a plurality of identifier storage registers and a plurality of identifier comparators. As a consequence, this segmentation method has been associated with the following problems.
Firstly, there is a case in which a number of identifier storage registers are occupied with respect to a single physical address, such that the address translation for the other physical addresses cannot be carried out efficiently. For example, in a case a number of entries in the address table is constant, when a plurality of entries are occupied by a number of identical logical and physical address pairs with different program numbers assigned, the types of the physical addresses that can be stored in the address table can be reduced considerably. Such a situation is equivalent to a case in which the address table covers only a limited range of the address space. When the covered range of the address space is limited, the probability at which the address given from the processor makes a hit at the address table becomes lower and a number of address table misses increases, such that a time required for the address table miss recovery operation increases. This time required for the address table miss recovery operation is counted as a part of the overall address translation time, so that the average overall address translation time is increased considerably in such a case.
Secondly, when the program to be executed is switched from one program to another, the entries accompanied by the program numbers used in the previously executed program are invalidated regardless of whether the program numbers are shared by the next program to be executed, so that the operation for filling the address table entries is required even for the shared program numbers. For example, in a case two programs sharing the logical address space are to be executed alternatively, when a new entry is required, the entries accompanied by the program numbers used in the previously executed program are invalidated regardless of whether these entries belong to the shared logical address space or not. In order to fill these invalidated entries anew, it becomes necessary to carry out the operations of address translation and the program number matching for each entry all over again, and the time required for these address translation and program number matching operations is counted as a part of the overall address translation time, so that in a case of the switching of the program to be executed, the average overall address translation conversion time is increased considerably. In the worst case, the time required for these address translation and program number matching can occupy the major part of the overall address translation time, as the invalidation of the entries occur every time the program to be executed is switched.
Thirdly, it becomes necessary to check the properness of the rights to make accesses for a plurality of programs simultaneously, so that the operation for checking the properness of the rights to make accesses can be quite complicated. For example, in a case a plurality of programs share the identical logical address space, an entry in the address table is set up for each of the program numbers. In such a case, when a page swapping occurred, it becomes necessary to invalidate all the entries related to the specific logical address or physical address, whereas when a plurality of entries made hits, it becomes necessary to select the valid one among the plurality of entries making hits, and the time required for these operation for invalidating all the related entries and selecting the valid entry is counted as a part of the overall address translation time. Here, the number of related entries are unknown, so that the operation can be quite complicated as a great number of different states must be accounted for.
Now, there is an alternative manner of handling a plurality of program numbers in the segmentation method in which a specialized instruction is provided in the processor side to specify the program number at a time of execution. The switching operation using such a specialized instruction is effective in reducing the number of identifier storage registers so that the time required for the exchange of the segment registers can be shortened. However, in this case, each execution of each instruction becomes quite time consuming, so that it has been difficult to improve the throughput of the program. Thus, in the segmentation method, it is easy to separate the different programs completely, but it requires a large number of additional hardware and complicated operations to share the data among the programs.
In addition, in the segmentation method, there is a problem that it is necessary to provide a flexible protection check mechanism capable of dealing with different types of accesses differently permitted to different programs and different levels of the rights to make accesses differently endowed to different programs. For example, in a case a plurality of programs share the data, very frequently, there is a case in which only one program is permitted to update the data and the other programs are only permitted to read this data.
Also, there is a case in which it is necessary to provide the hierarchical protection among the programs according to the difference of the content and the level of the programs such as Kernel, OS, and application programs. The method to achieve such a hierarchical protection is known as a ring protection method.
In order to cope with such cases, it is necessary to provide each entry of the address table with a field for specifying a data access type and a field for specifying a ring level, so as to distinguish the data in different specified levels as different entries.
In the ring protection method, the program numbers are hierarchically ordered to establish strength relationships in which the transition of the control toward the stronger region is limited to only those which are made through a proper procedure, and the data access toward the stronger region is prohibited, such that the directionality can be provided in the accesses among the protected regions. In this ring protection method, one program number can have a right to make accesses to a plurality of regions, so that the number of identifiers can be reduced and the time required for switching the identifiers can be shortened. In addition, the check of the properness of the program number can be made by comparing the sizes of the identifier of the accessing side and the identifier of the accessed side at a time of the access, so that there is no need to provide a specialized instruction to the processor side in this ring protection method.
However, this ring protection method has a drawback that the set up of the protection regions is not very flexible because it is predetermined that the overlapping region between the region for one identifier and the region for another identifier is to be regarded as the region for the stronger identifier.
Now, in general, in the ring protection method, a kernel program is positioned at the strongest level, and the protection levels of the other programs are determined according to the absolute strength relationship of each program with respect to the kernel program.
However, in a case of the server-client type programming in which a plurality of programs are executed in mutual cooperation, there is a case in which the hierarchical relationship among the program changes in every execution, so that it has been difficult to set up the absolute protection level for each program in advance.
The strength relationships established among the memory regions in the ring protection method is shown in FIG. 1, in which segment-1 to -4 corresponds to the different programs such as kernel, OS, library, and application programs. Here, in order for each of the kernel program and the OS program to have a dedicated data region, the segment-3 and the segment-4 corresponding to the OS program and the kernel program must be assigned to the same protection level as shown in FIG. 1, and the dedicated data regions of these two programs must be managed the method other than the ring protection method, so as to make the dedicated data region of one hidden from the dedicated data region of another.
However, in FIG. 1, there are four protection levels provided in correspondence to four programs, so that when the kernel program and the OS program are assigned to the same protection level, there will be a protection level to which no program is assigned. In other words, there are cases in which the programs having no hierarchical relationship with each other must be assigned to the same protection level, and the protection level having no program assigned exist. Such a conflict of the hierarchical relationships occurs at a higher possibility as a number of programs to be allocated into the protection space increases, and makes the management of the different protection levels difficult.
Thus, the ring protection method is suitable for a case in which the cooperative relationships among the programs are simple and fixed, but lacking in an ability to express semi-ordered hierarchical relationships among the programs.
As a consequence, in the ring protection method, it is impossible for the memory protection device to set up the strength relationships among the programs dynamically according to the progress of the programs. In addition, in the ring protection method, a region belonging to a certain protection level is accessible from the regions belong to the stronger protection levels, so that it has been impossible to provide a dedicated region for a program belonging to an intermediate protection level.
Moreover, in the ring protection method, there is a possibility for the entries of the address table to be occupied by the identical logical and physical address pairs, so that the ring protection method also has the drawback of the extended average memory access time, similar to the segmentation method. Furthermore, in the ring protection method, there is a case in which a plurality of entries make hits in an address table look up, and it becomes necessary in such a case to select the valid one among the plurality of entries making hits. Also, when a number of programs share the OS region, it is required to manage a certain group of programs collectively, but this has been impossible in the conventional ring protection method and it would have been necessary to manage each entry separately.
Now, the virtual memory to be used in a computer system is in either one of a single virtual space scheme in which a plurality of programs to be executed in parallel are loaded into one virtual space, or a multiple virtual space scheme in which each of the plurality of programs to be executed in parallel is allocated to separate one of a plurality of virtual spaces.
For example, in the UNIX system, the different processes are allocated to different virtual spaces. For this reason, when the multiple processes are executed, the virtual space must be switched when the process is switched and the hit rates for the TLB (Translation Look-aside Buffer) and cache devices can be deteriorated considerably. In other words, the TLB and cache devices which are provided for the purpose of increasing the execution speed would not operate effectively in such a multiple process environment.
Here, it is noted that, in the TLB and cache devices, the data and the physical addresses are searched by using the virtual addresses used in each program as keys, and the address positions are usually fixed regardless of the program so that the virtual address positions are common to all the programs. For this reason, when the data of the previously executed program are left in the TLB and cache devices, there is a possibility for causing an erroneous hit during the execution of the new program after the program switching.
In order to prevent the occurrence of such an erroneous hit, conventionally, a space number for identifying each virtual space has been provided in addition to the virtual addresses, and the consistency of the TLB and cache devices with respect to the main memory device are checked every time the context is switched. However, these conventional provisions require an extra amount of hardware, and also make the software operation to be used in the context switching very complicated, so that the increase of the execution speed could not have been realized without introducing considerable disadvantages in other practical aspects.
Despite of this difficulty, the performance under such a multiple process environment has become increasingly important because of the increasing use of the server-client type programming in which a plurality of programs sharing the same data are executed in mutual cooperation, due to the considerable advantages of the server-client type programming that the server program and the client program can be developed separately and flexibly such that each program can be developed to have a wider applicability and a longer lifetime.
In this regard, the size of the OS has been increased so much recently, because of the increasing number of functions to be supported by the OS, such that the OS is no longer provided as a single program as it used to be but separated into a plurality of mutually cooperating programs according to the types of the functions to be supported. Similarly, there are cases in which the application programs are also provided as a plurality of mutually hidden programs, in order to improve the software productivity. In such a case in which a plurality of programs are to be operated in mutual cooperation, the operations related to the context switching and the copying of the argument which are to be carried out by the data processing unit become quite time consuming.
In order to resolve this problem, it has been considered desirable to omit the operation related to the switching of the programs by allocating a plurality of programs to a single virtual space. Here, in a conventional scheme such as that shown in FIG. 2 in which a plurality of user programs 601 to 605 are separately allocated to separate virtual spaces under the OS 600 while these user programs 601 to 605 are using the same address region, it is impossible for one user program to make an improper access to the data of the other user program as each user program is allocated to separate virtual space. However, when the scheme to allocate a plurality of programs to a single virtual space is adopted, the data of the other programs, which have been inaccessible in the multiple virtual space scheme, become accessible as all the programs are allocated to the same virtual space, so that there is a need to provide a mechanism to control the accesses according to the properness of each access.
However, when the conventional segmentation method is utilized for this purpose, an enormous amount of hardware would be required such that the switching of the access regions would be quite complicated, while when the conventional ring protection method is utilized for this purpose, the protection ranges cannot be set up freely among the programs because the strength relationships cannot be changed dynamically in the ring protection method. Consequently, it has conventionally been difficult to realize the simple and flexible protection of the memory regions among a plurality of programs in which the protection ranges can be set up freely.
In summary, there has been the following three major problems in the conventional memory management and the protection system for controlling memory accesses to the virtual memory in the computer system adopting a server-client type programming mode in which a plurality of programs sharing the same data are executed in mutual cooperation.
First of all, as the programs are allocated to different logical address spaces separately, it becomes necessary for the programs allocated to different address spaces to exchange the data through the operating system (OS) in order to share the data among them, but this causes a considerable slow down of the processing speed due to the overhead of the OS.
Here, it is possible to devise a method in which a plurality of threads are executed on a single logical address space such that the sharing of the data among the threads can be achieved without causing any overhead to the OS.
However, such a method does not account for the protection of the data among the threads, so that in order to protect the data used by one thread from the other threads, it becomes necessary to employ a scheme requiring a large overhead to the OS. Namely, to this end, it becomes necessary to allocate each thread to different logical address space along with those data which are accessible from this thread. This scheme actually requires a considerably large overhead to the OS so that it is quite impractical to adopt this scheme.
Thus, the first major problem is that it has been impossible for each program to have the different types of accesses that are permitted to each thread.
Secondly, in order to provide a logical address space for each thread separately, it becomes necessary to provide a page table for each logical address space and allocate the same program to the same address region in the different logical address spaces. However, this in turn requires all the threads which are commonly accessible to a certain address region to have the same logical address and physical address pair in respective page tables, such that the memory capacity to be used by the page tables of all the threads must be provided redundantly to a large extent.
Here, when the types of accesses permitted to all the threads which are accessible to the commonly accessible address region are identical, it becomes possible to share a part of the page tables among a plurality of threads, but in a case of using a TLB (Table Look-aside Buffer) as a cache device for the page table, the redundancy cannot be avoided and the average processing speed for the address translation is inevitably slowed down accordingly.
The reason for this drawback is as follows. Namely, each entry of the cache of the page table has a pair of a page table information and a thread number, so that different cache entries are required for different threads even when the page table information is identical. Consequently, a number of cache entries having the same page table information but different thread numbers will occupy a large portion of the cache, such that a number of actually different physical addresses that can be stored in the cache will be reduced, and therefore the processing speed for the address translation will be slowed down.
In addition, there is also a drawback that the information related to the same page must be placed and managed on a plurality of page tables, such that the processing for the paging becomes quite complicated. For instance, in a case a plurality of threads are mapping the identical physical address to different logical address spaces, when it becomes necessary to page out a certain page, it becomes necessary to search and invalidate all the entries of the page tables which are mapped to that certain page. Similarly, in a case of page in, it becomes necessary to search and validate all the entries of the page tables which are mapped to that certain page. The additional time required for these operations will be added to the average address translation processing time, so that the processing speed for the address translation will be slowed down.
Thus, the second major problem is that the memory capacities required for the page tables and the page table caches are redundant and the management of the page tables and the page table caches becomes quite complicated and inefficient.
Thirdly, it has been considered effective in some applications to change the access rights permitted to each thread depending on which thread is executing a program on which address region in each logical address space. For example, when the data on a certain address region can be changed only by the thread executing the programs on another certain address region, the data on that certain address region can be changed while maintaining certain conditions. In a case the data of the database are allocated to that certain address region while the access routines for the data of the database are allocated to that another certain address region, it becomes possible to make accesses to the data of the database without requiring the intermediate use of the OS, while protecting the data of the database sufficiently, so that the high speed access processing becomes possible.
On the other hand, a conventionally memory management unit such as the Intel's 80486 processor has been capable only of changing the access rights by limiting the range of the programs that can be executed by each thread by using the segmentation scheme and shifting from one access level to another access level by using a call gate. In other words, such a conventional memory management unit has been lacking a flexibility in that the access rights can only be controlled among the elements related by an order relationship, such as the application programs with respect to the OS and the kernel, such that the element at a higher access level will become accessible to all the elements at lower access levels, because a number of different access levels realizable has been limited by the hardware.
Thus, the third major problem is that it has been impossible to flexibly change the permitted access rights for each address region, in a case the programs and data are allocated to a plurality of divided address regions within the single logical address space.