This invention relates to digital computer systems, and specifically to a virtual memory having recovery capabilities.
In the future, as users of state-of-the-art symbolic computing machines develop large-scale, knowledge-based applications, they are expected to encounter major problems arising out of storage management problems in supporting large and complex knowledge/data bases. The word storage is used herein in a broad sense to encompass virtual memory, file systems and databases. The problems can be primarily attributed to the dichotomy by which today's computers, including state-of-the-art symbolic computers such as the Texas Instruments EXPLORER and the Symbolics 3670, manage storage along two entirely different organizations. These organizations can be referred to as the computational storage and the long-term storage.
In symbolic/artificial intelligence (AI) processing, a representation of knowledge is a combination of data structures and interpretive procedures that, if used in the right way in a program, will lead to "knowledgeable" behavior. The goals of AI systems can be described in terms of cognitive tasks like recognizing objects, answering questions, and manipulating robotic devices. The most important consideration in formulating a knowledge representation scheme is the eventual use of the knowledge. The actual use of the knowledge in symbolic/AI programs involves three stages: (1) acquiring more knowledge, (2) retrieving facts from the knowledge base relevant to the problem at hand, and (3) reasoning about these facts in search of solutions. A number of different knowledge representation schemes, such as state-space representation, logic, procedural representation, semantic nets, production systems, and frames, have been developed by the knowledge representation community. The choice of the knowledge representation scheme very much depends on the application requirements.
No matter which knowledge representation scheme is used, at some sufficiently low level of representation the knowledge is represented by memory objects interconnected by pointers. These objects exhibit a structure which is defined by the interconnection graph of pointers connecting the objects. The structure of objects created and manipulated by symbolic AI applications is usually very rich and complex. Moreover, both the information in objects, as well as the structure of objects, can undergo rapid changes.
In symbolic computing, objects representing a knowledge base are created and manipulated in the computational storage. As its name implies, the computational storage contains objects to be manipulated by the processor of a computer system. These objects can be numbers, strings, vectors, arrays, records, linked lists, instructions, procedures, etc. These objects, both small and large, are usually identified by names. The names of objects serve as convenient handles or pointers that can be passed as procedure parameters, returned as procedure results, and stored in other objects as components. The names of objects are typically implemented as their virtual addresses. Programmers create and manipulate objects by using programming languages, such as LISP and PROLOG.
Typically, the computational storage is implemented as virtual memory, which consists of a hierarchy of memories: a fast, small semiconductor main memory, backed up by a slow, large disk to support paging. Objects in the computational storage are accessed very rapidly as the processor can directly access them by specifying their addresses (real or virtual), often at a speed that matches the basic processor cycle time. The information stored in these objects is also processed and manipulated very efficiently as it is stored in a format defined by the processor architecture, and can therefore be directly interpreted by the processor hardware or microcode.
Often, the information stored in the computational storage has a very rich structure; i.e., objects in the computational storage are interconnected by a rich and complex structure of pointers to match the requirements of applications at hand. The structure of these objects is often dynamic. However, objects in the computational storage do not exist beyond the life times of programs that create them. When a program terminates or a system shutdown, or crash occurs, these objects cease to exist. Therefore, they are called short-lived or transient objects. To make these objects survive beyond the life times of programs that created them, i.e., to make them long-lived or persistent, they must be moved to the other storage organization, i.e., the long-term storage.
As its name implies, the long-term storage is used to keep information for long periods of time. It is typically implemented on a disk-resident file system. The disk file system is logically different from the paging disk of the computational storage, even though the physical disk media may be shared by both. Examples of information stored in the long-term storage are files, directories, libraries, and databases. The long-term storage retains information in a reliable fashion for long periods of time. In order to store information beyond the life time of a program that periods of time. In order to store information beyond the life time of a program that creates it in the computational storage, the information needs to be first mapped into a representation expected by the long-term storage and then transferred to it for long-term retention using a file input/output (I/O) operation or a database operation.
The types of objects supported by the long-term storage are very restrictive (essentially files, directories, relations, etc.), and may match with the data structure requirements of many applications. The representation of information in the long-term storage is quite "flat." For example, a file may consist of a sequential stream of bits or bytes, such as ASCII characters. Files or relations usually can neither hold procedural objects nor pointers to other objects in the long-term storage. Information in these objects can neither be directly addressed nor directly processed by the processor, because its representation is not compatible with the processor architecture. The information can be processed only after it is mapped into a representation expected by the computational storage and then transferred to it for processing. The translation overhead in mapping these objects to/from a collection of files is quite substantial, too.
In addition to the time overhead for translation and mapping of objects between the computational and long-term storages, there is additional space overhead, as the information is essentially duplicated in virtual memory and the file system. There is an apparent paradox in that the computational storage, usually implemented as a virtual memory, hides the existence of the paging disk store; on the other hand, the long-term storage makes the existence of the disk explicit to the programmer. Thus, the programmer is faced with a nonuniform storage model, where differences in addressing, function, and retention characteristics between the computational and long-term storages are visible above the processor architecture level.
Programming languages, such as FORTRAN, Pascal, LISP, and Prolog, strongly reflect the dichotomy in storage organization. The specification of these languages almost invariably assumes long-term storage objects (files) to have entirely different characteristics from computational objects. As a result, these programming languages cannot directly process information in the long-term storage the way they can process information in the computational storage. This dichotomy propagates throughout the whole system and cannot be hidden from the user. It shows up in differences between names used for programming language objects and names used for files and databases.
The dichotomy also shows up in a different set of languages that has evolved to process information in the long-term storage. These languages include various so-called command languages, such as the UNIX shell language and the IBM TSO Command Language, that are responsible, among other things, for performing operations on files. The other class of languages which operate on persistent objects are various database languages, such as Square, Sequel, and Quel. These languages can define database objects, and perform queries and updates on them. Typically, such languages are often interpreted, and are restrictive and arcane in nature compared to the more familiar programming languages, which also enjoy the efficiency of compiled execution over interpreted execution.
As a consequence, the programmer must be aware of the nonuniform storage model, and must explicitly move information among storage media, based on the addressing mechanisms, functions and retention characteristics desired. Another consequence is that the nonuniform storage model is an obstacle to programming generality and modularity as it increases potential types of interfaces among programs. The hodgepodge of mode-dependent programming languages, such as command languages, programming languages, debugging languages, and editing languages, makes fast and efficient interaction with the system difficult.
The mapping between transient and persistent objects is usually done in part by the file system or the data base management system (DBMS) and in part by explicit user translation code which has to be written and included in each program. This task imposes both space and time penalties, and degrades system performance. Frequently the programmer is distracted from his task by the difficulties of understanding the mapping and managing the additional burden of coping with two disparate worlds: the programming language world and the DBMS world.
In large data-intensive programs there is usually a considerable amount of code, which has been estimated to be as high as 30% of the total, concerned with transferring data between files or a database, and the computational storage. Much space and time is wasted by code to perform translations between the transient and persistent object worlds, which has adverse performance impact. This is unsatisfactory because the effort and time required to develop and execute the translation code can be considerable, and also because the quality and reliability of the application programs may be impaired by the mapping. The storage dichotomy also gives rise to much duplication of effort in the operating system design and DBMS design.
These problems, created by the storage dichotomy, are considerably further complicated for symbolic/AI computing. Processes on current symbolic machines share a single address space: i.e., there is no per-process address space. Moreover, the address space is not segmented, but is a single, linear address space. Such a model of the computational storage allows easy, efficient and flexible sharing of objects among multiple processes. Any object can point to any other object by simply holding a pointer to that object (usually implemented as a virtual address of the object being pointed to). Arbitrarily complex structures of objects interconnected by pointers can be created and manipulated. Such powerful structuring of objects is very important for the development of the highly integrated and powerful software development environments available on these symbolic computers.
Unfortunately, current symbolic computers make a distinction between the computational and long-term storages, similar to today's conventional computers. In symbolic computers, making a single object persistent by moving it to a file system is not very meaningful; all objects that can be reached from an object by following all out-going pointers also need to be made persistent as a single entity, and all in-coming pointers pointing to the entity must be "properly taken care of." Such an entity, however, can be very large and moving it to a file system would be a complicated and expensive operation. Conversely, the reverse move from a file system to the computational storage would be equally as complicated and expensive.
Many current advanced programming techniques, especially as practiced in the symbolic/AI community, do not distinguish between procedures and data; procedures are just data, which are themselves active. As the body of information being dealt with grows and becomes more active, it becomes critical that the program environment, consisting of complex objects interconnected with rich pointer structures, survives for long periods of time. Mapping and moving of such rich environments into today's file system or database for long-term retention would involve substantial translation overhead, both in space and time.
Thus, there is a substantial difference between the representations of objects in the computational and long-term storages for symbolic/AI applications. The richer the structure of computational objects, the greater the difference and the bigger the effort needed to perform translation between these two representations. Emerging symbolic and AI applications will employ increasingly sophisticated and complex structures on a large number of objects on which retrievals, queries, inferences, reasoning, deductions, and computations will be performed. As can be anticipated, the overhead to map long-term objects into computational objects and vice-versa for large knowledge-intensive applications could be substantial.
The current approach taken by many researchers to facilitate knowledge-based applications is based on connecting a symbolic computer to a database machine. This approach is not based on persistent memory, as it neither addresses the storage dichotomy issues nor deals with the lifetime or interchangeability of procedure and data issues. There will be a mismatch between the data model requirements of symbolic/AI applications and the rigid data models supported by database machines. Therefore, such approach appears to be inadequate for expert database systems. These reservations are shared by other researchers in the field.
The persistent memory approach is based on a fundamentally different foundation. The literature on persistent memory dates back to 1962, when Kilburn proposed single-level storage, in which all programs and data are named in a single context. (T. Kilburn, "One Level Storage System", IRE Trans. Electronic Comput., vol. EC-11, no. 2, Apr. 1962) Saltzer proposed a direct-access storage architecture, where there is only a single context to bind and interpret all objects. (J. H. Salzer, "Naming and Binding of Objects". In R. Bayer et al, editors, Operating Systems, An Advanced Course, p. 99, Springer-Verlag, New York, N.Y., 1978.
Traiger proposed mapping databases into virtual address space. (I. L. Traiger, "Virtual Memory Management for Database Systems", ACM Operating Systems Review, pp. 26-48. Oct. 1982.) It seems that the simple data modeling requirements of the FORTRAN and COBOL worlds discouraged productization of these proposals because they are much more difficult to implement than the conventional virtual memory and database systems.
The MIT MULTICS system and the IBM System/38 have attempted to reduce the storage dichotomy. However, both have major shortcomings for symbolic computing: unlike LISP machines, each process has its own address space. All persistent information is in files. A file mapped into the address space of a process cannot hold a machine pointer to a file mapped in the address space of a different process. Thus, sharing of information among different processes is more difficult than with LISP machines. Furthermore, there is no automatic garbage collection, which is essential for supporting symbolic languages.
Recently, many researchers have proposed implementing persistent objects on top of a file system provided by the host operating system. Though persistent and transient objects still reside in two separate storage organizations, persistent objects can be of any general type, such as number, vector, array, record, or list, and can be manipulated with a common programming language such as ALGOL or LISP. However, there is a large overhead to access persistent objects because their pointers must be dereferenced by software, taking several machine cycles.
Systems having two-level memory storage can easily recover from a power failure, hardware failure, software error, or the like, which can be considered as a group as "system crashes". After a system crash, any hardware problems are repaired and the software is reloaded from long-term storage. All data and procedures which were in the virtual memory at the time of the crash are discarded, and the system is restarted, and those items that have been stored in files or a DBMS are considered to be valid.
A system which implements a large uniform memory is especially vulnerable to system crashes. Because persistent objects are stored in the virtual memory, they can be corrupted by the crash. The most recent version of a particular persistent object may or may not be stored on the paging disk. The current value of large objects may be partially on disk, and partially in RAM. Thus, the values stored on disk cannot be relied on, and cannot merely be used to reload and restart the system after a crash.
Thus, if it is desired to restore a virtual memory after a crash, prior art file and DBMS systems cannot be used. It is necessary to devise some mechanism for preserving the state of the virtual memory.
It is an object of the present invention to provide a virtual memory which can recover from hardware failures and software errors. It is a further object to provide a virtual memory which can be restored to an earlier, valid state to minimize loss of work. It is another object to provide a means for taking regular checkpoints of the virtual memory to preserve valid states which can be restored. It is another object to provide an improved recoverable paging scheme for virtual memories.
In order to provide for system recovery in case of a power failure, hardware failure or software error, checkpoints are periodically taken of the state of the system. These checkpoints are marked and stored on disk. Changes made between a checkpoint and the next checkpoint are also stored and marked, but are discarded in the event of a system failure. When there is a system failure, the system is rolled back to the checkpoint state, and processing resumes in a normal manner. Virtual memory pages which are updated after the most recent checkpoints are stored on disk as sibling pages. An efficient state indicator mechanism is provided for determining which sibling page is to be read from or written to disk when the corresponding virtual page is referenced. This state indicator mechanism indicates which pages on disk are included in the checkpoint state, and which contain information modified since the checkpoint. In normal operation, the most recent version is used, but when a system failure occurs, only the most recent version stored before the checkpoint is used.
The novel features which characterize the present invention are defined by the appended claims. The foregoing and other objects and advantages of the invention will hereinafter appear, and for purposes of illustration, but not limitation, two preferred embodiments are shown in the drawings.