The problem of moving programs from one computer architecture to another has given rise to a number of strategies for migration which minimize human effort. Among systems which support the same application interfaces, and exhibit the same security and data integrity characteristics, simple re-compilation of an application may suffice. However, this does not work in cases where high-level language source code is unavailable or where there is no compiler written for the target platform. In these cases, a more general solution is necessary to solve the problem of application migration.
Application migration is not the only problem. Application interfaces are built upon application enablement layers and operating systems, the attributes of which are highly dependent on processor architecture, and thus, may vary greatly from machine to machine. To move an application, it may also be necessary to port the application enablement environment, and possibly the underlying control program. Such an undertaking requires a great deal of human effort, and often fails to preserve the security and data integrity characteristics assumed by the users of the applications, because of operation on the original platform. A general solution to this second problem, which minimizes human effort and preserves desirable architectural attributes, would be quite beneficial in such cases.
A general solution for the first problem, application migration, is provided by an emulation technique known as binary translation, which automatically converts each machine instruction in a program written for a source processor architecture into one or more target processor machine instructions, effectively translating the source machine program, instruction by instruction, into a target machine program.
Often, binary translation is incorporated into a run-time environment, in which a previously unencountered source instruction is dynamically translated into target instructions as needed, and the translation is saved and reused the next time that particular source instruction is to be executed. Of course, data structures must be employed to track which instructions of the source program have been translated, and where the translations are stored. Also, the original source machine program instructions must be maintained, in addition to the target translations of those source instructions. This use of binary translation provides a relatively efficient means for emulating the computational and logical characteristics of a source architecture on a target processor.
Binary translation is far superior to brute force emulation techniques, which interpret source instructions as they are encountered, extracting parameters such as register specifications, and immediate data fields, and calling subroutines based on the opcode. Brute force approaches incur these penalties every time a source instruction is executed, while binary translation techniques incur such penalties only the first time a given instance of a source instruction is executed. Every subsequent execution of that particular instruction instance proceeds with the efficiency afforded by the translated code. In general, these methods have the effect of producing target machine code that has little dependence on source architecture state information, and bears a strong resemblance to code compiled natively for the target architecture.
While binary translation addresses the first problem, providing a general approach for application migration, it does not address the second problem, providing a general approach for migrating the application enablement environment and control program. Binary translation exploits the fact that most processor architectures provide similar computational and logical operations, the aspects of the architecture that are explicitly used by application programs. The layers below the application interface, which provide security and data integrity to the applications, and manage system resources which are shared by multiple independent applications, must use other facilities provided by a processor architecture. These facilities may vary greatly from architecture to architecture, particularily between CISC (Complex Instruction Set Computer) and RISC (Reduced Instruction Set Computer) architectures, and often the facilities provided by one architecture have no analog in another architecture. The inefficiencies associated with emulating these facilities in software have historically been prohibitive, thereby restricting the use of binary translation to the application domain.
For this reason, other means have generally been used to provide the characteristics of the interface to which emulated applications are written. The software layers underlying this interface are most often rewritten specifically for the target architecture at great expense. Depending on the differences in the underlying processor architectures, many of the security and data integrity characteristics of the environment may be sacrificed in the process. And, unique characteristics of the original operating system can be lost as a result. In other words, no general solution has yet addressed the second problem.
The approach described in application Ser. No. 08/349,771, entitled "Storage Access Authorization Controls In A Computer System Using Dynamic Translation of Large Addresses", assigned to the same assignee as the present invention and filed on the same days as the present invention, incorporated here by reference, does provide a general approach to solving the second problem. By utilizing large virtual addressing in a target processor, it incorporates all the authorization mechanisms of a source processor architecture. The target processor dynamic address translation hardware is used to check the legality of storage accesses, and to map legal storage accesses such that they proceed with the efficiency of the target processor hardware after legality is established. By providing an efficient means to implement the authorization facilities of one architecture on another, it provides the impetus to extend binary translation techniques to cover the entire environment: the application layer, the application enablement environment, and the control program.
It is the goal of this invention to have one computer, the target, provide the necessary program execution environment of another computer, the source, by means of the binary translation of the instructions of the source computer to those of the target computer, such that the control program, application enablement environment, and application programs perform their functions on the target computer in the same manner as they do on the source computer. From the point of view of the computing establishment, the source machine characteristics are observed in the target machine operation, and include the security, integrity, and functional aspects, including those provided by the source operating system, all without the cost of implementing the desired elements of the source as native elements of the target.
Due to their limited scope, existing binary translation techniques are heavily optimized around the issues which exist in the application domain. They do not need to manage the additional complexities which are handled in the software layers beneath the application interface. The approach described in this invention re-focuses the implementation of these techniques, making them feasible for the emulation of the entire system environment.
When the emulated portion of the system environment is extended beyond the application layer, to include the application enablement and control program layers, the ground rules change drastically. The purpose of a multitasking operating system is to maximize the value provided by computing resources, by efficiently distributing them between multiple applications. To do so it must manage the asynchronous events that are part of the total computing environment. As a consequence of such events, the operating system will interrupt the flow of one program and divert computing resources to another. Of course, it must keep track of the current instruction address for the interrupted program, so that the program can resume execution where it left off.
As stated above, most existing binary translation approaches emulate the application domain only, and run on an application interface which runs natively on the target processor. From the perspective of the target operating system, the emulation environment built around each emulated source application is just another target application. Thus, when the multitasking target operating system preempts an emulated source machine program, the return address is actually a target machine address.
However, when the emulated portion of the system environment includes the source operating system, and the interruptions themselves are emulated, the return addresses for preempted programs are source machine addresses. This is because the applications are running under the direct control of the source operating system, which dynamically allocates the system resources among them. Likewise, synchronous and asynchronous exceptions, such as page faults and I/O interrupts, require that a source address be used to determine the point of re-entry into the interrupted program. This is because the source operating system handles these events within the emulated source operating environment. Essentially, even though the application is being executed by means of binary translation, it is executing in a source machine operating environment.
When limited to the application domain, binary translation requires code entry points determined by source addresses only for branches within an application which cannot be determined by static analysis. Existing techniques focus heavily on this, going to great lengths to minimize the number of source entry points. When extended to include the operating system, binary translation requires a code entry point following every possible point of program preemption. This results in a huge increase in the number of possible entry points, specified by a source machine address, in an emulated program. The invention described herein alleviates the severe performance repercussions that such an increase would cause in normal binary translation environments. Also, existing techniques map source instructions in a single application address space to a set of translations for those instructions. When the entire system environment is emulated, there are multiple address spaces, each potentially containing instructions which must be emulated.
The ramifications of dynamic address translation and the possibility of programs shared between multiple address spaces add complexity and potential for new optimizations, while vastly increasing the scope of the translation management algorithms. For example, some operating system structures map the same system code into all instruction address spaces in order to improve system performance of linkage to, and data access by, that code. If applied to a multi-space emulation environment, traditional techniques would produce multiple translations for the same code, one for each address space that the code is part of. It is advantageous to minimize such duplication. The method described here discovers such multiple mappings of code through use of source DAT, and causes an existing translation of code to be used for all its uses in the source.
Finally, the existing techniques are static in nature, since the code in an application environment is generally not modified during its execution. An entire system environment, on the other hand, is very dynamic in nature. Given the objective of providing the full operational characteristics of the source operating system and those of its native application enablement programs in the emulated environment, full multitasking of multiple applications and application enablement programs must be supported. Application environments are instantiated and discarded. Programs are loaded, and overwritten. The source operating system controls the allocation and deallocation of the assigned source storage, allocated within the target machine storage. Real page frames are allocated in the source storage to back virtual pages, and eventually deallocated. Thus, the algorithms designed to maintain an emulated system environment must support the management of this volatility by creating and destroying target translations as source instructions are created and destroyed in the emulated environment.