The ×86 processor architecture, originally developed by Intel Corporation of Santa Clara, Calif., and the Advanced RISC Machines (ARM) architecture, originally developed by ARM Ltd. of Cambridge, UK, are well known in the art of computing. Many computing systems exist that include an ARM or ×86 processor, and the demand for them appears to be increasing rapidly. Presently, the demand for ARM architecture processing cores appears to dominate low power, low cost segments of the computing market, such as cell phones, PDA's, tablet PCs, network routers and hubs, and set-top boxes (for example, the main processing power of the Apple iPhone and iPad is supplied by an ARM architecture processor core), while the demand for ×86 architecture processors appears to dominate market segments that require higher performance that justifies higher cost, such as in laptops, desktops and servers. However, as the performance of ARM cores increases and the power consumption and cost of certain models of ×86 processors decreases, the line between the different markets is evidently fading, and the two architectures are beginning to compete head-to-head, for example in mobile computing markets such as smart cellular phones, and it is likely they will begin to compete more frequently in the laptop, desktop and server markets.
This situation may leave computing device manufacturers and consumers in a dilemma over which of the two architectures will predominate and, more specifically, for which of the two architectures software developers will develop more software. For example, some entities purchase very large amounts of computing systems each month or year. These entities are highly motivated to buy systems that are the same configuration due to the cost efficiencies associated with purchasing large quantities of the same system and the simplification of system maintenance and repair, for example. However, the user population of these large entities may have diverse computing needs for these single configuration systems. More specifically, some of the users have computing needs in which they want to run software on an ARM architecture processor, and some have computing needs in which they want to run software on an ×86 architecture processor, and some may even want to run software on both. Still further, new previously-unanticipated computing needs may emerge that demand one architecture or the other. In these situations, a portion of the extremely large investment made by these large entities may have been wasted. For another example, a given user may have a crucial application that only runs on the ×86 architecture so he purchases an ×86 architecture system, but a version of the application is subsequently developed for the ARM architecture that is superior to the ×86 version (or vice versa) and therefore the user would like to switch. Unfortunately, he has already made the investment in the architecture that he does not prefer. Still further, a given user may have invested in applications that only run on the ARM architecture, but the user would also like to take advantage of fact that applications in other areas have been developed for the ×86 architecture that do not exist for the ARM architecture or that are superior to comparable software developed for the ARM architecture, or vice versa. It should be noted that although the investment made by a small entity or an individual user may not be as great as by the large entity in terms of magnitude, nevertheless in relative terms the investment wasted may be even larger. Many other similar examples of wasted investment may exist or arise in the context of a switch in dominance from the ×86 architecture to the ARM architecture, or vice versa, in various computing device markets. Finally, computing device manufacturers, such as OEMs, invest large amounts of resources into developing new products. They are caught in the dilemma also and may waste some of their valuable development resources if they develop and manufacture mass quantities of a system around the ×86 or ARM architecture and then the user demand changes relatively suddenly.
It would be beneficial for manufacturers and consumers of computing devices to be able to preserve their investment regardless of which of the two architectures prevails. Therefore, what is needed is a solution that would allow system manufacturers to develop computing devices that enable users to run both ×86 architecture and ARM architecture programs.
The desire to have a system that is capable of running programs of more than one instruction set has long existed, primarily because customers may make a significant investment in software that runs on old hardware whose instruction set is different from that of the new hardware. For example, the IBM System/360 Model 30 included an IBM System 1401 compatibility feature to ease the pain of conversion to the higher performance and feature-enhanced System/360. The Model 30 included both a System/360 and a 1401 Read Only Storage (ROS) Control, which gave it the capability of being used in 1401 mode if the Auxiliary Storage was loaded with needed information beforehand. Furthermore, where the software was developed in a high-level language, the new hardware developer may have little or no control over the software compiled for the old hardware, and the software developer may not have a motivation to re-compile the source code for the new hardware, particularly if the software developer and the hardware developer are not the same entity. Silberman and Ebcioglu proposed techniques for improving performance of existing (“base”) CISC architecture (e.g., IBM S/390) software by running it on RISC, superscalar, and Very Long Instruction Word (VLIW) architecture (“native”) systems by including a native engine that executes native code and a migrant engine that executes base object code, with the ability to switch between the code types as necessary depending upon the effectiveness of translation software that translates the base object code into native code. See “An Architectural Framework for Supporting Heterogeneous Instruction-Set Architectures,” Siberman and Ebcioglu, Computer, June 1993, No. 6. Van Dyke et al. disclosed a processor having an execution pipeline that executes native RISC (Tapestry) program instructions and which also translates ×86 program instructions into the native RISC instructions through a combination of hardware translation and software translation, in U.S. Pat. No. 7,047,394, issued May 16, 2006. Nakada et al. proposed a heterogeneous SMT processor with an Advanced RISC Machines (ARM) architecture front-end pipeline for irregular (e.g., OS) programs and a Fujitsu FR-V (VLIW) architecture front-end pipeline for multimedia applications that feed an FR-V VLIW back-end pipeline with an added VLIW queue to hold instructions from the front-end pipelines. See “OROCHI: A Multiple Instruction Set SMT Processor,” Proceedings of the First International Workshop on New Frontiers in High-performance and Hardware-aware Computing (HipHaC'08), Lake Como, Italy, November 2008 (In conjunction with MICRO-41), Buchty and Weib, eds, Universitatsverlag Karlsruhe, ISBN 978-3-86644-298-6. This approach was proposed in order to reduce the total system footprint over heterogeneous System on Chip (SOC) devices, such as the Texas Instruments OMAP that includes an ARM processor core plus one or more co-processors (such as the TMS320, various digital signal processors, or various GPUs) that do not share instruction execution resources but are instead essentially distinct processing cores integrated onto a single chip.
Software translators, also referred to as software emulators, software simulators, dynamic binary translators and the like, have also been employed to support the ability to run programs of one architecture on a processor of a different architecture. A popular commercial example is the Motorola 68K-to-PowerPC emulator that accompanied Apple Macintosh computers to permit 68K programs to run on a Macintosh with a PowerPC processor, and a PowerPC-to-×86 emulator was later developed to permit PowerPC programs to run on a Macintosh with an ×86 processor. Transmeta Corporation of Santa Clara, Calif., coupled VLIW core hardware and “a pure software-based instruction translator [referred to as “Code Morphing Software”] [that] dynamically compiles or emulates ×86 code sequences” to execute ×86 code. “Transmeta.” Wikipedia. 2011. Wikimedia Foundation, Inc. <http://en.wikipedia.org/wiki/Transmeta>. See also, for example, U.S. Pat. No. 5,832,205, issued Nov. 3, 1998 to Kelly et al. The IBM DAISY (Dynamically Architected Instruction Set from Yorktown) system includes a VLIW machine and dynamic binary software translation to provide 100% software compatible emulation of old architectures. DAISY includes a Virtual Machine Monitor residing in ROM that parallelizes and saves the VLIW primitives to a portion of main memory not visible to the old architecture in hopes of avoiding re-translation on subsequent instances of the same old architecture code fragments. DAISY includes fast compiler optimization algorithms to increase performance. QEMU is a machine emulator that includes a software dynamic translator. QEMU emulates a number of CPUs (e.g., ×86, PowerPC, ARM and SPARC) on various hosts (e.g., ×86, PowerPC, ARM, SPARC, Alpha and MIPS). As stated by its originator, the “dynamic translator performs a runtime conversion of the target CPU instructions into the host instruction set. The resulting binary code is stored in a translation cache so that it can be reused . . . . QEMU is much simpler [than other dynamic translators] because it just concatenates pieces of machine code generated off line by the GNU C Compiler.” QEMU, a Fast and Portable Dynamic Translator, Fabrice Bellard, USENIX Association, FREENIX Track: 2005 USENIX Annual Technical Conference. See also, “ARM Instruction Set Simulation on Multi-Core ×86 Hardware,” Lee Wang Hao, thesis, University of Adelaide, Jun. 19, 2009. However, while software translator-based solutions may provide sufficient performance for a subset of computing needs, they are unlikely to provide the performance required by many users.
Static binary translation is another technique that has the potential for high performance. However, there are technical considerations (e.g., self-modifying code, indirect branches whose value is known only at run-time) and commercial/legal barriers (e.g., may require the hardware developer to develop channels for distribution of the new programs; potential license or copyright violations with the original program distributors) associated with static binary translation.
One feature of the ARM ISA is conditional instruction execution. As the ARM Architecture Reference Manual states at page A4-3:                Most ARM instructions can be conditionally executed. This means that they only have their normal effect on the programmer's model operation, memory and coprocessors if the N, Z, C and V flags in the APSR satisfy a condition specified in the instruction. If the flags do not satisfy the condition, the instruction acts as a NOP, that is, execution advances to the next instruction as normal, including any relevant checks for exceptions being taken, but has no other effect.        
Benefits of the conditional execution feature are that it potentially facilitates smaller code size and may improve performance by reducing the number of branch instructions and concomitantly the performance penalties associated with mispredicting them. Therefore, what is needed is a way to efficiently perform conditional instructions, particularly in a fashion that supports high microprocessor clock rates.