Currently, if one processor core executes programs belonging to different instruction sets, the most common method is to use a software virtual machine (or a virtual layer). The role of the virtual machine is to translate or interpret a program composed of an instruction set (guest instruction set) that is not supported by the processor core to generate the corresponding instructions in the instruction set (host instruction set) supported by the processor core for processor core execution. In general, during the operating process, an interpretation method fetches all fields including opcodes and operands in the guest instruction using the virtual machine in order and in real time via a software method. Then, the corresponding operations are performed for the operands using a stack structure in the memory based on different opcode. Therefore, multiple host instructions need to be executed to realize the function of any one guest instruction, and efficiency is low. While a translation method is used, a process similar to software compilation is implemented before the program is executed. The program is converted into a form entirely composed of the host instruction set. Thus, when the program is executed, the efficiency is higher. However, there is still performance overhead associated with the software compiling process.
With the second method, instruction decoders corresponding to different instruction sets are added in the processor core. When instructions of the different instruction sets are executed, the appropriate instruction decoders may decode the instructions, and the decoded instructions may be passed to a subsequent pipeline to perform the corresponding operations. This method has almost no loss in efficiency, but the extra instruction decoders can increase hardware cost, and increase the cost of the processor chip. In addition, because a variety of instruction decoders are implemented in advance within the processor core, scalability is lack and a new instruction set cannot be supported.
With the third method, a conversion module is added outside of the processor core. The conversion module can convert a guest instruction set to a host instruction set for processor core execution. The conversion module can be implemented in software. Generally, the method is easy to expand, but the efficiency is too low. The conversion module can also be implemented in hardware. But it is difficult to expand, and the method cannot take full advantage of a cache memory to obtain the host instruction.
Specifically, when the conversion module is located between the cache memory and the processor core, the instructions stored in the cache memory are guest instructions and the guest instructions need to be converted to the host instructions for processor core execution. Thus, regardless of whether there is a cache hit, the conversion operation needs to be performed. Therefore, the conversion operation for the same guest instruction is performed repeatedly multiple times, which increases the power consumption and deepens the pipeline of the processor core, further increasing hardware cost and performance loss when a branch prediction fails.
When the conversion module is located outside of the processor core (that is, the cache memory is located between the conversion module and the processor core), the instructions stored in the cache memory are the converted host instructions. That is, based on the host instruction address, an addressing operation is performed. While a branch target instruction address obtained by the branch instruction executed by the processor core is a guest instruction address. Because there is no one-to-one correspondence between the host instruction and the guest instruction (for example, a guest instruction may correspond to multiple host instructions), the correspondence between the host instruction address and the guest instruction address must be recorded. Thus, when a branch is taken, the guest instruction address of the branch target instruction is converted to the host instruction address. Then, the processor core may fetch the correct host instruction in the cache memory for processor core execution based on the obtained host instruction address. The difficulty for recording mapping relationships between the guest instruction addresses and the host instruction addresses is how to effectively convert the guest instruction addresses and store the converted instructions. Otherwise, once a branch is taken, based on a guest instruction address, the guest instruction is read out from the lower memory outside the conversion module and converted by the conversion module, and then the converted instruction is stored in the cache for processor core execution, greatly affecting execution performance. One solution to this problem is to use a trace cache based on a program execution trace instead of the traditional cache based on address matching. But the trace cache stores a large amount of instructions that have duplicate addresses but are located in different traces, resulting in large volume waste in memory capacity and low performance of the trace cache.
The disclosed methods and systems are directed to solve one or more problems set forth above and other problems. It should be noted that the content in this section is part of the disclosure and, unless explicitly indicated, shall not be considered as prior art.