The efficiency and performance of a processor is measured in IPC, the number of instructions executed per cycle. In a superscalar processor, instructions of the same or different types are dispatched and executed in parallel in multiple execution units. Each instruction dispatch port is typically connected to one execution pipe and certain type of instructions are always issued in a specific port since they can only be executed in a specific execution unit. The execution units work independently, in parallel and any dependency among instructions are detected before a group of instructions is formed and dispatched. In a typical superscalar processor, a decoder feeds an instruction queue, from which the maximum allowable number of instructions are issued per cycle to the available execution units. This is called the grouping of the instructions. The average number of instructions in a group, called size, is dependent on the degree of instruction-level parallelism (ILP) that exists in a program. In a typical processor, the Fixed Point Unit (FXU) is designed to handle the most frequent simple instructions that only require one cycle of execution. Typically such instructions are Loads, Stores, Binary Arithmetic and Logical operations. For complicated instructions that require many cycle of execution, instructions are cracked (as described in papers of Intel Corporation and present in IBM's Power 4 processors using the IBM's pSeries workstations) to many simple instructions that can be dispatched in parallel to the many execution pipes. The execution of such complicated instructions may still require many execution cycles. During each of these cycles a subset of cracked instructions are executed. The cracking of instructions increases the decode area, the decode time, and are hard to apply to complicated architectures such as the IBM mainframe S/390 ESAME instruction set which were described in IBM's Enterprise Systems Architecture Principles of Operation (SA22-7201-06) and as repeated in IBM's zArchitecture Principles of Operation SA22-7832-00, December 2000. Other complications are exception detection and reporting, serialization, error detection and recovery. Due to the listed complications, cracking is found to degrade performance and add significant logic area and complexity if applied to S/390 architecture. Other solutions to the complicated multi-cycle execution instructions are to assign or dispatch them to one dedicated FXU pipe which isolates them from the commonly executed single cycle instructions. The logic in the multi-cycle FXU pipe, which can be pipelined or not, is allowed to spin as many cycles as it requires to execute these instructions. The multi-cycle FXU pipe is good for out-of-order processors, but it does not add any benefits to in-order processors since no new instructions are allowed to be dispatched unless the multi-cycle instruction has finished executing. While the z900 IBM microprocessor could execute 64 bit instructions of the zArchitecture with a millicode implementation, it would be desireable to execute IBM's ESAME and IBM's zArchitecture is a more efficient manner.