There is a constant demand for performance improvement of stored program processors, commonly referred to as central processing units (CPU) and microprocessors. Historically, some processors have included microcode to perform at least some architectural instructions of the instruction set architecture (ISA) of the processor and to service exceptions. Conventional processors fetch a single microcode instruction from a microcode memory of the processor per clock cycle, which may limit the performance of micro-coded architectural instructions and/or exception service routines, particularly in processors that have the ability to process multiple instructions per clock.