A conventional digital signal processor or microprocessor needs to be fed information (data and instruction) coming from memories to execute or perform tasks. It is further noted that some tasks, such as digital signal processing tasks, require multiple bytes or words of information per instruction, bytes or words being stored at different memory locations. In such a case, the conventional processors require several memory accesses per instruction. This presents a problem if it is desired to execute an instruction in a single cycle in a unified memory space where information (data and instructions) can be stored in the same block of memory.
For example, in conventional processor architecture, if it is required to double  access a primary memory in a single cycle to realize the execution of an instruction in a single cycle. As such, the processor must fetch the new instruction following the current one and read or write all the primary memory data needed for the execution of the current instruction during the single cycle.
In the conventional processor architecture, the memory accesses are performed during the instruction's execute phase, referenced for example in a synchronous system from the rising edge to the rising edge of the main processor clock. The address for the data to be written or read is available at the beginning of the execute phase (usually computed during a previous instruction's execute phase), data access cycles are from rising edge to rising edge, with the access triggered in the middle of the execute phase on the falling edge.
Similarly, in the conventional processor architecture, the address of the instruction to be fetched from the primary memory is available at the beginning of the cycle and the instruction read from the primary memory is loaded into a register at the end of the cycle. This causes the access of the instruction to also happen in the middle of the cycle.
Therefore, in the conventional processor architecture, the primary memory access for the instruction fetch would be in conflict with the concurrent data access. This is particularly true if the accesses are directed to the same block of memory or if the accesses are accomplished using one unique bus.
To address this problem, it has been proposed to use a dual port memory that allows two concurrent read. It has been further proposed to use of a higher frequency clock to squeeze two accesses in a single cycle and still leave enough time for the address to set up and the data to set up.
The two above proposed solutions have their own disadvantages, they are expensive, realize high power consumption, and limit the overall performance.
Another proposed alternative is to change the pipeline and increase the number of pipe stages. This is not possible because it is desirable to maintain single cycle execution for all instructions including branches, jumps, etc.
Therefore, it is desirable to provide micro-architecture that enables two memory accesses per memory block per instruction cycle and does not negatively impact the cost or performance of the processor or require higher power consumption. It is also desirable to provide micro-architecture that enables two memory accesses to a single memory block per instruction cycle, while maintaining single cycle execution for all instructions including branches, jumps, etc. More specifically, it is desirable to provide micro-architecture that maintains singe cycle execution of all instructions while enabling the use of single port synchronous memories to store both data and instructions and improving overall speed performance and keeping the power consumption low.