The conventional general-purpose processor (CPU) and the Digital Signal Processor (DSP) are both flexible; they handle different applications by running different programs. However, due to the limited resources, the processing power and throughput rate of general-purpose processor are insufficient. Conventional multi-core processor integrates multiple processor cores, which may execute programs in parallel to improve chip performance. However, a parallelism programming mindset is needed to use a conventional multi-core processor to its full capacity. In reality, the allocation and management of resources by the OS is usually evenly partitioned rather than partitioned based on need. Compared with general CPUs, the Digital Signal Processor has more computational units but more often than not, the computational resources are still insufficient. Therefore, to improve parallelism, in particular dynamically scheduling the computing resources based on program execution, to better allocate the resources, is one of the keys to enhance a CPU's efficiency.
In today's processor architecture, cache is usually used to store part of the lower level memory content, so said content may be quickly fetched by higher level memory or the processor core to keep the pipeline flowing. Basic caches usually replenish themselves with content from lower level memory after a cache miss, which causes the pipeline to wait until the cache is refilled with the missing content. Although there are a few new cache structures, such as: victim cache, trace cache, and pre-fetch are all improvements on top of the basic cache. Nevertheless, the processor/memory speed gap is ever widening. The current architecture, particularly cache misses, has become the most serious bottleneck limiting the improvement of modern processors.