1. Microprocessor challenges
Microprocessors and DSPs are very popular in electronic circuits. However, the way to run instructions sequentially limits the performance of them. Today, people are trying to exploit the instruction level parallelism (ILP). Three most common available architectures namely superscalar, VLIW, chip-multiprocessor can run more than one instructions at the same time. However, all three approaches are not suitable for the future designs. The increase of their performance will come to saturation soon. The reason is that they are not scalable. We explain their problems in the following.
Superscalar processors use dynamic scheduling mechanism. That allows multiple instructions can be issued simultaneously. With this approach, we use some methods like renaming, forwarding, predicting. These cause many limitations. The first one is the window size and maximum issue count. We have to analyze as many instructions as possible to issue those which do not have any dependencies. That needs a lot of effort. For example, determining whether n issuing instructions have any register dependencies requires n2−n comparisons. That means the size of the analyzing circuit is very large and complex to issue many instructions. The second problem that we have to face is the accuracy of prediction. It is true that no predictor is perfect. The Tournament predictor can be a good one. It requires prediction buffer, and many complicated circuits but its results are not good enough. The third problem is limitation of the number of renaming registers. If we want to increase the superscalar processors' performance by using more hardware resources, our circuit is extremely complicated. We also know that “The smaller is the faster”. Hence, the superscalar processors are not as fast as a simple processor.
VLIW can be one alternative to superscalar. This approach uses static scheduling mechanism instead of dynamic one. The compilers implement hazard detection. That simplifies the hardware circuit a lot. However, this approach also has problems. The size of the code can be the most significant one. This problem has two causes. The first one is that in order to increase the ILP, the compilers often unroll the loops. The second one is placing many no-op (no operation) instructions when we do not use some processing elements. It is obvious that we often use only some of the processing elements. For the others, we have to fetch the no-op instruction. Having many processing elements causes this problem more seriously. The more processing elements we have, the more processing elements we do not use, and the more no-ops we need. Large code size makes the cache is not sufficient and causes congestion in the instruction off-chip interface. It is true that although the microelectronic industry develops very fast, the number of pins increases insignificantly. The number of pins increases because we enlarge the size of the chip. It is hardly to reduce the size of pins. Hence, bandwidth is a very important factor. It may be still very difficult to increase the bandwidth in future. Therefore, off-chip interface congestion will limit the performance of VLIW processors.
Another alternative is chip-multiprocessor approach. People integrate many processor cores in one single chip. This idea seems that it is quite scalable and very attractive. With the development of microelectronics, it is totally feasible to integrate many processor cores and a communication network in one chip. However, in fact, this idea is not scalable at all. The reason is bandwidth (similar to the VLIW case). We know that all of these cores have to share the same off-chip bandwidth. The available bandwidth is sufficient for one or two processor cores but it is totally insufficient for hundreds of processor cores. Moreover, chip-multiprocessor is also weak to exploit the ILP of one single thread.