A processor including a pipeline is designed to, when an instruction to execute is a branch instruction, cause a branch prediction mechanism to predict whether the branch instruction is taken or not taken and advance processing forward. When the branch prediction fails, the processor cancels all the processings executing precedingly based on the result of the branch prediction and re-execute another processing, resulting in performance loss. Therefore, an accuracy improvement of the branch prediction is important for the purpose of achieving a performance improvement of the processor.
As a first form of the branch prediction mechanism, there is the following branch prediction method. In the first method, the branch prediction mechanism holds branch destination addresses (target addresses) of branch instructions that were taken in the past as a branch history. The branch prediction mechanism searches the branch history using an instruction fetch address as an index in parallel with fetching (reading) of an instruction, to thereby predict a success or failure of branch (branch-taken or branch-not-taken) and a branch destination address (for example, Patent Document 1). In the first method, it is possible to nearly eliminate useless instruction fetches when a branch is taken because the time until decision of the branch destination prediction is short. However, there is no information corresponding to an instruction sequence flow in the branch prediction, resulting in low branch prediction accuracy.
As a second method of the branch prediction mechanism, there is a branch prediction method called g-share (for example, Non-Patent Document 1). In the second method, the branch prediction mechanism holds branch-taken accuracy and branch destination addresses of branch instructions as a branch history. When deciding a fetched instruction as a conditional branch instruction, the branch prediction mechanism uses the exclusive logical sum of a global history in which success or failure of recent branches is written in chronological order and instruction fetch addresses as an index to search the history of branch-taken accuracy, to thereby decide whether success or failure of a branch to predict a branch destination address. The second method makes it possible to obtain higher branch prediction accuracy than the first method of just searching the history of branch destinations.
In the second method, as the global history in which success or failure of branches is written in chronological order becomes longer in length, branch prediction performance is improved. However, the length of the global history relies on the size of the history of branch-taken accuracy, so that the size of the history of branch-taken accuracy becomes double in order to increase the length of the global history by one bit. Therefore, it is not easy to increase the length of the global history and the mounting area cost for the accuracy improvement of branch prediction obtained by expansion of the branch history is large.
As a third method of the branch prediction mechanism, there is a branch prediction method called perceptron (for example, Non-Patent Document 2). In the third method, the branch prediction mechanism performs branch prediction based on association between success or failure of a branch of a branch prediction target instruction and success or failure of a branch of an instruction fetched before the branch prediction target instruction. The branch prediction mechanism stores the association with the branch prediction target instruction in a weight table as a weight value. The branch prediction mechanism performs branch prediction based on a result obtained by making a weight value obtained by searching the weight table using a fetch address of the branch prediction target instruction as an index and a global history correspond to each other to be subjected to a product-sum operation.
Concretely, the branch prediction mechanism performs a product-sum operation of a result obtained by multiplying a weight W(i), i being a natural number of 1≤i≤n, of a previous i-th instruction from the branch prediction target instruction and a global history X(i) where a value of “+1” is written when a branch of the previous i-th instruction is taken and a value of “−1” is written when a branch of the previous i-th instruction is not taken together (W(0)+W(1)×X(1)+W(2)×X(2)+ . . . +W(n)×X(n)). Then, the branch prediction mechanism predicts branch-taken in a case where the result of the product-sum operation is positive and predicts branch-not-taken in a case where the result of the product-sum operation is negative.
In the third method, the length of the global history relies on the number of weight tables, and in order to increase the length of the global history by one bit, it is only necessary to increase the weight table by one. When the size of the weight table is sufficiently small, the mounting area cost caused by an increase in the global history is reduced compared to the second method. However, in the third method, by referring to the weight table using address of the branch prediction target instruction, weights are obtained, and the result of adding these weights together is used as a branch prediction result. Time is taken for this processing, to thus need to extend a latency of branch prediction and reduce an operating frequency.
As a forth method of the branch prediction mechanism, there is a branch prediction method called piecewise-linear (for example, Non-Patent Documents 3, 4). The forth method can improve the branch prediction accuracy by using an instruction execution path to a branch prediction target instruction for branch prediction based on the third method. Concretely, the branch prediction mechanism performs a search for an index of a weight table of a previous i-th instruction from a branch prediction target instruction using a fetch address of the previous i-th instruction, to thereby reflect the instruction execution path in the branch prediction.
Further, the following branch prediction technique is proposed. The branch prediction mechanism performs prediction processing by pipelines of two stages of stage 1 and stage 0. By the pipeline of the stage 1, the branch prediction mechanism performs weighting on each branch result of a global history by a weight selected from a weight table and performs a product-sum operation of the global history and the weight, to thereby calculate a product-sum operation value of weighted branch results. By the pipeline of the stage 0, the branch prediction mechanism calculates a prediction value using the product-sum operation value of the weighted branch results. The branch prediction mechanism performs the processing of the stage 0 using the result of processing of the stage 1 performed when the previous branch instruction is input (for example, Patent Document 2).
[Patent Document 1] Japanese Laid-open Patent Publication No. 06-89173
[Patent Document 2] Japanese Laid-open Patent Publication 2009-37305
[Non-Patent Document 1] S. McFarling, “Combining Branch Predictors”, Western Research Laboratory Technical Note TN-36, June 1993.
[Non-patent Document 2] D. A. Jimenez and C. Lin, “Dynamic branch prediction with perceptrons”, In Proceedings of the 7th International Symposium on High Performance Computer Architecture (HPCA-7), p.197-206, January 2001.
[Non-Patent Document 3] D. A. Jimenez, “Piecewise linear branch prediction”, In Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA-32), June 2005.
[Non-Patent Document 4] D. A. Jimenez, “Oh-snap: Optimized hybrid scaled neural analog predictor”, In Proceedings of the 3rd Championship on Branch Prediction, http://www.jilp.org/jwac-2/, 2011.
In the branch prediction by the above-described piecewise-linear branch prediction method, high accuracy of the branch prediction is obtained, but as will be described later, the amount of circuits in the branch prediction mechanism becomes huge and the latency of branch prediction increases.