A high-performance computer architecture is usually provided with an accelerated computation unit having a very strong capability of processing an intensive computation task, such as a general purpose graphics processing unit (GPGPU) and a field-programmable gate array (FPGA). When processing an intensive computation task, a central processing unit (CPU) allocates a large amount of parallel computing work to an accelerated computation unit to alleviate computing pressure of the CPU, so as to improve overall computing efficiency of the computer.
As shown in FIG. 1, in a computer architecture including an accelerated computation unit, a CPU and the accelerated computation unit have respective storage units. Generally, a memory of the CPU is defined as a main memory and a memory of the accelerated computation unit is defined as a device memory, and data transmission between the main memory and the device memory is implemented by using a bus.
Iterative computation is a typical intensive computation task. To improve computing efficiency, the iterative computation is usually implemented by being allocated to an accelerated computation unit. The iterative computation is generally applied in solving an equation set, solving matrix eigenvalues, singular value decomposition (SVD), and the like. As shown in FIG. 2, a basic idea of iterative computation is successive approximation. A rough initial value is selected, then a same iterative formula is used, and an intermediate result is repeatedly substituted into the iterative formula for loop computation, until a computation result converges to meet a precision requirement.
Because such an intensive computation task as the iterative computation has a very high requirement for data precision of the intermediate result, to implement effective computation convergence, data in a high-precision format is generally used in an accelerated computing process and also used in a data transmission process. Although a high-precision data format is used in the accelerated computing process and the data transmission process, and a requirement for computation precision is satisfied, the amount of data transmission is increased, the increased amount of data transmission leads to an increase in a delay of data transmission, and overall computation time for the CPU is also increased.
In a solution of the prior art, as shown in FIG. 3, two data format conversion units, that is, a unit for converting high-precision data to low-precision data and a unit for converting low-precision data to high-precision data, are added into the accelerated computation unit, so that the CPU transmits low-precision data to the accelerated computation unit by using the bus. After receiving the low-precision data, the accelerated computation unit performs zero padding on the data to convert the data to a high-precision data format, and then performs computation. When the accelerated computation unit needs to transmit the data to the CPU, the accelerated computation unit converts the high-precision data to the low-precision data, and then sends the low-precision data to the CPU by using the bus.
In the foregoing solution of the prior art, the amount of data transmission is reduced by transmitting low-precision data, and further a delay of data transmission is decreased. However, because two data format conversion units are added into an accelerated computation unit, it is required to additionally occupy computing resources and computation time of the accelerated computation unit to perform data format conversion. Consequently, efficiency of accelerated computation is decreased.