In a computing system, an instruction buffer is generally used to store instructions that may be decoded and executed by an execution engine, such as a processing unit. For example, an artificial neural network, such as a deep neural network, may include multiple layers of processing nodes. Each processing node on a layer can perform computations on input data generated by processing nodes on the preceding layer to generate output data. For example, a convolutional neural network may include multiple convolution layers, activation layers, and pooling layers, where each of the layers may be implemented using an execution engine, such as a convolution engine, an activation engine, a pooling engine, or a stream processor. Each of these execution engines may use an instruction buffer to temporally store instructions to be decoded by an instruction decoder and executed by an execution unit of the execution engine to perform various functions. In general, a neural network may be developed, trained, and made available to many end users. The end users can then use the trained neural network to continuously perform various tasks (which may be referred to as the inference process) on input data.
In many cases, due to the hardware limitation, the size of the instruction buffer may be smaller than the size of the instruction code used to perform a user function, such as an inference. Therefore, the instruction buffer may need to be refilled during each inference. In addition, because the user function (e.g., the inference process) may be performed continuously for different sets of input data, the instruction code for the user function may need to be reloaded into the instruction buffer for each inference. In general, it is desirable that the instruction buffer can be refilled while the execution engine is executing some other instructions stored in the instruction buffer, such that the user function can be performed more quickly and more efficiently using the available resources of the computing system.