US 12,169,786 B1
Neural network accelerator with reconfigurable memory
Tariq Afzal, San Jose, CA (US); Arvind Mandhani, San Francisco, CA (US); and Shiva Navab, Mountain View, CA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Jun. 27, 2019, as Appl. No. 16/455,334.
Claims priority of provisional application 62/772,359, filed on Nov. 28, 2018.
Int. Cl. G06N 3/10 (2006.01); G06N 3/02 (2006.01); G06N 3/08 (2023.01)
CPC G06N 3/10 (2013.01) [G06N 3/08 (2013.01); G06N 3/02 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computing system, comprising:
at least one processing unit that executes program code implementing a first neural network layer and program code implementing a second neural network layer, wherein the first neural network layer is a convolutional layer, and wherein the second neural network layer is a fully connected layer;
a plurality of memory banks; and
a memory manager configured to allocate the plurality of memory banks according to a first configuration and a second configuration, the first configuration and the second configuration each including an activation buffer that is dedicated to storing values representing input activations produced by an activation function of a neural network, a weight buffer that is dedicated to storing weights of the neural network, and an output buffer that is dedicated to storing values representing output activations of the neural network, each output activation being a result of a computation involving the input activations and the weights,
wherein in the first configuration, the memory manager allocates different numbers of memory banks to the activation buffer, the weight buffer, and the output buffer than in the second configuration resulting in a different size activation buffer, a different size weight buffer, and a different size output buffer in the first configuration compared to the second configuration, wherein at least some of the memory banks change from being allocated to the activation buffer to being allocated to the weight buffer upon switching from the first configuration to the second configuration, wherein the first configuration is used during execution of the program code implementing the first neural network layer, and wherein the second configuration is used during execution of the program code implementing the second neural network layer.