Pipelining is a method of accelerating performance of a computing device by dividing tasks into a plurality of “stages,” each of which may contain one or more “threads” that may not be mutually exclusive of other threads in the stage. For example, a pipeline stage may include a first thread that reads a value from memory for use in a later stage, a second stage that operates on a value read from memory in a previous stage, and a third stage that stores to memory a value that is the result of an operation performed in a previous stage. In some cases, speed and efficiency of pipelining may be better than in linear execution, wherein each instruction must sequentially load, execute, and then store.