This specification relates to image processors.
Image processors are programmable, domain-specific parallel processing devices that are designed to take advantage of two-dimensional spatial locality in image data. Image processors are designed to efficiently process existing image data, which distinguishes them from graphics processing units (GPU), which are designed to generate images in the first instance from an internal representation.
Image processors are designed for high-efficiency, low-power, parallel execution of workloads with two-dimensional spatial locality. A computing task has two-dimensional spatial locality when output data for a location in the input data is dependent on data that neighbors or is nearby the location in the input data. For example, a 3×3 blur filter can use data in a 9-pixel square region of input image data in order to compute an output value for a pixel at the center of the square region. In this specification, the input region needed to generate an output pixel is referred to as an input support region. This example blur filter has spatial locality because the output value uses data from neighboring pixels. Image processors can also be used for high performance parallel execution of workloads in other domains, including computer vision, object recognition, neural networks, and other machine learning tasks.
Programming an image processor typically requires writing and compiling a kernel program, which is then executed concurrently by each of a plurality of execution lanes of the image processor. Each execution lane is itself a component that can execute instructions and store data in one or more registers.
Some image processors take advantage of spatial locality by coupling an array of execution lanes to an array of shift registers. Each execution lane can access data required for its kernel program by shifting the input data within the array of shift registers rather than performing memory accesses. Conceptually, this can be thought of shifting an array of image data beneath an array of execution lanes. For example, an execution lane can access data required to compute a blur filter by repeatedly reading data shifted in snake-scan order: two pixels to the left, one pixel down, two pixels to the right, one pixel down, and two pixels to the left.
Many kernel programs that implement image processing algorithms are executed by systematically stepping through an input image in a fixed traversal pattern to read an input support region needed to generate each output pixel. In this specification, a transfer function is a function that defines a relationship between a location of an output pixel and a location of an input support region needed to generate a value for the output pixel according to a particular kernel program. In other words, a transfer function defines the inputs for a particular output pixel.
Many image processing algorithms use simple transfer functions that rely on global integer offsets, which means that the same integer offsets are applied for all output pixels regardless of the location of the output pixel. For such simple transfer functions, the position of an input pixel in the input support region can be expressed using simple integer offsets from the position of the output pixel (x, y), e.g., using transfer functions of the following form: f(x, y)=(x+x_offset, y+y_offset). For these simple transfer functions, the value of x_offset and the value of y_offset are the same for all output pixels.
However, some image processing algorithms have a complex transfer function, which is a transfer function that cannot be expressed in terms of global integer offsets. Non-integer rescaling is one example of a complex transfer function. For example, if an image is to be rescaled by a factor of 1.3, the transfer function cannot be expressed in terms of global integer offsets.
Some complex transfer functions can also vary the size and location of the input support region depending on the location of the output pixel in the image. For example, for an image processing algorithm designed to correct camera lens distortion, the output pixels on the edges of the image, e.g., where the distortion is most severe, will rely on input support regions that are both larger in size and have larger offsets than input support regions used to compute output pixels near the center of the image, where the distortion is least severe. Therefore, these kinds of image processing algorithms typically must be executed by a CPU, which is less energy efficient and slower than executing them directly on an image processor, or must be executed by a specially designed separate hardware device, which makes the chip design more complicated, larger, and more expensive.