Computer processing elements (or “processors,” such as central processing units (CPUs), graphics processing units (GPUs), host processors, co-processors, etc.) typically include a processor register file, referred to herein as general registers. A processing element generally includes a set of general registers that are each capable of storing some data value (e.g., each general register may be capable of storing a 32-bit data element). General registers are where data values are held while theprocessing element is performing computation using those data values. Thus, for instance, operands used in performing computations, as well as the results of such computations are typically stored in general registers of a processing element while the processing element is performing such computations. An instruction set executing on the processing element typically expressly manages the processing element's general register set (e.g., expressly manages the movement of data into and out of the general registers).
Certain processing elements may further include additional local data storage, such as a cache. As is well-known in the art, a processing element's cache is a data storage mechanism used by the processing element to reduce the average time to access main memory. The cache is typically a smaller, faster memory (than main memory) which stores copies of the data from the most frequently used main memory locations. As long as most memory accesses are cached memory locations, the average latency of memory accesses will be closer to the cache latency than to the latency of main memory. Typically, when a processing element that has a cache needs to read from or write to a location in main memory, it first checks whether a copy of that data is in its cache. If so, the processing element immediately reads from or writes to the cache, which is typically much faster than reading from or writing to main memory.
The cache is often associatively assigned to the main memory, and it is typically managed automatically by the hardware implemented in the system. So, an application or software process typically does not know expressly when data is coming into the cache or being ejected from the cache, but instead the cache operation is generally implemented by the hardware to be transparent in the path to main memory. Thus, unlike its express management of the general register, an instruction set executing on the processing element typically does not expressly manage the processing element's cache.
One type of data that is desirable to process in many applications is three-dimensional (“3D”) data structures. In general, 3D data structures contain data representing a 3D object. For example, in many applications, computer modeling of a 3D object is performed to enable analysis and/or computations concerning the 3D object that may be dangerous, costly, difficult, and/or impossible to perform on the physical 3D object itself. For instance, 3D computer modeling is used in many medical imaging applications, seismic exploration applications, flight simulator applications, and many other types of applications. As one example, two-dimensional (“2D”) image data acquired for at least a portion of a patient's body, such as through X-ray, sonogram, computed tomography (CT), etc., may be processed in known ways to generate 3D data structures, which may then be used to perform 3D computer modeling of the imaged portion of the patient's body. For instance, the 3D data structures may be used in certain applications to display a 3D image of the portion of the patient's body. As another example, seismic data may be acquired for a portion of the earth, and the data may be processed in known ways to generate 3D data structures representing the earth as a 3D object. The 3D data structures may then be used by applications to aid in the search for and evaluation of subterranean hydrocarbon and/or other mineral deposits.
Irrespective of how the 3D data structures are generated (e.g., whether through computed tomography, processing of seismic signals, etc.), what physical object(s) the 3D data represents (e.g., whether representing portion(s) of a human body, the earth, or other physical object), or what an application desires to use the data for (e.g., whether aiding in the treatment of a patient, searching for subterranean mineral deposits, flight training, entertainment, etc.), processing of such 3D data structures by processing elements is often complex. Typically, a desire for a processing element to process 3D data structures dictates certain characteristics/requirements in the design of the processing element in order for the processing element to be suitable for efficiently processing the 3D data structures.
One consideration for a processing element that is to process 3D data structures concerns the amount of local storage on the processing element, and particularly the size of the processing element's general register set. Further, as discussed below, not only is the full storage size of the general register set an important consideration, but often the size of one or more dimensions of the general register set is important for maintaining a desired data arrangement. For instance, as is well known in the art, for many types of compute operations on 3D data, the data is effectively organized into neighborhoods and use of some number of “nearest neighbors” are often relied upon for many computational operations. Thus, for instance, when performing certain processing operations on a 3D data structure, a neighborhood type of processing is often employed which may require using k nearest neighbor points, which are those data points closest to (or neighboring) the data point being computed.
Such neighborhood type of processing implicitly relies upon the relative positional arrangement of data. Thus, typically 3D data comprises data points that are arranged in main memory relative to each other, and such relative arrangement is important to maintain for many neighborhood-based computations. Further, it is generally desirable to maintain the relative positional arrangement of the data points within a processing element's general register set for performance of nearest-neighbor type of computational operations.
For example, suppose that a nearest-neighbor computational operation desires to use the 7 nearest neighbors on either side of a data point that is being computed, and further suppose that a general register set provides an 8×8 grid of general registers for storing data points. In this example, the total 64 data points that can be stored in the 8×8 grid would be sufficient for storing the total 15 data points that would be needed in the computation (i.e., the 7 points to the left, the 7 points to the right, and the center point under evaluation), but the data points would have to be “wrapped” around the 8×8 grid in some way and thus the relative positional arrangement of the data points in the neighborhood would not be maintained. Accordingly, at least one dimension of the general register set in this example is of insufficient size for storing the neighborhood of data in a manner that maintains the relative positional arrangement of data points within the neighborhood. That is, because the 8×8 grid does not have enough “X” dimension to allow for the desired 15 points, the relative positional arrangement of the data would not be maintained within the general register set in this example.
As such, traditional processing element designs have attempted to implement a general register set with dimensions of sufficient size to fully store a neighborhood of data that is likely to be required for performing certain operations when processing 3D data, wherein the dimensions of the general register set are of sufficient size to maintain the relative positional arrangement of the data within the neighborhood.