Many portable products, such as cell phones, laptop computers, personal data assistants (PDAs) and the like, utilize a processing system that executes programs, such as communication and multimedia programs. A processing system for such products may include multiple processors, multi-thread processors, complex memory systems including multi-levels of caches for storing instructions and data, controllers, peripheral devices such as communication interfaces, and fixed function logic blocks configured, for example, on a single chip.
Data to be received in and to be operated on by a processor are values of information that are quantized in binary form according to a level of measurement precision required to represent the information. Standard classes of data or data types are grouped according to a number of binary bits, such as integer values represented as 8-bit bytes, 16-bit half-words, 32-bit words, 64-bit double-words, 128-bit quad-words and floating point values represented as 32-bit single precision values, 64-bit double precision values and the like. Many processors support multiple data types and require an efficient way of accessing data for processing. Generally, each datum is assigned an address representing a location in a memory system of a processor. In many memory systems, the memory is organized according to a standard precision bit width, such as 32-bits allowing four bytes, two half-words or one word to be stored in each 32-bit location. In other processing systems, such as those associated with a single instruction multiple data (SIMD) processor of packed data sets and a vector processor, such as a SIMD vector processor, the memory system may be organized around larger bit widths based on groups of standard precision values, such as widths of 256-bits, 512-bits, or the like. For example, in a memory system having 512-bit width memory locations, each location may store sixty four bytes or thirty two half-words or sixteen words or eight double-words or four quad-words. Such large width memory locations may be located, in a SIMD vector processor's register file. In such systems, data alignment to addressable memory locations is important for efficient access using standard processor memory access instructions. However, such data alignment is not necessarily easy to achieve. For example, data structures may not be a multiple of a memory location's width. Also, there is no guarantee that a data structure of any size may start or end properly aligned. Thus, efficiently accessing unaligned data is a difficult problem.