The present invention relates computers, and, in particular, for storing data in a block-based memory arrangement.
Current memory arrangement for image/video is linear across the image plane as shown in FIG. 1. A cache line typically holds some parts of several basic image blocks. For example, a cache line could one line for four separate basic blocks. On the other hand, a basic image block is typically contained in multiple cache lines. That is, a single basic block could be contained in four separate cache lines.
Cache/memory lines are typically divided into boundaries. In the example illustrated in FIG. 2, the cache/memory lines are sub-divided into 8-byte boundaries. If a memory address corresponds to a boundary line, the memory address is considered an xe2x80x9caligned accessxe2x80x9d. If a memory address does not corresponds to a boundary line it is considered an unaligned access, and can typically take 2.5 times longer to access.
The number of unaligned access is typically high in computer applications. As a result, the memory latency associated with the unaligned accesses creates a bottleneck effect that limits the performance of image/video processing and other applications.
For example, accessing an 8xc3x978 pixel block of image data is essentially equivalent to accessing several cache lines (e.g., at least 8 cache lines in the architecture shown in FIG. 1). Therefore, processing an 8xc3x978 block of image data would require 8 memory accesses as shown in FIG. 3a. However, it is not guaranteed that each access to a block will be 8-byte aligned, as is shown in FIG. 3b. As a result, to access an 8 by 8 block could take up to 16 memory accesses.
The present invention provides a method and apparatus for storing image data in a memory device. The method includes receiving an image consisting of a plurality of pixels. The method further includes generating addresses in the memory device for pixels from the image, wherein the memory addresses are generated within memory blocks consisting of multiple rows, wherein each row of the memory block is shorter in length than a full line of the memory device.