Processors in present-day computer systems were designed for compute-intensive operations, but present day applications demand a lot of memory accesses. There is a need to improve memory transaction operations (e.g. read and write), which are a bottleneck to a system's performance. For example, in conventional pipelined processor architectures, the number of clock cycles per instruction (CPI) is close to 1 for most instructions with the exception of Load/Store (memory transactions).
A cache 120, as illustrated in FIG. 1a, is a small and fast memory that holds recently used pieces of data from the main memory 140 of a system 100, for example a computer system, a portable electronic device, a handheld device and the like. A processor 110 initiates a memory read/write transaction from/to the main memory 140 by placing a request, containing an address, on a memory bus. The request is processed by the cache 120 and if the data is present in the cache 120 (a cache hit) the request is quickly processed, and alternatively if the data is not present in the cache 120 (a cache miss), a unit of data called a cache block, containing the requested data, is fetched from the main memory 140 and placed in the cache 120. The memory controller 135 facilitates the data movement to and from the main memory 140. A TAG is a portion of the request address that uniquely identifies a cache block containing the requested data, and is used by the cache 120 to quickly determine whether a request generates a hit or a miss.
There is only limited area available on the processor chip for cache memory. As illustrated in FIG. 1b, the cache memory 120 is divided into lines, e.g. 130, each line being configured to store a block of data of equal size. The block is the unit of information transferred between the cache 120 and the main memory 140. However in most cases the processor 110 only reads/writes data of a size equivalent to the size of its registers 115 from/to the cache 120. The block size is therefore a multiple of the processor register size (one word).
Cache design is based on the principle of locality:                Temporal locality (locality in time)—If an item is referenced it will tend to be referenced again soon.        Spatial locality (locality in space)—If an item is referenced, items whose addresses are close by will tend to be referenced soon.        
The idea of having a cache block (i.e. cache ‘length’ greater than one data word) is derived from the principle of spatial locality, and the idea of having multiple blocks (i.e. cache ‘depth’ greater than one line) is derived from the principle of temporal locality, as illustrated in FIG. 1b. 
Spatial locality would require the cache block size to be as large as possible, and temporal locality would require the cache to have as many blocks as possible. There is a need to strike a balance between these two requirements for a given area of cache memory. Hence for a given area, a cache memory aspect ratio (length vs depth) is decided as a compromise between the principles of locality that is appropriate for the particular process(es) being run on the processor.
The common way to avoid cache misses in sequential read/write access is to increase the cache block size. But if the cache block size is (say) doubled to avoid frequent cache misses occurring during a sequential access, the number of blocks able to be stored is halved (thus impacting the principle of temporal locality). Alternatively, if the cache block size is doubled while keeping the number of blocks able to be stored constant, then the cache area will have to be doubled. This is undesirable.