Many modern microprocessors utilize virtually-addressed cache. In order to use virtually-addressed cache, translation of virtual addresses to corresponding physical addresses is typically performed. This translation essentially adds another stage to the instruction pipeline, which in turn decreases the performance of the microprocessor.
In addition, in response to increasing demand for longer battery life for mobile devices, power profiles of microprocessors have become increasingly critical. One of the largest components of power utilization within a storage unit is searching a physically-tagged tag array for each operation as operations move down the instruction pipeline. This search of the physically-tatted tag array is necessary to be able to retire loads directly from the instruction pipeline such that the average load-use penalty is as low as possible. The problem is that the power associated with accessing the physically-tagged tag array, which is typically a large structure, is quite significant.
As such, there is a need for a data cache and a method of operation thereof that enables the instruction pipeline to achieve the same performance while limiting the number of accesses to the physically-tagged tag array.