A recent trend to pack more functions into a small form factor is a so-called system-in-package (SiP) technology which is to enclose a number of integrated circuit (IC) dies in a single package or module. The dies may be stacked vertically or placed horizontally alongside one another inside the package. They are internally connected by fine wires that are buried in the package, or joined by solder bumps through a flip-chip technology. FIGS. 1A and 1B illustrate such SiP devices. Referring to FIG. 1A, there are two core dies 110 and 120 mounted on top of a package substrate 100. The core dies contain processing units as well as memories serving as Level 1 caches to the processing units. On top of the core dies 110 and 120, an additional SiP memory 130 is also mounted to serve as Level 2 cache to the dual processing units or cores.
Referring to FIG. 1B, beside the dual cores 110 and 120, and the SiP memory 130, there is another memory die 140 mounted on the same layer as the dual cores 110 and 120. In this case, as the memory die 140 is located closer to the dual cores 110 and 120 than the SiP memory 130, memory die 140 may serve as a Level 2 cache, and then the SiP memory 130 may serve as a Level 3 cache.
These SiPs can greatly extend cache capacity in a computer system. But with added levels of caches, memory management becomes more complicated. FIG. 2 shows how a microprocessor executes data. In this computer system, a memory hierarchy 200 includes a hard drive 210, a main memory 220, Level 2 caches 230, Level 1 caches 242 and a register file 244, which is closest to an execution unit 246 (Arithmetic-Logic Unit or ALU, for example). The main memory 220 is typically comprised of dynamic random access memory (DRAM). The caches 230 and 242 are smaller, faster memories and usually made of static random access memory (SRAM) that store copies of the data from the most frequently used main memory locations. Moreover, the Level 1 cache 242, the register file 244 and the execution unit 246 reside usually in the same central processing unit (CPU) die 240. Data are fetched through the memory hierarchy 200 from the hard drive 210, the main memory 220, the caches 230 and 242, and the register file 244 to the execution unit 246 for processing. Data storage tends to be a subset of another storage device farther away from execution unit 246. The farther the storage devices are away from the execution unit 246, the larger the capacities, the slower the speed, and the narrower the bandwidth. This pyramid scheme works to compromise speed versus capacity based on temporal and spatial localities, namely data blocks used now will be used later; data blocks used here will be used in close proximity later. This memory hierarchy 200 is applied to instructions as well as data for caches, main memory and disk storage. For the lowest level cache, instruction and data caches tend to be separate entities (separate caches). Otherwise they are stored in the same storage (unified cache) for other levels of caches. The memory hierarchy 200 is a commonly used technique in the computer art to achieve high performance while reducing costs.
Cache memories work like temporary storages. When the processing unit 246 wishes to read or write to a location in the main memory 220, it first checks whether that memory location is in the Level 1 cache 242. This is accomplished by comparing the address of the memory location to all tags stored in the Level 1 cache 242 that might contain that address. If the processing unit 246 finds that the memory location is in the cache, then the data corresponding to the address will be accessed directly from the Level 1 cache 242, and a cache hit will have occurred. Otherwise the data is not in the Level 1 cache 242, and it is a cache miss.
SiP extends computer cache capacity; however, with the aforementioned hierarchical memory management approach, the Level 2 cache 230 cannot be simultaneously checked with the Level 1 cache 242. The execution unit 246 can only check the Level 1 cache 242 directly. For Data to be accessed, they have to be transferred to the lower memories in the hierarchy. This lowers memory management efficiency.
As such, what is desired is a memory management system and method that can simultaneously check multiple memories either in the same or different levels, and hence directly accesses data stored in those memories.