With a wide variety of workloads emerging for cloud servers such as batch processing, real time analytics, high performance databases, etc., it becomes necessary to accelerate commonly used kernels using custom hardware/silicon intellectual property (IP) blocks inside the computing system. To preserve modularity and reduce time to market (TTM), these custom accelerator IP blocks are typically integrated with the main central processing unit (CPU) die in the form of a multi-chip package as companion dies (CDs). These CDs execute firmware that is required to be loaded at system boot time in order to function correctly. Firmware images for such companion dies are typically stored in a shared platform resource—such as a flash storage device attached to the chipset and loaded at boot time. The cloud server business demands optimal provisioning techniques for computer, storage and networking services with an emphasis on higher performance and low total cost of ownership (TCO). Increased boot times in a cloud server computing platform due to large firmware loading times are detrimental and sometimes prohibitively expensive for real time provisioning.
Consider a general case of an 8-socket server with two instances of a CD attached to each CPU. There is a total of 16 CD instances in this configuration. Each of these 16 CD instances require firmware to be loaded at boot time. The 16 total CD instances need access to its firmware in a concurrent manner during the platform boot flow. This leads to contention at the interface to the flash storage device among various agents (e.g., CDs) concurrently trying to access the same platform resource (i.e., flash memory storage behind a chipset interface), thereby causing a bottleneck, resulting in slow boot times that do not meet the boot time requirements for the computing platform. This problem gets worse as the number of CPUs and CDs in the system is increased for higher performance computing platforms.
Two approaches may be used to attempt to overcome this problem. In a first approach, a separate interface to the shared flash storage device is added to each CD socket. However, this results in an increased silicon area for the CDs, and additional pinouts to the motherboard. This approach is expensive, increases the bill of materials (BOM) cost of the system, and does not provide the desired performance. In a second approach, a small read only memory (ROM) is added to the system for each CD and made accessible to the CD to store the CD's firmware image. Thus, each instance of a CD has its own ROM. This is impractical in that it results in multiple copies of the firmware stored in the system (one each per ROM). This approach also increases the BOM cost, silicon area, and power requirements. Neither of these approaches are workable solutions.