Until recently, gains in computer performance have tracked with Moore's law, which states that transistor integration densities will double every 18 months. Although the ability to shrink the size of the transistor has lead to higher switching speeds and lower operating voltages, the ultra-large scale integration densities achievable through modern manufacturing methods has led to a leveling off in corresponding improvements in computer performance due to the large currents needed to power the ultra-large numbers of transistors. Silicon chips manufactured to the 22 nm manufacturing mode will draw 700 W-inch2 of semiconductor die. This large current draw needed to refresh and move data between die and across the surface of a single die has pushed the limitation of conventional power management circuits, which are restricted to significantly lower switching speeds. The large thermal loads generated by conventional power management systems further reduce system efficiency by requiring power management to be located significant distances from the processor and memory die, thereby adding loss through the power distribution network. Therefore, methods that reduce system losses by providing means to fabricate a hybrid computing module comprising power management systems that generate sufficiently low thermal loads to be situated in close proximity to the memory and microprocessor die are desirable.
As is typically the case with transistors, higher power switching speeds are achieved in conventional power management by shrinking the surface area of the transistor gate electrode in power FETs. In conventional transistor architectures switching speeds are limited by gate capacitance, according to the following:f=ION/(COX×W×L×Vdd)  (1)where,                f≡limiting switch frequency (1a)        ION≡source current (1b)        COX≡gate capacitance (1c)        W≡gate width (1d)        L≡gate length (1e)        Vdd≡drain voltage (1f)        
Switching speed/frequency is increased by minimizing gate capacitance gate electrode surface area (W×L). However, minimizing gate electrode surface areas to achieve high switching speeds imposes self-limiting constraints in high power systems (>100 Watts) when managing large low voltage currents, as the large switched current is forced through small semiconductor volumes. The resultant high current densities generate higher On-resistance, which becomes a principal source for undesirable high thermal loads. Modern computing platforms require very large supply currents to operate due to the ultra-large number of transistors assembled into the processor cores. Higher speed processor cores require power management systems to function at higher speeds. Achieving higher speeds in the power management system's power VET by minimizing gate electrode surface areas creates very high current densities, which in turn generate high thermal loads. The high thermal loads require complex thermal management devices to be designed into the assembled system and usually require the power management and processor systems to be physically separated from one another for optimal thermal management. Therefore, methods and means to produce a hybrid computing module that embeds power management devices in close proximity to the processor cores to reduce loss and contain power FETs that switch large currents comprising several 10's to 100's of amperes at high speeds without generating large thermal loads are desirable.
The inability of modern power management to switch large currents at speeds that keep pace with ultra-large scale integration (“ULSI”) transistor switching speeds has led to on-chip and off-chip data bottlenecks as there is insufficient power to transfer data from random-access memory stacks into the processor cores. These bottlenecks leave the individual cores in multi-core microprocessor systems under-utilized as it waits for the data to be delivered. Low core utilization rates (<25%) in multi-core microprocessors (quad core and greater) with minimal cache memory have forced manufacturers to add large cache memory banks to the processor die. The popular solution to this problem has been to allocate 30% or more of the modern microprocessor chip to cache memory circuits. In essence, this approach only masks the “data bottleneck” problem caused by having insufficient power to switch data stored nearby in physical random-access memory banks. This requirement weakens the economic impact of Moore's Law by reducing the processor die yield per wafer as the microprocessor die must allocate a substantial surface area to transistor banks that serve non-processor functions compared to the surface area reserved exclusively for logic functionality. The large loss of available processor real estate to cache memory in multi-core x86 processor chips is illustrated in FIGS. 1A, 1B, 1C. FIG. 1A presents a scaled representation of a Nehalem quad-core microprocessor chip 1 fabricated using the 45 nm technology node. The chip's surface area is allocated for 4 microprocessor cores 2A, 2B, 2C, 2D, an integrated 3 Ch DDR3 memory controller 3, and shared L3 cache memory 4. L3 cache memory 4 occupies roughly 40% of the surface area not allocated to system interconnect circuits 5A, 5B, or approximately 30% of the total die surface area. Similarly, the Westmere dual-core microprocessor chip 6 (FIG. 1B) fabricated using the 32 nm technology node allocates approximately 35% of its total available surface area to L3 cache memory 7 to serve its 2 microprocessor cores 8A, 8B. The Westmere-EP 6 core microprocessor chip 9 (FIG. 1C) fabricated using the 32 nm technology node allocates approximately 35% of its total available surface area to L3 cache memory 10 to serve its 6 microprocessor cores 11A, 11B, 11C, 11D, 11E, 11F. Higher semiconductor chip yields (more die per wafer) and lower system costs can be achieved in computing modules that increase the ratio of transistor real estate dedicated to logic functionality over cache memory. Large on-chip cache memory can be eliminated by integrating power management systems into the computing module that switch large currents at speeds that match microprocessor core duty cycles. Therefore, methods and means that boost microprocessor core utilization rates to levels in excess of 50%, preferably in excess of 75%, while maintaining the real estate allocated to cache memory to less than 20%, preferably to less than 10%, of the total die surface area are desirable to minimize module size and cost.
Another major drawback to Moore's Law is the extremely high manufacturing costs at the smaller technology nodes. These extreme costs have potential to greatly restrict the scope of low-cost computing applications in all but the largest applications. FIG. 2A shows the average costs of masks used to photolithographically pattern an individual material layer embedded within an integrated circuit assembly as a function of the manufacturing technology nodes. A key technology objective has been to integrate entire electronic systems on a chip. However, the significantly higher mask costs cause design and lithography costs to skyrocket at the more advanced technology nodes (45 nm & 32 nm). FIG. 2B shows the variation of design and lithography costs per function (memory, processor, controller, etc.) among the different technology nodes (65 nm, 45 nm, 32 nm) normalized to the fabrication cost at the 90 nm technology node for system-on-chip (“SoC”) devices serving low volume 20, medium volume 22, and high volume (general purpose) 24 technology applications. The increasing design and lithography costs cause SoC applications fabricated to the more advanced technology nodes (45 nm and 32 nm) to be more expensive in low-volume 20 and medium-volume 22 markets than they would be when fabricated to the less advanced technology nodes (90 nm and 65 nm). These cost constraints cause general purpose SoC applications 24 to be the only instance in which cost, size, and power benefits can be simultaneously achieved with the more advanced technology nodes. Markets are not monolithic, which causes low and medium applications to dominate overall market volumes in the aggregate. Therefore methods and means that allow the cost savings, size, and power savings achieved with general purpose system semiconductor systems made through the more advanced technology nodes (45 nm, 32 nm, and beyond) to be integrated into hybridized SoC designs serving the wider utility low-volume and medium-volume market applications are desirable.