Microelectronic components, such as semiconductors, generate a substantial amount of heat during operation. The heat is removed from the die of a component via a thermal assembly to assure that the die does not reach a temperature, often referred to as a junction temperature, which prohibits proper operation of the die or physically damages the die. For many components, the thermal assembly may comprise a thermally-conductive, dielectric interface material, often referred to as a thermal interface material (TIM), and a heat sink. In components that are more susceptible to damage such as processors, the thermal assembly may include, e.g., a first TIM, a lid, a second TIM, and a heat sink.
Today's demand for faster operations is forcing designers and manufacturers to increase the frequency or speed of operations of the components. As the frequency or speed of operations in a component increases, the amount of heat generated by the component increases. One major problem faced by manufactures is that the various layers of the thermal assembly trap heat against the die, raising the operating temperatures of the die unless the thermal assembly can be built to dissipate heat more rapidly.
As the thermal performance requirements for the thermal assemblies rise, manufacturers are forced to maintain higher manufacturing standards for the thermal assemblies to assure that heat is dissipated rapidly. The higher standards for thermal assemblies cause more thermal assemblies to fail and/or be reworked before components may be shipped to customers. Failing and reworking thermal assemblies at a manufactures' facility increases the costs of manufacturing the components.
One way to combat the lower yields and increased costs is to improve the accuracy of tests performed on the thermal assemblies to determine whether the thermal performance or conductivity of the assemblies fall within an acceptable heat dissipation range. Historically, manufacturers have gauged the performance of thermal assemblies based upon mechanical measurements of the thermal gap(s) between the die of a component and each subcomponent of the thermal assembly such as the heat sink. Thermal gaps may vary as a result of chip tilt, chip height, chip to chip co-planarity, heat sink to chip surface stack up tolerances, and thermal interface material (TIM) thermal property variances among other parameters. Often processor modules mechanical measurements are made in the bond, assembly, and test (BAT) line to either directly or indirectly assure that mechanical gap criteria are met for the subcomponent interfaces.
IBM and other manufacturers have discovered over the last couple of processor generations that, depending upon the module design and its second level attachment to a card and heat sink, the accuracy of the metrics used to determine acceptable thermal gap sizes is very poor. Even with statistical approaches some of the indirect methods for measuring gap thickness' are inadequate. The impact is that more margin is required in the processor speed sorts (hence fewer yielded parts to meet frequency requirements) to assure the system will operate flawlessly. This impact is very expensive.
To improve the accuracy of measurement of the thermal performance for thermal assemblies, several manufacturers have changed the primary quality metric for determining the integrity of thermal assemblies from a mechanical one to a thermal one. In particular, manufacturers determine the resulting thermal performance of the thermal assembly rather than mechanical measurements of the thermal gap(s), and pass or fail components based upon a comparison of the thermal performance against fixed, thermal performance criteria.
The fixed, thermal performance criteria may take into account the maximum junction temperature allowable for flawless operation calculated for the average die under worst-case conditions. More specifically, a customer or die manufacturer may calculate a maximum junction temperature for the average die based upon predicted performance range of the thermal assemblies under the worst-case conditions and fail dies that run hotter than the calculated junction temperature. The processor manufacturer or a customer may then provide the card assembly manufacturer with the calculated junction temperature for the average die to test the thermal assemblies under certain power conditions. After attaching the thermal assembly to the die, the card assembly manufacturer may determine the rate of heat dissipation of thermal assemblies based upon the calculated junction temperature and a measured junction temperature. The card assembly manufacturer may then determine whether the thermal assembly of a component dissipates heat rapidly enough to maintain the junction temperature for the average die within operable limits under the worst-case conditions. If the thermal conductivity of the thermal assembly dissipates heat rapidly enough, the component passes. If the thermal conductivity of the thermal assembly fails to dissipate heat rapidly enough, the component fails.
However, such current approaches fail many components that may operate within adequate operational parameters for flawless operation. For instance, dies may run hotter or cooler due to variances in conditions of production along an assembly line. Dies that tend to run cooler may fail the test criteria after assembly because the thermal conductivity of the thermal assembly does not dissipate heat rapidly enough for the average die but does dissipate heat rapidly enough for those cooler running dies. Further, higher power dies may fail the test criteria after assembly because the measured junction temperature is higher than that predicted for the average die but is still low enough for this higher power part to operate flawlessly.
Current approaches also fail many components that may operate within adequate operational parameters for flawless operation in a customer installation. For example, a worst-case condition may consider installation of a component in, e.g., a server, that will be located at a high altitude location, in a small, enclosed space that has no air conditioning system or other climate controls. However, the designated or typical customers for the components may install the components in a low altitude, large, air-conditioned space. Thus, utilizing the calculated junction temperature for the average die under the worst-case conditions may fail a significant number of components that would operate reliably in the designated or typical customer's installation.
Further, current approaches do not satisfactorily facilitate optimization of yield in accordance with project objectives. In particular, a given project objective may attempt to maximize yield, maximize yield of high performance components, maximize yield of components based upon reliability criteria, and/or fine-tune a balance between yield and high performance or reliability. However, current approaches place such a high weight on the performance for worst-case conditions that they provide very little flexibility, if any, to accommodate such project objectives.