Main memory is one of the most vulnerable hardware components in computing systems. In existing terascale systems, hardware errors account for up to 60% of the total failures. Of this, 40% of the hardware failures are memory related. Memory related failures are likely to increase in future systems not only because of the explosive increase in memory capacity for such future systems, but also because of the adoption of new technologies such as 3D stacking, larger device density, and lower voltage.
Memory reliability is even more complex for systems that use tagged memory. Tagged memory adds an extension bit or bits to each memory word to describe its state. Tagged memory is especially effective for graph-oriented problems that involve intensive communication and synchronization between data items as well as irregular thread and memory behavior. Such graph-oriented problems may include, for example, applications to model, analyze, and/or study interactions between proteins in the human body, linked information on the Internet, and/or intelligence data about the communications and movements of potential adversaries.