High performance computing (HPC) through supercomputers, computer clusters, or real-time computing has many advantages. However, it also presents serious challenges to HPC programmers. One such challenge involves tuning critical code sections for optimal performance. The tuning process is very complicated because it is highly dependent upon the system hardware that the code will run on. In addition, challenges are often aggravated by non-portability of performance optimizations across different architectures.