Performance optimization and acceleration of software applications are highly desired and heavily pursued activities in many areas of computing. It is particularly desirable in business and scientific applications that involve highly complex and computationally intensive data processing needs. Business organizations gain a competitive advantage with such optimization and acceleration schemes by reducing costs, improving turn around times, and elevating the overall profitability and efficiency of the organization.
To increase the throughput of systems that handle complex and computationally intensive data processing problems, such systems have used homogeneous, conventional multi-processors and/or cluster platforms. Consequently, the vast majority of software applications that have been developed for the scientific, financial, and other communities have been developed for these conventional processor based software machines. Software controlled conventional processor based machines provide great flexibility in that they can be adapted for many different purposes through the use of suitable software. Additionally, methodologies for developing software applications for these machines is well established and well understood by a large majority of professionals in the art of software application development.
However, scaling the number of conventional processors in homogeneous systems or platforms to reach high performance levels adds significant cost and dramatically increases the management complexity of the system, and in particular, its control and communication management. This typically leads to having specialized maintenance and operation requirements which are handled by a dedicated team of information technology professionals. For it is well known that as the number of conventional processors increases, the incremental benefit of adding additional processors decreases and can approach zero as the system management overhead begins to dominate.
While in some environments the flexibility of conventional processors is an advantage, the manner in which conventional processors are designed and constructed causes problems in other environments. Conventional processors are designed around a very limited number of fairly generic computational resources such as instruction and data caches, registers, arithmetic logic units, and floating point units. Conventional processors also typically have a fixed word size—e.g. 32 or 64 bits. These features of conventional processors cannot be changed or altered in a real time processing environment to fit the precise requirements of a given application. Consequently, a set of instructions and tools are provided to map application requirements onto the fixed number of available resources inside a conventional processor. This mapping limits application performance by various degrees depending on the quality of matching between available conventional processor resources and the ideal number and type of resources required by the application for optimal or peak performance.
To overcome these limitations of conventional processors, some systems have used coprocessors having a large number of highly specialized resources such as fast floating point units, flexible interconnects and pipelines, hardware multipliers and accumulators, and optimized math functions. Moreover, in many cases, such coprocessors provide the ability to adapt or dynamically change hardware circuits, interconnects, and/or bit lengths to meet the exact requirements of a particular application. Such techniques are common in the case of programmable logic devices such as Field Programmable Gate Arrays (FPGA).
A rather distinct difference between conventional processor systems with their generic computational resources and coprocessor systems having a large number of highly specialized resources is the speed at which they perform a function. Typically, for a given set of related functions, a software controlled conventional processor is usually significantly slower than a specialized processor or co-processor that is specifically configured to the desired functionality and that has dedicated parameters and resources for optimal and high speed operation of given functions. These special resources available within coprocessors, when properly utilized by a given application, typically result in a significant performance improvement over traditional means of using only conventional processors and associated development methods.
However, specialized coprocessors in and of themselves are not a panacea to the ills of general processors vis-à-vis processing throughput. Whereas specialized processors increase the speed of computing particular functions, they lack the flexibility of a conventional processor and introduce a very different set of programming methodologies, tools, and instructions. Moreover, when compared to conventional processor programming methods, the methods for specialized processors are cumbersome, error-prone, complex, and lacking in high level abstractions and libraries that are needed for ease of development and use, and there is a relative paucity of professionals who are skilled in the area of programming with such specialized processors.
To reap the benefits of both conventional processors and specialized coprocessors, attempts have been made to combine conventional processors and specialized coprocessors in a single system. However, the challenges associated with integrating specialized processors and co-processors with conventional processors, especially as such integration relates to software development and acceleration of high performance computing applications, have severely limited the use of specialized coprocessors in mainstream computing systems and platforms. Additionally, such systems have generally relied solely on the speed of the coprocessor (or adding multiple processors or coprocessors) to increase throughput, and therefore such systems lack any overall operational efficiency.