Embodiments of the present invention relate to improving communications in a processor-based system, and more particularly to a system including multiple sequencers.
Computer systems include various components to process and communicate data. Typical systems include one or multiple processors, each of which may include multiple cores, along with associated memories, input/output (I/O) devices and other such components. To improve computation efficiencies, computation accelerators, special-purpose I/O devices and other such specialized units may be provided via one or more specialized components, referred to generically herein as helper units. However, inefficiencies may occur in using such helper units, as in a typical computing environment that implements a general-purpose processor and an industry-standard operating system (OS) environment, a software stack can impede efficient usage. That is, in a typical OS environment, system software is isolated from application software via different privilege levels, and operations in each of these different privilege levels are subject to OS context save and restore operations, among other limitations.
Thus whenever a helper unit such as a special-purpose accelerator is incorporated, it is usually exposed as a device and a user-level application can only indirectly use the helper unit via the OS's device driver software stack, which has direct access to the raw physical helper unit resource. Consequently, the helper unit resource via the associated device driver is a system-wide resource and not an application-level resource such as general-purpose registers, virtual memory or sequencers, which are virtualized across context switches.
The problem with having to use a device driver to access a helper unit is the inefficiency (in terms of path length from application to driver to the helper unit), and inflexibility due to OS-imposed restrictions related to “standardized” driver interfaces.
Classic examples of a computation accelerator are coprocessors such as math coprocessors (like so-called ×87 floating point coprocessors for early Intel® Architecture (IA)-32 processors). Typically, such coprocessors are coupled to a main processor (e.g., a central processing unit (CPU)) via a coprocessor interface, which is of a common instruction set architecture (ISA) as the main processor. Furthermore, the interaction between these resources is via a traditional escape/wait signal protocol, in which the main processor is placed in a wait state while the coprocessor performs its requested function, at the conclusion of which control returns back to the main processor. However, during coprocessor operations, the main processor cannot perform useful work while waiting for a result from the coprocessor. That is, the coprocessor is integrated such that it architecturally operates sequentially with the program order of the control flow of a main processor. This leads to inefficiencies in processor utilization, especially, when the coprocessors are capable of operations that are concurrent with computation on the main processor. A need thus exists for an improved manner of communicating with and using such helper units.