1. Field of the Invention:
The invention broadly relates to maintaining compatibility with legacy software and hardware in modern computer systems, and more specifically to mitigating the latencies that commonly occur in the input/output (I/O) read and write paths of "virtualized" devices.
2. Description of Related Art:
Emulation, the concept of manipulating a device with software to provide the functionality of another similarly functioning device, is ubiquitous. For example, a typical application for emulation is in the art of so-called "dumb" terminals. In this regard, compatibility is provided by interpreting or mapping keystrokes and display formats intended for one particular type of terminal to another type of terminal. Emulation therefore, permits software commands and queries to transcend hardware platforms.
By way of contrast, the Assignee of the present invention has pioneered a new concept known as a "virtual subsystem architecture" which is starkly different from emulation in that it provides the functionality or compatibility with popular legacy hardware subsystems such as modems, sound cards, and display cards--but with only employing the general purpose resources of a modern processor. The general scheme for the virtual subsystem architecture is described in U.S. patent application Ser. No. 08/540,351, which was herein incorporated by reference. A particular application to audio generation and capture is described in U.S. patent application Ser. No. 08/458,326, which is also herein incorporated by reference. An enhanced system management mode suitable for use with the present invention is described in commonly assigned U.S. patent application Ser. No. 08/541,359 (Docket No: CX00258), which is also herein incorporated by reference.
Through observation and experimentation, the inventor of the present invention has found that real-time applications in the virtual subsystem architecture appear to have a lackluster performance due to the combination of: i) limitations of legacy hardware requiring a delay to be inserted between two successive write operations; ii) idiosyncratic techniques of application and driver software for accomplishing these delays; and, iii) the overhead associated with the entrance and exit with respect to simple operations during virtualization.
One illustrative, but not limiting example of lackluster performance in the virtual subsystem architecture occurs when a sound card such as a SoundBlaster.TM. from Creative Labs Corporation of Milpitas, Calif. is virtualized. The sound card typically performs so-called "FM-synthesis" sound generation by employing a dedicated FM-synthesis integrated circuit known as an OPL-2 or OPL-3 (a later generation) chip from the Yamaha Corporation of Japan. The OPL-3 (which is essentially a superset of the OPL-2) chip is programmed through writes to its internal registers to define characteristics of tones and timbres for the desired output sound. The sound card maps the internal registers of the OPL-3 chip to specific I/O addresses recognized by the personal computer (PC). Hence, the OPL-3 chip is programmed through the I/O space of the PC by application/driver software which write an "index and data" pair to program a single internal register within the OPL-3. The index identifies which internal register within the OPL-3 chip is being addressed while the data sets the new value for that internal register.
As a consequence of the OPL chip design and speed of modern processing systems, a delay is required between the write operation of the index and the write operation of the data. The required delay however, varies among vintages of OPL chips and consequently, existing application/driver software inserts a minimum delay of six microseconds as a "safe harbor", more typically fifteen microseconds, and on upwards of twenty-five microseconds. The delay is ordinarily supplied by making multiple so-called "faux-reads" (i.e. read instructions whose result is irrelevant) to a "status" register on the OPL-3 chip with each faux-read presumed to have a duration of an ISA I/O cycle of approximately one microsecond. Accordingly by way of example and not limitation, executing fifteen faux-reads between index and data writes would roughly induce fifteen microseconds of delay based on the expected I/O cycle latency. It should be noted at this point that some reads by the application/ driver software are not faux-reads and actually rely upon the status information returned from the read of the OPL-3 chip.
A drawback with this method of delay inducement (i.e. relying on a duration of an ISA I/O cycle) exists when employing the virtual subsystem architecture. Specifically, if faux-reads are trapped and acted upon under the system management mode mechanism as part of the virtualization process, an excessive amount of delay is induced due to the relatively large entrance/exit overhead with respect to the simple faux-read operation. By way of example and not limitation, it is common for applications/drivers to insert fifteen faux-reads between the index write and the data write in order to induce an approximate fifteen microsecond delay. Assuming arguendo that the entrance/exit overhead of the system management mode mechanism for each virtualized faux-read instruction takes three microseconds, then a delay loop comprising of fifteen faux-reads actually induces a delay of forty-five microseconds--thirty microseconds longer than intended. Therefore, depending on the amount of audio processing, the addition of the unintentional delay may cut into the bandwidth of the processor to a point where the motion of graphics drawn to the display become jerky.
Ignoring the trap on all I/O read instructions to improve performance is not an acceptable solution since as previously mentioned, some application and driver software rely on the status information returned from the read of the OPL-3 chip. Other options to improve performance may include adding a coprocessor or some other form of substantially dedicated hardware--increasing cost and minimizing any savings yielded from eliminating the original device. Moreover, simply increasing the speed of the processor offers some improvement but also increases attendant system complexity, power consumption, and cost.
Accordingly, it can be seen from the foregoing, that there is a need for an accelerated virtual subsystem architecture to accelerate the virtualization process of subsystems with minimal additional hardware resources and to overcome the obstacles presented by the delay inducement technique of legacy software.