1. Field of the Invention
This invention relates to sound production systems and more particularly to a sound engine that preserves requests for synthesis of sound waveforms such that if its capacity for real-time synthesis is exceeded, then impact on an ongoing sound performance will be minimized.
2. Discussion of the Prior Art
In processor-based sound production systems, such as personal computers ("PCs") utilizing real-time synthesis, sound generating processes, such as multimedia programs, send requests for sound to a sound engine. The sound engine manages the sound requests and synthesizes corresponding digital waveforms, and an audio output system converts the digital waveforms into audible sounds.
Although sound-generating processes may request sound as digitally recorded segments to be played back as with a conventional tape recorder, event-based sound production is favored as being more responsive for producing interactive sounds while producing acceptable ongoing sound such as soundtracks. It is also favored as requiring the generation and storage of far less data. With event-based sound production, such as according to the Musical Instrument Digital Interface ("midi") and General Musical Instrument Digital Interface ("General midi") standards, sound-generating processes send sound requests, including instructions for how to create requested sounds, to the sound engine. The sound engine creates or "synthesizes" digital waveforms according to the instructions, similarly to a player piano pressing keys according to the arrangement of holes in a piano roll. A General midi sound request might for example include an instruction to start synthesizing ("note-on") a specific pitch ("note") for a specific sound quality or "instrument" such as middle-C on a piano. Upon receipt of the sound request, the sound engine not only initiates but continues to synthesize a piano sound throughout the sound's characteristic duration unless it receives a request to stop synthesizing the sound ("note-off").
Since a processor-based sound engine necessarily synthesizes waveforms serially, the audio output system uses a conventional double-buffering system to create a continuous waveform that will produce continuous sound. A double-buffering system includes two buffers that alternate input and output tasks on a regular and continuous basis. While one buffer is available to receive a waveform for a given period of time ("one cycle"), the other buffer is continually outputting individual parts ("samples") of the previous waveform to a digital-to-analog converter. When a cycle ends ("times-out"), the buffers switch tasks and continue outputting samples without interruption.
While each double-buffering cycle times-out in only several milliseconds, a typical sound lasts several seconds. Thus, the sound engine will ideally synthesize a waveform corresponding to each of any number of sounds requested in segments ("waveform segments") with each segment having a duration of one cycle and each complete waveform having a duration of typically a few to several hundreds of samples. In addition to continuing waveforms, the sound engine will ideally synthesize a waveform segment corresponding to each request for a new sound as it receives the request. Inconveniently, since PCs are not exclusively dedicated to producing sound, the processor may be interrupted at an unpredictable time for an unpredictable interval during any given cycle.
Thus, the need to produce sounds interactively, combined with the need to produce continuous sound using processor-based systems that are not exclusively dedicated to producing sound, causes problems if the sound engine receives more sound requests than it can fully synthesize in real-time.
In prior PC-based sound production systems, excess sound requests have been summarily and irrevocably discarded both upon receipt and again during synthesis. At the start of each double buffering cycle the total number ("playable number") of synthesizable waveform segments is calculated as a presumed processor-available time divided by the sum of a mean time for fully synthesizing an average complexity waveform plus additional time to account for potentially more complex waveforms and processor interruptions. The sound engine then irrevocably discards any requests for sound in excess of the playable number and begins fully synthesizing the remaining ("planned") sound requests. When the current cycle times out, the sound engine irrevocably discards all planned sound requests that it has not yet synthesized.
While summarily discarding sound requests assures that the capacity of a sound engine is not exceeded due to processor interruptions, it conflicts with the sound engine's very purpose: synthesizing requested sounds as accurately and completely as is possible. Accurate synthesis is important not only in initiating requested sounds, but also in sustaining and concluding sounds that were initiated earlier.
The FIG. 1a graph shows how conventional non-discriminating, irrevocable and abrupt discarding of sound requests results in the loss of auditory cues and sound textures. Most naturally occurring sounds 110, such as piano sounds, have a characteristic attack portion 110a, sustain portion 110b and concluding portion or "release" 110c. In the piano example, a hammer hitting the strings causes attack portion 110a, sound 110 sustains 110b while the key is depressed and sound 110 concludes 110c after sufficient time has elapsed or when the key is released. Portions of sound 110 not only provide sonic texture, but also provide a listener with sonic cues. When conventional sound engines discard a sound request, the corresponding sound 110 ends abruptly, thereby preempting cues provided by remaining portions of sound 110. Thus, if a sound request is discarded at t.sub.1 prior to a sound's attack at time t.sub.2, the entire sound 110 is lost. If the sound 110 provided a critical system warning, musical event or interactive cue, then that is lost as well. If a sound request is discarded at time t.sub.4 during a sound's sustaining portion, then the attack portion 110a of sound 110 is not affected. However, the remaining duration 120b of sound 110 is lost along with the texture and cues provided by the remainder of sustaining portion 110b and release 110c. Thus, during each double-buffering cycle conventional sound engines discard many sounds, along with the respective textures and cues they provide, indiscriminately at any point during their duration and without consideration of their sonic importance. Therefore, the integrity of a sound performance might be severely compromised.
The FIG. 1b graph, by magnifying portion 200 of the FIG. 1a graph, shows how conventional, non-discriminating, abrupt, and irrevocable discarding 115b of a sound request during the corresponding sound sustaining portion 110b or concluding portion 110c introduces readily perceived noise. Abruptly discarding 115b a sound request immediately ceases all synthesis of a corresponding sound 110, resulting in a total lack of the sound 110 or equivalently a waveform 110e having a constant zero amplitude. Since it is extremely unlikely that the amplitude of the last sample prior to discarding 115b will be zero, a difference in amplitudes ("waveform discontinuity") occurs at time t.sub.4. This waveform discontinuity, after mixing with other sounds and amplification, is perceived by listeners as a loud, obnoxious, popping sound followed by the sudden absence of the remainder of the sound as well as a loss of the textures and cues that the request-initiating process intended the sound to provide. To make matters worse, conventional sound engines may discard any number of sound requests during each typically ten to twelve millisecond cycle.
The conventional sound engine solution of completely and irrevocably discarding sound requests is also inconsistent with the problem that it is intended to solve, ie., temporarily exceeded sound engine capacity. Exceeded sound engine capacity is most often due to processor interruption during synthesis. While conventional sound engines discard excess sound requests during each cycle in anticipation of processor interruption, such processor interruption might not and in many cases does not occur. Further, processor interruption and resultant exceeded sound engine capacity during a current cycle indicates a greater likelihood of sufficient capacity during successive cycles when synthesis of such sound requests might be continued. Thus conventional discarding of sound requests is in both cases premature; a waste of time better reserved for fully synthesizing more requests; and might needlessly, severely and detrimentally impact an ongoing sound performance.
Thus, there is a need for a sound engine that makes better use of time and equipment resources to synthesize sounds.