Multimedia programs present content to a user through both audio and video events while a user interacts with a program via a keyboard, joystick, or other interactive input device. A user associates elements and occurrences of a video presentation with the associated audio representation. A common implementation is to associate audio with movement of characters or objects in a video game. When a new character or object appears, the audio associated with that entity is incorporated into the overall presentation for a more dynamic representation of the video presentation.
Audio representation is an essential component of electronic and multimedia products such as computer based and stand-alone video games, computer-based slide show presentations, computer animation, and other similar products and applications. As a result, audio generating devices and components are integrated into electronic and multimedia products for composing and providing graphically associated audio representations. These audio representations can be dynamically generated and varied in response to various input parameters, real-time events, and conditions. Thus, a user can experience the sensation of live audio or musical accompaniment with a multimedia experience.
Conventionally, computer audio is produced in one of two fundamentally different ways. One way is to reproduce an audio waveform from a digital sample of an audio source which is typically stored in a wave file (i.e., a .wav file). A digital sample can reproduce any sound, and the output is very similar on all sound cards, or similar computer audio rendering devices. However, a file of digital samples consumes a substantial amount of memory and resources when streaming the audio content. As a result, the variety of audio samples that can be provided using this approach is limited. Another disadvantage of this approach is that the stored digital samples cannot be easily varied.
Another way to produce computer audio is to synthesize musical instrument sounds, typically in response to instructions in a Musical Instrument Digital Interface (MIDI) file, to generate audio sound waves. MIDI is a protocol for recording and playing back music and audio on digital synthesizers incorporated with computer sound cards. Rather than representing musical sound directly, MIDI transmits information and instructions about how music is produced. The MIDI command set includes note-on, note-off, key velocity, pitch bend, and other commands to control a synthesizer.
The audio sound waves produced with a synthesizer are those already stored in a wavetable in the receiving instrument or sound card. A wavetable is a table of stored sound waves that are digitized samples of actual recorded sound. A wavetable can be stored in read-only memory (ROM) on a sound card chip, or provided with software. Prestoring sound waveforms in a lookup table improves rendered audio quality and throughput. An advantage of MIDI files is that they are compact and require few audio streaming resources, but the output is limited to the number of instruments available in the designated General MIDI set and in the synthesizer, and may sound very different on different computer systems.
MIDI instructions sent from one device to another indicate actions to be taken by the controlled device, such as identifying a musical instrument (e.g., piano, flute, drums, etc.) for music generation, turning on a note, and/or altering a parameter in order to generate or control a sound. In this way, MIDI instructions control the generation of sound by remote instruments without the MIDI control instructions themselves carrying sound or digitized information. A MIDI sequencer stores, edits, and coordinates the MIDI information and instructions. A synthesizer connected to a sequencer generates audio based on the MIDI information and instructions received from the sequencer. Many sounds and sound effects are a combination of multiple simple sounds generated in response to the MIDI instructions.
A MIDI system allows audio and music to be represented with only a few digital samples rather than converting an analog signal to many digital samples. The MIDI standard supports different channels that can each simultaneously provide an output of audio sound wave data. There are sixteen defined MIDI channels, meaning that no more than sixteen instruments can be playing at one time. Typically, the command input for each MIDI channel represents the notes corresponding to an instrument. However, MIDI instructions can program a channel to be a particular instrument. Once programmed, the note instructions for a channel will be played or recorded as the instrument for which the channel has been programmed. During a particular piece of music, a channel can be dynamically reprogrammed to be a different instrument.
A Downloadable Sounds (DLS) standard published by the MIDI Manufacturers Association allows wavetable synthesis to be based on digital samples of audio content provided at run-time rather than stored in memory. The data describing an instrument can be downloaded to a synthesizer and then played like any other MIDI instrument. Because DLS data can be distributed as part of an application, developers can be assured that the audio content will be delivered uniformly on all computer systems. Moreover, developers are not limited in their choice of instruments.
A DLS instrument is created from one or more digital samples, typically representing single pitches, which are then modified by a synthesizer to create other pitches. Multiple samples are used to make an instrument sound realistic over a wide range of pitches. DLS instruments respond to MIDI instructions and commands just like other MIDI instruments. However, a DLS instrument does not have to belong to the General MIDI set or represent a musical instrument at all. Any sound, such as a fragment of speech or a fully composed measure of music, can be associated with a DLS instrument.
Conventional Audio and Music System
FIG. 1 illustrates a conventional audio and music generation system 100 that includes a synthesizer 102, a sound effects input source 104, and a buffers component 106. Typically, a synthesizer is implemented in computer software, in hardware as part of a computer's internal sound card, or as an external device such as a MIDI keyboard or module. Synthesizer 102 receives MIDI inputs on sixteen channels 108 that conform to the MIDI standard. Synthesizer 102 includes a mixing component 110 that mixes the audio sound wave data output from synthesizer channels 108. An output 112 of mixing component 110 is input to an audio buffer in the buffers component 106.
MIDI inputs to synthesizer 102 are in the form of individual instructions, each of which designates the MIDI channel to which it applies. Within synthesizer 102, instructions associated with different channels 108 are processed in different ways, depending on the programming for the various channels. A MIDI input is typically a serial data stream that is parsed in synthesizer 102 into MIDI instructions and synthesizer control information. A MIDI command or instruction is represented as a data structure containing information about the sound effect or music piece such as the pitch, relative volume, duration, and the like.
A MIDI instruction, such as a “note-on”, directs synthesizer 102 to play a particular note, or notes, on a synthesizer channel 108 having a designated instrument. The General MIDI standard defines standard sounds that can be combined and mapped into the sixteen separate instrument and sound channels. A MIDI event on a synthesizer channel 108 corresponds to a particular sound and can represent a keyboard key stroke, for example. The “note-on” MIDI instruction can be generated with a keyboard when a key is pressed and the “note-on” instruction is sent to synthesizer 102. When the key on the keyboard is released, a corresponding “note-off” instruction is sent to stop the generation of the sound corresponding to the keyboard key.
The audio representation for a video game involving a car, from the perspective of a person in the car, can be presented for an interactive video and audio presentation. The sound effects input source 104 has audio data that represents various sounds that a driver in a car might hear. A MIDI formatted music piece 114 represents the audio of the car's stereo. Input source 104 also has digital audio sample inputs that are sound effects representing the car's horn 116, the car's tires 118, and the car's engine 120.
The MIDI formatted input 114 has sound effect instructions 122(1–3) to generate musical instrument sounds. Instruction 122(1) designates that a guitar sound be generated on MIDI channel one (1) in synthesizer 102, instruction 120(2) designates that a bass sound be generated on MIDI channel two (2), and instruction 120(3) designates that drums be generated on MIDI channel ten (10). The MIDI channel assignments are designated when MIDI input 114 is authored, or created.
A conventional software synthesizer that translates MIDI instructions into audio signals does not support distinctly separate sets of MIDI channels. The number of sounds that can be played simultaneously is limited by the number of channels and resources available in the synthesizer. In the event that there are more MIDI inputs than there are available channels and resources, one or more inputs are suppressed by the synthesizer.
The buffers component 106 of audio system 100 includes multiple buffers 124(1–4). Typically, a buffer is an allocated area of memory that temporarily holds sequential samples of audio sound wave data that will be subsequently communicated to a sound card or similar audio rendering device to produce audible sound. The output 112 of synthesizer mixing component 110 is input to buffer 124(1) in buffers component 106. Similarly, each of the other digital sample sources are input to a buffer 124 in buffers component 106. The car horn sound effect 116 is input to buffer 124(2), the tires sound effect 118 is input to buffer 124(3), and the engine sound effect 120 is input to buffer 124(4).
Another problem with conventional audio generation systems is the extent to which system resources have to be allocated to support an audio representation for a video presentation. In the above example, each buffer 124 requires separate hardware channels, such as in a soundcard, to render the audio sound effects from input source 104.
Similarly, other three-dimensional (3-D) audio spatialization effects are difficult to create and require an allocation of system resources that may not be available when processing a video game that requires an extensive audio presentation. For example, to represent more than one car from a perspective of standing near a road in a video game, a pre-authored car engine sound effect 120 has to be stored in memory once for each car that will be represented. Additionally, a separate buffer 124 and separate hardware channels will need to be allocated for each representation of a car. If a computer that is processing the video game does not have the resources available to generate the audio representation that accompanies the video presentation, the quality of the presentation will be deficient.
Developing Interactive Audio
When developing audio content for a multimedia application in a development environment, such as when developing a video game program, the audio content is typically created by a composer or sound designer, and most of the implementation and integration of the audio content into the multimedia application is performed by an application developer, or game programmer. The audio sounds and music that are associated with a video presentation of the application are created by the sound designer and then implemented and encoded into the application code by the programmer.
The iterative process between a sound designer and an application developer to generate, provide, and implement audio content for a video application can be a slow process. The sound designer has to adjust volume levels, modify sound effects, change the music for a particular variable level, etc. for each audio rendition of a video event. Subsequently, the application developer has to encode each audio rendition into the video application in the right sequence and at the right time, depending on video game variables. For example, the music associated with a particular character in a video game might change when variables such as the character's health level, status, situation, environment, and the like changes. Further, a scene change may necessitate a transition to new music, or the intensity of the music can be increased or decreased based on the activity or excitement level in the game.
Accordingly, there is a need for techniques to abstract the development of an audio rendition corresponding to a video event in an encoded video application so that a sound designer and an application developer are not restricted to the conventional iterative application audio design process.