Current video game systems hardware almost universally include a main processor and a graphics processor. The main processor may be a Pentium processor such as in a personal computer (PC). Alternatively, the main processor may be any processor involved in the transmission of program information to a graphics processor. The graphics processor is tightly coupled to the main processor by a very high performance bus with data throughput capability meeting or exceeding that of an Accelerated Graphics Port (AGP). The graphics is also generally coupled via an I/O bus providing an audio processor and includes network connectors for a PCI port. The main processor and graphics processor are tightly coupled to minimize any performance degradation that could accompany the transfer of data from the main processor and memory system to the graphics processor.
The audio system components are usually not viewed as performance critical. Hence the audio system usually resides on a lower performance peripheral bus. This is perfectly acceptable for the audio in current systems. Currently, the highest performing game audio systems have two chief characteristic features.
The first characteristic of high performance game systems is a positional audio scheme. A positional audio system performs dynamic channel gain/attenuation based on the user input and character perspective on a screen in real time. Multi-channel speaker systems typically include five main speakers, a front left, center, and front right speaker, plus a rear left and a rear right speaker. Such systems also include a separate subwoofer, which is a non-positional speaker for bass reproduction. Such an audio system with five main speakers and sub-woofer is referred to as a ‘5.1 level’ system.
If a sound generating source is coming from the left of the on-screen camera position, the gains on the left speakers are increased for that sound. Similarly, the gains for the right side are attenuated. If the user moves the joystick and changes the relative camera position, the channel gains are dynamically modified. The positional audio algorithm will be enhanced in new designs to sound well on a living room quality multi-channel system.
The second characteristic component is a real time reverb. Real time reverb can be run, not mixed with the track but rendered during game play. This creates a sound field effect based on the user environment within the game. For example, if the game moves from an outdoor scene into a cavern, a cavern reverb is applied to all new game produced sounds. Thus a gun shot will have an echo since it is now inside the cavern instead of outside. Several competing game system providers employ this of technology.
Both the positional audio and the real time reverb enhancements require the game designer to create the desired effect at game create time. The effects are then applied during runtime by the audio processor. For example, a cavern hall effect must be added to the game code in the form of “when this level is loaded, apply the cavern effect.” The game developer provides this effect which does not require a separate mixed track to be heard. The effect is produced as processing is applied, on the fundamental sound during run time. Thus a normal gunshot could be mixed for only the front left/right speakers.
Additionally, it is possible in a computer game to apply a different reverb to each sound primitive based on the sound source location. Suppose a sound comes from a cave but the listener position is outside the cave. The sound source will have the cave reverb applied, while any sound generated by the listener will not. These real-time effects must be set by the audio designer during the game create time by tagging the sound with the reverb to be applied.
In contrast to the moderate sophistication of current audio techniques, video techniques have advanced at a much more rapid pace. Video game manufacturers have committed ever increasing levels of hardware and software technology to the video image. Video information for game systems is assembled from elementary data and layered in levels to allow for image processing according to superposition principles. Increasing detail is supplied to the image with the inclusion of additional layer information. In a landscape scene, the lowest level is a wire-mesh structure that forms the spatial coordinates upon which objects may be placed. Higher levels contain polygon objects and yet higher levels contain refinements on the shapes of these objects such as rounding corners. With more levels the landscape scene and objects are further refined and shaped to:
1. Add texture to shapes taking them from stark geometrical figures to more realistic appearance;
2. Mix in reflective properties allowing reflective effects to be observed;
3. Modify lighting to add subtle illumination features;
4. Add perspective so that far away objects appear to be smaller in size;
5. Add depth of field so that position down into the image may be observed; and
6. Provide anti-aliasing to remove jagged edges from curves.
These are only a few basic features added in layers superimposed to form the finished image. The amount of image processing required to accomplish this refinement of the video data is enormous. The game starts from a suite of data describing polygons and their placement on a wire mesh as well as the characteristics of each polygon implicitly creating a video landscape to enable the processor to generate highly refined effects.
Multi-channel surround sound is becoming a standard function in gaming systems. Multi-channel surround sound enables a much wider array of effects than possible in a standard 2-speaker stereo system. Many standards and applications have been created that take advantage of this in modern game systems. Some of these support positional audio commonly referred to as 3D audio. Some apply various post-processing based effects to a base sound file for additional effects. Thus a reverb models the sound in a closed environment. These models allow a game developer on game creation, to pre-determine how a sound should be heard in a given environment. The game developer creates a single sound file. The sound levels on the multi-channel speaker system are adjusted via the positional audio application program interface (API) based on the relative position of the listener to the sound source. Various post processing effects such as a reverb can also be applied to a single sound source file in real-time based on the pre-programmed environment state information. This creates a better listening experience during game play.
However, all these models assume that the game environment itself is static. Although speaker levels can be dynamically adjusted, the sound properties cannot be adjusted unless pre-programmed before hand as described above. This creates a fairly large burden on the game designer to have enough audio knowledge to know what various effects are supposed to sound like in a given environment, particularly physics based effects. These models also so not use any information regarding changes in the sound environment, particularly the creation of multiple sound sources and how they interact with each other. In the static model, these effects must be pre-determined upon game design.
Next generation game console audio requirements will fall into one of two major operational modes: Bit Stream Playback Operational Mode; and Game Operational Mode. Two game manufacturers have indicated that their next console will be more than a game system. These consoles will be a living room entertainment system. The key audio component in the current living room entertainment system is the audio-visual reproduction (AVR). The soon to be introduced consoles will need to support some AVR functionality. Direct un-amplified multi-channel audio out may be present.