The invention relates to a process for graphical visualisation and real-time control of virtual and/or real objects for the purpose of generating and/or influencing image sequences or tone sequences, with the aid of which objects represented on a screen or real objects, such as, for example, robots or such like, can be manipulated or controlled in their characteristics and/or actions in a convenient and reliable manner in (almost) real time by interactive control commands of a user. In particular, the present invention relates to a process for animation and motion control of a three-dimensional (real or virtual) articulated object in real time, to a process for freely specifiable control of animated graphics, video data or audio data with the aid of optical or acoustic parameters, to a computer-software product for executing such a process and also to a system for real-time motion control of virtual and/or real objects.
Let it be noted that “objects” in the sense of the invention may be virtual objects but may also be real objects such as robots or other remote-controlled objects having several degrees of freedom.
Within the scope of interactive communication between man and machine, problems frequently arise that are based on the inadequate adaptation of the machine to the characteristics of reception, processing and output of information by the human user. This mismatch may, on the one hand, result in a flood of information which can no longer be handled by the user, above all if several tasks have to be performed. On the other hand, too little demand on the user, for example in the case of a highly automated system in which the human is merely given a monitoring function, may have the effect that a falling-off in performance occurs as a result of the monotony of the working situation, and faults as a consequence of deficient drilling in the procedure in such situations are no longer overcome. Deficient consideration of the knowledge and level of training of the user of a machine should also be mentioned here. In many cases, human behaviour, for example in the selection, evaluation and linkage of information, in arriving at a decision, in problem-solving and also in the planning and execution of actions, is still only insufficiently taken into account and supported when it comes to the design of technical systems.
Although the systems that are currently used for the representation and control of objects in virtual environments take account in increasing measure of the capabilities of man for receiving and processing information, they have a major drawback: in connection with the input of control commands with a view to exerting a direct influence on the scene that is represented, the user is still reliant on conventional methods for the manual input of information, such as, for example, via a mouse, trackball, joystick, graphics tablet with pen or touch-screen. The input mechanisms that are necessary in this connection firstly have to be learnt by the user, in order to be capable of being executed also at an appropriate speed of reaction. On the other hand, the innate or acquired capabilities of man that are already present for communication by means of acoustic signals (e.g. speech) or optical signals (e.g. facial expressions, gestures, demeanours and movements) in connection with the input of information for the purpose of controlling objects are only insufficiently taken into account.
For the adaptation of technical systems to man, prior knowledge of his characteristics, his behavioural patterns, his skills and his level of knowledge is consequently necessary. In connection with the exchange of information between a user and an information system, the sensory, cognitive and motoric characteristics of man are of interest in particular.
With regard to the sensory characteristics of man which are predetermined by the sensory channels, essentially the following perceptive channels are addressed by conventional machines and devices for the output of information:                the visual channel (eyes) by means of optical signals,        the auditory channel (ears) by means of acoustic signals and        the tactile channel (sense of touch) by means of haptic signals.        
After processing of the signals in the brain (cognition), with respect to the motoric characteristics of man which are predetermined by the output channels the following channels are essentially available:                the motor functions of the arms, hands and fingers, and of the legs and feet, and also movements of the body, head, eyes or mouth—that is to say, physical movements, demeanours, gestures and facial expressions for the purpose of generating mechanical or optical signals,        the motor functions of speech for the purpose of generating acoustic signals.        
Via these channels, signals can be input into an information system in order to trigger a desired action of the system.
An ideal medium for communication between a user and an information system should be matched both to the sensory and perceptual capabilities and to the motoric capabilities and also to the specific characteristics of the human user. In this connection the information should be structured in such a way that an optimal correspondence is achieved between the representation of the output information and the mental model of the user: if the information to be displayed to the user is presented in such a way that, for example, his spatial perceptivity is addressed, the user can deal with astonishingly complex amounts of information per unit time. Similarly, the information system should be capable of receiving, understanding and processing as many types of information transmitted by a user as possible, and of transforming them into corresponding actions. Associated with this is the advantage that the user is able to react more efficiently and more quickly to new events and situations. User-friendliness and appropriateness to the task are consequently typical features which such an ideal communication medium is provided with. These features may be manifested as follows:                correspondence between type, volume and output speed and presentation of the output information with the sensory characteristics of the human user,        consideration of all the information channels of the user in connection with the reception, recognition and interpretation of received control signals of the user,        easy learnability and intuitive operability of the medium,        high bandwidth of the transfer of information to the brain and high throughput of information,        dynamic adaptation of the application to the individual characteristics, capabilities, tasks, working and organisational techniques of the user,        use of a natural interactive language having high semantic content,        reliability, robustness and maintainability of the medium,        social acceptance of the medium in the population,        consideration of health, ergonomic and safety-relevant aspects etc.        
It is the aim of the development of suitable interfaces between man and machine to start from the characteristics of human communications channels and skills in order to make available devices, interactive techniques and interfaces that guarantee an effective reciprocal communication via these channels. So-called virtual realities (VR) are particularly suitable in order to attain this aim. The term “virtual reality” (VR) is understood to mean the computer-based generation of an intuitively perceptible or sensible scene consisting of its graphical representation and the interactive possibilities for the user. A virtual environment affords a user access to information which otherwise would not be available at the given location or at the given time. It is based on natural aspects of human perception, inasmuch as it employs visual information in three spatial dimensions. This information can, for example, be selectively changed or enhanced with further sensory stimuli. Essential preconditions in this connection are the monitoring of perspective in real time and the possibility of the active exertion of influence by the user of the system on the scene that is represented.
In the course of navigation through virtual environments the user can employ the type of control that is natural for him. This may be, for example, appropriate arm or leg movements, movements for positioning the head or eyes, rotary movements of the body or movements directed towards an object. Through the use of already existing skills of the user for the purpose of control, the cognitive loading during the interaction between man and machine can be reduced. As a result, the bandwidth of the communication between man and machine can be increased and the operability of the machine can be improved. Whereas in the case of the conventional forms of man/machine communication the control of the machine is undertaken in command-oriented manner, in the case of the control of objects in virtual environments no specific commands have to be freshly learnt and employed: the computer “observes” the user passively and reacts in appropriate manner to movements of the eyes, head and/or hands of the user under real-time conditions.
The manipulation of the characteristics and the influencing of the actions of objects of a scene that is represented presupposes a complicated interplay of sensor technology, cognitive processing and motor functions, on which many factors act (individual behavioural patterns and capabilities, experiences, environmental influences etc.). In the case of interactions in a virtual world, there are additional difficulties. For the control, manipulation and influencing of objects, in particular a reflex-type or cognitive sensory-motoric feedback is important which, for example, originates from receptors in the skin, kinaesthetic sensations, the sense of balance and also visual and/or acoustic sensations. In this connection, in many cases a necessary redundancy arises which does not always obtain in VR applications. By reason of the often insufficient sensory feedback in VR applications, in addition the learning of motoric skills is rendered difficult.
In the commercial VR applications a distinction is made between systems in which the user is completely integrated into the virtual environment (“immersion”) and systems that offer only one “window” for virtual reality. In addition to the known forms of man/machine communication, such as                direct manipulations of objects by manual delicate motoric operations (pointing, touching, grasping, moving, holding firmly etc.),        formal interactive languages (programming languages, command languages and formal query languages),        natural-language interaction,        gesticulatory interaction by means of non-verbal symbolic commands (facial expressions, gestures, demeanours, movements) and also        hybrid task-orientated forms of interaction,        
virtual realities can also be interpreted as a new form of man/machine communication. As the name “virtual reality” already suggests, for this purpose a certain fidelity to the reality of the presentation is necessary: the sensory information that is required for processing a task or for attaining an objective is to be presented to the user. Visual perception provides not only information about the location, movement, form, structure, contour, texture, colour or patterning of objects etc., but also information about the relative position of the body of the observer and the movements thereof and also about the nature of the three-dimensional environment. In this connection, synthetically generated environments can be fashioned more realistically if as much as possible of the information arising in natural environments (parallaxes due to movement, vanishing-points of the perspective representation, spatial depth effect and plasticity, illumination and casting of shadows, masking, brilliance effect, reflection effects and diffuse reflection etc.) is simulated. How much and which information is to be presented depends on the particular task that has been set. The differences between real and virtual worlds determine how realistic the simulation is perceived to be.
For the purpose of realising virtual realities, the visual information has to be simulated by a computer. In this connection, aspects similar to those in painting are relevant. In the case of the computer-assisted simulation of three-dimensional worlds, ordinarily the projection of individual beams of light is simulated. The starting-point of such a simulation is the specification of the environment to be simulated. To this end, the individual objects with their characteristics and their locations have to be established. For the purpose of visualisation, the intensities of individual image-points are then computed and projected onto the output medium.
With the aid of these simulations it is possible for totally new types of learning and drilling to be realised (examples: driving simulator or flight simulator); on the other hand, in this connection particular aspects of the real world are always abstracted. VR applications therefore simultaneously bring about an enhancement and a restriction of the possibilities for experience on the part of the user.
In principle, VR systems consist of sensors and actuators and also the coupling thereof. Important hardware constituents are, inter alia, the following:                Displays for presentation of the virtual environment: within the context of visual presentation, nowadays monitors, head-mounted displays (HMD), binocular omni-oriented monitors (BOOM) and projection systems find application above all; but use is also made of auditive or tactile displays which react to acoustic or manual user inputs.        Positioning and orientation systems for recording the location and perspective of the user: in this connection a distinction is made between the determination of the absolute position (position tracking) and the measurement of the deflection of articulations (angle measurement). Electromagnetic, kinematic, acoustic, optical and also image-processing procedures find application.        Interactive and manipulative systems for the action and reaction of the user in the virtual environment: for this purpose, use is made of pointing devices (2D or 3D mice, trackballs, joysticks etc.) or tactile devices (touch-screens, electromagnetic graphics tablets with pen etc.); so-called “data gloves” with deflection sensors and pressure sensors are also being employed to an increasing extent. Furthermore, voice control should also be mentioned in this context.        Computation systems and software for generating the virtual environment, subject to real-time requirements.        Networks for the integration of various users, by virtue of which new forms of collaboration may evolve.        
The diverse technical variants of helmet-based or head-based systems for visualising virtual realities are designated synoptically in English as “visually coupled systems” (VCS). They consist of the following important components:    1. a display attached to the head or helmet,    2. a device for determining the movements of the head and/or eyes of the user,    3. a source of visual information, which depends on the direction of the head and/or vision of the user.
When a system of such a type is employed for VR applications, information from both the real environment and the virtual environment can be presented at the same time. In this connection one speaks of “see-through displays” for the presentation of enhanced realities.
The tracking of movements of the head is an important component of VR applications. Ordinarily the position and orientation of the head in space are ascertained; advanced systems can, in addition, track the direction of vision. To this end, most systems employ ultrasound, magnetic energy or light energy for communication between the transmitters fitted to the head and the receivers. Important technical data that play a role in the selection of these systems are:                the number of degrees of freedom for the directions of motion that can be registered and tracked,        the recordable angular range,        the static precision (sensitivity to vibration),        the resolution,        the reliability,        the throughput of data and the scanning-frequency of the screen,        the interface to the computer and also        further performance aspects.        
VR applications can be used successfully in practice in a number of different fields. In the following, a number of possible applications will be outlined in exemplary manner.                Use in the training field: through learning to deal with (virtual) objects, interactive demonstrations, visualisation of abstract concepts, virtual training of behaviour in dangerous situations, virtual exploration of remote locations or epochs, knowledge can be imparted, creative skills can be taught, and behavioural patterns can be trained.        Use in driving training and flight training in appropriate simulators: behaviour, particularly in emergency situations, can be taught through the use of simulators.        Use in the field of computer games: through the possibility of navigation through a virtual scene and the possibility of selective control and influence on virtual objects, an impression arises that is close to reality, as a result of which the attractiveness of a computer game for the user can be substantially increased.        
The technologies that are available nowadays for the input of information into a data-processing system can be divided into four groups, according to the sensors that are used:    1. mechanical input systems (e.g. keyboards, mice, trackballs and joysticks),    2. electrical input systems (e.g. tactile displays and graphics tablets),    3. optical input systems (e.g. light pens) and    4. acoustic input systems (e.g. voice-input and voice-interpretation systems).
In the following, the aids that are customary for the input of information according to the present-day state of the art and that are employed for the purpose of controlling objects in the field of VR applications will be briefly considered.
Conventional input systems, such as keyboards, mice, trackballs and joysticks, are in widespread use nowadays. They are used in order to control position-markers (cursors), mouse pointers etc., in order, for example, to be able to navigate through a virtual scene or to move virtual objects on the screen. The disadvantage of these input systems is that they require a surface to rest on (that is to say, a permanent location) in order to be able to be used efficiently.
With a touch-screen, on the other hand, it is possible to point with the finger directly to objects that are illustrated on the screen without requiring further space-consuming ancillary equipment on the desk. Low-resolution touch-screens have 10 to 50 positions in the horizontal and vertical directions and utilise horizontal and vertical series of infrared light-emitting diodes and photoelectric sensors in order to build up a grid of invisible beams of light immediately in front of the screen. When the screen is touched, both vertical and horizontal beams of light are interrupted. From this information the current finger position can be ascertained.
Another known embodiment of touch-sensitive information-input devices is the capacitively coupled touch-panel. The latter provides a resolution of about 100 positions in each direction. If a user touches the conductively coated glass plate of the touch-screen with a finger, the current finger position can be ascertained by reason of the change in impedance. Other high-resolution panels make use of two transparent layers which are minimally spaced from one another. One of these layers is conductively coated; the other is coated with a resistive material. By virtue of the contact pressure of the finger, these two layers touch one another, and by measuring the drop in voltage resulting therefrom the current finger position can then be ascertained. A lower-resolution and cheaper variant of this technology uses a grid of fine wires instead of these layers.
According to the state of the art, nowadays various solutions to the problem of real-time motion control of virtual objects are available, each of these solutions being optimised for a special application. Certain limitations are therefore associated with each of these solutions. In order to be able to explain the rudiments of some of the most important of these solutions, it is necessary to consider briefly their most important aspects.
One possibility for real-time motion control of virtual or real objects has recently arisen by virtue of the fact that input devices for computers have become known that enable the simultaneous input of control signals having several mutually independent degrees of freedom. The possibilities that are created thereby exceed by far those which, for example, consist in the use of a mouse which can only be controlled in two dimensions (e.g. on the desktop). Although it is also known to provide a mouse with, for example, additional switches, these switches have the disadvantage that they do not enable the input of analogue data but are restricted to binary data (on/off).
Various input devices are also known from the state of the art that are able to generate analogue control signals having various mutually independent degrees of freedom, in which case each of these analogue signals can consequently be used as a parameter value of a controlling device. For example, from patent specification U.S. Pat. No. 5,757,360 an egg-shaped input device for computers is known which can be moved freely in space by a hand of the user, which ascertains its instantaneous positions, directions of motion, velocities and accelerations, and which transmits these kinematic data to the computer in wireless manner. In this case an analogue sequence of motions in the form of a motion pattern is identified, from which motion commands are derived and converted into an animated graphical representation. The motion patterns are recognised automatically with the aid of a pattern-recognition algorithm. In addition, control commands are generated. The disadvantage of this process consists in the fact that it is not freely specifiable, since sequences of motions of the user, which are recorded by the input device in analogue manner, are assigned to corresponding sequences of motions of stored motion sequences of an animated graphical representation and can only be represented as such.
Input devices that are provided with force/moment sensors to be operated manually are known, for example, from patent specifications DE 36 11 336 C2, DE 37 64 287 and also EP 0 979 990 A2. From the last-named patent specification it is known, moreover, to use a force/moment sensor of such a type for controlling a real or virtual mixing console or control console, for example in order to create and to fashion novel colour, light and/or sound compositions. In this case, once again the intuitive spatial control in three translatory and also three rotatory degrees of freedom can be transmitted in advantageous manner to a stage for continuously variable spatial mixing or control of a large number of optical and/or acoustic parameters.
Manually controllable input systems that permit navigation in three dimensions are nowadays employed successfully in a number of extremely diverse technical fields of application. One such field of application is constituted by, for example, control devices for controlling the functions of electronic musical instruments (above all, in the case of synthesisers and master keyboards) that are provided with a so-called MIDI interface (Musical Instrument Digital Interface).
Such a three-dimensional control device for electronic musical instruments is realised by, for example, the D-Beam controller which is integrated into the EM-30 and EM-50 keyboards manufactured by Roland. Through the use of an extremely sensitive beam of infrared light, in this case the movements of the hand and/or body of a user above a surface coated with infrared sensor elements can be detected in contact-free manner. By virtue of the D-Beam technology, individual acoustic parameters of recorded improvisations or compositions can be modified or controlled in real time by these movements of the hand and/or body of the user. To this end, the D-Beam controller converts the movements of the user into MIDI signals and analogue control signals.
Since one of the preferred exemplary embodiments of the present invention is likewise based on a control device for controlling stored parametrised audio data with the aid of parametrised control signals which, with the aid of controllable virtual objects, are transmitted to at least one electronic musical instrument via a MIDI interface, in the following the aspects of the MIDI standard that are important for an appreciation of the invention will be briefly presented.
The MIDI interface is a format for digital data transmission between electronic musical instruments, computers and peripheral devices. The MIDI standard is in widespread use nowadays and has been used by many musicians and composers since its introduction in 1983. MIDI reveals a very efficient method for presenting audio data, and this makes MIDI a very attractive data-transmission protocol not only for composers or artists but also for a large number of computer applications that are capable of generating sounds or sound patterns, such as multimedia applications or computer games, for example. Thanks to the publications of the General MIDI System Specification, nowadays the commonest PC/MIDI interfaces enjoy widespread acceptance amongst users. Furthermore, MIDI is supported by the Microsoft Windows operating system and by other operating systems. By reason of the development and marketing of inexpensive synthesisers, the MIDI standard is enjoying growing popularity, with a steadily increasing number of applications.
MIDI was originally developed in order to be able to couple two or more keyboards of different manufacturers to one another. However, at that time no-one foresaw that, with the aid of the MIDI data format, complete musical productions would be created by sequencer systems. Nowadays MIDI finds application, above all, as a transmission medium, in order to replace or to supplement digitised audio data in computer games or multimedia applications.
MIDI was standardised by the MIDI Manufacturers Association (MMA), to which all manufacturers of digital musical instruments throughout the world belong. This committee defines the standard that is binding on all members; inter alia, it also defines the command structure of the MIDI protocol which is laid down in the MIDI standard. Without this standard, incompatibilities would have arisen between the devices of different manufacturers.
In contrast with the transmission of analogue audio data, when sound patterns are transmitted from one or more keyboards to a computer (or in the opposite direction) via a MIDI interface merely bit sequences (so-called MIDI events) are transmitted which comprise, in electronically readable form, the significant acoustic parameters of the pieces of music that have been played on the individual keyboards or that are to be reproduced on them. These programming commands comprise MIDI sequences which, for example, instruct the synthesiser which soundtracks are to be recorded, which solo instruments or accompanying instruments are to be used for an arrangement, and which musical parameters are being transmitted. In detail, the expression “acoustic parameters” is to be understood to mean, for example, pitches, note-values or rest-values, loudness-levels, tempi, articulation instructions, timbres, pedal effects, vibrato effects, chorus effects, echo effects, overtone effects and/or other special effects. In the following these acoustic parameters will be designated as “MIDI playing information”. When the audio data are reproduced on a keyboard it is accordingly a question not of analogue recordings of pieces of music previously recorded on a keyboard but of the exact reproduction of the recording event itself. The polyphonic voices of a reproduction synthesiser are at least partially assigned.
In comparison with the use of sampled audio data, which are stored on a diskette or on a CD-ROM, the generation of sounds or sound patterns with the aid of MIDI synthesisers has many advantages. One of these advantages concerns the memory space that is required for storage of the parametrised audio data. Files in which digitally sampled audio data are normally stored in a PCM format (such as, for example, “.WAV” files) are, as a rule, rather large. This is true, in particular, of long pieces of music that have been recorded in stereo quality with a high sampling-frequency. In contrast, MIDI files are extremely small. For example, files in which high-quality sampled audio data are stored in stereo quality contain about 10 Mbytes per minute of played-back music, whereas a typical MIDI sequence has a size of less than 10 kbytes per minute of played-back music. This is the case because MIDI files—as already mentioned—do not contain the samples audio data but merely contain the programming commands that are required by a synthesiser in order to generate the desired sound.
Since the MIDI playing information transports no direct information about the type of the audio data represented, individual acoustic parameters of this MIDI playing information are arbitrarily interchangeable retrospectively. This likewise affords great advantages:                A composer can retrospectively orchestrate or rearrange his work in variable manner.        The errors that have possibly arisen in the course of recording (e.g. “wrong notes”) can be corrected retrospectively.        Several synthesisers can, for example, reproduce one and the same voice (unisono), in order to achieve more richness of sound and many other effects.        
The additional possibility of editing recorded pieces of music on the screen, using one or more synthesisers or other electronic musical instruments—that is to say, of being able to change, supplement, delete, move or transpose individual notes or rests, groups of notes or entire staves—simplifies the work of a composer considerably.
A complete MIDI word consists, as a rule, of three bytes. Firstly the so-called status byte is transmitted; this is a communication about which type of message it is a question of. The status byte is followed by two data bytes, which contain data about the respective content of the message. The following example relates to the MIDI representation for the “switching on” of a note of medium pitch (c∪) which is to sound with medium loudness (Ital.: mezzoforte, mf):
TypeLoudnessof messagePitch informationinformation(status byte)(1st data byte)(2nd data byte)MIDI word (binary)100100002001111002010001102MIDI word1441060107010(decimal)Musical“Note On”Middle “C” (c∪)“Mezzoforte”information(mf)
The first bit (most significant bit, MSB) in the binary representation of the status byte is always assigned the value “1”; in the case of data bytes, the MSB always has the value “0”. In this way it is possible for status bytes and data bytes to be distinguished unambiguously. For the decimal values ZS for a status byte, as a consequence of the “1” in the MSB the following holds: ZS∈[12810;25510]. Since in the case of the data bytes the MSB is set to “0”—that is to say, it can no longer be drawn upon as a value indicator—only seven bits remain in each instance for the data bytes, so that for the decimal values ZD1 and ZD2 of the two data bytes the following consequently holds: ZD1∈[010;12710] and ZD2∈[010;12710]. These 128 different “Note On” or “Note Off” combinations are fully sufficient, since 128 different pitches—arranged in an equally tempered, chromatic scale (i.e. spaced at semitone intervals)—far exceeds the tonal range (compass) of a modern concert grand piano with 88 keys.
In order to be able to address individual devices within a MIDI system selectively, there exist a total of 16 MIDI channels. The MIDI channel of the transmitter has to be identical with that of the respective receiver. In principle, it holds true in this connection that one MIDI data line transports all the playing information on all 16 MIDI channels, with the connected tone generators selecting the messages that are intended for them in the given case. With the aid of the last four bits of the status byte the address Ak of a selected MIDI channel k (where k ∈{010, . . . , 1510} or Ak∈{00002, . . . , 11112}) is transmitted. This means that only the first four bits of the status byte comprise the status information of a MIDI sequence (e.g. “Note On”, “Note Off” etc.).
2nd quartet of the1st quartet of thestatus bytestatus byte (status information)(channel address)Type of message1001200002(status byte, binary)Type of message  910  010(status byte,decimal)
For the example of the “Note On” command described above, in concrete terms this means that the first channel (k=0) was selected with the address A0=00002.
If a byte of such a MIDI word is received by a synthesiser, first of all a check is made on the basis of the MSB as to whether a status byte or a data byte is present. The receiver must furthermore check all status bytes with respect to their channel addressing that it, instructed by the respective MIDI-channel setting, has to receive. If a MIDI event with the address intended for it is discovered, the receiver decodes the data bytes following the status byte and generates the corresponding tones. This principle may be compared with the reading of a choir singer or an instrumentalist who, when singing or playing a piece of music, picks out only the voice intended for him from a polyphonic composition, an arrangement or a score.
In the following the functions of two of the commonest control devices to be encountered in synthesisers and master keyboards will be briefly described: the “pitch bend wheel” for continuously variable detuning of the pitches of tones of struck keys of a synthesiser, and the “modulation wheel” for modulation of the timbre properties of the pitches of tones of struck keys of a synthesiser.
The MIDI data format provides, by way of smallest possible interval between different pitches, equally tempered semitone steps or “chromas” (that is to say, enharmonically equivalent intervals, such as augmented unisons, minor seconds or doubly diminished thirds). In order to obtain a continuously variable frequency variation (“pitch bending”), so-called pitch-bend-wheel data are required which can be generated with a corresponding control instrument of the synthesiser (the pitch bend wheel). In this connection it is generally a question of a wheel or (more rarely) a joystick which can be moved in four directions. In the case of synthesisers that are equipped with these pitch bend wheels, it is possible for the pitches of the keys that are pressed down on the keyboard of the synthesiser to be detuned by a rotary motion of the pitch bend wheel in the direction of higher or lower frequencies. Detunings of up to an equally tempered whole tone in the direction of higher or lower frequencies can normally be generated by this means. The pitch bend wheel is, as a rule, equipped with a restoring mechanism which springs back again to the middle or normal position when the wheel is released. This position corresponds to the pitches of the keys pressed down when equally tempered tuning is taken as a basis.
With the aid of a modulation function, which ordinarily can be performed with another control instrument of the synthesiser (the modulation wheel), the tones of the keys struck on the keyboard of the synthesiser may optionally be provided with a vibrato effect. Alternatively, this control instrument may also be employed for the purpose of achieving other effects, for instance in order to modify the brilliance of the sound or the resonance of the tones played by virtue of an alteration to their overtone spectrum. If the modulation wheel is rotated as far as a first stop-point, the depth of effect is minimal; if it is rotated as far as an opposite, second stop-point, the depth of effect is maximal.
In order to be able to explain the control of individual MIDI functions with the aid of parametrised control signals, in the following the most important of the MIDI controllers that are necessary for this will be briefly discussed.
With the aid of the first data byte of a MIDI word, a maximum of 128 different controller addresses and hence up to 128 different playing aids or other MIDI functions can be addressed. The second data byte is responsible for the control range. For example, with the aid of controller No. 1, which is responsible for frequency modulations, a vibrato effect or tremolo effect can be added to the sounds generated by the synthesiser. In the following table the commonest controllers are listed with their numbers (addresses) and designations:
Controller Number (decimal)Controller Designation1Modulation2Breath Controller4Foot Controller5Portamento Time6Data-Entry Slider7Volume8Balance10Panorama11Expression Pedal12 to 31User-Defined91External-Effects Depth92Tremolo Depth93Chorus Depth94Detune Depth95Phaser Depth
Addresses 12 to 31 are not assigned and offer the user opportunities for free assignments of MIDI functions. Depending on the nature of the respective synthesiser, occasionally very extraordinary physical parameters can be allocated to these addresses, for example oscillator frequency or pulse width of the generated oscillations.
Controllers 32 to 38 serve for resolving the range of values of controller addresses 1 to 6 more finely. The same is also brought about by the controllers numbered 39 to 63 for addresses 7 to 31.
The playing aids described hitherto are distinguished by a common characteristic: they can be controlled in continuously variable manner (in 128 or more individual steps). The generic term for these controllers is “continuous controllers”. In contrast, other controllers exist that undertake switch functions and are therefore generally called “switch controllers”. The following table gives an overview of the most important of these controllers:
Controller Number (decimal)Controller Designation64Sustain Pedal65Portamento Switch66Sostenuto67Soft Pedal69Hold-Pedal70 to 90undefined96Data Entry (−/No)97Data Entry (+/Yes)
On closer inspection of the second data byte of Controller 64 only two words are to be found, namely                000000002 (=010) for “Pedal Off” and        111111112 (=12710) for “Pedal On”.        
However, in principle the MIDI data format permits a differentiated interpretation of the range of values 110 to 12610. For example, in this way a multi-level sustain effect can also be provided. In the case of synthesisers that permit this multi-level sustain effect, the decay phase of sound events with “half” depressed pedal is correspondingly shortened in comparison with the decay phase of sound events of the same pitch and loudness in the case of a pedal that has been depressed as far as the stop.
Normally the controller addresses, for example for the modulation wheel, the hold-pedal or swell boxes that can be operated with the foot for the purpose of influencing the dynamics, are already defined.
If one of these playing aids is being used, the controller address that is transmitted is fixed. Modern synthesisers and master keyboards, however, are also provided furthermore with freely definable controllers, i.e. an arbitrary controller-number can be allocated to the playing aids (pedals, swell boxes, wheels and sliding controls) provided for said controllers. Some tone generators permit, in turn, the redefinition of the controller functions in the device itself. In this connection the received controller-number can be freely allocated to an internal function.
For the purpose of generating sounds with the aid of electronic musical instruments, computers and sound cards, nowadays a large number of different technologies and processes are in use. Two widespread techniques for generating sounds with the aid of electronic musical instruments are frequency modulation (FM synthesis) and the use of wave tables (WAV synthesis).
FM synthesis is a process which is utilised by sound cards and computers in order to imitate the timbres of acoustic musical instruments by electronic means. Sounds generated with the aid of FM synthesis are easily recognisable as such—in contrast with sound patterns that have been generated by WAV synthesis with the aid of wave tables. Master keyboards and synthesisers that are provided with wave tables are relatively expensive but are often preferred by many professional and amateur musicians, on account of their high play-back quality. In this case the sound patterns are generated by wave tables which combine stored, digitally sampled original sounds (samples) of acoustic musical instruments with one another and/or reproduce them. Compared with FM synthesis, in which electronic sounds are generated with the aid of the computer, wave-table sound patterns consequently appear to be substantially more realistic.
With the aid of sound cards, the possibilities of conventional computers for generating audio signals and sound patterns can be extended. Sound cards are indispensable for any application that makes use of sound effects. In order to be able to translate analogue audio data into the digital computer language, sound cards are provided with appropriate devices for digitising analogue sounds. In this connection the generation of sounds by a sound card is based either on FM synthesis or on WAV synthesis.