1. Field of the Invention
The present invention relates in general to the field of digital audio systems and, in particular, to systems which include MIDI synthesizers. Still more particularly the present invention relates to a method and apparatus for outputting digital audio and MIDI synthesized music with efficient memory utilization.
2. Description of the Related Art
MIDI, the "Musical Instrument Digital Interface" was established as a hardware and software specification which would make it possible to exchange information including musical notes, program changes, expression control, etc. between different musical instruments or other devices such as sequencers, computers, lighting controllers, mixers, etc. This ability to transmit and receive data was originally conceived for live performances, although subsequent developments have had enormous impact in recording studios, audio and video production, and composition environments.
A standard for the MIDI interface has been prepared and published as a joint effort between the MIDI Manufacturer's Association (MMA) and the Japan MIDI Standards Committee (JMSC). This standard is subject to change by agreement between JMSC and MMA and is currently published as the MIDI 1.0 Detailed Specification, Document Version 4.1, January 1989.
The hardware portion of the MIDI interface operates at 31.25 KBaud, asynchronous, with a start bit, eight data bits and a stop bit. This makes a total of ten bits for a period of 320 microseconds per serial byte. The start bit is a logical zero and the stop bit is a logical one. Bytes are transmitted by sending the least significant bit first. Data bits are transmitted in the MIDI interface by utilizing a five milliamp current loop. A logical zero is represented by the current being turned on and a logical one is represented by the current being turned off. Rise times and fall times for this current loop are less than two microseconds. A five pin DIN connector is utilized to provide a connection for this current loop with only two pins being utilized to transmit the current loop signal. Typically, an opto-isolator is utilized to provide isolation between devices which are coupled together utilizing a MIDI format.
Communication utilizing the MIDI interface is achieved through multi-byte "messages" which consist of one status byte followed by one or two data bytes. There are certain exceptions to this rule. MIDI messages are sent over any of sixteen channels which may be utilized for a variety of performance information. There are five major types of MIDI messages: Channel Voice; Channel Mode; System Common; System Real-Time; and, System Exclusive. A MIDI event is transmitted as a message and consists of one or more bytes.
A channel message in the MIDI system utilizes four bits in the status byte to address the message to one of sixteen MIDI channels and four bits to define the message. Channel messages are thereby intended for the receivers in a system whose channel number matches the channel number encoded in the status byte. An instrument may receive a MIDI message on more than one channel. The channel in which it receives its main instructions, such as which program number to be on and what mode to be in, is often referred to as its "Basic Channel." There are two basic types of channel messages, a Voice message and a Mode message. A Voice message is utilized to control an instrument's voices and Voice messages are typically sent over voice channels. A Mode message is utilized to define the instrument's response to Voice messages, Mode messages are generally sent over the instrument's Basic Channel.
System messages within the MIDI system may include Common messages, Real-Time messages, and Exclusive messages. Common messages are intended for all receivers in a system regardless of the channel that receiver is associated with. Real-Time messages are utilized for synchronization and are intended for all clock based units in a system. Real-Time messages contain status bytes only, and do not include data bytes. Real-Time messages may be sent at any time, even between bytes of a message which has a different status. Exclusive messages may contain any number of data bytes and can be terminated either by an end of exclusive or any other status byte, with the exception of Real-Time messages. An end of exclusive should is sent at the end of a system exclusive message. System exclusive messages always include a manufacturer's identification code. If a receiver does not recognize the identification code it will ignore the following data.
As those skilled in the art will appreciate upon reference to the foregoing, musical compositions may be encoded utilizing the MIDI standard and stored and/or transmitted utilizing substantially less data. The MIDI standard permits the use of a serial listing of program status messages and channel messages, such as "note on" and "note off" as control messages.
When utilized in conjunction with various MIDI-controlled sound generated devices or modules, musical compositions may be recorded and played.
As will hereinafter be detailed, these sound generators or "modules" have taken many forms. In one form, referred to as "wavetable" or subtractive synthesis, stored wave forms (shorter than an entire sampled sound discussed below) are operated upon by filters, voltage controlled amplifiers, and the like to generate or "synthesize" sound. One benefit of this approach in addition to creating new and unusual sound forms not present in nature was that relatively little memory was required, which, in low-end computer systems, can be an extremely precious commodity.
Yet another form of sound generation took the form of sampling, digitizing, and storing an analog acoustic signal, and then subsequently converting it back to analog form during playback. A distinct advantage to this approach was that it frequently could emulate complex acoustic wave forms in a far more realistic and convincing manner than other techniques known in the art. However there was a price to be paid for such realism. The data rate required for such simple sampling systems can be quite enormous with several tens of thousands of bits of data and associated memory being required for each second of audio signal.
As a consequence, many different encoding systems have been developed to decrease the amount of data required in such systems. For example, many modern digital audio systems utilize pulse code modulation (PCM) which employs a variation of a digital signal to represent analog information. Such systems may utilize pulse amplitude modulation (PAM), pulse duration modulation (PDM) or pulse position modulation (PPM) to represent variations in an analog signal.
One variation of pulse code modulation, Delta Pulse Code Modulation (DPCM) achieves still further data compression by encoding only the difference between one sample and the next sample. Thus, despite the fact that an analog signal may have a substantial dynamic range, if the sampling rate is sufficiently high so that adjacent signals do not differ greatly, encoding only the difference between two adjacent signals can save substantial data. Further, adaptive or predictive techniques are often utilized to further decrease the amount of data necessary to represent an analog signal by attempting to predict the value of a signal based upon a weighted sum of previous signals or by some similar algorithm.
In each of these digital audio techniques speech or an audio signal may be sampled and digitized utilizing straightforward processing and digital-to-analog or analog-to-digital conversion techniques to store or recreate the signal.
While the aforementioned digital audio systems may be utilized to accurately store speech or other audio signal samples, even with data compression the substantial penalty in storage requirements must be paid as compared with those required in MIDI-controlled synthesized systems described above. However, in systems where it is desired to recreate realistic human speech or other acoustic sounds, there often exists no appropriate alternative.
Several hybrid approaches were attempted in the prior art seeking to obtain the benefits of synthesized sound such as wave table synthesis and sampled sound hereinbefore discussed. In one such attempt, a parallel implementation of both wavetable synthesis and sampled sounds was provided in hardware, a representative example being the SY77 Synthesizer manufactured by the Yamaha Corporation. Such a synthesizer provided for switching between wavetable or sample-generated sounds and in some limited instances cross-connection between features of each (such as using the VFO of the wavetable synthesizer with a playback of a sampled sound). While thus providing the benefits of both sampled and wavetable synthesis, the obvious limitation of this parallel implementation was the requirement of dual parallel implementations having attendant cost increases.
In still another attempt to provide a hybrid approach offering benefits of wavetable and sampled synthesis, referred to in the art as "LA" synthesis and as implemented representatively by various synthesizers manufactured by the Roland Corporation, the generated waveform was a combination of a sampled and wavetable-generated waveform. It has been found psychoacoustically that much of the character of a sound is identified in the human ear by the information carried in the attack portion of a waveform. Accordingly, in accordance with this technique, a first attack portion of a waveform was generated by means of playback of an actual sampled attack of the desired instrument, thereby lending the necessary realism to the implementation of the sound. This was of course at the cost of memory in that as previously discussed such sampled waveforms, for any reasonable resolution and signal to noise ratios, requires relatively more memory than a corresponding sound genesis technique utilizing synthesis such as wavetable synthesis. Nevertheless, because only the attack portion of the sound was generated by an actual sampled sound, memory was saved which would otherwise have to be used if the entire waveform was a sample playback. The remaining portion of the desired waveform was thence generated by means of the second technique, namely wavetable synthesis which provided more or less the sustained or steady state portion of the desired waveform. Inasmuch as this portion was generated by wavetable synthesis with less severe memory requirements than would otherwise be necessary if this portion of the waveform was generated by a storage sample, savings in memory was thereby realized. Although there were distinct benefits to this hybrid approach such as the ability to generate new sounds which were combinations of sampled and wavetable generated artificial sounds, they were nevertheless serious drawbacks to this approach as well.
First, provision was not made for selecting either or the other modes of sound generation for generating the entirety of the sound. One reason, of course, was that this would defeat the purpose of such a hybrid approach inasmuch as for the sampling case, for example, it would require storage not only of just the attack portion of the sampled waveform but the rest of the waveform (for which the whole approach was directed to saving the memory otherwise necessary to create this portion). Yet another serious drawback to this approach was that there was no provision made for uploading, altering, or otherwise upgrading the sounds by way of altering and adding to the existing sample portions and wavetable parameters.
In yet another attempt to avoid the problems of the aforementioned approaches requiring dual hardware, limitations in upgrading new sounds or providing for a complete sampled or wavetable sound implementation if desired, development also focused on digital signal processor or DSP sound generation. In such an approach, wherein the DSP could implement the sound generation, attempts have been made to reconfigure the DSP dynamically to generate either sampled or synthesized sound as desired. In such an implementation, particularly wherein an expensive multi-tasking DSP system was not provided, it was found necessary to load DSP code implementing either the wavetable or sample-based sound generation, on the fly as well as requiring switching between these various forms of code dynamically in determining based upon the incoming MIDI datastream which mode in the DSP to be switching to.
Such a system was found to be extremely difficult to implement, one alternative being to provide multiple copies of DSP code simultaneously available depending upon the mode desired. The problems with the approach of dynamically loading DSP code, depending upon the sound-generation technique desired, was compounded in multi-tasking operating systems since it was difficult if not impossible to know, due to the ongoing task switching, when the appropriate time was and how to coordinate the loading and switching of the DSP code, again resulting in a need to load complete sets of DSP code and permit the multi-tasking system to perform the switching.
Multimedia is an emerging market wherein MIDI capability is a key multimedia element. However, as previously noted, a serious problem for low-end systems which may become prevalent in homes and school environments is in maintaining low cost of the system which characteristically results in relatively small memory systems, giving rise to the aforementioned problems. As the use of MIDI increases as well, it is likely to further increase adoption by low-end users where equipment expenditures in this area are extremely limited. Thus, techniques are highly sought which will provide for multimedia function to operate on smaller, less expensive systems such as techniques for saving memory. Such memory costs in low-end systems may be the critical difference in successfully providing systems in the high volume, low price market. Specifically, a means was needed to provide for MIDI, including sampled sounds on limited hardware while nevertheless providing the highest quality sound possible within these constraints of low price systems.
It was thus apparent that a need existed for a method and apparatus whereby certain digitized audio samples, such as human speech and acoustical musical sounds, could be recreated and combined with synthesized music utilizing s MIDI data file in such a way as to obtain the benefits of both approaches, while at the same time accounting for these severe limitations imposed on memory availability by low end systems.
More particularly, it was found highly desirable to provide a single hardware configuration implementing multiple modes of sound generation, and in particular, either synthesized (such as wavetable) sounds or sampled sound generation. Still further, it was found desirable to provide for such a system which would not require dynamic reloading of code such as DSP code and which would not require inordinate time to be spent trying to determine which modules of DSP code to execute. Yet a further object was to provide a system providing the benefits of both synthesized and sampled sounds wherein it was nevertheless possible to upgrade the system with improved synthesized and sampled sounds. Still further, it was desired to implement the system wherein a basic set of acceptable sounds was provided (such as the standard 175 general MIDI implementation sounds) implemented with a reasonably cost effective yet pleasing system such as wavetable synthesis, and wherein, if desired, the user might nevertheless upgrade the quality of these sounds to sampled sounds which could be automatically substituted for the corresponding general MIDI wavetable synthesized sounds if available as desired and as the system resources permitted.
These and other benefits are provided by the invention which will best be understood by reference to the following detailed description of an illustrative embodiment when read in conjunction with the accompanying drawings wherein: