The present invention relates generally to tone synthesis apparatus and methods for synthesizing tones, voices or other desired sounds on the basis of waveform sample data stored in a waveform memory or the like, and programs therefore. More particularly, the present invention relates to an improved tone synthesis apparatus and method for performing tone synthesis in a note connecting (or waveform connecting) portion, provided for continuously connecting between adjoining or successive tones or notes with no discontinuity or break therebetween, without involving an auditory tone generating delay.
Heretofore, the so-called AEM (Articulation Element Modeling) technique has been known as a technique for facilitating realistic reproduction and reproduction control of various rendition styles (various types of articulation) peculiar to natural musical instruments. As known in the art, the AEM technique can generate a continuous tone waveform with high quality by time-serially combining a plurality of ones of rendition style modules corresponding to various portions of tones, such as attack rendition style modules each representative of a rise (i.e., attack) portion of a tone, body rendition style modules each representative of a steady portion (or body portion) of a tone, release style modules each representative of a fall (i.e., release) portion and joint rendition style modules each representative of a note (or waveform) connecting portion (or joint portion) for continuously connecting between successive notes with no break therebetween using a desired rendition style like a legato rendition style. Note that, throughout this specification, the terms “tone waveform” are used to mean a waveform of a voice or any desired sound rather than being limited only to a waveform of a musical tone. One of various examples of inventions pertaining to such an AEM technique is disclosed in U. S. patent application publication No. 2002-0143545 corresponding to Japanese Patent Application Laid-open Publication No. 2002-287759.
FIG. 8 shows an example of a continuous tone waveform connecting between successive notes, with no break therebetween, using a conventionally-known joint rendition style module. As shown in (a) of FIG. 8, when note-on information of a succeeding one of successive notes has been acquired, as performance information (e.g., MIDI information), prior to acquisition of note-off information of a preceding one of the notes, a continuous tone waveform connecting between successive notes, with no break therebetween, using a desired rendition style is provided by representing the preceding note (i.e., tone to be generated first) with an attack rendition style module and body rendition style module, representing the succeeding note (i.e., tone to be generated following the preceding note) with a body rendition style module and release rendition style module and further interconnecting the respective body rendition style modules with a joint rendition style module. Further, in forming a continuous tone waveform by time-serially combining a plurality of rendition style modules, the AEM technique uses crossfade synthesis or crossfade connection for interconnecting rendition style modules in a crossfading manner without involving unnaturalness of a tone. Thus, in this case too, a body rendition style module and joint rendition style module of a preceding note, another joint rendition style module and body rendition style module of a succeeding note are connected together in a crossfade fashion (i.e., “crossfade-connected”) using loop waveforms L1, L2, L3 and L4 (indicated by vertically-elongated rectangular blocks) adjoining the respective joint rendition style modules.
To facilitate clear understanding, a non-loop waveform of a first half section of a conventionally-known joint rendition style module, corresponding to a region of the module where a tone pitch of a preceding (or first) tone, is mainly heard or auditorily perceived as compared to a tone pitch of a succeeding (or second) tone will be referred in this specification to as “preceding-note region” (indicated in the figure by a hatched section PR), and a non-loop waveform of a second half section of the joint rendition style module, corresponding to a region of the module where the tone pitch of the succeeding note, is mainly heard or auditorily perceived as compared to the tone pitch of the preceding note (more specifically, region following a point where the tone pitch of the preceding note shifts to the tone pitch of the succeeding note) will be referred to as “succeeding-note region”. Also, note that the terms “loop waveform” are used to refer to a waveform that is read out in a repetitive (or looped) fashion.
In a case where a tone is synthesized using a joint rendition style module, there may sometimes be caused an auditory tone generating delay before a succeeding one of successive notes starts to be heard; such an auditory tone generating delay will hereinafter referred to also as “latency”. As seen in (a) of FIG. 8, tone synthesis based on a joint rendition style module is started after receipt of note-on information of a succeeding one of successive notes in question. Further, with the joint rendition style module, the human can not auditorily perceive that sounding of the succeeding note has started (i.e., that there has occurred a shift from the tone pitch of the preceding note to the tone pitch of the succeeding note) before tone synthesis timing shifts from the preceding-note region to the succeeding-note region. Therefore, if the preceding-note region of the joint rendition style module has a relatively long time length (see the hatched section PR in (b) of FIG. 8), it would be a long time before synthesis of a tone of the succeeding-note region is started after start of synthesis of a tone of the preceding-note region, so that a human player etc. may feel a tone generating delay (latency) of the succeeding-note region. Particularly, the time length of the preceding-note region of the joint rendition style module depends on the tone pitch of the preceding note; that is, if the preceding note has a high tone pitch, the preceding-note region would have a shorter time length, while, if the preceding note has a low tone pitch, the preceding-note region would have a longer time length. Further, depending on the type of a musical instrument in question, it is necessary to set the preceding-note region to a rather long time length in view of a possible influence of the tone pitch shift or transition (e.g., where the musical instrument is a trombone). Therefore, a low pitch tone tends to cause a latency more notably than a high pitch tone, and, depending on the type of the musical instrument, such a latency tends to be always felt. Thus, it has been conventional to acquire (or pre-read) performance information (e.g., MIDI information) of a succeeding note prior to arrival of predetermined performance timing, so as to synthesize a tone by allotting a joint rendition style module to an appropriate time position in consideration of a time length of a preceding-note region based on the pre-read performance information of the succeeding note (so-called “playback performance”). In such a playback performance, a latency resulting from the use of a joint rendition style module seldom becomes a problem because time adjustments are made as noted above.
However, in a real-time performance where tones are sequentially synthesized in response to actual performance operation by a human player, a latency in a connecting portion between tones or notes would become a problem. Namely, in a real-time performance, unlike in the aforementioned playback performance, performance information, such as note-on information and note-off information, corresponding to actual performance operation can of course not be acquired prior to the actual performance operation, and the performance information is supplied in real time in response to actual performance operation. Therefore, a succeeding one of successive notes is inevitably influenced by the time length of the preceding-note region of a joint rendition style module used, so that a latency would be undesirably produced for the succeeding note in a connecting portion between the notes or tones.