The present invention relates to tone synthesis apparatus, methods and programs for generating waveforms of tones, voices or other desired sounds, for example, on the basis of readout of waveform data from a memory or the like while varying a timbre and rendition style (or articulation) of the tones, voices or other sounds. More particularly, the present invention relates to an improved tone synthesis apparatus, method and program which perform control to reduce a delay in tone generation (i.e., tone generation delay) etc. that may occur during, for example, a real-time performance.
In recent years, there has been known a tone waveform control technique called “SAEM” (Sound Articulation Element Modeling), which is intended for realistic reproduction and control of various rendition styles (various types of articulation) peculiar to natural musical instruments. Among examples of equipment using the SAEM technique is an apparatus disclosed in Japanese Patent Application Laid-open Publication No. HEI-11-167382 (hereinafter referred to as “patent literature 1”). The conventionally-known apparatus equipped with a tone generator using the SAEM technique, such as the one disclosed in patent literature 1, are arranged to generate a continuous tone waveform by time-serially combining a plurality of ones of rendition style modules prepared in advance for individual portions of tones, such as an attack-related rendition style module defining an attack waveform, release-related rendition style module defining a release waveform, body-related rendition style module defining a body waveform (intermediate waveform) constituting a steady portion of a tone and a joint waveform interconnecting tones. For example, the apparatus can generate a waveform of an entire tone by crossfade-synthesizing waveforms of individual portions of the tone using an attack-related rendition module for an attack portion, i.e. a rise portion, of the tone, one or more body-related rendition modules for a body portion, i.e. a steady portion, of the tone and a release-related rendition style module for a release portion, i.e. a fall portion, of the tone. Also, by using a joint-related rendition style module in place of the release-related rendition style module, the apparatus can also generate a series of waveforms of a plurality of successive tones (or tone portions) connected together by a desired rendition style. Note that, in this specification, the terms “tone waveform” are used to mean a waveform of a voice or any desired sound rather than being limited only to a waveform of a musical tone.
Further, there have been known apparatus which allow a human player to selectively designate in real time rendition styles to be used, among which is the one disclosed in Japanese Patent Application Laid-open Publication No. 2004-78095 (hereinafter referred to as “patent literature 2”).
In apparatus equipped with a tone generator capable of sequentially varying the tone color and rendition style (or articulation) while sequentially crossfade-synthesizing a plurality of waveforms on the basis of a tone synthesis technique as represented by the SAEM synthesis technique, such as those disclosed in patent literature 1 and patent literature 2 mentioned above, at least two tone generating channels are used for synthesis of a tone to additively synthesize waveforms allocated to the tone generating channels while frequently fading out and fading in output tone volumes of the individual tone generating channels, to thereby output a waveform of the entire tone. Example of such tone synthesis is outlined in FIG. 9. More specifically, FIG. 9 is a conceptual diagram showing a general picture of the conventionally-known tone synthesis where synthesis of a tone is performed using two, i.e. first and second, tone generating channels. In FIG. 9, the horizontal axis represents the time, while the vertical axis the respective output volumes of the first and second tone generating channels. Further, to facilitate understanding, the respective output volumes of the two tone generating channels are shown in FIG. 9 as linearly controlled from 0% to 100% within each crossfading time period. Further, in FIG. 9, time point t2, t3, t5 and t6 each represents a point when switching between rendition style modules to be used is completed. These rendition style switching time points t2, t3, t5 and t6, i.e. time positions of the rendition style modules, are determined in advance, in corresponding relation to rendition style modules corresponding to performance operation or operation of rendition-style operators (e.g., rendition style switches) by a human operator, in response to the operation and on the basis of data lengths specific to the rendition style modules designated in accordance with the operation, respective start times of the rendition style modules (which correspond to completion times of individual crossfade syntheses and each of which is variable in accordance with a time vector value or the like varying in accordance with the passage of time), etc.
As seen in FIG. 9, once a note-on event is instructed (more specifically, once note-on even data is received) at time point t0 in response to performance operation by the human player, synthesis of a tone waveform in the form of a non-loop waveform corresponding to an attack portion is started in the first tone generating channel. Following the synthesis of the non-loop waveform corresponding to the attack portion, synthesis of a tone waveform A that is a steady waveform constituting part of the attack waveform and in the form of a loop waveform to be read out repetitively (such a loop waveform is depicted in the figure in a solid-line vertically-elongated rectangle) is started in the first tone generating channel. Then, from the time point (t1), when the synthesis of the tone waveform A has been started, onward, the output volume of the first tone generating channel is gradually decreased from 100% to 0% to thereby fade out the tone waveform A. Simultaneously with the fading-out of the tone waveform A, the output volume of the second tone generating channel is gradually increased from 0% to 100% to thereby fade in a tone waveform B (loop waveform) corresponding to a body portion of the tone. In response to such fade-out/fade-in control, the waveforms of the first and second tone generating channels are additively synthesized into a single loop-reproduced waveform. The thus crossfade-synthesized loop-reproduced waveform smoothly varies from the tone waveform A to the tone waveform B.
Once the output volume of the first tone generating channel reaches 0% and the output volume of the second tone generating channel 100% (time point t2), synthesis of another tone waveform C (loop waveform) constituting the body portion is started in a fading-in manner, and simultaneously fade-out of the tone waveform B in the second tone generating channel is started. Then, once the output volume of the first tone generating channel reaches 100% and the output volume of the second tone generating channel 0% (time point t3), synthesis of still another tone waveform D (loop waveform) constituting the body portion is started in a fading-in manner, and simultaneously fade-out of the tone waveform C in the first tone generating channel is started. In this way, as long as the body portion lasts, the tone is synthesized while fade-in/fade-out is alternately repeated in the first and second tone generating channels with the tone waveform to be used sequentially switched from one to another. Once a note-off event is instructed (more specifically, once note-off even data is received) at time point t4 in response to performance operation by the human player, transition or shift to a non-loop release waveform by way of a steady tone waveform E (loop waveform) constituting part of the release waveform is started after completion of crossfade between the tone waveform C of the first tone generating channel and the tone waveform D of the second tone generating channel (i.e., at time point t5 later by Δt than time point t4 when the note-off instruction was given). In this way, the individual waveforms defined by the above-mentioned rendition style modules connected together can be smoothly connected together by crossfade synthesis between the loop waveforms, so that a continuous tone waveform can be formed as a whole.
In the conventionally-known apparatus equipped with a tone generator using the SAEM technique, as noted above, rendition style modules are allotted in advance to the time axis in response to real-time performance operation, selection instruction operation, etc. by the human player and in accordance with the respective start times of the rendition style modules, and cross-face waveform synthesis is performed between the thus-allotted rendition style modules to thereby generate a continuous tone waveform. Stated differently, the tone synthesis is carried out in accordance with previously-determined crossfade time lengths. However, if the crossfade time lengths are determined in advance, it is not possible to appropriately respond to, or deal with, sudden performance instructions, such as note-off operation during a real-time performance or note-on operation of a tone during generation of another tone. Namely, when a sudden performance instruction has been given, the conventionally-known apparatus shift to a release waveform (or joint waveform) only after crossfade synthesis having already been started at the time point when the performance instruction was given is completed, so that complete deadening of the previous tone would be delayed by an amount corresponding to the waiting time till the completion of the crossfade synthesis and thus start of generation of the next tone would be delayed by that amount.