1. Field of the Invention
The present invention relates to differentiated digital voice and music processing, noise filtering, creation of special effects as well as a device for carrying out said method.
2. Description of the Related Art
More particularly its purpose is to transform the voice in a realistic or original manner and, more generally, to process the voice, music and ambient noise in real time and to record the results obtained on a data processing medium.
It applies in particular, but not exclusively, to the general public and to sound professionals who wish to transform the voice for games applications, process the voice and music differently, create special effects, reduce ambient noise, and record the results obtained in compressed digital form.
In a general manner, it is known that the vocal signal comprises a mixture of very complex transient signals (consonants) and of quasi-periodic parts of signal (harmonic sounds). The consonants can be small explosions: P, B, T, D, K, GU; soft diffused consonants: F, V, J, Z or hard ones CH, S; with regard to the harmonic sounds, their spectrum varies with the type of vowel and with the speaker.
The ratios of intensity between the consonants and the vowels change according to whether it is a conversational voice, a spoken voice of the lecturing type, a strong shouted voice or a sung voice. The strong voice and the sung voice favour the vowel sounds to the detriment of the consonants.
The vowel signal simultaneously transmits two types of messages: a semantic message conveyed by the speech, a verbal expression verbal of thought, and an aesthetic message perceptible through the aesthetic qualities of the voice (timbre, intonation, speed, etc.).
The semantic content of speech, the medium of good intelligibility, is practically independent of the qualities of the voice; it is conveyed by the temporal acoustic forms; a whispered voice consists only of flowing sounds; an “intimate” or close voice consists of a mixture of harmonic sounds in the low frequencies and of flowing sounds in the high frequencies; the voice of a lecturer or of a singer has a rich and intense vocal spectrum.
With regard to musical instruments, these are characterized by their tessitura, i.e. the frequency range of all the notes that they can emit. However, very few instruments have a “harmonic sound”, that is to say an intense fundamental accompanied by harmonics whose intensity decreases with rank.
On the other hand, the musical tessitura and the spectral content are not directly related; certain instruments have maxima of energy included in the tessitura; others exhibit a well defined maximal energy zone, situated at the high limit of the tessitura and beyond; others, finally, have widely spread maxima of energy which extend greatly beyond the high limit of the tessitura.
Moreover, it is known that the analogue processing of these complex signals, for example their amplification, causes an unavoidable degradation which increases as said processing progresses and does so in an irreversible manner.
The originality of digital technologies is to introduce the greatest possible determinism (i.e. an a priori knowledge) at the level of the processed signals in such a way as to carry out special processing operations which will be in the form of calculations.
Thus, if the signal representing a sound, originally in its natural form of vibrations, is converted into a digital signal provided with the previously mentioned properties, this signal will be processed without undergoing degradation such as background noise, distortion and limitation of pass band; furthermore, it can be processed in order to create special effects such as the transformation of the voice, the suppression of the ambient noise, the modification of the breathing of the voice and differentiation between voice and music.
Audio-digital technology of course comprises the following three main stages:                the conversion of the analogue signal into a digital signal,        the desired processing, transposed into equations to be solved,        the conversion of the digital signal into an analogue signal since the last link in the chain generates acoustic vibrations.        
In a general manner, it is known that sound processing devices, referred to by the term vocoder, comprise the following four functions:                analysis,        coding,        decoding,        synthesis.        
The patent US 2002/184009 (HEIKKINEN Ari) of 5th Dec. 2002 proposes a method for the suppression of the variation of pitch by individually displacing the pulses of the pitch of the analysis frame in order to obtain a fixed pitch.
The patent WO 01/59766A (COMSAT) of 16th Aug. 2001 proposes a technique for the reduction of noise using linear prediction.
The U.S. Pat. No. 5,684,262 A describes a method which consists of multiplying the original voice by a tonality in order to obtain a frequential shift and to thus obtain a voice which is lower or higher.
Moreover, data compression methods are used essentially for digital storage (for the purpose of reducing the bit volume) and for transmission (for the purpose of reducing the necessary data rate). These methods include a processing prior to the storage or to the transmission (coding) and a processing on retrieval (decoding).
From among the data compression methods, those using perceptual methods with losses of information are the most used and in particular the MPEG Audio method.
This method is based on the masking effect of human hearing, i.e. the disappearance of weak sounds in the presence of strong sounds, equivalent to a shifting of the hearing threshold caused by the strongest sound and depending on the frequency and amplitude difference between the two sounds.
Thus, the number of bits per sample is defined as a function of masking effect, given that the weak sounds and the quantification noise are inaudible. In order to draw the most advantage from this masking effect, the audio spectrum is divided into a certain number of sub-bands, thus making it possible to specify the masking level in each of the sub-bands and to carry out a bit allocation for each of them.
The MPEG audio method thus consists in:                digitizing in 16 bits with sampling at 48 kHz,        deriving the masking curve between 20 Hz and 20 kHz,        dividing the signal into 32 sub-bands,        evaluating the maximum amplitude reached in each sub-band and during 24 ms,        evaluating the amplitude of just inaudible quantification noise,        allocating the number of bits for the coding,        generating the number of bits in the sub-band,        packaging this data in a data frame which is repeated every 24 ms.        
This technique consists in transmitting a bit rate that is variable according to the instantaneous composition of the sound.
However, this method is more adapted to the processing of music and not of the vocal signal; it does not make it possible to detect the presence of voice or of music, to separate the vocal or musical signal and noise, to modify the voice in real time for synthesizing a different but realistic voice, to synthesize breathing (noise) in order to create special effects, to code a vocal signal comprising a single voice or to reduce the ambient noise.