1. Field of Invention
The present invention relates to an apparatus for acoustically improving an environment, and particularly to an electronic sound screening system for this purpose.
2. Description of Related Art
In order to understand the present invention, it is necessary first to appreciate something of the human auditory system, and the following description is based on known research conclusions and data available in handbooks on the experimental psychology of hearing, and in particular in “Auditory Scene Analysis, The Perceptual Organization of Sound” by Albert S. Bregman, published by MIT Press, Massachusetts.
The human auditory system is overwhelmingly complex, both in design and in function. It comprises thousands of receptors connected by complex neural networks to the auditory cortex in the brain. Different components of incident sound excite different receptors, which in turn channel information towards the auditory cortex through different neural network routes.
The response of an individual receptor to a sound component is not always the same; it depends on various factors such as the spectral make up of the sound signal and the preceding sounds, as these receptors can be tuned to respond to different frequencies and intensities. Furthermore, the neural network route for the sound information can change and so can the destination. All of the above, combined with the sheer number of receptors and neurones connecting them to the auditory cortex, enable the auditory system to decode simple pressure variations to create a highly complex, three-dimensional view of auditory space.
Masking Principles
Masking is an important and well-researched phenomenon in auditory perception. It is defined as the amount (or the process) by which the threshold of audibility for one sound is raised by the presence of another (masking) sound. The principles of masking are based upon the way the ear performs spectral analysis. A frequency-to-place transformation takes place in the inner ear, along the basilar membrane. Distinct regions in the cochlea, each with a set of neural receptors, are tuned to different frequency bands, which are called critical bands. The spectrum of human audition can be divided into several critical bands, which are not equal.
In simultaneous masking the masker and the target sounds coexist. The target sound specifies the critical band. The auditory system “suspects” there is a sound in that region and tries to detect it. If the masker is sufficiently wide and loud, the target sound cannot be heard. This phenomenon can be explained in simple terms on the basis that the presence of a strong noise or tone masker creates an excitation of sufficient strength on the basilar membrane at the critical band location of the inner ear effectively to block the transmission of the weaker signal.
For an average listener, the critical bandwidth can be approximated by:
            BW      c        ⁡          (      f      )        =      25    +                            75          ⁡                      [                          1              +                              1.4                ·                                                      (                                          f                      1000                                        )                                    2                                                      ]                          069            ⁢                          ⁢              (        Hz        )            where BWc is the critical bandwidth in Hz and f the frequency in Hz.
Also, Bark is associated with frequency f via the following equations:
                    Bark            =              f        100              ,          f      >              500        ⁢                                  ⁢        Hz                                Bark            =              9        +                  4          ⁢                                          ⁢                      log            2                    ⁢                      f            100                                ,          f      >              500        ⁢                                  ⁢        Hz            
A masker sound within a critical band has some predictable effect on the perceived detection of sounds in other critical bands. This effect, also known as the spread of masking, can be approximated by a triangular function, which has slopes of +25 and −10 dB per bark (distance of 1 critical band), as shown in accompanying FIG. 23.
Principles of the Perceptual Organization of Sound
The auditory system performs a complex task; sound pressure waves originating from a multiplicity of sources around the listener fuse into a single pressure variation before they enter the ear; in order to form a realistic picture of the surrounding events the listener's auditory system must break down this signal to its constituent parts so that each sound-producing event is identified. This process is based on cues, pieces of information which help the auditory system assign different parts of the signal to different sources, in a process called grouping or auditory object formation. In a complex sound environment there are a number of different cues, which aid listeners to make sense of what they hear.
These cues can be auditory and/or visual or they can be based on knowledge or previous experience. Auditory cues relate to the spectral and temporal characteristics of the blending signals. Different simultaneous sound sources can be distinguished, for example, if their spectral qualities and intensity characteristics, or if their periodicities are different. Visual cues, depending on visual evidence from the sound sources, can also affect the perception of sound.
Auditory scene analysis is a process in which the auditory system takes the mixture of sound that it derives from a complex natural environment and sorts it into packages of acoustic evidence, each probably arising from a single source of sound. It appears that our auditory system works in two ways, by the use of primitive processes of auditory grouping and by governing the listening process by schemas that incorporate our knowledge of familiar sounds.
The primitive process of grouping seems to employ a strategy of first breaking down the incoming array of energy to perform a large number of separate analyses. These are local to particular moments of time and particular frequency regions in the acoustic spectrum. Each region is described in terms of its intensity, its fluctuation pattern, the direction of frequency transitions in it, an estimate of where the sound is coming from in space and perhaps other features. After these numerous separate analyses have been done, the auditory system has the problem of deciding how to group the results so that each group is derived from the same environmental event or sound source.
The grouping has to be done in two dimensions at the least: across the spectrum (simultaneous integration or organization) and across time (temporal grouping or sequential integration). The former, which can also be referred to as spectral integration or fusion, is concerned with the organization of simultaneous components of the complex spectrum into groups, each arising from a single source. The latter (temporal grouping or sequential organization) follows those components in time and groups them into perceptual streams, each arising from a single source again. Only by putting together the right set of frequency components over time can the identity of the different simultaneous signals be recognized.
The primitive process of grouping works in tandem with schema-based organization, which takes into account past learning and experiences as well as attention, and which is therefore linked to higher order processes. Primitive segregation employs neither past learning nor voluntary attention. The relations it creates tend to be valid clues over wide classes of acoustic events. By contrast, schemas relate to particular classes of sounds. They supplement the general knowledge that is packaged in the innate heuristics by using specific learned knowledge.
Grouping
A number of auditory phenomena have been related to the grouping of sounds into auditory streams, including in particular those related to speech perception, the perception of the order and other temporal properties of sound sequences, the combining of evidence from the two ears, the detection of patterns embedded in other sounds, the perception of simultaneous “layers” of sounds (e.g., in music), the perceived continuity of sounds through interrupting noise, perceived timbre and rhythm, and the perception of tonal sequences.
Spectral integration is pertinent to the grouping of simultaneous components in a sound mixture, so that they are treated as arising from the same source. The auditory system looks for correlations or correspondences among parts of the spectrum, which would be unlikely to have occurred by chance. Certain types of relations between simultaneous components can be used as clues for grouping them together. The effect of this grouping is to allow global analyses of factors such as pitch, timbre, loudness, and even spatial origin to be performed on a set of sensory evidence coming from the same environmental event.
Many of the factors that favor the grouping of a sequence of auditory inputs are features that define the similarity and continuity of successive sounds. These include fundamental frequency, temporal proximity, shape of spectrum, intensity, and apparent spatial origin. These characteristics affect the sequential aspect of scene analysis, in other words the use of the temporal structure of sound.
Generally, it appears that the stream forming process follows principles analogous to the principle of grouping by proximity. High tones tend to group with other high tones if they are adequately close in time. In the case of continuous sounds it appears that there is a unit forming process that is sensitive to the discontinuities in sound, particularly to sudden rises in intensity, and that creates unit boundaries when such discontinuities occur. Units can occur in different time scales and smaller units can be embedded in larger ones.
In complex tones, where there are many frequency components, the situation is more complicated as the auditory system estimates the fundamental frequency of the set of harmonics present in sound in order to determine the pitch. The perceptual grouping is affected by the difference in fundamental frequency pitch) and/or by the difference in the average of partials (brightness) in a sound. They both affect the perceptual grouping and the effects are additive.
A pure tone has a different spectral content than a complex tone; so, even if the pitches of the two sounds are the same, the tones will tend to segregate into different groups from one another. However another type of grouping may take effect: a pure tone may, instead of grouping with the entire complex tone following it, group with one of the frequency components of the latter.
Location in space may be another effective similarity, which influences temporal grouping of tones. Primitive scene analysis tends to group sounds that come from the same point in space and segregate those that come from different places. Frequency separation, rate, and the spatial separation combine to influence segregation. Spatial differences seem to have their strongest effect on segregation when they are combined with other differences between the sounds.
In a complex auditory environment where distracting sounds may come from any direction on the horizontal plane, localization seems to be very important, as disrupting the localization of distracting sound sources can weaken the identity of particular streams.
Timbre is another factor that affects the similarity of tones and hence their grouping into streams. The difficulty is that timbre is not a simple one-dimensional property of sounds. One distinct dimension however is brightness. Bright tones have more of their energy concentrated towards high frequencies than dull tones do, since brightness is measured by the mean frequency obtained when all the frequency components are weighted according to their loudness. Sounds with similar brightness will tend to be assigned to the same stream. Timbre is a quality of sound that can be changed in two ways: first by offering synthetic sound components to the mixture, which will fuse with the existing components; and second by capturing components out of a mixture by offering them better components to group with.
Generally speaking, the pattern of peaks and valleys in the spectra of sounds affects their grouping. However there are two types of spectra similarity, when two tones have their harmonics peaking at exactly the same frequencies and when corresponding harmonics are of proportional intensity (if the fundamental frequency of the second tone is double that of the first, then all the peaks in the spectrum would be at double the frequency). Available evidence has shown that both forms of spectra similarity are used in auditory scene analysis to group successive tones.
Continuous sounds seem to hold better as a single stream than discontinuous sounds do. This occurs because the auditory system tends to assume that any sequence that exhibits acoustic continuity has probably arisen from one environmental event.
Competition between different factors results in different organizations; it appears that frequency proximities are competitive and that the system tries to form streams by grouping the elements that bear the greatest resemblance to one another. Because of the competition, an element can be captured out of a sequential grouping by giving it a better sound to group with.
The competition also occurs between different factors that favor grouping. For example in a four tone sequence ABXY if similarity in fundamental frequencies favors the groupings AB and XY, while similarity in spectral peaks favors the grouping AX and BY, then the actual grouping will depend on the relative sizes of the differences.
There is also collaboration as well as competition. If a number of factors all favor the grouping of sounds in the same way, the grouping will be very strong, and the sounds will always be heard as parts of the same stream. The process of collaboration and competition is easy to conceptualize. It is as if each acoustic dimension could vote for a grouping, with the number of votes cast being determined by the degree of similarity with that dimension and by the importance of that dimension. Then streams would be formed, whose elements were grouped by the most votes. Such a voting system is valuable in evaluating a natural environment, in which it is not guaranteed that sounds resembling one another in only one or two ways will always have arisen from the same acoustic source.
Schemas
Primitive processes of scene analysis are assumed to establish basic groupings amongst the sensory evidence, so that the number and the qualities of the sounds that are ultimately perceived are based on these groupings. These groupings are based on rules which take advantage of fairly constant properties of the acoustic world, such as the fact that most sounds tend to be continuous, to change location slowly and to have components that start and end together. However, auditory organization would not be complete if it ended there. The experiences of the listener are also structured by more refined knowledge of particular classes of signals, such as speech, music, animal sounds, machine noises and other familiar sounds of our environment.
This knowledge is captured in units of mental control called schemas. Each schema incorporates information about a particular regularity in our environment. Regularity can occur at different levels of size and spans of time. So, in our knowledge of language we would have one schema for the sound “a”, another for the word “apple”, one for the grammatical structure of a passive sentence, one for the give and take pattern in a conversation and so on.
It is believed that schemas become active when they detect, in the incoming sense data, the particular data that they deal with. Because many of the patterns that schemas look for extend over time, when part of the evidence is present and the schema is activated, it can prepare the perceptual process for the remainder of the pattern. This process is very important for auditory perception, especially for complex or repeated signals like speech. It can be argued that schemas, in the process of making sense of grouped sounds, occupy significant processing power in the brain. This could be one explanation for the distracting strength of intruding speech, a case where schemas are involuntarily activated to process the incoming signal. Limiting the activation of these schemas either by affecting the primitive groupings, which activate them, or by activating other competing schemas less “computationally expensive” for the brain reduces distractions.
There are cases in which primitive grouping processes seem not to be responsible for the perceptual groupings. In these cases schemas select evidence that has not been subdivided by primitive analysis. There are also examples that show another capacity: the ability to regroup evidence that has already been grouped by primitive processes.
Our voluntary attention employs schemas as well. For example, when we are listening carefully for our name being called out among many others in a list we are employing the schema for our name. Anything that is being listened for is part of a schema, and thus whenever attention is accomplishing a task, schemas are participating.
It will be appreciated from the above that the human auditory system is closely attuned to its environment, and unwanted sound or noise has been recognized as a major problem in industrial, office and domestic environments for many years now. Advances in materials technology have provided some solutions. However, the solutions have all addressed the problem in the same way, namely: the sound environment has been improved either by decreasing or by masking noise levels in a controlled space.
Conventional masking systems generally rely on decreasing the signal to noise ratio of distracting sound signals in the environment, by raising the level of the prevailing background sound. A constant component, both in frequency content and amplitude, is introduced into the environment so that peaks in a signal, such as speech, produce a low signal to noise ratio. There is a limitation on the amplitude level of such a steady contribution, defined by the user acceptance: a level of noise that would mask even the higher intruding speech signals would probably be unbearable for prolonged periods. Furthermore this component needs to be wide enough spectrally to cover most possible distracting sounds.
This, relatively inflexible approach, has been regarded hitherto as a major guideline in the design of spaces and/or systems as far as noise distraction is concerned.