A portion of this disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office Patent files or records, but otherwise reserves all copyright rights whatsoever.
The present invention relates generally to a method and apparatus that senses stimuli, creates internal representations of them, and uses its sensor data and their representations to understand aspects of the nature of the stimuli (e.g., recognize them). More specifically, the present invention is a method and apparatus that represents sensed stimuli in a manner that is invariant under systematic transformations of the device""s sensor states. This device need not recalibrate its detector and/or retrain its pattern analysis module in order to account for sensor state transformations caused by extraneous processes (e.g., processes affecting the condition of the device""s detectors, the channel between the stimuli and the device, and the manner of presentation of the stimuli themselves).
Most intelligent sensory devices contain pattern recognition software for analyzing the state of the sensors that detect stimuli in the device""s environment. This software is usually xe2x80x9ctrainedxe2x80x9d to classify a set of sensor states that are representative of the xe2x80x9cunknownxe2x80x9d sensor states to be subsequently encountered. For instance, an optical character recognition (OCR) device might be trained on letters and numbers in images of printed pages. Or, a speech recognition device may be trained to recognize the spoken words of a particular speaker. After these devices have been trained, their performance may be degraded if the correspondence between the stimuli and sensor states is altered by factors extrinsic to the stimuli of interest. For example, the OCR device may be xe2x80x9cconfusedxe2x80x9d by distortions of pixel patterns due to a derangement of the camera""s optical/electronic path, or it may be unfamiliar with pixel intensity changes due to altered intensity of illumination of the printed page. Similarly, the speech recognition device may be compromised if the microphone""s output signal is altered by changes in the microphone""s internal response characteristics, or it may fail to recognize words if the frequency spectrum of sound is altered by changes in the transfer function of the xe2x80x9cchannelxe2x80x9d between the speaker""s lips and the microphone. These processes systematically deform the sensor states elicited by stimuli and thereby define a mapping of sensor states onto one another. If such transformations map one of the sensor states in the training set onto another one (e.g., the pixel intensity pattern of one letter is mapped onto that of another letter), the pattern recognition software will misclassify the corresponding stimuli. Likewise, the device will not recognize a stimulus in the training set if it""s original sensor state has been transformed into one outside of the training set.
These problems can be addressed by periodically recalibrating the device""s detector to account for sensor state transformations caused by changed conditions. For example, the device can be exposed to a stimulus consisting of a test pattern that produces a known sensor state under xe2x80x9cnormalxe2x80x9d conditions. The observed differences between the actual sensor state and ideal sensor state for this test stimulus can be used to correct subsequently encountered sensor states. Alternatively, the device""s pattern analysis (e.g. pattern recognition) module can be retrained to recognize the transformed sensor states. These procedures must be implemented after each change in observational conditions in order to account for time-dependent distortions. Because the device may not be able to detect the presence of such a change, it may be necessary to recalibrate or retrain it at short fixed intervals. However, this will decrease the device""s duty cycle by frequently taking it xe2x80x9coff-linexe2x80x9d. Furthermore, the recalibration or retraining process may be logistically impractical in some applications (e.g., computer vision and speech recognition devices at remote locations).
A similar problem occurs when the fidelity of electronic communication is degraded due to distortion of the signal as it propagates through the transmitter, receiver, and the channel between them. Most communications systems attempt to correct for these effects by periodically transmitting calibration data (e.g., test patterns) so that the receiver can characterize the distortion and then compensate for it by xe2x80x9cunwarpingxe2x80x9d subsequently received signals. As mentioned above, these techniques may be costly because they periodically take the system xe2x80x9coff-linexe2x80x9d or otherwise reduce its efficiency.
The present invention substantially overcomes the disadvantages of prior sensory devices by providing a novel self-referential method and apparatus for creating stimulus representations that are invariant under systematic transformations of sensor states. Because of the invariance of the stimulus representations, the device effectively xe2x80x9cfilters outxe2x80x9d the effects of sensor state transformations caused by extraneous processes (e.g., processes affecting the condition of the sensory device, the channel between the stimulus and the sensory device, and the manner of presentation of the stimulus itself). This means that the device can use these invariant representations to understand the nature of the stimuli (e.g., to recognize them), without explicitly accounting for the transformative processes (e.g., without recalibrating the device""s detector and without retraining its pattern recognition module).
The behavior of this device mimics some aspects of human perception, which is remarkably invariant when raw signals are distorted by a variety of changes in observational conditions. This has been strikingly illustrated by experiments in which subjects wore goggles creating severe geometric distortions of the observed scene. For example, the visual input of some subjects was warped non-linearly, inverted, and/or reflected from right to left. Although the subjects initially perceived the distortion, their perceptions of the world returned to the pre-experimental baseline after several weeks of constant exposure to familiar stimuli seen through the goggles. For example, lines reported to be straight before the experiment were initially perceived to be warped, but these lines were once again reported to be straight after several weeks of viewing familiar scenes through the distorting lenses. Similar results were observed when the goggles were removed at the end of the experiment. Namely, the world initially appeared to be distorted in a manner opposite to the distortion due to the lenses, but eventually no distortion was perceived. These experiments suggest that humans utilize recent sensory experiences to adaptively xe2x80x9crecalibratexe2x80x9d their perception of subsequent sensory data. There are many other examples of how our percepts are often invariant under changed observational conditions. For example, human observers are not usually confused by a different intensity of illumination of a scene. Although the raw sensory state of the observer is altered by this change, this is usually not attributed to changed intrinsic properties of the stimulus of interest (e.g., the scene). Similarly, humans perceive the information content of ordinary speech to be remarkably invariant, even though the signal may be transformed by significant alterations of the speaker""s voice, the listener""s auditory apparatus, and the channel between them. Yet there is no evidence that the speaker and listener exchange calibration data in order to characterize and compensate for these distortions. Rather, these observations suggest that the speech signal is redundant in the sense that listeners extract the same content from multiple acoustic signals that are transformed versions of one another. Finally, it is worth noting the tendency of different persons to share the same perceptions of the world, despite obvious differences in their sensory organs and processing pathways. This xe2x80x9cuniversalityxe2x80x9d of perception may also be due to the apparent ability of each individual to xe2x80x9cfilter outxe2x80x9d the effects of systematic sensor state transformations, including the transformations relating his/her sensor states to those of other individuals.
The present invention is a sensory method and apparatus that creates stimulus representations that are invariant in the presence of processes that remap its sensor states. These representations may share the following properties of human percepts: immediately after the onset of such a process, they may be affected, but they eventually adapt to the presence of sensor state transformations and return to the form that would have been produced in the absence of the transformative process. In order to see how to design such a device, consider any process that systematically alters the correspondence between the stimuli and the sensor states. For example, consider: 1) changes in the performance of the device""s detectors (e.g., drifting gain of a detector circuit or distortion of an electronic image in a camera), 2) alterations of observational conditions that are external to the detectors and the stimuli (e.g., different intensity of a scene""s illumination or different positioning of the detectors with respect to the stimuli), 3) systematic modifications of the presentation of the stimuli themselves (e.g., systematic warping of printed pages or systematic morphing of a voice). Because of such changes, a stimulus that formerly resulted in sensor state x will now induce another sensor state xxe2x80x2. Let the array of numbers x corresponding to a sensor state be the coordinates of that state on the manifold of possible sensor states. In this language, the above-mentioned processes systematically transform the absolute coordinates of the sensor state associated with each stimulus. However, certain relationships between the coordinates of a collection of sensor states may remain invariant in the presence of such a process. This is analogous to the fact that the physical rotation or translation of a collection of particles in a plane does not affect the relationships among the members of the collection, even though the absolute coordinates of each particle are transformed. For example, Euclidean coordinate geometry can be used to describe the relative positions of such particles in terms of a xe2x80x9cnaturalxe2x80x9d internal coordinate system (or scale) that is rooted in the collection""s intrinsic structure; i.e., the coordinate system that originates at the collection""s center of xe2x80x9cmassxe2x80x9d and is oriented along its principal moments of xe2x80x9cinertiaxe2x80x9d. Such a self-referential description is invariant under global rotations and translations that change the absolute coordinates of each particle. This suggests the following strategy: if we describe stimuli in terms of the relationships among their sensor states, we may be able to represent them in a way that is not affected by the above-described transformative processes. Specifically, we show that a sufficiently dense collection of sensor states in a time series has a locally defined structure that can be used to describe the relationship between each sensor state and the whole time series. Because this description is referred to the local structure of the collection of sensor states in the time series, it is invariant under any linear or non-linear transformations of all of the states in the collection. Now consider a specific embodiment of the invention that uses this method and apparatus to describe stimuli in terms of recently encountered stimuli. If a sufficient time has elapsed since the onset of a transformative process, each stimulus will be represented by the relationship between its transformed sensor state and a collection of recently encountered transformed sensor states. The resulting representation will identical to the one that would have been derived in the absence of the transformative process: namely, the representation describing the relationship between the corresponding untransformed sensor state and the collection of recently encountered untransformed sensor states. Furthermore, the stimulus will be represented in the same way as it was before the onset of the transformative process, as long as both representations were referred to collections of sensor states (transformed and untransformed) that were produced by the same sets of stimuli. In essence, the temporal stability of this type of stimulus representation is due to the stability of the device""s recent xe2x80x9cexperiencexe2x80x9d (i.e., the stability of the set of recently encountered stimuli to which descriptions are referred). Immediately after the onset of a transformative process, the representation of a stimulus may drift during the transitional period when the device is referring its description to a mixed collection of untransformed and transformed sensor states. However, as in the human case, the representation of each stimulus will eventually revert to its baseline form when the collection of recently encountered states is entirely comprised of transformed sensor states
In sensory devices of this type, the sensor signal is represented by a non-linear function of its instantaneous level at each time, with the form of this scale function being determined by the collection of signal levels encountered during a certain time period (e.g., during a recent period of time) [Levin, D. N., xe2x80x9cTime-dependent signal representations that are independent of sensor calibrationxe2x80x9d, Journal of the Acoustical Society of America, Vol. 108, p. 2575, 2000; Levin, D. N., xe2x80x9cStimulus representations that are invariant under invertible transformations of sensor dataxe2x80x9d, Proceedings of the Society of Photoelectronic Instrumentation Engineers, Vol. 4322, pp. 1677-1688, 2001; Levin, D. N., xe2x80x9cUniversal communication among systems with heterogeneous xe2x80x98voicesxe2x80x99 and xe2x80x98earsxe2x80x99 xe2x80x9d, Proceedings of the International Conference on Advances in Infrastructure for Electronic Business, Science, and Education on the Internet, Scuola Superiore G. Reiss Romoli S.p.A., L""Aquila, Italy, Aug. 6-12, 2001]. This rescaled signal is invariant if the signal levels at all relevant times are invertibly transformed by the same distortion. This is because the relationship between each untransformed signal level and the scale derived from the collection of untransformed signal levels is the same as the relationship between the corresponding transformed signal level and the scale derived from the collection of transformed signal levels. This can be understood in the context of the above-described analogy, involving the positions of particles in a plane. Each particle""s position with respect to the collection""s intrinsic coordinate system or scale is invariant under rigid rotations and translations that change all particle coordinates in an extrinsic coordinate system. This is because each particle and the collection""s intrinsic coordinate system are rotated and translated in the same manner. According to the present invention, the signal levels detected by the sensory device in a suitable time period have an intrinsic structure that defines a non-linear coordinate system (or scale) on the manifold of possible signal levels. The xe2x80x9clocationxe2x80x9d of the currently detected signal level with respect to this intrinsic coordinate system is invariant under any invertible transformation (linear or non-linear) of the entire signal time series. This is because the signal level at any time and the scale function at the same time point are transformed in a manner that leaves the rescaled signal level unchanged.
As suggested above, the task of representing stimuli in an invariant fashion can be reduced to the mathematical task of describing sensor state relationships that are not affected by systematic transformations on the sensor state manifold. Now, assume that the change in observational conditions defines a one-to-one transformation of the sensor states. This requirement simply excludes processes (e.g., a change in the spectral content of scene illumination) that make it possible to distinguish previously indistinguishable stimuli or that obscure the difference between previously distinguishable stimuli. Such a process has exactly the same effect on sensor state coordinates as a change of the coordinate system on the manifold (xxe2x86x92xxe2x80x2) in the absence the process. This is analogous to the fact that the physical rotation of an array of particles in a plane has the same effect on their coordinates as the inverse rotation of the axes of the coordinate system. Therefore, the task of finding sensor state relationships that are independent of transformative processes is mathematically equivalent to the task of describing sensor state relationships in a coordinate-independent manner. In other words, the relationships among the sensor states must be described in a manner that is independent of the coordinate system used to label them. In specific embodiments of the invention, differential tensor calculus and differential geometry are used to provide the mathematical machinery for deriving such coordinate-independent descriptions of a time series of points on a manifold.