The instant invention pertains to pattern classification and categorical perception of real-world sensory phenomena. In the current case, the invention learns a mapping from the patterns found within at least one predetermined set of provided inputs including but not limited to sensory observations, measurements resulting from various measurement systems, simulated or processed signals resulting from various models, and/or compilations of the above inputs, to at least one invariant perception, which may be then given a name, or label, among discrete categories. In general, such a problem may be challenging to solve or advance toward a rational set of acceptable solutions, since the available sets of inputs (labeled below as “correlants”) containing information on systems of interest under observation (below identified by labels “reality”, “real objects”, “real world”, or simply “world”) may incorporate an unknown measure of random or systemic portions not pertinent or correlated to the features of interest of the observed reality. It may be already a part of practitioners' experience that variable irreducible portions of error inputs accompany virtually all input sets frequently representing generally accepted feature in the fields of measurements and observations that, in general, the world may be a random and noisy place.
This, inter alia, may make it hard for a system or a device arranged to perform at least one world-related task or acquire, store, and exchange world-pertinent information (indicated by “a machine”) to detect and classify an event or an object (for example the identity of a person's face) even when viewed from different angles. More particularly, a machine arranged to detect patterns in the world-pertinent information and use it for subsequent classifications is indicated by the designation “a pattern machine”. Nevertheless, even more complex machine-executable tasks such is to recognize that distinct objects share certain common spatial-spectral features and can be so classified (e.g. as bike, car, truck, or plane); or to determine that distinct sounds share certain temporal-spectral feature can be so classified (e.g. as words, phrases, and more complex speech patterns) are desirable in a plurality of applications subjects to current research activities or even prototypical test implementations.
Generally, many of the above machine-executable tasks, if taken separately, can be treated as a many-to-one mapping which may represent a complex problem to solve. But one may be focused on the even more challenging problem of learning a many-to-many mapping of a sensory continuum of any number of sensor modalities to a discrete space of labels of any fixed size. A prior art approach to similar tasks based on a “mixture of experts” where each many-to-one sub problem is trained separately and then combined linearly to solve the large many-to-many mapping is not part of the current invention. Such an approach may be folly, as it would fail to recognize and reuse the recurring patterns that many distinct objects or observations share; and so it may not be efficient enough (neither statically or computationally) to scale up to increasingly more complex, real-world problems; and it may not allow pooling of evidence to either support or refute competing hypotheses about the perceived invariance. The latter may be very significant, as it may be enabling to being able to reason under increased uncertainty, which may be done consistently and with optimized expected error by doing so within a Bayesian framework. Thus, present invention approaches this problem using well known Bayesian statistical inference, but with the help of well defined newly-developed tools in information theory, probability theory, and the theory of fixed points, combined in a novel way to solve this invariant mapping problem.
Therefore, the current invention realizes an original paradigm for semi-supervised categorical perception as an invariance learning pattern machine. The new paradigm is novel, inter alia, in how it combines ensemble learning (also known as variational Bayesian inference) with reinforcement learning in a dynamic Bayesian network. Ensemble learning, also called variational Bayes, is a family of algorithms for approximating the solution of fully Bayesian network models where the integrals involved are intractable. Ensemble learning methods are approximate, but provide a lower bound on the marginal likelihood that is multiplied with the prior to form the posterior used for prediction, PY. This allows the normalization or weighting of several hypothesized models for the purposes of model selection, which is then naturally built into the model.
The structure of dynamic Bayesian network is also novel, which may be also enabling for capturing the multiscale, self-similar structure of features typically found in real-world features. Also, it is understood that the current invention may approximate and represent a step in a direction of achieving a universal pattern machine which may be similar in structure and may execute processes which approximate and may be compared to processes as performed by a neocortex portion of human brain.
In contrast with the current invention, one problem with most known artificial intelligence (AI) and machine learning (ML) solutions of prior art is that learning is usually based on strict assumptions about the problem with algorithms built from overly rigid, non-adaptive rules for mapping prearranged information extracted from input signals (correlants) to desired output responses (classification targets). In addition, there are usually only two types of AI and/or ML solutions: supervised and unsupervised. The former requires that data be labeled with its corresponding target, which may be hard to obtain. So, training is usually limited, which may lead to insufficient performance. Moreover, such solutions may be too inflexible when given novel data that, in the real world, have non-stationary statistics, may be very noisy, and may tend to violate simplifying assumptions. So, again, the solution may perform inadequately in part because it may fail to adapt to an uncertain and time-dependent environment. On the other side, unsupervised learning solutions may not require labeled data, but their applicability may be limited to data density estimation and data clustering as a relatively limited part of a larger pattern classification solution; as opposed to providing a robust solution by itself. While these diametric solutions may be successful on certain problems for which each may be customized, none of them merit the designation “pattern machine” in the sense we indicated above. Many of them may have shortcomings that may prevent success on the complex problem of categorical perception. This, at least some embodiments of the machine of the current invention are conceptualized and arranged to be examples of a pattern machine for solving categorical perception problems.
AI and/or ML prior art has traditionally been based on pre-formulated rules frequently lacking flexibility necessary to learn and predict satisfactorily under dynamic conditions. The relevant problems may be inherently non-stationary, as the world is a random place. Furthermore, it also may be inherent in a structured world that the rules may change, evolve, or morph. A pattern machine can perform pattern classification by taking and including cues from the hierarchical structure of spatiotemporal features and patterns of correlants. The multiscale structure of correlants may have sequential vs. coincidental nature in time. That is, information may be embedded and conveyed in both space and time, simultaneously with and without redundancy. So, some embodiments of current inventions are structured such that one dimension or scale may not be favored over another when extracting any or all information. At least in part because of these requirements, many embodiments of the current invention extract and process information both simultaneously and sequentially in space and time, all in a concerted effort to correlate the extracted information to invariant patterns. At least related to these features, practices and structures of known prior art does not treat such problems as embodiments of present invention do.