1. Field of the Invention
The present invention relates generally to the field of data processing and, more particularly, to a method for hypothesis generation and verification (HyGV-method), allowing for intelligent decision-making, image and sequence recognition and machine-learning.
2. Description of the Background
2.1. Hypothesis Testing in Mathematical Statistics
The subject of this invention pertains to such a vast area of human cognitive activities that an exhaustive analysis of the background of this invention would take an analytical description of the state of the art in a too diverse domain, including many humanitarian and exact sciences. The bulk of the works in this area deal with statistical hypothesis testing, and, regretfully, there has been much less interest in the matter of testing of truly scientific hypothesis by applying the approaches proposed by scientists who either attempted to tackle the problem from the supra-science (i.e. philosophical) positions—for instance, K. R. Popper's theory of hypothesis falsification—or encountered statistical analysis problems of such complexity (for instance, in biology, including and especially, ecology) that primitive mathematical approximation could be of no use in the understanding of principles of interrelations between variables that determine a real diversity of objects and phenomena.
The foundation of modern hypothesis testing, being the central issue of modern mathematical statistics, was laid by R. A. Fisher; later works by J. Neyman and E. S. Pearson instated certain modifications that gave it the generally accepted form. The probabilistic principles of setting forth and testing a hypothesis have been described in numerous works and constitute an essential part of the modern inference statistics. In a nutshell (i.e. aside from the abundance of all the methods, approaches, techniques, and interpretations of mathematical statistics as presented in numerous textbooks and used as the basis for numerous statistical software products), hypothesis testing is all about comparison between a null hypothesis and an alternative hypothesis. The former is to reflect the absence of differences between population parameters, whereas the latter is to state the opposite. An alternative hypothesis is accepted if/when the null is rejected. The two hypotheses are compared based on normal curves of probability distributions, and, therefore, none of them can be conclusively proven or rejected, but one is eventually stated to be more probable based on its higher probability degree.
It would be hard to put it better than D. H. Johnson did in Hypothesis Testing: Statistics as Pseudoscience (presented at the Fifth Annual Conference of the Wildlife Society, Buffalo, N.Y., 26 Sep. 1998; published electronically on www.npwrc.usgs.gov), “I contend that the general acceptance of statistical hypothesis testing is one of the most unfortunate aspects of 20th century applied science. Tests for the identity of population distributions, for equality of treatment needs, for presence of interactions, for the nullity of a correlation coefficient, and so on, have been responsible for much bad science, much lazy science, much silly science. A good scientist can manage with, and will not be misled by, parameter estimates and their associated standard errors or confidence limits. A theory dealing with the statistical behavior of populations should be supported by rational argument as well as data. In such cases, accurate statistical evaluation of the data is hindered by null hypothesis testing. The scientist must always give due thought to the statistical analysis, but must never let statistical analysis be a substitute for thinking! If instead of developing theories, a researcher is involved in such practical issues as selecting the best treatment(s), then the researcher is probably confronting a complex decision problem involving inter alia economic considerations. Once again, analyses such as null hypothesis testing and multiple comparison procedures are of no benefit.”
Statistical hypothesis testing has been heretofore viewed as the only scientific approach to information processing. It determines both the process of data processing and, to a greater extent, the approach to setting up experiments and data selection (probability/nonprobability sampling). However, as mentioned by Anderson et al. (Anderson, D. R., Burnham, K. P., and Thompson, W. L. Null hypothesis testing: problems, prevalence, and an alternative. Journal of Wildlife Management 64(4): 912–923), “over 300 references now exist in the scientific literature that warn of the limitations of statistical null hypothesis testing”. The number of such works had been exponentially increasing in the period of the 40's to 90's of the past century.
This invention provides a method for hypothesis generation and verification (HyGV) that involves principles fundamentally different from those employed in the statistical hypothesis testing methods; it is free from the flaws of probabilistic approaches, can be applied in processing of any type of information, and it is exceptionally simple in use. The method is based on the principle of the information thyristor designed by us and described in Detailed Description of this invention.
2.2. Hypothesis Generation and Verification
Hypothesis generation and verification is the basis of logical thinking and of a well-grounded decision making. “Decision making” is one of the most frequently occurring terms in AI. Unless on each occasion of its use, an explanation is provided on what exactly is implied by it in a given case, its general meaning is as fuzzy as it gets, up to a total lack of meaning. If what is meant by the term is an independent, adequate and reproducible response to a change in a set of alternative courses of actions, then any reliable measuring instrument (hyperbolically speaking, even a thermometer) should qualify as a decision-making method and apparatus. Or, if the implied responsibilities involve relieving the operator (a human-being) from the necessity of screening and discarding false or unverified information and to be able to advise a human-being on how to act in a particular situation, then it is more of reference book. If a decision-making system is supposed to be a “quick-learner”, then the question arises: what to learn and how? If the instruction/training is to be provided by the human instructor, then such a device cannot be an independent thinker/decision-maker—it will remain a thermometer, however sophisticated it may be, or a reference book, however regularly it is updated. There is no learning part in such “training”. One and the same decision applied to particular situations that are same in general, however, different in slight but important details may result in opposite outcomes, and, therefore, the user's failure to provide proper control over its “decision-making” artificial assistant may end poorly for the user.
Acceptance of a decision is based on acceptance of a hypothesis that provides an explanation regarding a certain phenomenon of an event, object, or person, as well as non-antagonistic alternatives thereof. A hypothesis is a verifiable statement which may include predictions. A prediction is nothing else but a continuum of analogs of a given phenomenon—even if the latter in reality may be a unique one. Value of a prediction depends on how correctly it can rank those analogs in accordance with the probability of their occurrence depending on circumstances.
Decision making involves several different stages, including the following most important ones:
(1) recognition and understanding of a problem on which a decision has to be made, or formulation of an objective of a decision-making task;
(2) hypothesis generation, i.e. construction of a series of variants of potentially applicable decisions supposedly including an optimal one;
(3) search for information that may be used for hypothesis verification;
(4) hypothesis verification.
As this invention provides a method and system for unsupervised hypothesis generation and verification, in this context it is important to elaborate on the matter of which of the stages of the computerized decision-making process can in principle be implemented as an unsupervised operation. Such an analysis of the decision-making stages will facilitate the generation of a hypothesis on the issue of why computers are still unable to make decisions on their own, and whether there may be any solutions for this problem.
We will start with the last of the aforesaid basic stages of decision making—i.e. hypothesis verification. There exist many different viewpoints regarding this part of the decision-making process; for instance, Popper's opinion that it is all about creative intuition which cannot be governed by logic, or, quite a polar view on a hypothesis as an expression of the relationship between two (or more) variables (McGuire, W. J. (1997) Creative hypothesis generating in psychology: some useful heuristics. Annual Review of Psychology, v. 48, pp. 1–30). If hypothesis verification can be brought to comparison of values of different variables, then this task is well within computer's competence. Same is true for the third stage—information search—which, by definition, is the area where computers outperform humans in speed and efficiency. The second stage—hypothesis generation—is closely connected with the first step in decision making, i.e. the understanding and formulation of an objective, and therefore is extremely difficult for computerized implementation. Nevertheless, there are many factors to support the feasibility of that task. See, for example, McGuire's discussion of creative hypothesis generation on strategic and tactical levels and the description of 49 heuristics, including 5 types and 13 subtypes, that are used by psychologists and can be taught (Ibid). In the following disclosure of this invention, we will show that not only is computer-based hypothesizing possible, but it is also possible to develop a computer-implemented imitation of approaches used in human way of thinking. As far as the first stage of the decision-making process is concerned, it involves that very unique function that can be performed only by humans and (at least as of today) not by computers. Apparently, this is the key aspect that has to be explored before taking the challenge of the “thinking computer” idea. One of the many issues involved in this problem is pivotal in the context of this invention, which, in its turn, has been conceived as a logical result of the developments presented in the related patent and copending application.
As is well-known, different individuals can (and, more often than not, do) make different decisions regarding one and the same situation. When two experts express two different or opposite to each other's opinions on a same matter, a person seeking an expert opinion and familiar with the individual styles of each of the experts' performance will only gain from the obtained results. For instance, one of the experts may be too conservative and cautious in judgments, whereas another may be overly categorical. A common feature of both of them is individuality, i.e. each of them has a unique and specific way of a psycho-physiological response, philosophical view on phenomena under study, preferences in logical approaches, etc.—all of which can be taken into account and used in making a final decision. A bad expert's opinion may appear to be no less useful if such expert's style is reproducible. Contrarily, a computer does not have the individuality, and its “brain”—the software—is a composite product of the humankind and is developed by large groups of programmers.
Individuality or “ego” can be interpreted in different ways. For instance, computers manufactured by a perfectly same technology may still have slight differences, each of its own, and, therefore, can be viewed as “individualities”—if non-individuality is understood only as sharing exactly same set of properties of objects. However, there is also another understanding of individuality, as, for instance, applied to a human-being taking a road of his own and capable of independent thinking and judgment; and in the context of individuality it does not matter whether or not the thinking, judgments and decisions are correct. We imply this interpretation of individuality when stating that a computer does not have it.
2.3. What it takes to “Raise” the AI
Many AI terms that have been around for decades by now still lack clear and explicit definitions of what exactly is implied by a given term—which is not surprising as the whole domain of AI is about imitation of something which itself has not yet been fully explored by the science. Thus, from its very onset, the AI research has been oriented toward the effect rather than the cause, toward the imitation of the brain's unique abilities without the understanding of their nature. And, of course, the AI is expected to work independently, i.e. relieving the human operator from the necessity to control the AI's every step. This cocktail, made of materialism and Cartesian ideas, has been served to several generations of AI student, although everybody in the field understands by now that the modern use of the term “artificial intelligence” is more marketing than scientific. In general, all what computer science has so far come up with on the issue of imitating the human or animal brain processes is a vocabulary. Take, for instance, artificial neural network (ANN) systems after the McCulloch-Pitts model of the neuron based on an intuitive view of how charge accumulation occurs on a cell membrane and how it influences synapse strengths. Not only readers of popular scientific literature, but also many researchers in artificial neural network are convinced that ANN is indeed the imitation of the work of brain neurons. Leaving alone the fact that the whole concept is purely a product of computer programming and mathematics and that the word “neuron” in this context is just a symbol of a future goal and by no means an assertion of any real achievement, there is yet another problem: even if computer engineering can describe and simulate the synapse formation and transmission, how can it describe and simulate what is still unknown to neuroscience: how is specific information communicated from one neuron to another?
With all its obvious interest in biological terminology, computer science omits to focus on really important features of autonomous self-referent biological systems as the mammalian brain, while it is well-known that many of those features play the key roles in the functioning of living systems. There is an undeniable truth about the human brain activity, and failure to realize or remember that truth inevitably results in failure in fulfilling the task of the realistic simulation of the human brain activity. That truth is so simple and trivial on the surface that it does not catch the attention of the computer science community whose hope for creation of artificial intelligence—be it through computation speed breakthrough, computer memory expansion, or advances in programming art (for instance, products of the artificial neural network concept)—never dies. However, it is obvious that there is nothing yet in the computer science field that could give hope for development of a computer system that would be able to make independent decisions on what is right and what is wrong. Even the strong believers in the future of artificial intelligence realize that the computing power in the fifth or sixth generation cannot, by itself, guarantee a breakthrough in the AI field.
The simple and trivial truth, referred to above, consists in the fact that any living system—including, of course, the brain as the most complex domain in the system of the living substance—has a highly cooperative infrastructure. Cognition is a biological phenomenon, and it can be understood only as such. Consciousness cannot be explained by merely making a list of all its properties. Metabolic systems of living organisms involve lots of biochemical processes whose performances are ultimately coordinated. Even a small failure in a minor “department” of a metabolic system (“minor” from a biochemist's anthropomorphic viewpoint) may become a debilitating or lethal factor for a system. No computer program attempting to imitate the processes occurring in living organisms and, especially, in the brains, the most complex part of them, can provide for that level of coordination, and it is clear why. The human brain has mysterious properties, and no less mysterious are those of the human body infrastructures that support the brain functioning—for instance, the haematoencephalic barrier whose role is not to allow certain substances that can damage the brain work to penetrate the nerve cells. A computer program that can at any point sustain artificially made commands without a complete loss of its functionality will never be able to imitate the brain properties. Should it ever happen that a computer program with the functionality similar to that of human brain is created, it will consist of a set of algorithms that provides a continuous metabolic cycle with the highest level of cooperation and coordination between its constituent parts.
Complex computer programs are developed in a programming style that to a large extent corresponds to what could be defined by an eclectic notion of “compromise logic”. As software developers' key priority is the achievement of a technical objective rather than maintaining a certain wholesome logic, it often happens that starting with the very early stages of a computer program development, a unified algorithmic core can no longer be maintained and breaks into a multitude of artificially joined individual algorithms. Execution results provided by individual algorithms are further either used, or ignored, or rejected, depending on how well they work towards the solution of tactical and strategic tasks in the context of a given computer program. Thus developed a computer program can be compared to music without melody; its individual components often become mutually antagonistic, and to eliminate the antagonism, developers resort to “premature mathematization” (Russel, S. [1997] Rationality and intelligence. Artificial Intelligence Journal, 94 (1–2) 57–77). The latter, while resolving particular local problems, inevitably creates new problems, and swiftly fills up the whole space of a program where logical continuity of its components is missing. Thus, the attempts to cope with the growing complexity of computer programs lead to creation of more complex programs.
Full cooperation between all of the algorithms of a computer program is an extremely difficult task, and without its implementation, no program that can qualify for the role of the brain's artificial counterpart. A truly cooperative system of algorithms does not tolerate commands that are alien to its environment, however important their execution may be in the context of a program's performance or in the view of its designer. Simply put, an algorithm that effectively imitates the brain can be emulated by no other algorithm but itself. In general, this constitutes that truth which is so trivial that it remains simply ignored.
Another simple but important truth, relevant in the issue of the efficiency of computer-implemented learning, consists in the fact that cognition is a product of interaction between deduction and induction. Over two thousand years of experience and knowledge generated by the mankind's best think-tanks testify to the fact that these two oppositely directed processes underlie the actual process of cognition. However intensely has this issue been investigated throughout the past centuries, we have yet to understand how these two fundamental mechanisms interact in the brain. But the fact of the matter is that there is spontaneous interaction between deduction and induction, and they are inseparable.
2.4. Algorithmic Foundation of this Invention
Our research and development in AI, or—using a more correct but less common term—non-biological intelligence, NBI (see more information on the related work on http://www.matrixreasoning.com), has been based on the understanding of the fact that without the implementation of two aforementioned features of the brain—functionality cooperation and organic spontaneity of the relationship between deductive and inductive processes (or—speaking in computer science language—without an algorithmically holistic approach)—no imitation of the brain activity is possible. This ideology led us to development of a system of interrelated algorithms for identification, differentiation and classification objects described in a high-dimensional space of attributes, which further has been used as the underlying methodology in this invention. The said methodology, comprising the evolutionary transformation of similarity matrices (U.S. Pat. No. 6,640,227, October 2003, by L. Andreev) as a new universal and holistic clustering approach that provides a solution to most complex clustering problems, is based on quite a simple algorithm that can be defined by a commonly known principle of “the golden mean”.
The method for evolutionary transformation of similarity matrices consists in the processing, in one and the same fashion, of each cell of a similarity matrix so that a similarity coefficient between each pair of objects in a data set is replaced by a ratio of a similarity coefficient between each of objects in a pair and a mean value of similarities between each of two objects whose replacement similarity coefficient is under computation and all other objects of a matrix. The algorithm of the process of evolutionary transformation of a similarity matrix is based on the following formula:
                                          S                          A              ,              B                        T                    =                                    (                                                ∏                                      i                    =                    1                                    n                                ⁢                                                                  ⁢                                                      min                    ⁡                                          (                                                                        S                                                      i                            ⁡                                                          (                              A                              )                                                                                                            T                            -                            1                                                                          ,                                                  S                                                      i                            ⁡                                                          (                              B                              )                                                                                                            T                            -                            1                                                                                              )                                                                            max                    ⁡                                          (                                                                        S                                                      i                            ⁡                                                          (                              A                              )                                                                                                            T                            -                            1                                                                          ,                                                  S                                                      i                            ⁡                                                          (                              B                              )                                                                                                            T                            -                            1                                                                                              )                                                                                  )                                      1              n                                      ,                            (        1        )            where ST AB. is a binary similarity coefficient after transformation No. T; “n” is a number of objects associated with a matrix; A, B, and i are objects associated with a matrix; “min” and “max” mean that a ratio of STi(A) to STi(B) are normalized to 1. The algorithm for such transformation is repetitively applied to a similarity matrix till each of similarities between objects within each of the clusters reaches 100% and no longer changes. the end, the process of successive transformations results in convergent evolution of a similarity matrix. First, the least different objects are grouped into sub-clusters; then, major sub-clusters are merged as necessary, and, finally, all objects appear to be distributed among the two main sub-clusters, which automatically ends the process. Similarities between objects within each of the main sub-clusters equal 100%, and similarities between objects of different sub-clusters equal a constant value which is less than 100%. The entire process of transformation may occur in such a way that while similarities within one sub-cluster reach the value of 100% and stop transforming, another sub-cluster still continues undergoing the convergent changes and take a considerable number of transformations (in which the objects of another sub-cluster are no longer involved). Only after the convergent transformation of the second sub-cluster is complete, i.e. when similarities between its objects reach 100%, and similarities between objects of the two sub-clusters clusters is less than 100%, an entire process of evolutionary transformation of a similarity matrix is over. In the described process, there is no alternative to the sub-division of all objects of a data set into two distinctive sub-clusters. Any object that may represent a “noise point” for any of the major groups of objects in a data set of any degree of dimensionality gets allocated to one of sub-clusters.
Conversely, the above described convergent evolution may also be represented as divergent evolution and reflected in the form of a hierarchical tree. However, the mechanism of the algorithm for evolutionary transformation involves the most organic combination of the convergent and divergent evolution (or deduction and induction based on input information about objects under analysis). For that purpose, each of the sub-clusters formed upon completion of the first cycle of transformation is individually subjected to transformation, which results in their division into two further sub-clusters, respectively, as above described; then, each of the newly formed four sub-clusters undergoes a new transformation, and so on. This process, referred to as ‘transformation-division-transformation’ (or TDT) provides for the most rational combination of the convergent (transformation) and divergent (division) forms of the evolution process, in the result of which an entire database undergoes multiple processing through a number of processes going in opposite directions. The said combination of processes is not regulated and is fully automated, autonomous and unsupervised; it depends on and is determined by only the properties of a target similarity matrix under analysis, i.e. by input data and an applied technique of computation of similarity-dissimilarity matrices. In other words, the ETMS algorithm is based on “uncompromising” logic that cannot be manipulated by arbitrarily introduced commands, which results in the fact that the efficiency of the ETMS-method greatly depends on how adequate and scientifically well-grounded are the techniques used in presentation of input data (i.e. computation of similarity matrices). Thus, for the evolutionary transformation method to be independent from the operator's will and truly unsupervised, the similarity matrix computation must be based on a procedure that does not depend on the type of input data.
Some of the approaches applied in many of the widely used applications for the purpose of establishing similarity-dissimilarity of objects described in high-dimensional space of attributes clearly represent a forced solution used for the lack of proper techniques and are simply nonsensical. For instance, there is a widely known notion of the “curse of dimensionality” which refers to a dramatic dependency of parameterization of distances between attributes on their dimensionality. Understandably, this dependency catastrophically increases in a super-space, resulting in a situation when the most that can be done about similarities-dissimilarities is the standardization of conditions for comparison of similarities on a presumption that “objects in a set have otherwise equal status”, which by definition cannot be considered as an acceptable methodological platform. For instance, it is customary to use Euclidean distances to determine similarities (between objects) as vectors in n-dimensional spaces of parameters even if they are described by different dimensions—despite the elementary truth that this is grossly unscientific. This inadmissible compromise further creates multitude of problems, starting with the “curse of dimensionality” and up to the necessity of entering special constraints for a computer program to avoid the use of Euclidean distances where it is absurd.
In the meantime, there is quite a simple solution that effectively and completely takes care of the problem of unsupervised automated computation of similarity matrices for objects described by any number of parameters. The solution, described by us in a copending patent application entitled “High-dimensional data clustering with the use of hybrid similarity matrices”, consists in the following. So-called monomer similarity matrices according to each of parameters describing a given set of objects are computed for a set of objects, after which the monomer matrices (whose total number corresponds to a total number of parameters) are hybridized. If we have a set of monomer similarity matrices (M) where each of the matrices is calculated based on one of the parameters, i.e.M(a), a∈{1,2, . . . ,n}  (2),then, hybridization of the matrices is performed by the formula:
                              Hij          =                                    (                                                ∏                                      a                    =                    1                                    n                                ⁢                                                                  ⁢                                                      M                    ⁡                                          (                      a                      )                                                        ij                                            )                                      1              /              n                                      ,                            (        3        )            where Hij is a value of hybrid similarity between objects i and j. Thus, the computations in both the ETSM algorithm and the above-referred procedure for preparing hybrid similarity matrices for the ETSM-method are based on a simple operation of calculation of mean values. Hybridization of matrices leads to the natural fusion of object patterns in terms of their variables' values. Clearly, hybridization can be done on similarity matrices that have been computed based on any type of attributes (categorical, binary, or numerical). Since attributes converted into units of a monomer similarity matrix no longer have any dimensionality, the above referred procedure for hybridization of monomer similarity matrices can be used as a methodological basis for comparison of attributes of any kind and nature.
As a result of development of monomer similarity matrices hybridization technique, it has become possible to add to a hybrid matrix any numbers of copies of individual parameters, thus to find out weights of individual parameters in a totality of all parameters that describe a given set of objects. The parameter multiplication method described in a copending application by L. Andreev (“High-dimensional data clustering with the use of hybrid similarity matrices”) has provided the grounds for the method of this invention.
The final issue that ought to be discussed in the context of the background of this invention is the technique for monomer similarity matrix computation. As monomer matrix computation is based on a single parameter, it causes, for instance, a Euclidean distance automatically transform into the city-block metric. The copending application by L. Andreev “High-dimensional data clustering with the use of hybrid similarity matrices” provides two types of metrics to be used in computation of monomer similarity matrices—the R- and XR-metrics. The R-metric (“R” for “ratio”) is calculated by the formula:Rij=min(Vi, Vj)/max(Vi, Vj)  (4),where Vi and Vj are values of parameter V for objects i and j. Here, similarity values are calculated as the ratio of the lower value to the higher value of a parameter of each of the two objects. Thus, values of the R similarity coefficient vary from 0 to 1.
The XR-metric (“XR” stands for “exponential ratio”) is calculated by the formula:XRif=B−|vi−vj|  (5),where Vi and Vj are values of parameter V for objects i and j, and B (which stands for “base”) is a constant higher than 1. Values of the XR similarity coefficient also vary from 0 to 1.
R-metric is optimal for description truly or quasi-equilibrium systems where attributes reflect a signal strength, concentration, power, or other intensiveness characteristics. XP-metric is optimal for description of non-equilibrium systems where attributes reflect a system shape for operations in spatial databases, a distance between individual points within a system, or other extensiveness characteristics.