One of the principal goals of the general field of Artificial Intelligence (AI) is to create and program computers and machines to exhibit human intelligence. Despite tremendous technological advances in raw processing power, leading to the ability to perform calculations at speeds far surpassing those capable by any human (e.g., on the order of trillions of operations per second), the storage and retrieval of vast amounts of information, and other areas, many argue that true AI has not yet been achieved. It is widely recognized that a principal reason for this is because computers only operate according to specific sequences of steps, called algorithms, to perform well-defined tasks, and the programming of such algorithms to mimic human cognitive functions has proven to be extremely difficult. Even the simplest of tasks for humans, such as the recognition of objects and features in images, speech, and natural language processing have turned out to be enormously difficult to implement in machines as it was and somewhat remains unclear as to how to write very large sets of rules covering many possible variations, ambiguities, and special and different cases prevalent in every-day human perception, communication, and processing of information.
To address shortcomings of traditional algorithmic programming, so-called artificial neural networks (ANN) have been developed. ANNs employ different paradigms of computation in an effort to solve problems such as those mentioned above. ANNs consist of multiple layers of simple computing elements, loosely resembling and inspired by biological neurons, which receive inputs from multiple other such elements. The multitude of connections between individual elements as well as between different layers of elements comprises the primary mechanism of computation, with inputs to initial layers being propagated and back-propagated amongst intermediate, hidden layers, and outputs being produced from a final layer.
Deep learning networks are a special class of ANNs with large numbers (tens, hundreds, or even thousands) of intermediate, hidden layers. Such deep learning networks have been trained to perform tasks such as autonomous control of self-driving cars (under certain conditions), recognition of objects and faces in images and videos, automated translation of text between different languages, detection of sentiment in short sequences of text, automated summarization of text, advanced game-playing in complex games such as Chess and Go (where they have reached or surpassed the abilities of top human players), and others. The multitude of such efforts, and the expenses associated therewith, illustrates the present commitment to the field of AI, with some believing that automated devices may eventually reach, or even surpass, human intelligence.
Programming ANNs involves a learning, or training, phase in which a given network is provided with sets of data. In supervised learning, an ANN is presented with known inputs that yield known outputs, which known outputs are compared to the outputs of the network being trained. Differences between the outputs of a network being trained and the known, desired outputs, called error, are computed and used to propagate small changes in values (or weights) associated with connections within the network. This process is successively repeated, with the intention of always decreasing the magnitude of the error in each iteration. The training stops when the magnitude of the error drops below a specified threshold.
In unsupervised learning, error is computed automatically, without any advance knowledge of desired outputs for known inputs. One example of a system that operates using what is generally considered an unsupervised learning method is word2vec. Word2vec can be used to produce word embeddings; that is, embedding of words from large bodies of unstructured texts into vector spaces of real numbers of bounded dimensionality, typically in the hundreds of dimensions. It is generally considered to employ unsupervised learning as it does not require a training set. Instead, the required relationships are automatically computed from very large bodies of text. According to a distributional hypothesis, words are assumed to be similar and related if they appear in similar contexts. See Harris Z. S., Distributional structure, Word, 10(23), 146-162 (1954). Word2vec assigns each unique word from a corpus of unstructured text to a corresponding vector and positions those vectors relative to one another in the vector space so that words sharing common contexts are located close to one another. Viewed differently, starting from randomly initialized vectors for the subject words, word2vec performs an optimization so as to minimize errors (the distances between words as represented in the vector space), so that the result is distances between vectors representing the words are closely proportional to ratios of relative co-occurrences of words in similar contexts. Word2vec, and a related method called phrase2vec, thus allow seeming syntactically different and unrelated words and phrases to be grouped within vector spaces.
Notwithstanding achievements such as those described above, the current state-of-the-art in AI still falls far short of the ultimate goal of mimicking human intelligence by lacking one of its principal aspects and capabilities: that of reasoning. The ability to reason and infer new knowledge from an existing base of knowledge and observations of the world around us is one of the principal characteristics of human intelligence. Early efforts in AI recognized this fact and focused on tasks such as proving theorems in miscellaneous, formal mathematical theories. Such theories are very well defined as they consist of a limited set of axioms, assumed to be correct a priori, and rules of inference describing how new knowledge is constructed from existing pieces, namely axioms and previously proved theorems. Every new theorem in such systems is justified by constructing a proof, in form of an abstract tree, with interior nodes labeled by rules of inference and leaves labeled by axioms.
In such systems, new knowledge, in the form of new theorems, is created by finding new valid proof trees. Typically, this involves a search process going through multitudes of many possible trees, in some heuristic fashion, trying to eliminate presumably fruitless and inefficient paths. The problem is known to be of enormous complexity, such that even the most powerful computers are known to be able to handle only a very small fraction of all possible search spaces. It is because of this complexity, existing in even the simplest of formal of mathematical theories, that such efforts have been greatly reduced.