Publications, journals, scientific data and most of other subject matters as a whole are a highly unorganized, fragmented and unstructured repository of data, whose growth rate is ever increasing, both in terms of number of documents and sheer number of sources. As the data grow beyond centralized supervision and control, it becomes more and more difficult to find specific non-trivial information, or—if the latter is found—if is hard to evaluate its reliability. More specifically no search engine or machine-based algorithmic method is presently available to search for connections between concepts expressed in homogenous or non-homogenous papers, where “non-homogenous” here means “focused on the same phenomena, but studying them from the perspective of different sciences or disciplines”; “non-homogenous” may also mean relating to different fields of technology or different technical fields. The term concept in the context of the present invention may relate to a term, a name, a description, a nomenclature, a denotation, a definition, an item, a pair, a triplet or a chain of words and so forth. In a simplest form such may be a search item that a user inputs into an internet search machine or queries in a public or private database.
Suitable examples are papers (literature) on human physiology written by biochemists or written by physicians. Here, while the former will deal with the concepts proteins, receptors, genes and biochemical processes, the latter will mention concepts like symptoms, clinical tests, diseases and body organs and tissues, drugs. As such, while some concepts may be known to the community of biochemists, other concepts may be only known to physicians.
Another example is directly derived from the multitude of financial or economic information created every day about stock markets, companies' performance, market indexes, rating agencies, economic trend reports etc.; these information show a great deal of direct and indirect relationships and the high dynamics of their natures make very difficult to construct patterns. By means of the presented invention it is possible to build an interlinked cloud of data that constantly shows emerging patterns and, because their dynamics, their trends. It is, however, noted that the present disclosure does not relate to above information as such, but to technical means that are employed for analysing and searching the information.
Another field of application can very well be fit into any of the geo-political or social behavioural analysis where it is important to trace paths connecting behaviours or sentiments (nodes) and identify unexpected relationships. This could apply to social network, sociopolitics, telecommunication “call analysis”, contact relationship analysis and other.
Several search solutions for data search and analysis are already used in many fields but their limitation are severe ones: First, in search engines, for example, a machine receives an input consisting of a textual query, whose words are used in an index-based search that returns to the user a ranked list of documents that might contain the required information. Second in a data mining system, a machine processes an input set of data looking for patterns and trends by means of statistical analysis, in order to provide figures that can confirm or refute a hypothesis (e.g. Baesyan approach)
The first approach is widely known, as it is part of the daily experience of billions of Internet users: its main strength is the wide—almost all comprising—applicability to the more heterogeneous requests and the great ease in formulating the question. The second is mainly known by professionals (especially in science/business) where it is vital to detect contexts, opportunities and risk factors among a huge set of “noisy” data: its main strength is the capability to devise clear and focused answer to the “question” it has been asked. Yet both of them fall short of the task to provide a meaningful answer when an elaboration of knowledge is required. In more details:
Search engines cannot generally produce an answer requiring a structured inference, not even when they are declined in the semantic flavour, i.e. they do not perform a knowledge analysis. Rather they try to provide to a human being the “leads” he needs to find the answer by himself: there is no knowledge processing by the machine, and—in the end—a user is forced to skim through the results to gauge if they are really relevant and to read them altogether to find the answer.
Data mining machines show quite the opposite shortcomings, because they do provide answers but only about very specific subjects and only after the question has been duly declined in a suitable format for a computer elaboration. Artificial Intelligence (AI) systems that are sometimes used for data mining, do work on a knowledge base but their knowledge representation is a static one: in fact it is an ontology—a basic definition of rules of the items or the items as such the system will work with—that must be provided to the system as an input. The machines can then improve their procedures, learning to work better with the defined items but not improve their basic knowledge. Under this perspective they are still operating machines. Moreover, being a complex conceptual structure with input links hardcoded by men, the ontology must be manually built and often its size and entangled rules result in a very cumbersome maintenance.