The computer and the access to diverse bodies of information via the Internet have opened a tremendous space of possibilities as a mechanism from which to develop knowledge and innovation.1 One of the challenges associated with the use of key word searches is to identify meaning: what might be of special interest as it relates to their search. This is more problematic when the searcher has limited understanding or command of other domains in which the key knowledge is required to find association with what they are searching, as it is the case in many cross-disciplinary applications of these domains of knowledge. 1 Ref: US Patent #20090043797A1, Pub. Feb. 12, 2009
An example of the problem that this creates, for instance, is when a researcher in a cross-disciplinary field searches the US Patent database using key word searches; thousands of records may turn up, many of which are not relevant to the area of interest. While other mechanisms exist to help narrow the search, such as the use of categories this in itself may be quite limiting when exploring innovation. This is because there is a natural tendency for the searchers to remain constrained to what is already intuitively obvious to them. This is a well understood phenomena called Functional Fixedness.2 2 German, Tim P. and H. Clark Barrett. “Functional Fixedness in a Technologically Sparse Culture”. Psychological Science. Volume 16, —Number 1.
Another example of the problem that a key-word based search creates, is it constrains the search to exactly what was entered using key words versus an interpretation of what was intended with the search. For instance, the Searcher might be looking for ways to remove oil from clothes in cold water using the key words “remove oil from clothes” but these key words would never reveal the possibility of answers in the realm of biomimicry where for instance, the Antarctic Icefish digests oils in temperatures below −5° C. The key word search is tied to a natural brain operation to work within categories and that is counterproductive to innovation.
But existing semantic based searches don't necessarily resolve these issues either. For instance, a semantic based search engine called Hakia claims to rank relevancy not based on popularity, but based upon “meaning match”. But when asking it to find information related to removing oil from clothes, the query is limited in its ability to garner context, and doesn't allow the user to specify to what extent they are expecting answers that might be within a different domain (outside of the context) of the query.
Another semantic query engine called SenseBot is more powerful in its approach by presenting to the user possible different meanings/contexts that enable the user to ‘lead’ the interpretation. For instance “remove oil from clothes” returns an array of possible other queries by presenting words such as “carpet cleaning clothing grease oil stains washing . . . etc.”.
But these search mechanisms do not provide: (a) a natural language input query; (b) a MetaLanguage based on identification of a Fundamental Nature of the Search Query and a Fundamental Nature of the target response; and (c) a semantic-based understanding of Fundamental Actions. The present inventive concept fills the gaps in the shortcomings of the current state-of-the-art in its approach that verbs are predominant in the MetaLanguage approach.
The resolution to this dilemma is first a philosophical one. For instance, what is a ‘pen’? The casual approach would be to respond that a pen is a writing instrument. However, in an innovation world, a pen is defined by the intent of the user. When I intend to use it as a writing instrument it may be a pen. When I intend to harm someone with it, it is a weapon. When I use it to keep a door open, it is a door stop.
This philosophical approach is not new. Plato wrote about the concept of language not being an objective reality in and of itself, in his body of work called the Five Dialogues: Euthyphro. In this dialogue between Euthyphro and Socrates we see Euthyphro coming to this conclusion which demonstrates how long this concept has been around for (in terms of language not being an objective reality).                Socrates: . . . I'm afraid, Euthyphro, that when you were asked what piety is, you did not wish to make its nature clear to me, but you told me an affect or quality of it, that the pious has the quality of being loved by all the gods, but you have not yet told me what the pious is. Now, if you will, do not hide things from me but tell me again from the beginning what piety is, whether loved by the gods or having some other quality—we shall not quarrel about that—but be keen to tell me what the pious and the impious are.        Euthyphro: But Socrates, I have no way of telling you what I have in mind, for whatever proposition we put forward goes around and refuses to stay put where we establish it.3 3 Plato, Five Dialogues: Euthyphro. Translated by G. M. A. Grube.        
The philosophical problem is fundamentally one of key word searches being oriented around nouns as fixed objective realities. Similarly, when an invention is developed it is categorized in a domain related to similar nouns with objective realities. This is even more complex, for instance, when searching a given compound where a tremendous domain specific background such as chemistry is required.
Business Application
The implications for the inventive concept from a business standpoint are enormous. For instance, one significant application of the inventive concept is in the domain of sublicensing innovation into non-intuitive domains. An example is the joystick which was used in the domain of computer controls, is now being used for driving a car. Or another example is a gel compound for absorbency in diapers is used as fire retardant to fight fires. These developments appear to be almost ‘accidental’ rather than an intentional observation of the application of new innovation in one domain into a different domain.
From 1980 to 1999 U.S. patent licensing revenues grew from $3B to $100B, a testimony to the growing importance of the application of intellectual property.5 The ability to leverage sublicensing of intellectual property particularly in the case of non-intuitive domains is often accidental. Part of the reason for this is that biologically our brains need to categorize. “And yet, imagination stems from the ability to break this categorization, to see things not for what one thinks they are, but for what they might be.” (Berns, Gregory. “Iconoclast”. Harvard Business Press. P. 37) 5 Ref: Global Intellectual Property Asset Management Report, July 2005, Volume 7, Number 7. “Intellectual Property Metrics Today: It Can Be Done—part II. By Russell Barron and Linda Hansen (Foley & Lardner), Richard F. Bero (Corporate financial Advisors, LLC), Patrick Thomas (1790 Analytics LLC), Dr. Jan M. K. Jaferian (Lucent Technologies Intellectual property Business), and Michelle Girts (CH2M Hill).                “Perception, however, is constrained by the categories that an individual brings to the table. Although categories may not be absolute, they are learned from past experience, and because of this relationship, experience shapes both perception and imagination. In order to think creatively, and imagine possibilities that only iconoclasts do, one must break out of the cycle of experience-dependent categorization . . . ” p. 54 (ibid.)        “ . . . the brain operates under the efficiency principle, which means that it will do its job in a way that takes the least amount of energy. It is lazy. The efficiency principle dictates that the brain will take shortcuts based on what it already knows. These shortcuts, although they save energy, lead to perception being shaped by past experience. How you categorize objects determines what you see. And because imagination comes from perception, these same categories hobble imagination and make it difficult to think differently.” P. 57 (ibid.)        
People with expertise in various specific domains continue to work within those domains to innovate. Furthermore, these people tend to work in silos within their own social networks with little interactions with those outside of them. This makes it challenging to innovate across different domains, to speed the development of associations, and/or recognize the potential for the application of an innovation into a different domain.
The desire to sublicense innovation into non-intuitive domains is also occurring heavily within the domain of life sciences. A common practice known as ‘repurposing’ drugs has been developed in pharmaceutical industry with the goal to identify secondary or tertiary indicators leading to the application of the drug's ability to help solve a problem in a therapeutic domain that wasn't intended.
Biomimicry is another domain where scientists are looking to nature to solve problems. Per an already mentioned example, looking at the Antarctic icefish and how it digests oils in extreme cold may lead scientists to understanding more powerful mechanisms for cold-water stain-fighting detergents.6 6 Heath, Dan and Chip Heath. “Stop Solving Your Problems”. Fast Company, November 2009. P. 82-83.
There is a significant amount of work that exists in the area of semantic searches. It will be appreciated that the below-listed prior-art items, as well as any other prior art patents, articles or other items discussed above, are hereby incorporated herein by reference in their entireties, and that various embodiments of the instant invention may utilize in combination the apparatus and/or methods disclosed in such items in whole or in part.
Prior Art—Ontologies
Onotologies generally are noun-oriented and not verb-oriented in terms of the classifications and properties—which are useful—but not powerful enough for complex searching. Examples of open sourced ontology search systems include:
Web Service Modeling Ontology
Watson Semantic Web Gateway
Ontologies
Resource Description Framework—RDF:
The Resource Description Framework (RDF) is a framework for representing information in the Web.
OWL Web Ontology Language
OWL is intended to be used when the information contained in documents needs to be processed by applications, as opposed to situations where the content only needs to be presented to humans. OWL can be used to explicitly represent the meaning of terms in vocabularies and the relationships between those terms. This representation of terms and their interrelationships is called an ontology.
Wordnet
WordNet® is a large lexical database of English, developed under the direction of George A. Miller. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations. The resulting network of meaningfully related words and concepts can be navigated with the browser. WordNet is also freely and publicly available for download. WordNet's structure makes it a useful tool for computational linguistics and natural language processing.
Prior Art Semantic Search Patents
U.S. Pat. No. 7,908,438 field of invention is knowledge management systems, and more specifically to associative memory systems, methods and computer program products. Associative memories are widely used in the field of pattern matching and identification, expert systems and artificial intelligence. This particular referenced patent addresses the breakdowns of scaling when dealing with associative memory representation. Associate matrices are capable of counting associations amongst pairs of attributes. This particular patent addresses issues as they related to performance and is may be utilized in combination with features of one or more embodiments of the instant invention as is discussed below in more detail.
The application of associative memories to search engines is not new. Some publications such as “AMASS Core: Associative Memory Array for Semantic Search” by P. Rujan, F. Vuillod, J. Schwenninger, A. Mages, Learning Computers Int. GmbH, C. Layer, H-J. Pfleiderer, University of Ulm. This paper in particular speaks about the use of associative memories to implement a general purpose associative dynamic memory towards improving the tremendous costs of indexing. The authors of the above paper propose doing so by first constructing from the text statistically significant features. Once appropriately identified, semantic similarity is identified by ‘forcing synonyms’ into the proposed form. Notwithstanding the techniques described within that document, there continues to be an issue with the selection of what is considered as a ‘statistically significant feature’ and of effectively mining and query the data that is stored in an associative memory in order to conduct searches. While U.S. Pat. No. 7,774,291 and U.S. Pat. No. 7,478,090 provide solutions to this problem, they still suffer from high costs of the search and have an inferior ability to identify similarities and/or analogies compared to that of the instant invention.
Of particular importance, however, in U.S. Pat. No. 7,774,291 is the use of a relevance score provided by querying the feedback memory to compute the strength of association between a given entity and a task, using personal feedback knowledge to capture positive (relevant) and negative (irrelevant) feedback for an entity, document or association for a current task as seen by the user. What has not been provided in this, however, is the importance of understanding various categories of users and/or their “profiles” to help improve the relevance of searches of other users with similar “profiles”. The profiles of user of the instant invention, as is discussed below in more detail, are based predominantly on work-related behavioral traits as provided by psychological behavioral profiles, and educational and/or experiential backgrounds (e.g. mechanical engineering, electrical, chemical, etc.) as provided or indicated by users based on the domain in which they are operating as compared with the domain in which they are investigating (which can be very different). This capability of the instant invention extends beyond what this patent discusses in terms of facilitating workflow by enabling an interactive knowledge repository. This capability of the instant invention includes using associative memories to capture the relationship of profiles to relevance scores.
U.S. Pat. No. 7,805,455 and U.S. Pat. No. 7,251,781 speak to the situation in which a user does not have adequate domain knowledge requiring the user to conduct independent research using whatever means are available to find useful information—including using books, public internet search engines, private data subscription services, internal enterprise portals, or other sources of relevant technical information. The proposed solution fails to address what will eventually become an unscalable solution with the use of common database practices to identify cause-effect relationships and the queries required to identify those.
When dealing with knowledge representation particularly in situations of queries where one is formulating a query or problem statement and looking for a solution, again the issues related to abstraction and knowledge representation particularly in very complex domains can severely limit the practical use of any such invention. U.S. Pat. No. 7,536,368 puts forward an invention of a problem analysis tool that automatically reformulates a problem statement into a natural language or Boolean query that is automatically submitted via a knowledge search tool to a database, and responses to this query from the database are automatically provided. Extracting what might be deemed as ‘key elements’ of the problem is not trivial. There exists the natural limitation of the user's knowledge, the limited representation within the knowledge database and therefore the limited extraction of meaning from it, and the difference that context can make both for the person conducting the query as well as the original context of the solution. The instant invention, as discussed below in more detail, addresses all of the aforementioned challenges: the development of ‘wikis’ for the database which enables people to ask questions and receive answers from the original ‘owner’ of the document/book etc. and/or the comments from other users enable an enriched context from which to query. Furthermore, the use of associative memories facilitates the speed of query results, improves scalability of the overall solution and leverages human intelligence as part of the solution. The use of MetaLanguage—an abstraction layer—combined with associative memories improves the overall solution yet again.
U.S. Pat. No. 7,120,574 describes a computer search that expands a user query with two synonym dictionaries—actions and object—and then validates the expanded queries with entries in a Subject-Action-Object Knowledge Database (SAO KB). This latter database is prepared from natural language texts and contains fields with subjects, actions, objects and ‘main parts of objects’ extracted from the object. The patent specifically lists verb-noun expressions that are synonymous with other verbs and relates to computer based search systems and in particular narrowing searches for the user's convenience. The instant invention uses a set of thesaurus' that are specific to a given domain of discipline and prepares a relationship of verbs to “Fundamental Actions”—that is a set of verbs that are abstracted and form a MetaLanguage across all disciplines. The philosophy is also very different for the instant invention: while verb-object relationships exist the philosophical approach of the instant invention is to emphasize the verb through the use of matching “Fundamental Actions” and de-emphasize the noun by categorizing the nouns in domains of “Fundamental Natures” in order to facilitate cross-industry applications. This has the additional benefit, especially in combination with associate memories, of speeding up the query. The instant invention also ties the use of Attributes to Fundamental Natures as opposed to the use of attributes of objects. This generalization or abstraction of attributes of Fundamental Natures (a category of the objects) is fundamental instead of specific and as such a significant departure from what was put forward in U.S. Pat. No. 7,120,574.
U.S. Pat. No. 6,167,370 field is an invention for document semantic analysis/selection with knowledge creativity capability utilizing subject-action-object (SAO) structures. The system performs substantially the same semantic analysis on each candidate document as performed on the user input search request. That is, the system generates an SAO structure(s) for each sentence of each candidate document and forwards them to the comparative Unit where the request SAO structures are compared to the candidate document SAO structures. Those few candidate documents having SAO structures that substantially match the request SAO structure profile are placed into a retrieved document Unit where they are ranked in order of relevance. The system then summarizes the essence of each retrieved document by synthesizing those SAO structures of the document that match the request SAO structures and stores this summary for user display or printout. Users can later read the summary and decide to display or print out or delete the entire retrieved document and its SAO's. The instant invention is a significant departure from the aforementioned approach. For one, the pattern recognition capabilities provided for in associative memories provide a rich context for addressing issues of relevance. Context dependency is critical for eliminating irrelevant queries. Furthermore, the instant invention puts forward a MetaLanguage that compares and contrasts Fundamental Natures, attributes and Fundamental Actions, in some embodiments, in the context of associative memories providing for improved performance and relevance.
Other patents of general relevance in semantic searches include:
U.S. Pat. No. 6,453,315—Meaning-based information organization and retrieval. Abstract: The present invention relies on the idea of a meaning-based search, allowing users to locate information that is close in meaning to the concepts they are searching. A semantic space is created by a lexicon of concepts and relations between concepts. A query is mapped to a first meaning differentiator, representing the location of the query in the semantic space. Similarly, each data element in the target data set being searched is mapped to a second meaning differentiator, representing the location of the data element in the semantic space. Searching is accomplished by determining a semantic distance between the first and second meaning differentiator, wherein this distance represents their closeness in meaning Search results on the input query are presented where the target data elements that are closest in meaning, based on their determined semantic distance, are ranked higher.
U.S. Pat. No. 7,689,410—Lexical semantic structure. Abstract: A lexical semantic structure for modeling semantics of a natural language input on a computer is described. A set of lexical semantic categories is selected to model content of the natural language input. A methodology associates content of the natural language input to one or more categories of the set of lexical semantic categories.
U.S. Pat. No. 7,558,778—Semantic exploration and discovery. Abstract: A semantic discovery and exploration system is disclosed where an environment enabling a developer or user to uncover, navigate, and organize semantic patterns and structures in a document collection with or without the aid of structured knowledge. The semantic discovery and exploration system provides techniques for searching document collections, categorizing documents, inducing lists of related concepts, and identifying clusters of related terms and documents. This system operates both without and with infusions of structured knowledge such as gazetteers, thesauruses, taxonomies and ontologies. System performance improves when structured knowledge is incorporated. The semantic discovery and exploration system may be used as a first step in developing an information extraction system such as to categorize or cluster documents in a particular domain or to develop gazetteers and as a part of a deployed run-time information extraction system. It may also be used as standalone utility for searching, navigating, and organizing document collections and structured knowledge bases such as dictionaries or domain-specific reference works.
U.S. Pat. No. 7,120,574—Synonym extension of search queries with validation. Abstract: A computer search involves expanding a user query with two synonym dictionaries—actions and object—and then validating the expanded queries by comparison with entries in a Subject-Action-Object Knowledge Database (SAO KB) in a discipline corresponding to the query. The latter is prepared from natural language texts and contains fields with subjects, actions, objects, and “main parts of objects” extracted from the object.
U.S. Pat. No. 6,246,977—Information retrieval utilizing semantic representation of text and based on constrained expansion of query words. Abstract: The present invention is directed to performing information retrieval utilizing semantic representation of text. In a preferred embodiment, a tokenizer generates from an input string information retrieval tokens that characterize the semantic relationship expressed in the input string. The tokenizer first creates from the input string a primary logical form characterizing a semantic relationship between selected words in the input string. The tokenizer then identifies hypemyms that each have an “is a” relationship with one of the selected words in the input string. The tokenizer then constructs from the primary logical form one or more alternative logical forms. The tokenizer constructs each alternative logical form by, for each of one or more of the selected words in the input string, replacing the selected word in the primary logical form with an identified hypernym of the selected word. Finally, the tokenizer generates tokens representing both the primary logical form and the alternative logical forms. The tokenizer is preferably used to generate tokens for both constructing an index representing target documents and processing a query against that index.
U.S. Pat. No. 6,161,084—Information retrieval utilizing semantic representation of text by identifying hypernyms and indexing multiple tokenized semantic structures to a same passage of text. Abstract: The present invention is directed to performing information retrieval utilizing semantic representation of text. In a preferred embodiment, a tokenizer generates from an input string information retrieval tokens that characterize the semantic relationship expressed in the input string. The tokenizer first creates from the input string a primary logical form characterizing a semantic relationship between selected words in the input string. The tokenizer then identifies hypernyms that each have an “is a” relationship with one of the selected words in the input string. The tokenizer then constructs from the primary logical form one or more alternative logical forms. The tokenizer constructs each alternative logical form by, for each of one or more of the selected words in the input string, replacing the selected word in the primary logical form with an identified hypernym of the selected word. Finally, the tokenizer generates tokens representing both the primary logical form and the alternative logical forms. The tokenizer is preferably used to generate tokens for both constructing an index representing target documents and processing a query against that index.
U.S. Pat. No. 6,101,492—Methods and apparatus for information indexing and retrieval as well as query expansion using morpho-syntactic analysis. Abstract: An index generator and query expander for use in information retrieval in a corpus. A corpus is provided as an input to an inflectional analyzer, which produces a lemmatized corpus having base forms and associated inflections for each word in the original corpus. The lemmatized corpus is provided as an input to a disambiguator, which performs part of speech tagging and morpho-syntactic disambiguation to produce a disambiguated corpus. The disambiguated corpus is provided as an input to a derivational generator, which produces an expanded corpus having all possible valid derivatives of each word of the disambiguated corpus. The disambiguated corpus is provided as an input to a transformational analyzer, using a grammar and a metagrammar for analyzing syntactic and morphosyntactic variations to conflate and generate variants, producing an index to the corpus having a minimum of variants. Alternatively, a query expander is provided utilizing similar techniques.
Therefore, an unaddressed need exists to accelerate the association between distinct bodies of research, patents, and documents—in a way that breaks through categorization and involves the original researchers to provide clarification, understanding, and simplification of the underlying mechanics, principles and/or laws discussed within the documentation.