LMai is an algorithm that explains the process of how machines can identify the relationship between the words automatically and act as guidance system to humans. To realize the above statement, LMai portrays certain novel techniques, by virtue of which it is possible for a machine to make good judgment of word relationship. Hence, LMai depicts information about a novel term decomposition technique, which is used to decompose words of less importance to extract useful or informative words or Keywords from the given document. A process by which the Topic of a document is automatically extracted; and a process by which the relationship between the Topics and the Keywords is established to identify the words that are related to each other. Further, the benefits of implementing LMai into various applications are discussed. One such application being the usage of LMai in Search Engines is explained.
Usually Search Engines retrieve data based on the relevance, page rank and other related criteria. If LMai is plugged into these types of Search Engines, LMai enhances the Search Engine capability to a great extent wherein context based results is portrayed to the end user.
A point to be noted is that Search Engines, which retrieve data based on the relevance, page rank seldom portray context based results.
LMai is not a search engine from scratch; it is an algorithm that is capable of identifying the related words from a set of documents automatically. For any given domain ex: Medical for instance, it needs an expert/doctor to map the related terms together. If “Heart Surgery” were to be the Keyword, a doctor would imply “Open Heart Surgery”, Minimal Invasive Heart”, “Heart Attack”, “Heart Bypass Surgery”, “Vascular Surgery”, “Angioplasty”, “Cardiac Catheterization” etc to be related to “Heart Surgery”. LMai defines this kind of relationship automatically without any set of training data. This is indeed a powerful feature; wherein a machine tries to behave like an expert, although not to an accuracy of 100% but it positively portrays very convincing results.
The some of the prior arts, which are related to our field of technology, are given below for the ready reference. Distinction between present technology and the prior arts also explained at the end of the prior art.
Document D1: U.S. Pat. No. 5,465,320—Method of automated learning, an apparatus therefore, and a system incorporating such an apparatus.
This invention discloses automated learning techniques by using neural networks. In order to speed up and simplify automated learning of rules by a neural network making use of fuzzy logic, data from a system is analyzed by a teaching data creation means which groups the data into clusters and then selects a representative data item from each group for subsequent analysis. The selected data items are passed to a rule extraction means which investigates relationships between the data items, to derive rules, but eliminates rules which have only an insignificant effect on the system. The results are candidate rules which are stored in a first rule base. The candidate rules are then compared with rules in a second rule base to check for duplication and/or contradiction. Only those rules which are not duplicated and not contradictory are stored in the second rule base. Hence, when fuzzy inference is used to control the system on the basis of rules in the second rule base, only valid rules which provide a significant effect on the system are used.
Document D2: United States Patent Application 20060217818 (Semantically Specified Syntactic Analysis)—Learning/thinking machine and learning/thinking method based on structured knowledge, computer system, and information generation method.
The Document D2 provides a learning machine capable of expressing/accumulating concept and semantic relation by understanding semantic relation of information as a relation between concepts after characteristics, semantic relation and structure of information and the like have been analyzed. The Document D2 is intended to realize a thinking machine in which information is inclusively collected and stored as knowledge structured based on the semantic relation of the collected information, information is generated by predetermined inference so as to have a new semantic relation in response to an inquiry or request and which can decide an optimal solution to an inquiry and the like by evaluating/deciding new generated information.
The object of D2 is achieved through a learning/thinking method based on a structured knowledge comprising: a knowledge input step for inputting inclusively collected data, information and knowledge; a knowledge structuring step in which a semantic relation is extracted from said inputted data, information and knowledge in accordance with a plurality of rules, meaning of information is analyzed based on said extracted semantic relation, a link indicates a semantic relation between a node and a node for indicating a meaning, said node and said link have structures so as to exchange their roles and structured knowledge expressed by said node and said link is stored; an information generating step for generating new information by predetermined inference such that a knowledge structured by said node and said link based on said semantic relation has new semantic content and semantic relation; a value judging step for evaluating and judging a new knowledge of generated information by verifying said information generated result with said knowledge base; a knowledge increasing step for accumulating said evaluated/judged result and new information generated knowledge in said knowledge base to increase knowledge; and an optimal solution deciding step for deciding and outputting an optimal solution in response to an inquiry or request from then outside, wherein said information generating step comprises: a relating node retrieving step for retrieving only a unit in which a relating node is stored; a relating link retrieving step for retrieving only a unit in which a relating link is stored; and a step for executing inference by using at least any one of analogical reasoning, inductive inference, abduction and association based on a retrieved result of said relating node retrieving step or said relating link retrieving step.
Document D3: U.S. Pat. No. 6,944,612 Structured Contextual Clustering Method and System in a Federated Search Engine
This document discloses a federated search engine which groups search results from information sources using attributes of the search results. In grouping the search results, a first set and a second set of attributes are extracted from content in each set of search results received using information source wrappers. The first set of attributes defines a main clustering strategy, and the second set of attributes defines a sub-clustering strategy. A main grouping of the sets of search results received from the information sources is generated using the first set of attributes. The main grouping of search results includes a plurality of labeled groups with a plurality of search results in each group. A sub-grouping of search results is generated for each labeled group of search results in the main grouping of search results using the second set of attributes.
Document D4: U.S. Pat. No. 7,107,254 Probablistic Models and Methods for Combining Multiple Content Classifiers
The invention applies a probabilistic approach to combining evidence regarding the correct classification of items. Training data and machine learning techniques are used to construct probabilistic dependency models that effectively utilize evidence. The evidence includes the outputs of one or more classifiers and optionally one or more reliability indicators. The reliability indicators are, in a broad sense, attributes of the items being classified. These attributes can include characteristics of an item, source of an item, and meta-level outputs of classifiers applied to the item. The resulting models include meta-classifiers, which combine evidence from two or more classifiers, and tuned classifiers, which use reliability indicators to inform the interpretation of classical classifier outputs. The invention also provides systems and methods for identifying new reliability indicators.
The probabilistic dependency models generated and trained by probabilistic dependency model builder are models that make classification predictions using a probabilistic approach to combining evidence. Examples of probabilistic dependency models include decision trees, neural networks, and Bayesian belief networks. Where the evidence considered includes the outputs of two or more classifiers, probabilistic dependency model builder can be described as a classifier combiner and is said to produce meta-classifiers. Where the evidence considered includes the outputs of only one classifier, probabilistic dependency model builder can be described as a classifier tuner. Probabilistic dependency model outputs can be of the same type as classifier outputs and often include a confidence level or utility associated with a classification decision.
Document D5: KR20050013025 Method for Knowledge Information Search Service Based on Search Engine for Classification System of Part-of-Speech on Interne
A method for a knowledge information search service based on a search engine for a classification system of a part-of-speech on the Internet is provided to classify emotion as well as any terminology or field by using the classification system using the part-of-speech and offer an expansion typed knowledge information search service using the classification system. CONSTITUTION: An information search classification system uses the part-of-speech classification system. A query sentence made by including a word classified in the part-of-speech classification system is one knowledge information unit. One query sentence, which is the knowledge information unit, is classified into the search classification system by classifying each word in the query sentence into each detail part-of-speech.
Though, all the documents D1 to D5 are related to Machine learning system and method thereof, but differ in the methodology used to achieve unsupervised machine learning.
Further, none of the documents D1 to D5 disclose the machine learning by LMai-Latent Metonymical Analysis and Indexing based algorithm. The crux of our invention lies in novel mathematical approach to identify the relationship between the words in a set of given documents (Unstructured Data). This technique or the algorithm does not necessarily need training data to make decisions on matching the related words together but actually has the ability to do the classification by itself. All that is needed is to give the algorithm a set of natural documents.
Document D1 is based on automated learning techniques which are achieved by using neural networks making use of fuzzy logic whereas our technology is not based on neural networks.
Document D2 explains a learning machine capable of expressing/accumulating concept and semantic relation by understanding semantic relation of information as a relation between concepts after characteristics, semantic relation and structure of information and the like have been analyzed.
By way of comparing our technology with document D2, we can clearly make out the difference like cited technology is completely based on Semantically Specified Syntactic Analysis which is not in our case. Our instant technology is not a mere semantic relation algorithm. Our instant technology does not necessarily need training data to make decisions on matching the related words together but actually has the ability to do the classification by itself.
Neither of the documents D1 and D2 talks about unsupervised machine learning and classifying the clusters by machine itself.
Document D3 discloses a federated search engine which groups search results from information sources using attributes of the search results. The document D3 fails in explaining automated learning and classification of data into different clusters.
The document D3 Prima facie might appear to be similar to our technology but if you look at the methodology used in creating clusters and classifying the contents in clusters is distinct from our invention. It does not talk about the automated learning and classification.
Preliminary set of clusters is created using a suffix comparison method and attribute-specific normalization techniques are adapted to operate with clustering methods at and that groups search results by detecting common suffixes between attribute content. Initially, the content of each specific attribute that is to be normalized is first converted using the attribute-specific normalization techniques into a set of words, so that the clustering method is able to differentiate between standard ‘text’ attributes and specific attributes.
Document D4 talks about Probabilistic models and methods for combining multiple content classifiers.
The probabilistic dependency models generated and trained by probabilistic dependency model builder are models that make classification predictions using a probabilistic approach to combining evidence. Examples of probabilistic dependency models include decision trees, neural networks, and Bayesian belief networks.
This technology also mainly based on neural networks and Bayesian belief networks. This technology does not give any idea of learning data by machine automatically and classifying the data into different cluster automatically.
Document D5 is a method for a knowledge information search service based on a search engine for a classification system of a part-of-speech on the Internet is provided to classify emotion as well as any terminology or field by using the classification system using the part-of-speech.
This technology is particularly developed for searching the documents. The method used to develop classification is distinct from our technology and it does not explain about unsupervised machine learning technique. The application of this technology is limited to search engine only.
LMai as described earlier is a novel concept for Advance Machine Learning or Unsupervised Machine Learning Techniques, which depicts a methodology to extract the relationship between words automatically without any guidance given to the machine.
LMai could be boxed as a plugin to amalgamate with applications that need it. In the context of this paper LMai is boxed as a plugin to sit on top of an already existing Search Engine.
Related Algorithms:
                1. PLSA        2. PLSI        