This disclosure relates generally to machine learning. More particularly, it relates to teaching a machine learning system to cluster a set of natural language queries based on significant events.
It is known for a computer to receive a natural language query and perform a search on a database or corpus of documents based on keywords identified in the query. In a natural language query, it is typical for a parser to identify some of the words in the query as more meaningful than others in the query. For example, common words such as articles, “the”, “a” and “an” are rarely accorded the importance of a keyword, while nouns are often selected by the system as keywords. The system accepts a natural language query (NLQ) as an input, extracts keywords of interest from the query and identifies relevant articles, documents or events in the database or corpus based on the keywords identified in the query. Keywords are sometimes grouped as “entities”, also referred to as named entities, which are effectively clusters of keywords that share certain properties. Some common examples of entities are places, persons or organizations.
In the prior art, the identified keywords derived from a natural language query are largely limited to the keywords which occur within the actual query, perhaps augmented by a list of synonyms of the identified keywords from a dictionary or thesaurus.
Further improvements in the computer aided search mechanisms are needed.