In today's modern society, data is being entered by multiple users for countless different reasons. Once the data is entered, other users want the ability to search and sort the data to interpret the data and to find relevant results quickly. However, since the data is being entered by different users, in different formats, locations and languages, there may be inconsistencies in how the data is entered. Thus, for example a search of a specific word or topic may miss relevant information because of an inconsistent entry.
An example of this is user entered recipes that are posted on a web site. The structured portion of the recipe can be fields for the name of the recipe, ingredients, cuisine, and event. However, once the users start to enter the information into the fields, there can be a large variance in how the information is actually presented. In any structured field, misspellings can be common. Also, synonyms or alternate names for recipes and ingredients are typical. For example, some users may enter a recipe as “potato soup,” or “cold potato soup,” while other users may enter the name “vichyssoise.” However, a user searching for “vichyssoise” would likely want to see all available cold potato soup recipes.
This is true of other types of user-inputted data. Much of this data may be semi-structured or lack any structure whatsoever. The present invention can extract useful knowledge from the unstructured or semi-structured data.