This invention generally relates to converting structured data into unstructured text to enable natural language searching.
Often, when people have a topic to research, they search through the Internet; and in order to research a given topic, people will often perform a natural language or keyword search. A natural language search is a search wherein the searcher uses a regular, spoken language, such as English, to enter a search query. For example, when searching through the catalog of a home improvement retailer, a searcher may identify a particular product and enter “what are the dimensions” in a search box.
Natural language searching does not necessarily use the language as it is usually spoken (i.e., in sentences), and one or more words or terms in a search query may be used in a way that does not form a standard sentence. For instance, a person searching through a home improvement catalog who wants to know what colors of refrigerators are available, may enter “Refrigerator Colors.”
Search using natural language is an extremely user friendly method of searching. Most enterprises' systems of records, however, have data in a structured format, and this data is not readily available for natural language search.
Differentiating between unstructured data and structured data is based upon whether the data is associated with a logical schema. Structured data is data that is associated with a logical schema, while unstructured data is data unassociated with a logical schema. Thus, unlike unstructured data, structured data is associated with a specification as to how the data may be found or located in an unambiguous manner. For example, a specification for a relational database table of ordered names, street addresses, towns, states, and zip codes might state that zip codes are found in column five (whereas names, street addresses, towns, and states are found in columns one, two three, and four, respectively).
Examples of structured data include, but are not limited to, relational databases (which use the Data Definition Language [DDL] for writing logical schema), XML databases (which use an XML schema to describe the structure of XML files and the types of the data contained therein) and spreadsheets (which provide a manner in which to accurately identify data stored within fixed fields within a record or file). Examples of unstructured data include, but are not limited to, email messages, word processing documents, documents in .pdf format, web pages, and other types of data comprising free-form text. Thus, as mentioned above, the difference between structured data and unstructured data is that structured data is associated with a specification as to how data may be found or located in an unambiguous manner.
Unfortunately, natural language search engines are ineffective at providing search results from structured data. This problem is made more acute by the fact that people are becoming more and more accustomed to searching for information using natural language searches.