The functionality of conventional search engines is mostly limited to matching keywords in the query with keywords in a content being searched. Conventional search engines cannot effectively handle natural language-based queries, especially when such queries contain fuzzy expressions.
For example, if a user is conducting a product search and the user enters a query string such as “find computers that are not too heavy”, or “large-screen smart phones”, etc., conventional search engines cannot understand the expressions that convey certain types of fuzzy meaning, such as “not too heavy”, or “large”, etc. More advanced technologies to build a more intelligent search engine is much needed.
Furthermore, a different type of problem in the field of textual data processing, or in the technical field of natural language processing in general, is that, in comparison to what is known as structured data, such as the data stored in various types of databases in various organizations, the majority of the data so far accumulated in this information and Big Data age are still in a format known as unstructured data, the type of data that are mainly in a free text format, such as documents, emails, text messages, social media user-generated data such as product or service reviews, comments, customer feedback, etc.
Extracting useful information, especially gaining insights from a large amount of unstructured data for better decision-making, and better customer satisfaction has long been a challenge to many companies or organizations. The currently available data analysis tools, including what is known as the structured query language and related statistical analysis tools, usually do not perform well with unstructured data due to such data being “unstructured”. One example of an unstructured data problem is that a company or organization of a certain size usually has a huge amount of unstructured data created either by its employees or by its customers. How to effectively handle such data to improve the overall business has been a difficult task to accomplish for many companies, or even governments.
However, while being considered “unstructured”, text data of various types do have their internal structures, such as linguistic structures, as well as pragmatic structures, and more importantly, informational structures. The challenge is that such structures are usually difficult to discern and analyze by automated means due to the limitations of the current technological stage and the complexity of human language.
More advanced technologies in handling the unstructured data by identifying relevant or critical information hidden in a large amount of unstructured data is much needed.