In comparison to what is known as structured data, such as the data stored in various types of databases in various organizations, the majority of the data so far accumulated in this information and Big Data age are still in a format known as unstructured data, the type of data that are mainly in a free text format, such as documents, emails, text messages, social media user-generated data such as product or service reviews, comments, customer feedback, etc.
Extracting useful information, especially gaining insights from a large amount of unstructured data for better decision-making, and better customer satisfaction has long been a challenge to many companies or organizations. The currently available data analysis tools, including what is known as the structured query language and related statistical analysis tools, usually do not perform well with unstructured data due to such data being “unstructured”. One example of an unstructured data problem is that a company or organization of a certain size usually has a huge amount of unstructured data created either by its employees or by its customers. How to effectively handle such data to improve the overall business has been a difficult task to accomplish for many companies, or even governments.
However, while being considered “unstructured”, text data of various types do have their internal structures, such as linguistic structures, as well as pragmatic structures, and more importantly, informational structures. The challenge is that such structures are usually difficult to discern and analyze by automated means due to the limitations of the current technological stage and the complexity of human language.
More advanced technologies in handling the unstructured data by identifying relevant or critical information hidden in a large amount of unstructured data is much needed.