The amount of data electronically recorded is steadily growing. The data may comprise of structured data (such as database records or xml data) or unstructured data (such as documents, books, media files and such). The data needs to be processed, transformed and abstracted into information so that enterprises could apply the knowledge gained from the data for their benefit. The process of extracting knowledge from data is called data mining.
Today data mining applies to almost all areas; be it banking, retail, healthcare, telecommunication, genetics etc. Data analysts use a variety of tools and techniques to extract, transform and comprehend gigabytes or terabytes of raw data from multiple sources. They transform the raw data into summary of information such as reports, anticipating discovery of previously unknown trends and facts.
Generally, every transaction recorded in raw data represents a fact, but at a much lower granular level than anticipated by a decision maker who wants to comprehend and act based on the facts. It is not practical for a human being to stay focused and consume a large amount of information in a timely manner and manually pick up the “prominent trends” leaving out less important facts. Hence the need of a convenient mechanism or tool arises that can abstract terabytes of data to a short summary report of a few kilobytes.
There are numerous statistical and non-statistical methods and tools available and used by today's data analyst for exploring and correlating the elements within data. The unstructured data analysis is relatively complex as compared to the structured data analysis because structured data can be easily indexed. Recent advancements in natural language processing techniques, statistical techniques etc. provide improved results in parts of speech tagging, extracting keywords from documents, face or object recognition in media files. These advancements set the stage for designing additional practical methods for further processing and summarizing of data which can make knowledge discovery from of unstructured data easier. For example, computational semantics process extracts the meaning representation from different types of texts.
Thus, a set of key elements can be mined from an artifact and is normally used for classification of the artifact, creating abstract at the artifact level as such. ‘Key elements’ are the elements associated with an artifact such as keywords/key phrases in text/audio/video files, face or object recognition in media files and the like. However, the key elements extracted from unstructured data does not include other parameters such as but not limited to, relationship between the key elements, frequency of the key-elements and the frequency of the relationship between the key elements. Without considering other parameters of the elements, users have limited options to modify the data network and perform efficient analysis based on the importance of the key elements.
Therefore, there is a need for an efficient knowledge discovery of both unstructured and structured data and to have a data network that can be modified based on different parameters like relationship between the key elements, frequency of the key-elements and the frequency of the relationship between the key elements. Thereby, enable the user to have a clear picture of the facts and their importance.
Improved systems and methods for analyzing data and creating modifiable data network are desired.