Roughly 85% of corporate information and 95% of global information is unstructured. This information is commonly stored in text documents, emails, spreadsheets, internet web pages and, similar sources. Further, this information is stored in a large variety of formats such as plain text, PDF, bitmap, ASCII, and others.
To analyze and evaluate unstructured information, there are a limited number of tools with limited capabilities. These tools can be categorized into four distinct groups of tools. These are (1) entity, concept and relationship tagging and extraction tools, (2) enterprise content management and knowledge management tools, (3) enterprise search categorization tools, and (4) document management systems.
Entity extraction tools search unstructured text for specific types of entities (people, places, organizations). These tools identify in which documents the terms were found. Some of these tools can also extract relationships between the identities. Entity extraction tools are typically used to answer questions such as “what people are mentioned in a specific document?” “what organizations are mentioned in the specific document?” and “how are the mentioned people related to the mentioned organizations?”
Enterprise content/knowledge management tools are used to organize documents into folders and to share information. They also provide a single, one-stop access point to look for information. Enterprise tools can be used to answer questions such as “what documents do I have in a folder on a particular terrorist group?” and “who in my organization is responsible for tracking information relating to a particular terrorist group?”
Enterprise search and categorization tools allow key word searching, relevancy ranking, categorization by taxonomy, and guided navigation. These tools are typically used to find links to sources of information. Example questions such tools can answer include “show me links to documents containing the name of a particular terrorist” and “show me links to recent news stories about Islamic extremism.”
Document management tools are used to organize documents, control versioning and permissioning, and to control workflow. These tools typically have basic search capabilities. Document management tools can used to answer questions such as “where are my documents from a particular analysis group?” and “which documents have been put in a particular folder?”
In contrast to unstructured or freeform information, structured data is organized with very definite relationships between the various data. These relationships can be exploited by structured data analysis tools to provide valuable insights into the operation of a company or organization and to guide management into making more intelligent decisions. Structured data analysis tools include (1) business intelligence tools, (2) statistical analysis tools, (3) visualizations tools, and (4) data mining tools.
Business intelligence tools include dashboards, the ability to generate reports, ad-hoc analysis, drill-down, and slice and dice. These tools are typically used to analyze how data is changing over time. They also have the ability to see how products or other items are related to each other. For example, a store manager can select an item and query what other items are frequently purchased with that item.
Statistical analysis tools can be used to detect fraud, check quality control, fit-to-pattern analysis, and optimization analysis. Typical questions these tools are used to answer include “what is the average daily network traffic and standard deviation?” “what combination of factors typically indicate fraud?” “How can I minimize risk of a financial portfolio?” and “which of my customers are the most valuable?”
Visualization tools are designed to display data graphically, especially in conjunction with maps. With these tools one can visually surf and/or navigate though their data, overlay and evaluate data on maps with a geographic information system (GIS), and perform link and relationship analysis. These tools can be used, for example, to show trends and visually highlight anomalies, show a map color-coded by crime rate and zip code, or answer the question “who is connected by less than 3 links to a suspicious group?”
Data mining tools are typically used for pattern detection, anomaly detection, and data prediction. Example question that can be addressed with these tools are “what unusual patterns are present in my data?” “which transactions may be fraudulent?” and “which customers are likely to become high-value in the next 12 months?”
Tools for analyzing structured data are far more flexible and powerful than the current tools used to analyze unstructured data. However, the overwhelming majority of all data is unstructured. Therefore it would be advantageous to have a middleware system and method that allows structured data analysis tools to operate on unstructured data.