Most businesses, governments, entities and individuals rely heavily on computers for tasks such as word processing, e-mails, and various data-driven applications. With the ever increasing accumulation of electronic data files stored on computer systems everywhere, individuals and entities are faced with the daunting task of locating, extracting and analyzing the vast amounts of electronic data for various important and, in some case, critical tasks.
While various systems and tools exist today for searching computer-based data files, there are certain limitations in the existing products and procedures. For example, many existing tools are directed towards visualizing semantic network relationships between the concepts found in unstructured documents such as emails, word documents, spread sheets, PowerPoint presentations, text in CAD drawings, and the like. These tools are generally known as “text-mining” tools and can be used to analyze various unstructured documents by extracting common concepts and terms from the various documents.
These tools are typically used to acquire and analyze electronic documents by preparing an extensive database of the captured documents and the various indices that track the terms, concepts and metadata associated with these documents (From, To, CC, BCC, date created, subject, title, author etc). Then, other tools can be used to visualize the relationships, if any, which exist between those documents by providing an overview of the relationships based on the semantic content of the documents. Additionally, there are other tools that provide methods for investigating and analyzing the details of the relationships between the documents as well as the associated content. Similarly, other tools in widespread use today are capable of various traditional data-mining activities and can be used to analyze structured data such as databases, spreadsheets and the like.
While these various tools have been useful for certain limited data analysis purposes, there are certain circumstances where these tools are not sufficient. Even though it may be desirable to analyze the relationships that exist between documents that exist in both unstructured data stores as well as structured data stores, this task can be difficult if not impossible to achieve. For example, even though e-mail messages and various financial transactions (e.g., checks, a wire transfer between banks, an A/P or A/R entry) can all be generalized as “documents,” there is presently no convenient or efficient way to correlate and/or analyze these disparate documents. The limitations of the present technology include, but are not limited to, the lack of a standardized central message store, the lack of a process or procedure to identify a given individual or entity given the many different names, aliases, e-mail accounts, bank and brokerage accounts, etc. that exist for each represented individual or entity, and the like.
Furthermore, even if the disparate documents can be related in some fashion, much of the information, primarily the content, associated with a given message or group of documents is unstructured or semi-structured and there are no convenient tools available to perform any meaningful analysis using this information. Additionally, retrieval of the targeted information is usually limited to the use of Boolean logic queries against the structured information and the associated simple terms even when access to the unstructured information is provided. Finally, when using the analytical tools presently available, the typical data visualization technology is generally limited to the presentation of “lists of lists of lists,” typically presented in some type of tabular format. This is hardly a convenient, efficient, or effective way of analyzing complex concepts, particularly regarding the structure of the relationships between dozens, hundreds, thousands, or even millions of documents. Accordingly, even with a significant amount of time and effort expended, it is not always easy or even possible to locate, extract, correlate and/or analyze the desired data, even if it does exist.
As can be seen by the discussion presented above, there are many limitations inherent in the present systems and tools for searching and analyzing the electronic data files presently stored in various computer systems. Accordingly, without the development of new and useful methods and tools to perform additional document analytics and visualizations on both structured and unstructured information, the ability of users to extract the desired data for effective and efficient decision-making purposes will continue to be suboptimal.