This disclosure relates generally to information systems. More particularly, the disclosure relates to techniques for generating a semantic space for electronically stored information.
Collaboration using electronic messaging, such as email and instant messaging is becoming increasingly ubiquitous. Many users and organizations have transitioned to “paperless” offices, where information and documents are communicated almost exclusively using electronic messaging. Also, “paper” based documents can be scanned and converted to electronic files using OCR (Optical character recognition). As a result, users and organizations are also now expending time and money to sort and archive increasing volumes of digital documents and data.
At the same time, state and federal regulators such as the Federal Energy Regulatory Commission (FERC), the Securities and Exchange Commission (SEC), and the Food and Drug Administration (FDA) have become increasingly aggressive in enforcing regulations requiring storage, analysis, and reporting of information based on electronic messages. Additionally, criminal cases and civil litigation frequently employ electronic discovery techniques, in addition to traditional discovery methods, to discover information from electronic documents and messages.
One problem with electronically storing information is that complying with disclosure requirements or reporting requirements is difficult because of the large amounts of data that may accumulate. As broadband connections to the Internet are common in most homes and businesses, emails frequently include one or more multi-megabyte attachments. Moreover, these emails and attachments are increasingly of diverse and propriety formats, making later access to data difficult without the required software.
Another problem is that disclosure requirements or reporting requirements do not simply require that the electronic message be preserved and then disclosed. Often, the disclosure requirements or reporting requirements are more focused toward the disclosure or report on information about the electronic message, such as who had access to sensitive data referred to in the contents of a particular electronic message. Some companies have teams of employees spending days and weeks reviewing emails in order to respond to regulatory audits and investigations. For these reasons, the inventors believe that users and organizations need electronic message analysis solutions to help lower costs in disclosing and/or reporting information related to electronic messaging and other electronically stored information.
In electronic discovery, whether it is for early case assessment or for improving speed and accuracy of review, it is critically important to identify as many responsive documents as is possible. Unlike typical web search engine technologies which focuses on identifying only a handful of most relevant documents, electronic discovery invariably is about minimizing the risks of overlooking relevant documents and minimizing expenses. This shifts the technical challenge from optimizing precision (finding only relevant documents) into one of increasing recall (finding most of the relevant documents).
Accordingly, what is desired is to solve problems relating to generating semantic spaces for electronically stored information, some of which may be discussed herein. Additionally, what is desired is to reduce drawbacks related to semantic analysis and automatic concept searches, some of which may be discussed herein.