The present invention relates in general to analysis of information items and in particular to systems and methods for determining communication chains between parties from electronic records.
With the proliferation of computing devices and communication networks such as the Internet, an ever increasing amount of information is exchanged in electronic forms such as e-mails (electronic mail messages), instant messages (IMs), electronic document memos, etc. Electronic communication forms generally provide simple and easy to use, yet powerful, mechanisms for communication of information. To take just one example, the use of e-mail provides a number of advantages over traditional communication techniques such as phone and fax-based communications, including cheaper cost, reduced delivery time, ability to handle multiple document formats, and archival capabilities.
E-mails are usually stored in databases including mail specific databases such as Microsoft Exchange or Lotus Notes. Given the critical nature of corporate data, e-mails are usually backed up regularly on to backup media. The fact that e-mails can be and are archived coupled with the fact that users tend to be more direct and forthright about information in e-mails make them excellent candidates for analysis for legal purposes.
While the potential dangers of discovery of “harmful” e-mails have caused some companies to introduce policies to destroy old e-mails and backups, most companies still maintain backups of old e-mails for at least some period of time (usually for a few years). Further, government regulations (e.g., Sarbanes-Oxley at the federal level) require may companies to maintain such e-mail backups for an extended period of time (e.g., several years).
At present, other types of electronic communications are less likely to be archived; however, companies are beginning to archive more types of communications, particularly for regulatory compliance purposes. For example, many companies have begun to archive IMs and/or voice mail messages. Companies are also adopting electronic forms of more traditional types of communications, such as internal memoranda and letters, which are often circulated in file formats such as PDF, Microsoft Word, or the like, and documents in these formats often are archived. With the growth of electronic calendar systems, teleconferencing, video conferencing, and Web conferencing, electronic records of when meetings occurred and who was in attendance may also become increasingly available in the future.
When a company is involved in a lawsuit, counsel on both sides typically search the company's records for evidence of activity that might prove liability or exculpate the company. Even in the absence of a lawsuit, corporate counsel might want to examine communication records for evidence of a crimes or other activities for which the company might be held liable or to satisfy reporting requirements as to the lack of such activity. Typically in such situations, e-mail archives and document archives are searched for particular keywords, senders and receivers, and the search results are manually reviewed by a human. For example, the lawyers involved in a lawsuit might look for critical documents and/or e-mails, then try to trace the path of the documents and/or e-mails through the system to establish when critical pieces of information became known to certain people within the company. Commonly asked questions related to communications include: Who within the company knew about a certain piece of information? When did the person know it? Who conveyed the information to the person? Through what channel? Did others receive this communication? To whom did the person convey or forward the information?
To try to help answer such questions, a number of existing search systems allow a user to extract and search e-mails. For example, some e-mail systems provide an administrator console that allows an authorized user to search a database of stored or archived e-mails by date, sender, receiver, and keywords. Some systems of this type do not have the capability to search attachments to the e-mails, where important information is often to be found. Other systems improve on the administrator console by extracting the e-mails and attachments to another repository and indexing the content there, enabling an authorized user to search both e-mail and attachments at the same time. Using systems of this kind, a user can identify all e-mails having particular keywords but must then manually review the e-mails in order to determine how information propagated through the organization, that is, to identify communication chains. Further complicating the problem is that communication chains may be direct links (e.g., an e-mail sent by user A to user B) or indirect (e.g., an e-mail sent by user A to user C, who then forwards it to user B) chains involving one or more intermediaries. A direct link can be established from a single message, but finding indirect chains generally requires correlating multiple messages.
To facilitate detection of indirect communication chains, some e-mail search systems also allow e-mails to be grouped into threads of presumptively related messages. These systems typically group messages into threads based on the subject headers and/or related-message headers that are included in most e-mail messages. For instance “Re:” and “Fw:” or similar prefixes are commonly added to subject headers to identify e-mails that reply to or forward a previous e-mail. An e-mail with a particular subject line and other e-mails whose subject lines differ only by the addition of “Re:” or “Fw:” can be grouped into a thread and organized, e.g., by time sent or time received. Related-message headers use message identifiers (e.g., serial numbers or other codes) assigned to each message, or in some systems to threads of replies and/or forwards, to identify one or more messages to which they relate. Changing the subject line when forwarding or replying to a message may defeat thread detection based on subject lines but generally does not defeat thread detection based on related-message headers.
Either of these systems, however, can be defeated (intentionally or unintentionally) if an e-mail recipient conveys the information further by some mechanism other than replying to or forwarding the received message. For instance, an e-mail recipient might compose a new e-mail message with a new subject line or pass on the information through a different channel, such as IM or voicemail. The new message will not be related to the old message in any way that a thread-based e-mail grouping system can detect. Consequently, a user who wants to reconstruct a communication chain will need to do so manually. Since message recipients often propagate received information in diverse ways, the ability of existing thread-based systems to identify communication chains is significantly compromised.
It would therefore be desirable to provide systems and methods for determining communication chains in a wider range of situations than existing systems support.