Discovery is a process by which two parties in legal proceeding exchanges information, exhibits and documents according to specific rules of procedure. In a typical legal proceeding, a party (“requesting party”) may, pursuant to procedural rules, send a document request to another party (“responding party”) to compel the responding party to produce documents that contain many categories of subject matters. The responding party reviews potential documents, identifies documents containing any of the enumerated categories of subject matters, and produces them for the requesting party. Historically, the responding party reviewed paper documents, copied responsive documents, and produced them for the requesting party. Information technologies have caused companies to use electronic documents and thus. It is necessary to use an Internet review platform for the review of documents. In a typical document review, the representing law firm or the client retains a data company for providing data hosting services and retains contract attorneys (“the reviewers”) from employment agency to review documents on client computers. The reviewers can access the server of the review platform and download documents one by one for review.
The need for document review may arise from all kinds of causes such as civil actions, securities litigation, patent infringement, product liability claims, administrative actions, merger acquisition approvals, governmental investigations for statutory violations (violation of Foreign Corrupt Practice Acts), criminal actions, compliance reviews, and internal due diligence reviews. Different legal procedures and substantive laws require the responding party to produce different types of documents. As a result, there is no universal procedure for processing documents. Each review project requires unique tasks for the project manager and the reviewers. Each type of cases may require unique discovery process.
The documents sought by the requesting party depend upon the nature of claim and thus vary considerably. When a corporation acquires another corporation, the acquisition transaction may be subject to approval by the Department of Justice. This type of review is very unique in that the government only looks for possible antitrust violations. In nearly all cases, the government focuses on three types of relevancy: relevant products, relevant market, and relevant time. The reviewers must pay attention to any documents, which could raise antitrust concerns. In class actions, discovery is the most contentious. Disputed issues may revolve around looting, fraud, and failure to disclose important information. In patent infringement cases, issues may be patent validity, patent misuse, and inequitable conduct. Document review for this kind of cases requires the reviewers to identify infringing products and services.
In the case arising from government investigation, the government may issue subpoena to compel a corporation to produce certain documents. The kinds of documents requests vary from case to case although documents sought in the same type of cases often include certain similar documents. Some of the cases may arise under the law regulating communications, stockbrokers, and investment advisers. Some investigations may be focused on specific issues. Thus, document requests will be revolving around those issues. Other cases may require broader investigations. For example, if an investigation is focused on the accuracy of a submitted declaration, the focus of discovery will be on the declaration. If an investigation is directed at a specific kind of advertisements such as using fax, web mail, or bulk email, discovery would focus on those issues. Discovery tasks may include a search for finding documents that are concerned with advertisement methods. Some investigation cases arise under the Foreign Corrupt Practices Act, which prohibits corporations from giving anything of value to the officials of foreign governments. When a company is under investigation for violating this federal statute, review is focused on how money or gifts are used to improve business opportunities.
Internal due diligence review may be conducted to find internal misconduct such as looting, embezzlement, and steeling. For example, when a bank discovers that someone may have stolen or embezzled money, the bank may conduct an internal investigation. While such discovery does not always work, it is a proper step for finding some useful leads for an answer. Due diligence review is conducted for various other purposes. When a company is to acquire a business or a substantial amount of its assets, the acquiring company may have to conduct necessary investigation of the acquired company so that it can make an informed decision. The investigation is conducted to ascertain potential liabilities, outstanding debts, assets, revenues, cash flow, and intellectual properties.
Objectives of document production vary, depending upon the nature of cases and other factors. Regardless of the complexity of legal issues, the final objective for each document production project is to produce just enough documents to meet the requirements of the document request or subpoena and identify the documents that support the claims or defenses. However, due to the dynamics of litigation, the parties must consider additional objectives, which include producing a document database that is capable of scaling up and down and which will be useful in a later stage of litigation. Another common objective is to produce documents at the lowest costs possible.
All client companies make different products and sell different services. Thus their documents contain completely different substances. Despite their differences, they documents contain (1) information on a large number of projects, services, and processes, (2) the strange codes or casual names of products, services, and materials, (3) a large number of players such as employees, customers, attorneys and consultants, and other parties, (4) technical subjects of varying complexity, (5) jargon, abbreviations, and acronyms, (6) assumptions only understood by the sender and intended readers, (7) incomplete person names, place names, and discussion topics that can be understood only by involved people, (8) protected compressed and zipped files, (9) trade secrets protected by passwords, and (10) substance in one or more foreign languages. Due to any and all of those reasons, document review is not an easy task.
Corporate documents contain a large number of duplicates. A large number of duplicate documents arise from their document distribution practice, archiving, file backup, drive backup, media backup, and server backup. A document may be distributed to several, tens, and hundreds of employees. Some documents may be amended from time to time and sent to a large number of employees as updates. Each of the documents may be backed up by many ways, including file backup, drive backup, media backup, and server routine backup. Certain documents may have thousands of copies while other documents may have only tens to hundreds of copies. The large number of documents is primarily responsible for the high review cost.
Due to the large number of computer applications for creating documents and complex file histories, some documents cannot be properly processed for review. Documents cannot be opened due to (1) lack of supporting applications, (2) association with a wrong application, (3) missing necessary file components, (4) being linked to an unavailable file, (5) incorrect encode in the text in foreign languages, (6) corruption in its file structure, (7) infection by virus, and (8) lost part of information or damaged file structure. It is not always easy for reviewers to ascertain whether a document has a real technical problem. When a great number of documents cannot be opened, it is a disaster. The only possible solution is to find original documents. Documents incorrectly marked as technical problems may be routed back to reviewers for another round of review.
Many large corporations are doing business worldwide. As a result, corporate documents are written in different languages, depending upon geographic region where the documents are created or the authors and intended readers. Some documents are written in foreign languages, others contain foreign languages between lines, and yet others contain English translations. Some documents may be written in more than one language. It would be very difficult to have those documents reviewed. They go through several rounds of reviews. If such documents are important, they are translated to English.
Password protection of documents adds further complications. Passwords protected documents often appear in the document pools of software companies and technology companies. This class of documents can significantly reduce review speed. It is often difficult or even impossible to find right passwords. In many times, the reviewers treat such documents as trash or technical documents. The parties in civil litigation may reach an agreement on how to treat those documents. Now companies use zip files to send corporate documents by email. A zip file may contain tens to hundreds of files. Some zip files may contain database files or spreadsheets.
Document production is further complicated by unpredictable changes inherently in litigation. Litigation need frequently requires law firms to change every possible instruction including review standards, request definitions (specification definitions), coding rules, and different ways of handling of documents. The large number of documents in a review pool makes this matter even worse. Any fixes, adjustments, and corrective review would require a great deal of review time. The whole production process is full of changes, adjustments, fixes, quality checks, corrective reviews, and special reviews. The current electronic document review model, an extension of the conventional discovery model, lacks flexibility for handling dynamic changes. On top of so many complicating factors is the diversity of people involved. For any review, the parties involved include the client, litigation attorneys, project managers, document processors, staffing agency, and document reviewers. One single bad communication between any of them may result in an error that might require a massive corrective review.
The massive amount of case information, a large number of file types, commingled foreign languages, and prevalent technical problems are directly responsible for poor performance and unmanageable discovery costs. Many additional factors such as poor review plan, reviewers' inability, confusing review instructions, missed applications on client computers, poorly-worded definitions in the coding pane, bad structures of coding tree, and unavailable passwords are among other factors contributing to poor performance.
Because of the nature of documents, document review is a slow learning process. Meaningful review is not possible in the early stage of review even by experienced reviewers. When a reviewer learns more and more about document substances, the reviewer can substantially improve review quality and increase review speed.
Other problems such password protection can waste much more time. An operation from file selection, downloading, to unzipping the file can waste as much as 10 minutes per document. Moreover, whenever a reviewer is unable to open a document, the reviewer waits for help or repeatedly tries the same operations. The time wasted from this is much difficult to assess. Documents routed to a wrong destination must be routed back and forth without final resolutions.
In a classic document review model, documents are collected to form a review pool, and they are reviewed to identify those documents, which contain substances falling in one or more categories of the request. The definitions of the categories are provided in the document request. One of the document requests in a patent infringement case may be “any and all documents that discuss, mention, and relate to the patent in suit.” The document request may contain several to nearly hundred specific requests. The reviewers review all potential documents and find relevant documents. Those responsive documents then are further reviewed to determine if they are privileged and thus withheld from being produced. The review platform has a review tag database table for storing coding decisions such as responsive or non-responsive, privilege or not privileged. For a document that is responsive, the reviewer checks the responsive tag and all other applicable tags for the document. In addition, the reviewer may determine if a document is hot (Hot documents are those that are very important to the case) and code it accordingly. Responsive and non-privileged documents are produced optionally with a production log identifying each of the produced documents. The production log may contain only limited information for identifying each produced document.
Information technologies have caused companies and businesses to produce extremely large document pools, which can comprise more than a millions documents. Thus, reviewing and producing documents by the conventional manual method are no longer practicable. The e-discovery industry has become a big industry that a large number of companies are involved. The main areas of services include data collection, data processing, documents hosting, software development, employee staffing, training and consulting, and document review.
Since the deployment of Concordance, more than two dozens review systems have entered into the market. Each platform consists of a server and server application and plural terminal computers connected to the server. Well-known review platforms include Concordance, Applied Discovery, Iconect, Stratify, Ringtail, Introspect, Attenex, Summation, and Case Central. Each review platform comprises a server for loading and processing data and for sending documents through the Internet to a plurality of client computers where the documents are reviewed one by one. As shown in FIG. 1, each of the review platforms interact with document reviewers through a review user interface which comprises a document coding pane 100, a document list pane 110, a document view pane 120, and document advancing buttons 130. Regardless of the review platforms, the basic concept is the same. First, documents from one or more custodians of the responding party are collected and stored on a server. Hard copies of documents are scanned and saved as suitable image files. Electronic documents are converted into image files such as Tiff, PDF, and PNG. Certain electronic documents may be converted into text files by optical character recognizing software, while the files in their native formats and text formats are also available for download during review. All documents are loaded onto the server. They deliver electronic documents to review terminals in text, html, TIFF, PDF, or native files.
The files are indexed according to certain scheme, which is mainly for the convenience of assigning reviewing tasks to plural reviewers and tracking documents' processing statuses. Documents may be retrieved using specific search keys or by other specific processing methods. On some review systems, documents may be displayed as files in one parent folder on the review Browser of the client computer. Documents can be assigned to different reviewers by virtual folders or numbers' ranges. On other platforms, documents may be assigned to plural reviewers by assigning documents by a start and end bates numbers. They may be presented to reviewers in an order consistent with their consecutive bates numbers.
Plural reviewers review documents from client computers that are connected to the server. Usually, each of the reviewers can log into a personal review account and open an assigned folder or document range to review documents. If the platform allows plural reviewers to review documents by ranges, each of the reviewers must go to the start document number of his assigned document range. Each of the review platforms has at least two panes: one for viewing the document and one for marking the document (often known as “tagging tree”). They also have a pane for showing all documents in a list. In reviewing documents, the reviewer opens a document on the document pane, reads the document, and conducts required analysis. Upon finishing reading the document, the reviewer clicks all applicable check boxes on the tagging pane according to review instructions. Each of the check boxes, also known as “tags,” is associated with one of the categories or definitions. For example, the tagging tree on the tagging pane may contain the following checking boxes and definitions: [X] None-responsive, [ ] Responsive, [ ] Hot document, and [ ] Privileged document. Some of the tags may have many sub-classes of tags associated with specific definitions. The number and natures of definitions used in each case are unique and may be completely different from what are used in other cases. Thus, the server must allow a project administrator to set up and modify the tagging tree for each project. The reviewer may write a note for a document in an annotation field associated with the document. After the reviewer finishes the first document, the reviewer clicks a submission button. This process causes the server to write the values for the selected tags into the database for the document and causes the server to load next document. The reviewer repeats the same process in reviewing next document.
Responsive review may be conducted a second time as a quality control. Reviewing the documents, which have been marked as non-responsive, is not always conducted. However, a second review of responsive documents is common. Privileged documents will be subject to further reviews by a privilege team for final determination of the privilege status. When a document is determined as privileged, it is removed from the responsive pool and placed in the privileged pool. A log is produced showing document identities such as creator, addressee, other recipients, date of creation, privilege basis claimed, and brief description of the subject matter. Privilege review may be conducted twice or more. In addition, responsive documents are also reviewed for significance (hot document review). Separate review for hot documents may be used in highly contentious cases.
A typical production project may comprise two responsiveness reviews, one or two privilege reviews, one optional hot document review, creation of privilege log, and creation of hot document log. The total number of reviews can be more than five. The reviewers may conduct corrective review for documents that contain detected errors and inconsistencies or contain potentially useful or harmful substance. Other tasks include proofreading document log, proofreading privilege log, removing documents from a privilege log, reviewing documents produced by adverse parties, searching specific information in the documents produced by adverse parties, tabulating information from the documents produced by an adverse party, searching public records, constructing database data for events, acts, and conducts, constructing attorney's name table for privilege review, analyzing the substance of found documents. This list is not exhaustive, and the nature of tasks can only be defined by the need of litigation.
In addition to a broad spectrum of potential tasks, the unpredictable nature of litigation makes discovery project even more difficult. A change in the document request, a negotiated settlement on discovery scope, change of client's objective, filing of new claims and new defenses, entering or exiting of parties in the case, ruling of a motion, and settlement of claims can totally change the discovery plan, the scope of review, the custodians' number, coding tree structure, coding rules, and the handling of specific documents. Therefore, the costs for contentious case cannot be predicted.
Review of corporate documents is a difficult task because the subject matters in corporate documents may be about anything under the Sun. They may be written at any technical levels. Documents may contain a large number of special acronyms, terms and expressions, unfamiliar product numbers, short product names and requests, people's names, unfamiliar transactions, incomplete names of places and locations, and unstated or implied assumptions. Accordingly, documents are not readily understandable to anyone who is outside of the discussion cycle. Reviewers constantly struggle to understand unfamiliar terms, transactions, events, locations, and persons. If the task of e-discovery is to review old documents for a corporation whose staff has been changed completely, the current staff can do little to help reviewers understand what was written on the old documents.
Document production cost is a major part of litigation cost due to the large volume of documents to be processed. The costs for processing documents is anywhere from $1 to $15. If a client has one million documents to be reviewed and processed, the total production cost would be from $1 to $15 millions. For a large review project involving a hundred reviewers who work 10 hours a day at the billing rate of $150 per hour, the total fee would be $150,000 a day. If each of the documents needs 2 minutes on average, billed at $150 per hour, the total costs for this component alone could be $5 million. A document review for a merger may cost several millions and a due diligence investigation can cost tens of millions of dollars. Certain time-intensive tasks could cost considerable more. Those tasks include writing summaries for documents, translation of foreign language documents, and creation of a detailed production log, and producing privilege log and hot document log. A considerable amount of time is consumed in discovering review problems, conducting corrective reviews, and conducting additional review required by litigation needs.
The total costs for a review project is the sum of the costs for reviewing each document. The cost for reviewing each document directly depends upon the time used for each document. The time for reviewing a document comprises (1) the time for loading the document, (2) the time for reading the document, and (3) time for analyzing the document, and (4) the time for coding the document and saving document. If the time for loading a document is 1 second per document on average, the total cost could be 150*(1*1,000,000)/3600=$41,700 assuming that reviewers are billed at the rate of $150 per hour. Thus, when a law firm uses a network speed at 1 minute per document, the bottom line price would be $3.3 million. This time component depends upon the design features of the review system, maturity of the operating software, the availability of supporting applications, and sustained bandwidth for each client computer. A review platform, by feeding a massive number of illegible documents, alone can double or triple review costs. The second time component has a lot to do with the experience of reviewers and familiarity with the case. A reviewer with considerable experience in the field and knows the language context need less time to read the document. In contrast, an inexperienced or new reviewer may need more time to read the document. The third time component depends upon reviewer experience, the amount of case information, the nature of legal matter, and the complexity of legal issues. The last time component depends upon system design of tagging pane, coding logic, the client computer, and network speed. Impossible, confusing, and conflicting coding logic will cause reviewers to struggle. This component largely depends upon the design features of the review platform. Other factors, which can make this problem worse, include slow network speed, limited bandwidth, and the layout and design of review user interface. Anything that affects individual's review time will affect the total cost.
Documents may be reviewed in one to as many as ten rounds. The total cost is approximately proportional to the rounds of reviews. Anything that affects individual's review time and the number of reviews will affect the total cost. A great number of parameters can affect the total cost of a document production project. Any problem with any of the factors can substantially increase production costs. For example, a bad review platform may lack tools for performing tasks productively; inexperienced reviewers need more time to review documents; poor network condition takes longer time to download documents; a bad review plan may use more review passes to perform same tasks; and bad management may result in more errors responsible for corrective reviews, and sudden changes in litigation needs may require a corrective review.
Another reason for high costs is the time needed for conducting corrective review and fixes. Many of large production projects have more than a million of documents. While most of the documents allow project mangers to track the review status in some way, but it is not always easy to track them in all the times. Documents are reviewed and processed, their production log, hot log, and privilege logs are constructed, and further reviews are conducted to meet changing definitions. A quality control at any stage or on any work products may reveal a mistake, but it is not easy to correct the mistake. The mistakes and inaccuracies may find their ways to the document pools, the production log, the privilege pool, the privilege log, and hot document log. Certain mistakes such as omitted documents can be fixed without the need for checking the whole process. Some mistakes such as using incorrect definitions, using wrong tagging conventions, omission of required tasks, and use of a wrong analysis method are more difficult to correct. After a project has started for weeks and months, correction of such mistakes is by no means easy in practice. The task can be as tedious as picking up a few sands from a bowl of cooked rice. The cost can be very high if the only remedy for correcting the mistake is to conduct a corrective review for all suspected documents. Moreover, quality review and correction may have to be conduced for all affected work products such as the privilege log, the hot log and other special files. Entries may be modified in a log, added into a log or deleted from a log. Document production is an extremely time-consuming, extremely difficult, and extremely expensive task. Any small mistake is equivalent to waste of hundreds of thousands of dollars.
Great effort has been made to reduce the total discovery cost. Costs and review accuracy is intertwined. The highest accuracy can be achieved by spending unlimited time to review, study and examine each document. However, the accuracy must be achieved at reasonable costs and within a reasonable review time. One way to reduce the number of documents in the review pool in some cases is to conduct effective searches and to retrieve only certain documents to form the review pool. A well-designed search method may retrieve certain documents to build a document pool for review. Each of the documents is then subject to several rounds of reviews by the reviewers. Some computer search methods can reduce as much as 80% of documents. The deduced size of the document pool for review directly reduces the costs of production. Inability to remove junk documents is one of the reasons for high production cost.
To further reduce the costs, some companies have developed computer algorithms for automatically coding documents. Same sample documents are reviewed to identify keys and key matrix and they are used to search documents. Based upon whether certain keys and key combination are in documents, the server codes the documents accordingly. Such computer algorithms may reduce a great deal of costs but cannot used to code documents in contentious cases. Other algorithms may imitate the coding done by human for similar or related documents.
The whole review process is a learning process for learning massive case information. There are overwhelming new elementary facts and unknown or unfamiliar terms. After a reviewer spends a great deal of time to learn an elementary fact such as the infringing nature of a product denoted by a model number, each of the other reviewers will have to go through the same process to learn the same fact. There are a great number of such case facts must be learned by each of the reviewers in the whole team. In addition, when a reviewer does not have an opportunity to learn the elementary fact, the reviewer mostly probably makes a coding error for all documents containing the elementary fact. The document review industry has not recognized the need or importance of showing elementary work products in real time between all reviewers and there was no way for doing so.
The review site management understands the need to train reviewers in various methods to improve review quality. At document review sites, such information may be posted on a blackboard or clipboard for sharing. This effort is intended to identify coding problems and correct potential errors. Discussion meetings may be conducted on a daily or weekly basis. This method is, however, ineffective and inconvenient. Such oral communication is ineffective to discuss coding issues, and it is not intended to share elementary facts discovered by reviewers. Moreover, discussion by verbal dialog may increase communication errors. Some review sites have provided a questions-and-answers forum where the reviewers provide questions and the project managers will provide answers one day or several days later. Sharing information by using Window's share drive has been used as early as the birth of the windows operation system itself. However, this method presents several problems. First, such arrangement does not allow plural reviewers to write information to the same source and the operating system may lock up the file when one reviewer opens it. To avoid this problem, the reviewers must be allocated certain time windows to enter questions and waste a great deal of administrative time. Second, such a method cannot be standardized to implement many functions. Different cases may require totally different ways of organizing and sharing case information. Thus, the table can be implemented only by questions and answers. Finally, there is no suitable way to ensure that all information posted is accurate and reliable. Posting a piece of wrong information for sharing may cause other reviewers to make a wrong coding decision. As a result, only project managers and litigation attorneys can answer the questions. The method cannot be used to share elementary facts that may control coding decisions in many related documents. Additionally, it should be assumed that questions and answers be distributed by email, email attachments, web pages, or web page attachments. This method is available as early as the first day when web site technology is available. However, it is seldom used for the similar reasons. It cannot be used to share elementary facts real time, there is no proper way to ensure data accuracy if all reviewers can update the file or attachment.