Reviewing and categorizing large corpuses of electronic documents can be a time consuming endeavor. For example, users may be assigned a subset of a corpus to review and categorize manually. Corpuses, however, may include on the order of millions of electronic documents that may need to be reviewed and categorized in a very short time period, and manual review of the corpus may not be efficient enough to accommodate such narrow time periods.
Some automated techniques for reviewing and categorizing corpuses of electronic documents exist that may provide improved efficiency over manual review. The available automated techniques, however, are not without their own flaws. For example, some automated techniques may produce highly inaccurate categorizations of electronic documents and may not provide a robust mechanism to improve the performance of the automated techniques. As a result, the existing automated techniques may result in relevant documents from being missed and/or highly confidential electronic documents being inadvertently provided to a third party.