The following terms are herewith defined, at least some of which are referred to within the following description of the present disclosure.    BPS Biased Probabilistic Sampler    CAL Continuous Active Learning    DS Diversity Sampler    IR Information Retrieval    LDA Latent Dirichlet Allocation    LSA Latent Semantic Analysis    OCR Optical Character Recognition    ROC Receiver Operating Characteristic    SAL Simple Active Learning    SPL Simple Passive Learning    SVM Support Vector Machines    TAR Technology-Assisted Review    TF-IDF Term Frequency-Inverse Document Frequency
In recent years, technology-assisted review (TAR) has become an increasingly important component of the document review process in litigation discovery. This is fueled largely by the dramatic growth in data volumes that may be associated with many matters and investigations. Potential review populations frequently exceed several hundred thousands of documents, and document counts in the millions are not uncommon. Budgetary and/or time constraints often make a once traditional linear review of these populations impractical, if not impossible, which has made “predictive coding” the most discussed TAR approach in recent years. A key challenge in any predictive coding approach is striking the appropriate balance in training the system. The goal is to minimize the time that the subject matter expert(s) spend in training the system, while making sure that the subject matter expert(s) perform enough training to achieve acceptable classification performance over the entire review population. Recent research demonstrates that Support Vector Machines (SVM) perform very well in finding a compact, yet effective, training dataset in an iterative fashion using batch-mode active learning. However, this research is limited. Additionally, these research efforts have not led to a principled approach for determining the stabilization of the active learning process. These needs and other needs are addressed by the present disclosure.