Testing to determine competence in a particular field has long been of interest. Testing is used in virtually all areas of commercial enterprise and government to evaluate candidates seeking certification in certain professional capacities, including real estate brokers, attorneys, medical doctors, etc. As opposed to student/classroom testing, where the goal is to give the test taker an absolute score on a continuum, often the goal of tests used for licensing is to place a candidate above or below a cut point or pass/fail level of latent ability.
Licensure tests attempt to maintain consistency over a period of time, while reflecting changes in laws, rules, or advances in the field that affect its particular subject matter. Additionally, the use of such standardized tests, potentially over a period of time, creates security risks associated with multiple administrations of the test. These risks include hacking test host computer systems, test item harvesting and subsequent answer distribution, etc. To address some of these issues, multiple versions of tests may be generated for a given round of testing where each test instance, or form, is assembled from a pool of test items.
Linear on-the-fly (“LOFT”) or computer adaptive tests (“CAT”) have become popular alternatives for optimizing educational and psychological measurement for specific purposes while minimizing certain risks, such as test item harvesting. Linear on-the-fly exams construct test forms out of a pool of items or item sets either just prior to or while the test taker responds to the test items. LOFT forms are usually constructed to optimized measurement precision in certain regions of the score scale for all test takers, as in the case of certification or licensure exams where precision is maximized near the pass/fail or “cut” point. Adaptive tests present test items selected from a pool of test items by using responses to previous test items to estimate the test taker's latent ability. Adaptive tests seek to either minimize test time or maximize score precision for each individual test taker by selecting items that are most appropriate for the individual based on his or her apparent competence. In addition to maximizing measurement, both of these methods seek to present unique exam forms to each individual, thus minimizing the opportunity for one individual to share items with another. However, both of these methods have well-known liabilities. Neither method offers the opportunity for pre-delivery review of the test as presented by subject matter experts because the test is assembled during the actual test session. Without the benefit of specific review, the LOFT or CAT forms may be subject to previously unidentified interactions, such as test item enemies. Further while most LOFT and CAT algorithms seek to minimize departures from targeted psychometric and content constraints for the forms they assemble, they often cannot guarantee constraint compliance for individual test takers. In the case of both LOFT and CAT the aggregate psychometric and content properties of all the forms produced cannot be known ahead of time, only simulated. Existing LOFT and CAT algorithms do not track aggregate item exposure in real time for items in their pools and usually cannot guarantee that certain items will not be over-exposed or under-utilized. Current LOFT and CAT forms are assembled individually and in isolation and may underuse or overexpose portions of the item pool. Finally, both LOFT and CAT testing can sometimes require a significant amount of processing power at a testing facility that increases the cost and overhead of administering the test and may limit the number of facilities capable of delivering a specific test.