A standardized test is a test that is administered and scored in a consistent, or “standard,” manner. As discussed in U.S. Pat. No. 6,234,806 (Trenholm et al.), which is incorporated by reference herein, standardized tests are administered to examinees (also, referred to herein as “test-takers, respondents, or users) for educational testing and evaluating particular skills. Academic skills tests include SATs, LSATs and GMATs.
The educational needs of modern society continue to evolve, and thus the desired academic skills that employers require likewise change. Academic skill tests thus must continue to evolve to properly gauge an examinee's abilities with respect to new skills. To meet this need, extensive research is conducted to identify the new skills that need to be tested for, and to create standardized test questions which can accurately measure the examinees' proficiency with respect to the new skills. Professionals, such as psychometricians who work in the educational measurement field, translate these identified skills into new test question formats and actual test questions. There are many feedback loops in the test development process. For example, even after suitable test question formats are identified and actual test questions are generated, it is still necessary to conduct extensive field tests over large populations of examinees to evaluate whether test responses are accurately measuring the examinees' proficiency with respect to the identified skills. Psychometricians use a plurality of well-known metrics to measure whether test responses from a pool of examinees meet this standard. Some of the metrics include item analysis, response latency analysis, form analysis, equating analysis and differential item functioning (DIF) analysis.
One important aspect in developing test formats is the scoring process. There are many ways to score a test. The manner in which test responses are presented and scored can greatly influence whether the test accurately measures the skills of the examinee that the test-giver wishes to gauge. Some scoring factors include whether full or partial credit is used and the score scale. For example, if a test provides multiple response opportunities for an item, the same exact test might be an inaccurate gauge of a skill to be measured if the scoring factors are not properly selected. Again, extensive analysis by psychometricians of field test results must be performed to confirm that a proposed scoring process should be adopted.
Furthermore, the field tests should be performed on a general population of examinees who are taking standardized tests in the ordinary course of their educational advancement, such as when applying to higher education programs. Performing tests in laboratory settings does not provide a sufficient environment to ensure that the test results will accurately reflect examinees' performance on a real-world version of the test.