Testing services administer a variety of standardized tests. For example, the Graduate Management Admission Test® (GMAT®) evaluates graduate business school applicants by measuring general verbal, mathematical, and analytical writing skills. The Graduate Record Examinations® (GRE®) assists graduate schools and departments in graduate admissions activities. Tests offered include the General Test, which measures developed verbal, quantitative, and analytical abilities, and the Subject Tests, which measure achievement in 14 different fields of study. The Scholastic Assessment Test® (SAT®) Program includes the SAT I: Reasoning Test and SAT II: Subject Tests. The SAT I is a three-hour test, primarily multiple-choice, that measures verbal and mathematical reasoning abilities. The SAT II: Subject Tests are one-hour, mostly multiple-choice, tests in specific subjects. These tests measure knowledge of particular subjects and the ability to apply that knowledge. Colleges and universities typically use the SAT® Program as a factor in determining admission or placement of prospective students. Individual states also administer tests to determine whether and to what extent students meet state standards for educational achievement.
Many tests, such as the above-mentioned tests, are offered multiple times during a year and/or are administered over multiple years. It is important, in the case of tests that are offered multiple times during a year, that the different administrations of each test be approximately equal in difficulty in order to properly rate examinees from different testing dates against one another. For tests that are administered over multiple years, it is important that each test be of a known difficulty level to accurately assess an examinee's performance and progress. Moreover, it may be important to evaluate other psychometric specifications and statistical properties for a given test prior to its administration.
Some current methods for constructing tests, including those using a computer interface, permit a test developer to view and select test items. Other methods can display a match between content specifications and the content properties of the selected test items. For example, such test construction systems typically keep track of metrics such as the number of questions that test a particular subject. On the SAT I, for example, questions are divided into mathematics and verbal questions. Additionally, the test construction system could also keep track of the number of questions that are devoted to a sub-topic (such as geometry or algebra) or that are presented in a certain format (such as an analogy completion, sentence completion or word problem). By identifying the number of questions of a particular type included in the developed test, the test developer may be alerted if an incorrect number of questions or an incorrect number of questions of a particular type are included in the test.
However, systems implementing these methods do not combine all of the features listed above to permit the test developer to develop tests more quickly, while at the same time including the ability to determine if the selected test items meet psychometric specifications for a test and also permitting a test developer to examine content or psychometric specifications during the test development process so that the test developer can add, remove or replace test items to adjust for deficiencies with respect to test specifications during the test item selection process.
Thus, a need exists for an evaluation tool that determines whether defined content and psychometric specifications for a test are met by a particular question set.
A further need exists for providing psychometric and statistical information to a test creator during the test creation process to permit evaluation and adjustment of the selected test items during the test creation process.