For many years standardized tests have been administered to examinees for various reasons such as for educational testing or for evaluating particular skills. For instance, academic skills tests (e.g. SATs, LSATs, GMATs, etc.) are typically administered to a large number of students. Results of these tests are used by colleges, universities and other educational institutions as a factor in determining whether an examinee should be admitted to study at that educational institution. Other standardized testing is carried out to determine whether or not an individual has attained a specified level of knowledge, or mastery, of a given subject. Such testing is referred to as mastery testing (e.g. achievement tests offered to students in a variety of subjects and the results being used for college entrance decisions).
FIG. 1 depicts a sample question and sample direction which might be given on a standardized test. The stem 4, the stimulus 5, responses 6 and directions 7 for responding to the stem 4 are collectively referred to as an item. The stem 4 refers to a test question or statement to which an examinee is to respond, e.g., question 13. The stimulus 5 is the text and/or graphical information (e.g., a map, scale, graph, or reading passage) to which a stem may refer. Often the same stimulus is used with more than one stem. Some items do not have a stimulus. Items having a common stimulus are defined as a set. In FIG. 1, questions 13 and 14 refer to stimulus 5 and therefore form a set. Items sharing common directions are defined as a group. Thus, questions 8-27 form a group. Only questions 8-14, however, are shown in FIG. 1.
A typical standardized answer sheet for a multiple choice exam is shown in FIG. 2. The examinee is required to select one of the responses according to the directions provided with each item and fill in the appropriate circle on the answer sheet. For instance, the correct answer to the question stated by stem 1 is choice B of the responses 3. Thus, the circle designated 8 in FIG. 2 corresponding to choice (b) is the correct answer to this item, i.e. question 13 should be filled in by the examinee as shown.
Generally, examinees register to take a particular test, by filling out a registration form and sending it to a test processing center such as Educational Testing Service, Princeton, N.J. by a specified registration date. A registration form usually requires that an examinee provide information such as the examinee's name and address, test to be taken and some related biographical information. After all of the registration forms have been received by the test administration center, the examinee information such as name, address, some recipients background questions, etc., is processed. Each examinee is scheduled to take the test by assigning to that examinee a place and time at which the test can be administered to that examinee. Typically, a number of examinees are scheduled to take the test at the same time and same place to conserve on administrative costs. One or more test administrators will be present at the locations where the test is scheduled to be taken.
Test administrators are generally responsible for distributing the test material, providing instructions to the examinees, monitoring any timing constraints required by the particular test and collecting the test material when the testing time has ended or when the examinee has finished taking the test. After collecting the examinees' responses and other test material, the administrator either directly or indirectly sends them back to the test processing facility, for scoring and evaluation.
After all of the examinees' tests are graded, statistical and other processing may be provided for various reasons. For instance, to assess one examinee's score, it is necessary to compare his or her score to those of other examinees taking the same test. Another important reason to evaluate the test results for statistical purposes is to create and update an informations bank containing the performance statistics of each item used or created for previous tests. This information may then be used for the creation of future tests.
A goal of standardized testing is to construct a test efficiently for the purpose of measuring a skill, ability, etc. Therefore, each test is constructed to conform to a test specification which defines the rules and/or constraints for selecting the items. In constructing a test, test developers select items from a pool of items so that the combination of selected items satisfy the test specification.
A test is typically divided into sections of questions. The test specification generally defines the number of items to be presented in the test, the number of test sections, the number of questions in each section, the time for taking the test, and the allotted time for responding to all the items in each test section. The test specification also specifies criteria for item selection. These are based on at least four item characteristics which include: (1) item content, e.g., mathematical questions relating to arithmetic, algebra, or geometry; (2) cross-information among items, e.g., more than one item testing the same point; (3) number of items/set, i.e. a identification of a subset of items of a larger set; and (4) statistical properties of items derived from pretesting, e.g. difficulty of the selected items.
In recent years, these methods for creating, delivering, administering, and scoring tests have been determined to be inadequate. Due to the number of examinees taking standardized tests, the demand for developing new and more diverse tests and a need to provide more flexibility in scheduling tests without sacrificing administration costs and security have increased.
One solution to these demands would be to automate the entire testing process. However, up until now only a few attempts have been made to automate only portions of the testing process. Furthermore, these attempts are limited in their ability to generate a variety of item types. They are not modular in their design to allow independent replacement of software or hardware, nor do they provide security and integrity features required for a standardized testing environment.
There have been attempts to develop computerized tools for instructional purposes. These products, although primarily geared to delivering instructional systems, often contain testing components as well. Some examples of instructional programs are available from Computer Curriculum Corp., Computer Networking Specialists Inc., Computer Systems Research, DEGEM, Ideal Learning, Josten's Learning Corp., New Century Education, Plato Educational Services--TRO Inc., Unisys--ICOPN System, Wasatch Education System, and Wicat Systems. Wasatch Courseware, for instance, provides on-line tools, such as a notebook, a pop-up calculator, word processor, graphics tool, glossary, and a database embedded into the lessons. Josten's Learning Corp. provides some flexibility in the hardware and software available for executing lessons such as networked or non-networked systems, the use of third party software, and the ability to operate its instructional system from a remote site. Ideal Learning has a management system which is also capable of accommodating third party software, and its test scoring system can score tests which are generated by a number of test developers including standardized tests. The DEGEM System is a networked system which is capable of providing statistical data on student or class progress. Therefore, although some of these instructional programs incorporate some features which could be utilized in an automated standardized computer-based testing system, none of them provides a flexible and integrated system for developing, generating, delivering, administering and processing computerized standardized tests.
There are also a number of systems for computerizing parts of the test construction process. (See e.g., a review by Hsu and Sadock (1985)). Perhaps the most comprehensive of these testing programs is the MicroCAT System developed by Assessment Systems Corporation. The MicroCAT System comprises four primary subsystems, one for each of development, examination, assessment, and conventional testing.
Although MicroCAT has been noted for its comprehensiveness, it has been criticized for a number of limitations. For instance, development of a test having a specification which does not match one of its predefined templates requires a detailed understanding of MicroCAT's programming language. Its graphics tools are very limited, and other commercial drawing packages such as PC Paint cannot be substituted for MicroCat's graphics. Furthermore, there is no on-line help available from either the test development system or from the examination system. Without an on-line help facility, a system such as MicroCAT could not practically be used to deliver and administer standardized tests to thousands of examinees each year. To use the MicroCAT assessment System, the test data must have been based on tests which were generated only by MicroCAT's specifications. Furthermore, MicroCAT does not provide security for examinee performance files nor does it provide integrity features to guard against power interruptions and the like.
To accommodate standardized tests in computer based testing, there is a need for a comprehensive computer based testing system which provides flexible test development and production, test administration and test delivery, as well as preprocessing and postprocessing of item statistics and examinee performance. Such a system should incorporate data integrity features, including system failure recovery and data security features. The design should be modular and extensible so that substantially every hardware and software component can be modified or replaced without affecting the functioning of the remainder of the system.