1. Field of the Invention
This invention relates to the general field of optical scanning to collect data and digital images from documents, and within that field to enhanced methods and apparatus for locating, identifying and interpreting such data and images. Although the methods and apparatus described herein may be useful in other related tasks, the most common usage is likely to be in the processing and scoring of standardized assessment tests.
2. Description of the Prior Art
Optical and electronic technology has provided methods of effectively and efficiently collecting data in digital electronic forms. To avoid the costs in time and money of manual data entry, various methodologies have been developed to capture data directly “on-line”. Nonetheless, much quantitative data are still collected by being written on paper documents. In the field of education, students are routinely asked to answer test questions on paper. Commercial institutions routinely use paper forms such as application forms, claim forms, change of address forms, assignment forms, and many others. Businesses survey customer satisfaction and employee attitudes by having them respond on paper documents. Publicly held companies send stockholders paper forms for voting. Other elections and lotteries use paper documents.
In order to avoid costly and sometimes inaccurate key-entry of data from the paper forms to electronic format, automated systems have been designed to convert the marks on the form directly into electronic data. Some applications have been highly successful, notably in the processing of lottery tickets and the processing of student test responses, especially in the area of standardized tests. For both of these applications, however, specialized machines and specialized paper must be used. Machines for reading lottery tickets are highly specialized and have little potential for applications outside their intended area. In the area of standardized tests, specialized “OMR” (Optical Mark Recognition) systems scan data from the student response forms. These “OMR scanners” are systems with hardware enabled resolution of OMR marks, and typically require that the response documents be produced with extremely tight tolerances for the paper, the inks, and precise location of the printing on the forms.
For broader applications, two optical reading approaches have been taken. The first is based on using OMR scanners and forms printed specifically for OMR systems. Customer satisfaction surveys and election ballots have been processed using such OMR technology. Because of the high precision of the forms and the machines, these applications provide accurate conversion of respondent marks into data. Although early OMR systems required that the respondent use a “#2 pencil”, this was largely overcome by visual spectrum read mechanisms which can recognize blue or black pen, ballpoint, marker, and/or pencil as being intended marks. Two drawbacks remain, however; the forms are expensive to produce and they often “look like a test”.
The second approach, used as an alternative to OMR systems, is based on scanning systems designed for document storage and retrieval. In such systems, a digitized image of the document is captured through the use of an “image scanner”. Although some image scanners can capture shades of gray ranging from pure white to black (gray-scale scanning) and others can capture shades of color (color scanning), data collecting systems typically convert the scanned image of the document into a bi-tonal (black and white)image in which each location or “pixel” within the image is presented as on or off. Data elements are then extracted from the bi-tonal images.
Such bi-tonal imaging systems generally have some means to locate specific areas of a form, and then extract data from those areas. The extraction may include conversion of printed barcodes into text or numeric data (barcode conversion). It may include conversion of machine printed characters into text (Optical Character Recognition, or OCR). It may include writing the entire image or part of the image to one or more computer files including data files which are part of a database. It may include converting areas containing handprinted text characters into text (Handprint Recognition). It may include converting areas containing discrete response areas into data (a form of OMR).
The term “scanning” has many different technological meanings. In this context, “scanning” refers to using optical scanning methods to extract data from sheets of paper. Scanning can be accomplished with specialized hardware systems that capture data from sheets using OMR. In such hardware-based OMR systems, a specialized “read head” is used to recognize and record the presence of marks in predetermined locations as the paper progresses through the machine and passes the read head. Scanning data from sheets of paper can also be accomplished through the use of image scanners.
The following sections identifies problems that are associated with locating marks in the scanned data, discriminating between intended and unintended or spurious marks, and the requirements for precise and expensive forms.
Locating a Target Response Area:
Using OMR as an example, an “OMR response target” is a location on a form indicating where to make a response mark for OMR processing. The target is normally, but not necessarily, indicated by a preprinted circle, box or other shape on the form. A major task for any scanning system to process OMR marks is to locate each response position in the image. While the printed target show the respondent exactly where to make a mark to have it registered as an intended response, there must be a means for the scanning system to locate in the image the area corresponding to the target in order to examine that area for the presence of an intended response.
Locating the Target with Traditional OMR:
In hardware-based OMR scanners, all response positions are located in a fixed location along a vector at right angles to the movement of the paper. With standard 8½ by 11 inch documents, the paper moves along the 11 inch dimension, so that the fixed positions are across the 8½ inch dimension. One edge of the paper is designated as the “reference” edge. As the sheet of paper is transported through the machine, the reference edge abuts a guide or rail. The response positions are located in absolute dimensions from that guide or rail to define a row or line of response positions with the centers of the response positions aligned at right angles to the reference edge. Positions within the row may be referred to as the “columns” of the response positions.
Successive rows of response positions are arrayed along the reference edge. In some early machines, these rows are located by absolute dimensions from the top of the sheet of paper. More recently, however, the rows are located by a printed mark on the paper. A series of marks aligned with the reference edge of the paper locates the series of rows, resulting in a matrix of multiple rows and columns of OMR positions. Typically, the series of these marks are said to be located in the “timing track”, and each mark within the series is referred to as a “timing mark”. Instructions on how to prepare a form for traditional OMR are disclosed, if further reference is wanted, in Pluta, et al. (U.S. Pat. No. 5,664,076). A constraint on the creation of these forms is that the printed information on the forms must precisely match the specifications for the form.
In traditional OMR systems, a special ink is selected to print the targets. This ink absorbs little or no light in the spectrum processed by the optical system so that, if there is no other mark in an area printed in this ink, the system interprets the area as not having any mark. Some OMR systems use the infrared spectrum for processing since pencil lead absorbs much of the infrared light, while others use visual spectrum light.
For forms with timing tracks, a second, typically black, ink is used. This ink absorbs light in the appropriate spectrum and therefore can be recognized by the optical devices and processed by the system. This ink is used to print the timing tracks and other marks which may identify the type of form and/or the particular document.
A problem with traditional OMR is that the mark made by the respondent may not be recognized as an intended mark when the mark is not at the precise location being processed by the system. This mis-alignment of the mark to the process area can be caused by many factors. One possibility is that the respondent made the mark outside the target position, but mis-alignment can also be caused by discrepancies in the forms themselves, in the equipment processing the forms, or in the manner in which the forms move physically through the equipment.
The potential for mis-alignment errors is a systemic problem associated with the physical characteristics of the scanning system, as explained, for example in Koch, et al. (U.S. Pat. No. 4,857,715).
Traditional OMR scanners have a low tolerance for offset, misregistration, and poor print or paper quality in the forms. In particular, the timing tracks on scannable forms for OMR scanners must be printed to high standards of print quality and print alignment to insure that an acceptably high percentage of completed forms can later be properly scanned and scored.
The systemic problems of the OMR scanners may result in the target positions printed on the form to be shifted left or right of the locations or columns of locations being processed by the hardware. As a consequence, the OMR system does not “look” where the respondent was shown to make the marks, but looks instead to the left or the right, increasing the potential that an intended mark will be missed. This problem may be caused by the form pulling away from the guide or rail, or by the OMR device being misadjusted so that the reading mechanism shifts the location at which marks are processed closer to or further away from the guide or rail, or by printing the response positions on the form too close or too far from the reference edge. All of these cause the system to process OMR locations that are shifted left or right from the locations marked by the respondent.
A more insidious systemic problem occurs when the spacing between columns on the form is larger or less than the spacing processed by the hardware system. If this happens, some targets may be exactly aligned with the area where the hardware processes while others are shifted left or right, with the deviations increasing from one edge of the form to another. This can be caused by improperly printed documents, but more often is due to documents expanding or shrinking as a function of changing humidity and temperature. On individual forms, this can also be caused by stress, folding, or other manipulation of the form before it is scanned.
OMR positions printed on the form can also be systematically higher or lower than the locations processed by the scanning hardware. Since many OMR forms utilize track marks, this can occur when the track marks are not in exact alignment with the targets printed on the form.
Perhaps the most common and severe cause of mis-aligned response positions is when forms are “skewed” relative to the reading mechanism. This generally occurs when the reference edge of the paper is not perfectly parallel to the guide or rail. Even small deviations from parallel can move the response positions away from their correct position. The horizontal alignment of the response positions on the form correspond closely to the locations processed by the system only at the top of the form. As the reference edge of the form is further and further from the guide or rail, the horizontal alignment of the target locations and the locations processed by the machine are increasingly divergent. In a similar manner, the vertical locations of response positions adjacent to the timing tracks correspond closely to the vertical locations processed by the system, while response positions at the opposite edge of the form are increasingly displaced. The effects of the horizontal and vertical misalignment are cumulative, so that the misalignment is greatest for response positions at the edge of the form farthest from the timing tracks and at the end of the form farthest from the guide or rail.
Because of the extent to which skewed forms jeopardize the accuracy of the OMR process, most systems are designed reject forms that are not almost exactly parallel with the guide or rail. Typically, forms that are not parallel to the guide or rail cause the scanner to stop, or at least cause the form to be rejected, so that the form must be read again.
Similar to skewed forms, if the optical read head is not exactly perpendicular to the guide or rail, the vertical location of response positions adjacent to the timing tracks will correspond closely to the vertical locations processed by the system, while response positions at the opposite edge of the form will be in locations increasingly different from those processed by the system.
Solution to the above problems have been proposed by Kamada, et al. (U.S. Pat. No. 5,099,340) and by Kits (U.S. Pat. No. 5,291,592). Forms are printed with timing tracks on both edges of the paper so that the vertical locations of all marks are defined by the lines connecting corresponding marks on opposite edges. In a similar manner, timing tracks are printed at the top and bottom of each sheet so that the horizontal locations of all marks are defined by the lines.
Locating the Target with Bi-tonal Imaging Systems:
Some systems such as Keogh et al. (U.S. Pat. No. 5,134,669) utilize the timing tracks on traditional OMR forms to locate intended response areas in the scanned image. These systems are based on the general assumption that the digital image is an perfect representation of the paper form, such that locations in the form and image conform exactly. Such systems suffer from the potential problems with traditional OMR systems and cannot overcome skew, stretch, or other distortions of the document prior to or during the image capture process.
Others methods have been used to locate OMR positions within a bitmapped, bi-tonal image. Systems such as that exemplified in Shepard (U.S. Pat. No. 5,140,139) start with a template derived from an answer form so that the target boxes locating the response positions appear in the template in an ink that is read by the scanner. The template is scanned and the boxes located, and the coordinates of the boxes is stored in memory. The forms are then scanned and software is used to align the respondent's form with the template. Once the respondent's form is properly aligned, the OMR response positions are established by their known locations on the mask. This process suffers from the problem that, since the target is printed in an ink processed by the scanner, any marks made by the respondent on the target itself can not be used in determining whether there is an intended mark in the response area.
Still others processes, such as described in Reid-Green, et al. (U.S. Pat. No. 5,001,769) use a series of “reference marks” and then transform the image of the respondent's form to correspond to the expected image. The computer performs a bilinear transformation on each page. As a result, the stored data corresponding to all of the field coordinates is transformed, so that the computer knows where those coordinates are to be found in the data representing the scanned form.
Such a procedure effectively overcomes skew, stretch, and other systematic distortions of the form or the image of the form, but may require excessive computer resources to properly register the entire sheet.
Problems in “Reading” the Mark at the Response Position:
Another major task in processing OMR types of response positions is to determine whether to record or read what may or may not be a mark at the response position. This task is typically done through two processes: normalization and mark determination.
The first process, normalization, ensures that marks are read uniformly throughout the form. On some systems, such as shown by Apperson, et al. (U.S. Pat. No. 6,079,624), normalization is achieved by calibrating the scanner. One or more standard or reference sheets are read through the scanner, and manual or automated review is used to ensure uniformity and accuracy in the values read. This process has been used for both traditional OMR and for image scanning systems, such as shown in Sakano (U.S. Pat. No. 4,760,464).
On some systems, special hardware or software is used to examine each sheet and ensure uniformity over the entire sheet. This type of normalization procedure has the advantage of adjusting for the paper being read, i.e. if one sheet of paper is darker than another, such as might occur from lots manufactured at different times, the same mark might read differently on the two sheets if the normalization is only based on an initial calibration. It is also possible to use a normalization routine which utilizes an initial calibration followed by a refinement for each sheet.
Mark determination is the process used to make a determination of whether a mark at the response position is an intended response. Mark determination is best accomplished using the normalized readings.
Mark Determination with Traditional OMR:
Traditional OMR systems are set to recognize marks based on the extent to which the entire response position is covered with a complete, dark mark. The traditional OMR scanner determines the “darkness” of a mark at each defined position. Typically, the darkness is rated on a numeric scale from 0 to 15, with no mark at all stored as a 0 and a solid black mark stored as a 15.
Calculation of the “darkness” of a mark will generally be made for a response area with some pre-determined height and width. Often the width of the OMR area is determined by the light sensing hardware. The height of the OMR area can be determined by the width of the timing marks, by hardware or software settings, and/or by the characteristics of the light sensing hardware.
Mark Determination with Bi-tonal Imaging Systems:
For bi-tonal systems, detecting the presence of a mark in an OMR area is generally based on counting the number or percentage of black pixels within the area of a response position.
One problem with such systems deals with the establishment of the bi-tonal image. Smudges and apparent erasures are generally converted to white, while solid marks are generally converted to black. However, there may be no difference between a light pencil mark and a smudge or erasure, so that the conversion may be in error. Since there is no discrimination between a light mark and a dark mark once the image has been converted to a bi-tonal image, stray marks through a response position can look the same as intended marks.
When the bi-tonal image is created, in many cases the printed target will be converted to black, and will increase the apparent number of black pixels within the response area if not otherwise compensated. One method of treating this situation is to “mask out”, or change to white (i.e. no mark), pixels corresponding to the preprinted target. This process, however, decreases the number of pixels being utilized in determining the presence of an intended mark and removes entire sub areas within the area of the response position from consideration.