1. General Background
Traditional Optical Mark Reader (OMR) systems require mark-sense forms that are specially designed with forms-creation software and that provide accurate printed registration marks to locate the response bubbles containing the data. These forms must be printed using high-end printers on high-quality paper that will not deform. Special ink colors that are transparent to the scanning process may have to be used. As a result, forms can cost the user from $0.25 to $1.00 per page.
The marking of forms must be done with a specified marking device, such as a #2 pencil. Marking areas, called response bubbles, must be filled in accurately. Finally, the forms must be read by highly accurate, specialized OMR scanners.
Current plain-paper OMR technology is not bound by these restrictions. It uses pattern recognition technology to automatically find response bubbles without the use of registration marks and to determine whether or not the response bubbles are marked.
There are two types of commercially available plain-paper technologies in use today. One prints registration marks on the forms. The other technology does not require the use of any preprinted registration marks but instead depends upon the location of the response bubbles for registration. Using pattern-recognition technology, both types automatically register the form, find response bubbles, and determine whether or not the response bubbles are marked. This technology allows mark-sense forms to be designed using standard, commercially available word-processing or graphics packages and to be printed by any quality printer on most papers. Any pencil or pen may be used to mark the forms, and the forms can be read using any reasonable quality off-the-shelf image scanner.
However, current plain-paper technology requires that all response bubbles on the form, whether filled in or not, be located. Response bubbles may not be recognized if they are damaged, missing, erased, or mismarked. If a current plain-paper OMR system cannot find a response bubble, the field containing that response bubble is treated as an exception. This creates an error that requires manual intervention to correct. What is needed is a plain-paper OMR system that can recognize and process forms even if response bubbles on the form are damaged, missing, erased, or mismarked. The present invention addresses this need via inferential self-registration.
Using enhanced pattern recognition techniques, inferential self-registration can infer the locations of not well-formed response bubbles due to erasure, white out (whiteout), or other causes. It treats these not well-formed response bubbles as unmarked response bubbles and does not invalidate the zone. As a result, forms analysis is much more complete and accurate. Inferential self-registration retains all of the advantages of current plain-paper OMR systems, such as the use of standard paper, off-the-shelf printers, standard commercially available image scanners, and common marking instruments. In addition, it solves the problem of not well-formed response bubbles. Test or survey takers can erase or white out response bubbles. Using inferential self-registration, response bubbles that are damaged by paper handling or printer faults (response bubble faults) become transparent to the recognition process, and the form can be completely and correctly analyzed without costly manual intervention.
2. Additional Background on OMR Technology:
Optical mark reading (OMR) is used extensively in education, market research, government, and other areas for testing, surveying, and many additional uses. OMR forms, specially prepared for each use, contain specific areas (such as circles), sometimes referred to as response bubbles, that can be filled in by respondents in response to questions. FIG. 1 shows a typical prior art OMR form. Optical mark recognition (OMR) equipment is used to read the respondents' answers that are marked on these forms in order to grade tests, to summarize surveys, or to fulfill whatever purpose for which the form was designed.
As discussed above, traditional OMR technology requires these forms to be highly accurate and to be preprinted with registration marks in order to be properly read. The forms must typically be printed in “drop-out” inks so that response bubble outlines and form text are not visible to the scanning equipment. As a consequence, these forms typically cost the user from $0.25 to $1.00 per page. Furthermore, the forms must often be filled out with a specified writing implement, such as a #2 pencil; and they must be read and be processed by specialized OMR scanners.
Currently available plain-paper OMR systems provide an OMR capability that is not bound by these restrictions. Using pattern recognition techniques, this technology allows ordinary word-processing and printing products to be used to prepare these forms; and common pencils or pens may be used to mark the response bubbles. The forms can then be read by any image scanner. As a consequence, users can achieve significant savings when using plain-paper mark sensing for testing, for the taking of surveys, and for any other use to which they may put OMR technology.
However, as discussed above, current plain-paper OMR technology suffers from a major problem, namely, that it is not able to handle response bubbles that are not recognizable due to erasures, whiteouts, printing faults, mismarkings, or many other reasons. If a response bubble cannot be found, the field containing that response bubble is reported as an exception and must be corrected manually—a costly and time-consuming process.
In summary, there are two known forms of OMR systems:    1. Traditional OMR: Traditional OMR technology uses specialized, expensive OMR scanners to scan and recognize mark-sense forms specially designed for use with OMR scanners. The forms require exact offset printing, the use of timing marks and registration marks, drop-out inks, must be completed using #2 pencils or certain color inks, and usually cost between $0.25 and $1.00 each.    2. Plain-Paper OMR: Plain-paper OMR technology uses pattern-recognition software to process and recognize images of mark-sense forms scanned by any common image scanner. Plain-paper forms can be designed anywhere and printed inexpensively on common printers and copy paper. Some plain-paper OMR software require the use of form registration marks, and some do not. When form registration marks are not used, the response bubbles themselves are used to register the page.
Before describing inferential self-registration technology, current mark-sense technology is reviewed.
Mark-sensing is a technique for automatically processing forms filled in by respondents. Mark-sense forms are quite flexible in their design. As a general rule, they contain specific areas, or response bubbles, in which a user can make a mark in order to answer a multiple choice question (FIG. 2, box (a)), to enter a numeric or alphabetic value (FIG. 2, box (b)), or to respond to any query which can be answered by one or more marks in predetermined areas.
In addition, a mark-sense form may contain bar codes and areas for written responses. These sources of information are not considered in this description of optical mark reading, which deals only with the reading of the marks within response bubbles.
In some cases, questions are integrated with the mark-sense areas on the same form (FIG. 2, box (a)). In other cases, numbered questions may be contained on a separate form, and the mark-sense form contains only the areas in which to mark the answers to the question (FIG. 2, box (c)).
Traditional OMR technology is quite restrictive. Because of the precision with which forms must be printed, they generally must be purchased from a forms manufacturer at a cost of $0.25 to $1.00 each. Forms must be marked with prescribed marking implements, such as a #2 pencil, and forms that have been marked by respondents must be read via specialized OMR scanners, such as the Scantrono ScanMark series, available from Scantron Corporation, Irvine, Calif., which attempt to recognize the marks made by each respondent. The marked response bubbles for each form are recorded, and this information is acted upon by an appropriate processing program to grade tests, to summarize surveys, or to provide whatever other processing function is required by the application.
To facilitate accurate scanning of mark-sense forms, traditional systems require the use of precise registration marks on forms in order to identify the rows and/or columns, as shown in FIG. 1. The registration marks are used to determine the positions of the mark-sense response bubbles. By using these registration marks, the OMR scanner can accurately determine the locations of mark-sense response bubbles and can determine whether or not those response bubbles have been marked.
Limitations of traditional OMR technology include the following:    1. Forms must be created by using special OMR forms creation software (typically performed as a service at a high cost).    2. The form registration marks must be located in a predefined location (along the left border of the page), or the form cannot be recognized. The OMR scanner hardware expects standardized registration marks.    3. The forms must be printed to exacting tolerances. Small variations in printing can render the form useless since the OMR scanner won't be able to locate the timing marks on the page or the individual mark locations.    4. Forms must be filled out with #2 pencils or specific pen colors (if an ink read head is used).    5. Forms must be scanned with the timing marks on the left. If the forms are rotated, they cannot be recognized.    6. Forms cannot be skewed or offset when scanned. Traditional OMR scanners cannot recognize forms that are skewed or offset during the scanning process.    7. Traditional OMR scanners usually cannot recognize forms that have writing or extraneous marks in the timing mark area.    8. Form compression or expansion due to humidity or other environmental factors can render traditional OMR forms unreadable due to the OMR scanner's need for precise positioning.    9. Error correction is difficult since there is no image of the form with which to compare the recognized result.    10. Both the OMR scanner and the forms that they require are very expensive.
Plain-Paper OMR Technology: Advances in OMR technology have led to the capability of using ordinary printers, plain paper, and standard inks, thus reducing significantly the cost of OMR forms. Many of the OMR products today that use plain-paper forms require page registration marks that can be used to logically align the page, as shown in FIG. 3. In addition to determining the locations of response bubbles, these registration marks are useful to eliminate skew and to compensate for document compression or expansion.
Furthermore, expensive, specialized OMR scanners are not needed. These plain-paper forms can be read by standard off-the-shelf image scanners that provide graphical data that can be processed to identify the marked response bubbles.
However, as is the case with row and column registration marks, page registration marks as shown in FIG. 3 take up critical space on an OMR form (consider the density of response bubbles in FIG. 1). Furthermore, users often do not like the aesthetics of forms with registration marks. They appear antiseptic and not user-friendly. Later advances led to scanning techniques that do not require the printing of any registration marks on the OMR form (FIG. 4). Rather, pattern recognition techniques are used to identify response bubbles and to determine whether a response bubble is marked or not. In effect, the expected locations of the response bubbles are used as implicit registration marks. An example of such a product is Remark Office OMR®, Version 6, available from Gravic, Inc., Malvern, Pa.
These advances in OMR technology have led to many advantages over traditional OMR products. Mark-sense forms may be created with any software package, such as Microsoft® Word, may be printed on plain paper by any quality printer, and may be read by any image scanner (FIG. 5). Form creation is more flexible, and the cost of forms is dramatically reduced. The printers and scanners required are much less expensive. Furthermore, the range of marking instruments that may be used is greatly extended.
It is imperative that the image scanner can accurately locate the mark-sense response bubbles so that it can correctly determine whether or not each response bubble is marked. This is complicated by many factors:    1. One set of forms may be printed by many printers and may therefore differ slightly from one another.    2. The forms may be read by different scanners. The dimensions determined by one scanner may be different from those determined by another scanner.    3. The form may be skewed by the printing process or by the scanning process so that rows are not exactly horizontal or that columns are not exactly vertical.    4. The skew may not be linear. It may be curved slightly, caused, for instance, by the scanning of the document from a single point that sweeps the document.    5. The form may be slightly compressed or expanded by the printing and/or scanning processes or by paper deformation due, for instance, to humidity.    6. There may be partial or invalid response bubbles.
The pattern-recognition techniques used in current plain-paper OMR generally solve these problems except for the last one. A major problem with current plain-paper OMR products is that there may be partial or invalid response bubbles. Response bubbles may be not well-formed due to erasures, whiteouts, printer faults, and so on. Current plain-paper OMR technology requires that all response bubbles, whether filled in or not, can be located and recognized. This is needed to ensure proper registration and accurate reading of the OMR form. If any one response bubble on the form cannot be located, an error is generated and must be corrected manually.
What is needed is a method to automatically detect and compensate for the above problem of not well-formed response bubbles. The present invention, hereinafter called “inferential self-registration,” or ISR, provides this capability. As a result, marked forms with missing or damaged response bubbles can be read and analyzed without manual intervention, thus saving significant analysis cost and time.
Inferential self-registration uses pattern-recognition techniques more advanced than those used for current plain-paper OMR to deduce the locations of mark-sense response bubbles more easily and quickly without the need for registration marks or special drop-out inks. In addition, it is not necessary to be able to locate or identify every response bubble on the OMR form. Only the identification of enough response bubbles to guarantee accurate registration is required.
3. Additional Background on Plain-Paper OMR
Before describing inferential self-registration technology, it is important to understand in more detail contemporary plain-paper OMR technology. This more detailed description of plain-paper OMR technology follows.
A. Forms
Using plain-paper OMR technology, forms can be designed with any word processor or equivalent facility and can be printed on any printer of reasonable quality. Any font may be used, and the form may be printed with normal black ink. The requirements for a mark-sense form usable by plain-paper OMR include the following:
A1. Response Bubbles
The response bubbles are the areas in which marks can be made. Any enclosed shape that can indicate the area in which to place a mark can be used for a response bubble (FIG. 6). Response bubbles may be constructed graphically, or they may be one or more standard textual characters entered via a word-processing program or other package used to create the form. For instance, an Arial capital O would form a perfectly adequate response bubble. Response bubbles can contain a letter or a number within the confines of the response bubble to aid the respondent in choosing the proper response bubble to mark. Such a character should be small enough and light enough so that it is not mistaken as a mark. Such response bubbles can be constructed graphically, or they may be provided via a specialized font.
A2. Grids
Response bubbles on a mark-sense form are arranged as one or more fully configured grids. There can be any number of rows and any number of columns in a grid. The rows and columns do not have to be evenly spaced. However, there must be the same number of response bubbles in each row and the same number of response bubbles in each column, and the response bubbles must be arranged in a rectangular grid (FIG. 7, part (a)). If it is desired to have a partially formed “grid” which is not a fully formed rectangular grid, it can be defined as multiple grids (FIG. 7, part (b)).
A3. Zones
A zone is an area on the mark-sense form that contains a grid of mark-sense response bubbles. Any number of zones may be on the form (FIG. 8). Different zones may be configured differently with grids of different sizes and spacings. In plain-paper OMR systems, there cannot be any text within a zone unless it is printed with a drop-out color (a color that is invisible to the image scanner), as that would confuse the OMR software when it tries to find marks. However, there may be text or images in other areas of the form so long as they do not overlap a zone. Good form design for plain-paper systems requires that there be an adequate margin between a zone and any neighboring text or images so that skew or other form distortion does not cause the OMR software to read text or images as being in the zone. For instance, if it is determined that skew can cause the form to be misregistered by up to ⅜ of an inch, there should be a margin of at least ⅜ of an inch between each zone and any neighboring printing.
B. Mark-Sense Response Bubble Identification
Given a scanned image of a mark-sense form, mark-sense response bubbles are identified using pattern recognition technology. The image scanner has reduced the mark-sense form to an array of closely spaced pixels (typically, 200 per inch). Each pixel has a binary value. Its value is “zero” if it is white or “one” if it is dark (due to printing or marking). Response bubbles are typically 10 to 14 points in size. “Point” is a typesetting measure. There are 72 points to the inch. Therefore, at a scanning resolution of 200 pixels per inch, each response bubble will be represented by an array of approximately 30×30 to 40×40 pixels.
Given the zone area that should contain the response bubble grid, that area is searched to determine if there is a shape within it that could be a response bubble. For instance, the area can be searched with a rectangular scan, looking for pixels in each row of the pixel array. A valid response bubble will contain a set of marked pixels in the shape expected (FIG. 9, image (a)). The marked pixels will all be connected. If a set of pixels does not conform to the expected response bubble shape, it will be ignored (FIG. 9, image (b)). If a valid response bubble is detected by this or other pattern-recognition means, its approximate center is determined. The center (noted by the X in images (a)-(c) of FIG. 9) is used as the position measurement for the response bubble. If a response bubble is marked (such as by filling it in with a pencil), the inner pixels of the response bubble will be noted as marked (binary values of “one”). If more than a certain percentage of the pixels within a response bubble are marked, the response bubble is determined to be marked (FIG. 9, image (c)). This percentage is a parameter that can be set by the user.
C. Template Editor
Before processing marked forms, the system must be “taught” the layout of the form. This is done by presenting a template editor with a scanned image of an unmarked mark-sense form (the template) to be read. The template editor is a software package that allows the user to describe the format of the mark-sense form.
C1. Zone Identification
One way to define the mark-sense form with the template editor is to present the scanned image of the template to the user via a display. The user selects a zone on the page by dragging a rectangle around it (a technique used by contemporary graphic packages to select a group of objects). The user then enters the number of rows and the number of columns of mark-sense response bubbles in the zone.
C2. Zone Validation and Measurement
Following zone identification, the template editor must verify that the grid within the zone is correctly dimensioned according to the user-entered parameters. It must determine the exact position of the grid on the template, and it must determine the inter-row and inter-column spacings of the grid within the zone. Using the response bubble identification logic described above, the template editor searches the zone for response bubbles by scanning vertically and horizontally within the zone. It verifies that the number of response bubbles that it has found matches that entered by the user and then records the grid position and the distances between the centers of the response bubbles in a row and the centers of the response bubbles in a column.
The user further describes the zone by giving it a name, by optionally giving each column and each row a description, by noting whether the answers to a question in the zone are vertically or horizontally oriented, by specifying whether each column and each row allow multiple choices, and by providing other information common to OMR processing.
At this point, the OMR system has the information that it needs to read marks in this zone. The user then selects the other zones on the template and proceeds as described above to completely define the format of the entire mark-sense form to the system.
D. Scanning
When a set of mark-sense forms are ready for processing, they are scanned into the system by any appropriate image scanner or digital camera. The scanner need only be one with a sufficiently high resolution and may present its results in any standard graphic form, such as pdf, tif, jpeg, or gif. Ideally, the same scanner, or the same make and model scanner, as that used for scanning the template should be used. However, this is not a necessary requirement.
E. Response Bubble Marking
With plain-paper OMR technology, there is no need to require that a certain marking device, such as a #2 pencil, be used to fill the response bubbles. Any pencil or pen may be used so long as the mark is dark enough to be imaged by the image scanner.
F. Auto-Alignment
If a different scanner is used to scan mark-sense forms than that used to create the form template, image distortion could affect the recognition of marks on these forms. In this case, the plain-paper OMR technology can be used to create a new, more accurate, template.
To do this, the template form is scanned via the image scanner to be used, and the resulting image is entered into the system. The template that was previously created can now be used as a starting point to locate the response bubbles in each zone. If the positions of these response bubbles differs from that shown in the original template, a new template is created using the updated positions. In this way, a new template matching the characteristics of the current scanner being used can be automatically created without all of the detailed graphic manipulation and data entry required to create the original template.