The proliferation of imaging technology, combined with ever increasing computational processing power, has lead to many advances in the area of automated document analysis. A significant proportion of office documents are generated using structured text/graphics editing applications such as Microsoft™ Word™, Microsoft™ Powerpoint™, amongst many others. In addition to formatted text editing, these text/graphics editing applications include basic drawing tools and options for the drawing of graphics shapes and the like. An important class of document analysis applications are referred to as “scan-to-editable” applications. These applications process a scanned bitmap representation of a document to generate an electronic version of the document that can be viewed and edited using such editing applications.
Drawing options in a typical structured text/graphics editing application include freeform line drawing, template shapes and connectors (i.e., dynamic line objects that connect to and/or between template shapes within a document). Some shapes and lines have one or more control points that permit user modification of the shape or line through manipulation of the control point(s) on a graphical user interface. The text/graphics editing applications may also include coloring, filling, layering and grouping options for sets of objects. Many commonly used geometric shapes can be created using template shapes. A user may prefer to use a template shape rather than drawing the shape using freeform lines as this option can be faster, more accurate in terms of representation of the desired shape, and easier to edit at a later time. The well known Microsoft™ AutoShapes set includes a number of examples of template shapes which can be manipulated within editing environments such as Microsoft™ Word™ and PowerPoint™. Other template shapes may be found in OpenOffice.org™ editing applications such as the Writer™ and Impress™ applications.
Line detection is a vectorisation method used in image processing and, in particular, document scan processing. There are a number of methods used for vectorizing bitmapped images including thinning methods, distance based methods, contour matching methods and Sparse Pixel Vectorization (SPV). Most of these methods require direct processing of pixel data during line detection. Line detection typically occurs at an early stage of processing an image when performing shape/line analysis on the image. Unfortunately, line detection methods frequently fail at sharp corners or curves. Such sharp corners and curves are common features in template shape objects used in “scan-to-editable” document analysis applications. As a result, shape matching can often fail.