Field
This disclosure relates to text parsing of complex graphical images.
Description of the Related Art
Optical character recognition (OCR) is a concept that is used to derive text from non-text-based document formats. These types of formats include portable document format (PDF) documents or image document formats (such as JPG/JPEG, GIF, PNG, TIFF, and other similar formats). Text-based documents store text data as text, whereas these non-text-based formats may store a series of pixels, or arrangement of images that have no direct relationship to the text that those pixels or images represent. OCR can “look” at an image and derive the most likely words that are shown in a given image.
One particular context in which OCR has been little-used is in architectural or plan drawings for large-scale building projects. The drawings are much larger (in scale) than a typical 8.5″×11″ document and text is arranged in atypical places, in various sizes, and often with limited information pertaining to its meaning. Only one skilled in viewing plan drawings or architectural drawings may quickly ascertain the meaning of a particular aspect of the drawing or drawing text. Because of this, traditional OCR principles seldom apply.
Unfortunately, throughout the design process, and sometimes into construction, plan documents are updated to reflect changes. In the absence of some real-world to computer integration, identifying the updated plan document for a particular location, floor, aspect, property, or the like can be difficult to do using a computer. As a result, an individual typically must cipher through each individual plan document to properly identify which aspect of a design project is identified in the document or which document is updated by the new document. This process is tedious, time-consuming, and not particularly valuable to a project, but is required if those using any computer-based system to review or provide comments on the project are to have any input in a reasonable, timely, and organized fashion.
Throughout this description, elements appearing in figures are assigned three-digit reference designators, where the most significant digit is the figure number and the two least significant digits are specific to the element. An element that is not described in conjunction with a figure may be presumed to have the same characteristics and function as a previously-described element having a reference designator with the same least significant digits.