The invention relates to ordering groups of text in an image.
Paper documents can be scanned and stored as images in a computer. Text recognition techniques, such as optical character recognition (OCR), can then be used to convert text in these images to a computer-editable format, such as ASCII characters. Scanned images can contain text organized in multiple, distinct blocks (e.g., multiple columns of text, headlines, captions, footnotes, footers). The text blocks may further be separated by relatively large areas of blank space and graphical objects (lines, pictures, and so forth). Text can also be surrounded by a frame or contain insets, which further separate the text into blocks. Although a person reading the page may be able to recognize the proper order of the text blocks in the image, it may be difficult for an OCR program to identify the text (by discarding the non-text components such as blank spaces and graphical objects) and then group the text into the proper reading order.