Electronic documents come in many different forms. For example, an electronic document may be produced by a scanning machine or electronic camera which converts an optical image to a computer readable file format. The optical image can be of a paper document, whiteboard, chalkboard, billboard, or even an electronic display screen. An electronic document may be produced by a tablet having a touch sensitive screen that receives handwritten input from the user's finger or a stylus and is then stored by the tablet in a computer readable file format. Computer readable file formats include PDF (Portable Document Format), JPEG (Joint Photographic Experts Group), GIF (Graphics Interchange Format), TIFF (Tagged Image File Format), PNG (Portable Network Graphics), and other formats that store bitmap images, as well as other formats.
The computer readable file format of an electronic document may be converted to a form that facilitates distribution of the essential content of the original electronic document via email and other electronic means. Conversion may be to a form that facilitates editing, such as in a basic text editor or word processing program. For example, optical character recognition (OCR) could be used as part of the conversion process to produce machine-encoded text which can later be searched and/or manipulated.
As shown in FIG. 1, image 10 of an electronic document may include bulleted list 12. Bulleted list 12 is a list in which one or more objects are introduced or led by a particular typographic symbol, referred to as a bullet. The bullet may take one of various forms, such as an asterisk (*), hyphen (-), plus sign (+), equal sign (=), and others. Other types of bullets include, without limitation, filled and non-filled circles, triangles, squares, and diamonds. In FIG. 1, each bullet 14 is a filled circle. Each item within bulleted list 12 is referred to as bullet item 16. Each bullet item 16 may include one or more item objects, which can be any one or a combination of text, photographs, pictures, and other graphical representations. A bullet item may contain multiple lines of item objects. For example, a bullet item may contain a sequence of words and/or other objects in multiple linear arrangements (lines) that form a paragraph-like structure. In FIG. 1, bulleted list 12 has ten bullet items 16 with each bullet item 16 containing handwritten text arranged in a single line.
Bulleted lists may have a hierarchical structure having multiple levels defined by indentations. In FIG. 1, bulleted list 12 has two indentation levels. Items 1 through 9 were originally intended by the author of the bullleted list to be in the same indentation level, designated as first indentation level 18. Subitem 1 was originally intended to be in the next indentation level designated as second indentation level 20.
Image 10 does not encode the hierarchical structure of bulleted list 12. In particular, image 10 does not encode aggregated indention levels, in that pairs of bullet items 16 having different horizontal positions in image 10 are not identified as being alike in indentation level. Since image 10 does not encode the hierarchical structure, items within bulleted list 12 cannot be easily edited. For example, image 10 may be a bitmap image in which bitmap image data for bullet items 16 are not grouped together according to the hierarchical structure that was originally intended by the bulleted list author and that would be apparent to a person looking at image 10. Conversion of image 10 could be performed if a person (a user) wants to easily delete or add a bullet item, rearrange the order of bullet items, or change the indentation level of a bullet item. However, conversion should accurately encode the hierarchical structure that was originally intended by the author of the bulleted list. Here, “accurately encoding” refers to accurately identifying pairs of bullet items 16 having different horizontal positions in image 10 as being alike in indentation level. If conversion does not accurately encode the hierarchical structure that was originally intended, the user will have to modify the converted bulleted list in order to match what was originally intended.
As shown in FIG. 1, it is possible for many bullet items 16 to be misaligned even when they are all originally intended to be in first indentation level 18. Misalignment occurs when the horizontal position of a bullet differs from that of another bullet within the same indentation level. As shown in FIG. 1, the difference in horizontal position can be progressive such that the horizontal distance from the first (top) bullet increases with each successive bullet within the same indentation level. This phenomenon, referred to as “progressive shifting” herein, frequently occurs when the bulleted list is created on a large surface, such as a whiteboard. Progressive shifting can make it difficult to accurately encode the originally intended hierarchical structure.