1. Field of the Invention
The present invention relates generally to a document editing apparatus and method, and more particularly, to a document editing apparatus and method for recognizing a printed document and storing the printed document in a similar way to a configuration of the printed document.
2. Description of the Related Art
An image character recognition device using a camera performs recognition by capturing an image of a printed document. In this case, a user may want to store the entire document including text rather than recognized text itself. According to this request, a captured document image is recognized by using various character recognition algorithms and converted to text data, thereby storing a document recognition result. The text data generated by the document recognition is processed in the form of a document file preset by the user and stored in a memory. The document file is stored in the form of a text file.
In general, a document is divided into more than one area, and characters included in a corresponding area of the divided areas are first processed. Accordingly, a sequence of character strings may be changed according to a configuration or a multi-paragraph type of a document image, and sentences of different paragraphs may be mixed. These changes may be significant enough to prevent understanding of the document based on the recognized text. Thus, when a recognition result is stored, recognition and storing of text and character strings of the entire document, instead a small number of characters, are a key consideration. Therefore, it is important to store the meaning of paragraphs without distortion.
Recently, the development of image processing technology and character recognition technology has significantly increased the possibility of character recognition on a somewhat deteriorated image. However, when a document divided into a picture area and a text area is recognized and stored, characters included in the same area are preferentially stored through an area analysis corresponding to a recognition pre-processing process. Conventionally, since only text data is simply stored after recognizing a text area, capabilities for storing various media documents through recognition is decreased. Thus, when a picture area, such as a picture, a graph, and/or a table, is included in a document which a user desires to store, the picture area may be misrecognized, thereby storing misrecognized characters.