Flow format documents and fixed format documents are widely used and have different purposes. Flow format documents organize a document using complex logical formatting objects such as sections, paragraphs, columns, and tables. As a result, flow format documents offer flexibility and easy modification making them suitable for tasks involving documents that are frequently updated or subject to significant editing. In contrast, fixed format documents organize a document using basic physical layout elements such as text runs, paths, and images to preserve the appearance of the original. Fixed format documents offer consistent and precise format layout making them suitable for tasks involving documents that are not frequently or extensively changed or where uniformity is desired. Examples of such tasks include document archival, high-quality reproduction, and source files for commercial publishing and printing. Fixed format documents are often created from flow format source documents. Fixed format documents also include digital reproductions (e.g., scans and photos) of physical (i.e., paper) documents.
In situations where editing of a fixed format document is desired but the flow format source document is not available, the fixed format document may be converted into a flow format document. Conversion involves parsing the fixed format document and transforming the basic physical layout elements from the fixed format document into the more complex logical elements used in a flow format document.
Some East Asian languages may be written horizontally or vertically. For example, Chinese, Japanese, and Korean scripts (sometimes referred to herein as CJK scripts) may be oriented in either a horizontal or a vertical direction. In some cases, vertically written text may include horizontal-in-vertical text, where multiple characters may be displayed horizontally in an area reserved for one vertical character. Currently, when converting a fixed format document with vertical text to a flow format document, vertically written text, including horizontal-in-vertical text, may not be recognized, and thus may not be reconstructed correctly.
Additionally, in both horizontally and vertically written East Asian scripts, a reading aid, herein referred to as ruby text, comprising characters indicating pronunciation of a word, may be included. In horizontal text, ruby text may be placed above a line of text, while in vertical text, ruby text may be placed to the right of a line of text. Currently, when converting a fixed format document to a flow format document, ruby text may be recognized as regular text flow and may not be reconstructed correctly and associated with a corresponding base text.
Further, various East Asian languages share a range of Unicode values whose graphical representations may depend on a particular font being used. Accordingly, when restructuring a document written in an East Asian language (e.g., Chinese, Japanese, or Korean), determination of a particular language may be performed to provide an appropriate font for the language.
It is with respect to these and other considerations that the present invention has been made.