A “scanned book” is an electronic book obtained through scanning a paper book using a scanner or the like. Each page of the scanned book corresponds a scanned image with higher DPI (Dot Per Inch). The amount of data needed to represent a scanned image is usually large, and thus, it is challenging to store and transmit the scan data. Moreover, data of the scanned pages may not be readily utilized, for example, for text copying, layout organizing of the documents, etc.
To enable text copying, a double-layer page technology has been proposed, in which a transparent layer is overlaid on the scanned image, and transparent words or characters obtained using OCR (Optical Character Recognition) are overlaid on corresponding locations of the transparent layer. As a result, the transparent words or characters can be copied without impacting the original page structure of the scanned book.
Although this method may enable text copying from a scanned book, the words or characters on the transparent layer can still not be used for more advanced applications such as changing the layout. As a result, a large amount of data in the scanned books still cannot be re-arranged. Accordingly, there is a need for a method and an apparatus for processing the data of the scanned book, to allow re-arrangement of the layout of the scanned book.