1. Technical Field
The present invention generally relates to the field of image analysis and in particular to comparing pages of text using image features based on word positions.
2. Background Information
Pages of text can be compared using image features. Image feature extraction is a well-studied problem. Many techniques perform well at point-matching across images and image lookup from a database. However, these techniques do not perform well on repetitive patterns, such as text in document images. In addition, these techniques extract thousands of features per image and match features using nearest neighbor search, which requires sophisticated indexing mechanisms.
Image features can make use of optical character recognition (OCR). Image features based on text that has been subjected to OCR processing typically work well. However, this technique requires good quality OCR, which is not always available and can be very costly. This technique is also language-dependent and performs poorly for certain languages (e.g., non-Western languages such as Chinese and Arabic).
Therefore, tasks that require comparison of document pages, such as search and retrieval, do not work as well as they should.