1. Field of the Invention
The present invention relates to content search and retrieval methods, and particularly to a method for retrieval of Arabic historical manuscripts that uses Latent Semantic Indexing.
2. Description of the Related Art
Typically, a Latent Semantic Indexing method (LSI) uses statistical techniques to model the way in which words are used in an overall collection of documents. In the resulting semantic space, a query can be similar to a document, even if they have no words in common. LSI is, thus, not dependent on any single word and might handle recognition errors robustly.
Large archives of historical Arabic manuscripts cannot be manually searched because of the difficulty of manual index construction, and also cannot be automatically searched, since they were stored in their original image forms. Optical character recognition (OCR) techniques are available, but due to the characteristics of the historical Arabic manuscripts and some content features, such as figures and drawings of manuscripts, OCR techniques may not yield satisfactory results due to feasibility issues. An alternative to OCR techniques involving a Contents-Based Image Retrieval (CBIR) system utilizing angular line feature extraction, concentric circle feature extraction, and similarity matching based on a variety of distance measures, as disclosed in S. A. Shahab et al., “Computer Aided Indexing of Historical Manuscripts”, Proceedings of the International Conference on Computer Graphics, Imaging and Visualisation (July 2006), which is hereby incorporated by reference in its entirety. However, there remains room for improvement in such a system.
Thus, a method for retrieval of Arabic historical manuscripts solving the aforementioned problems is desired.