The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.
Searching text documents using a search engine is well known. Searching takes place using the text within the document to identify relevant search results.
Often times, businesses receive a large quantity of non-text-based material. The non-text material may be referred to as image text documents if the image contains a number of words. Image text documents are images of words but the words are not searchable using a search engine. The text appears as a picture or image only and, thus, cannot be searched. The image text files may originate from various sources including faxes, forms, charts, diagrams, pictures, and the like. Often times, metadata may be stored with the image to help identify the image. The metadata may contain various titles, key people, or the like. The metadata may be manually entered for each document.
Audio files are another way in which a business may receive or store material. Audio files may be generated in various ways from voicemails and audio tracks of videos. Typically, businesses have no way to search audio files or the content therein.
When searching image text documents, only the metadata itself is searched. The search results may have limited accuracy due to the limited nature of the metadata. Thus, the documents found in the search may be limited because there is not enough identifying information in the metadata. Consequently, desirable information may be overlooked. Therefore, it is desirable to improve the results of search engines when faced with image text documents.