This disclosure relates generally to machine searches of a set of documents. More particularly, it relates to an improved search system and method for holistically indexing text and visual information in a set of documents.
There is a plethora of information available on the Internet, and in private data networks, covering every subject imaginable. It is a challenge to find the desired information within the bounty of available information. To that end, there are many search engines which provide search results to the user on a requested topic. Most of the search engines are text and key word based, and retrieve the search results based on previously indexed information. Text based search engines only index and search the textual content of a document, ignoring the visual elements such as images. Image based search engines can search images based on the metadata associated with the image or on visual patterns of the images, but typically ignore the surrounding page in which the images are found during the indexing and searching processes.
There remains a need for improved search and indexing mechanisms to locate information on the Internet, and other document based databases.