1. Field of the Invention
The present invention is related to data mining and, in particular, to retrieval of information stored or located on remotely connected computers, e.g., over the Internet or the world-wide-web.
2. Background Description
The world-wide-web (web) includes a large number of publicly available images that graphically convey numerical information. These images may include things such as charts, graphs, and diagrams, that collectively encompass an enormous amount of information. Typical state of the art search engines (e.g., Alta Vista) build web page indexes and can distinguish embedded images (e.g., files without an extension of .gif or .tif) from text. These search engines may further distinguish between photo images and graphically generated images, but do not analyze the contents of the images themselves. Unfortunately, neither is the information contained in these images indexed. Any indexing provided by existing state of the art search engines is text based, relying only on text included in each particular web page and with any associated image file name. So, information embodied in the images is not readily searchable for users.
The original raw numerical information conveyed by a chart image, for example, and used in creating images such as charts is not always available. Often, the chart or other type of numerical based image is the only available record of the data contained therein. Even if search engines could search chart images, prior art search engines still are of no avail for retrieving or otherwise reproducing the raw data for a particular numerical based image.
Further, even if some raw data is available, not all charted data is available through the web in tabular format. Whatever such tabular data is available is difficult to identify and compare with other charted data, i.e., data that is in image format only. In addition, since a particular chart may be described by a few simple numbers, (e.g., two points describe a straight line) extracting data from an image and converting the extracted data to tabular format could considerably compress the file size, which could in turn save storage space, as well as conserve transmission bandwidth for information that might otherwise only be available in an image file.
Thus, there is a need for locating available data that has been previously embedded and formatted into chart format. There is a further need to extract such data from charts and reformat extracted data into tabular formattable data, for subsequent manipulation and use. Accordingly, there is a clear need for a chart indexing method for quick identification and retrieval and for a system that responds to users' requests to provide charts that display various relationships or that corresponds raw data extracted from web-based charts. More particularly, there is a need for an image search engine or for an image search capability in web search engines.