A. Field of the Invention
Systems and methods described herein relate generally to information retrieval and, more particularly, to automated techniques for classifying documents.
B. Description of Related Art
The World Wide Web (“web”) contains a vast amount of information. Search engines assist users in locating desired portions of this information by cataloging web pages. Typically, in response to a user's request, the search engine returns references to documents relevant to the request.
One type of search engine is an image search engine. An image search engine, such as a web-based image search engine, catalogs images from the web. Typically, the image search engine may associate text, such as text that occurs near a particular image, with the image. The text associated with an image may then be searched using conventional key-word based search queries to locate images relevant to the search query.
Some documents contain images arranged in a format known as an “image gallery.” Image galleries include multiple images arranged in some uniform manner. For example, a web-based hyper text markup language (HTML) document describing a neighborhood picnic may contain nine images of the picnic arranged in a three by three table. Each image may include a description of the image (e.g., a description of the people in the image) located visually near the image.
Image search engines may consider images belonging to image galleries to be of different quality than other images, and may thus treat them differently when returning results to users. Accordingly, it can be important for an image search engine to be able to recognize when an image is part of an image gallery.