Traditional search engines, such as Web search engines, typically work by storing information about web pages retrieved by web crawlers or spiders and analyzing the content of those pages to determine how each page should be indexed. For example, words can be extracted from the titles, content, headings, or meta tags of the pages. Data about the web pages are then stored in an index database for use in later queries. A query can be a single word.
When a user enters a query into a search engine (typically by using keywords), the engine examines its index and provides a listing of best-matching web pages according to its criteria, usually with a short summary containing the document's title and sometimes parts of the text or an image. The index is built from the information stored with the data and the method by which the information is indexed. Some search engines allow users to search by date or to search before, after or during certain periods of time, based on dates found in the web pages or when the web pages were indexed.
Most search engines support the use of Boolean operators to further specify the search query. Boolean operators are for literal searches that allow the user to refine and extend the terms of the search. The engine looks for the words or phrases exactly as entered. Some search engines support proximity search, which allows users to define the distance between keywords, and concept-based searching, which involves the use of statistical analysis on web pages containing searched for words or phrases. Natural language queries allow the user to type a question in the same form a user might ask the question to a human. The search engine then parses the question to identify key words and may construct its own Boolean query based on the key words and their association with one another.
Since the usefulness of a search engine depends on the relevance of the results it generates, most search engines employ various methods for determining which web pages that match the search terms are more relevant, popular, or authoritative than others, with the former being ranked before the latter. How a search engine decides which pages are the best matches, and what order the results should be shown in, varies widely from one engine to another, but two main systems have emerged. One employs predefined and hierarchically ordered keywords that humans have programmed extensively while the other generates an inverted index by analyzing texts it locates.
A text-based search engine cannot, however, search for text that is not there, so a text-based search engine would not be able to index and search photos and videos unless the photos or videos were also associated with words. A number of color-based search engines have been developed to help improve image searching. For example, Picitup's PicColor product (www.picitup.com) is a visual search engine that allows images to be searched for specific colors selected from a color palette, Picitup's Mutlicolor Flickr search allows users to search for the combination of a single color and a limited number of shapes, but not words. For example, a user could select the color “orange” and a “star” shape and possibly get an image of an orangish, star-shaped starfish as a result, Shapes are identified through image recognition software. Multicolor by Idee Inc. (http://labs.tineye.com is another visual search engine that allows a user to select multiple colors from a palette, such as both orange and yellow, but does not combine these color searches with text searching. Etsy, an online shop for handmade products, has a color search option that is combined with words in a limited manner. For example, a user could first select a color of interest, then enter a key word and be returned numerous items that match the key word and include the selected color.
Other content search methods, such as search engines used to select movies, music, books, etc., on various websites include preloaded information about the content, such as action, thriller, non-fiction, jazz, etc. If a user enters the word “action” on a movie website, then the user would get a listing of movies considered to be “action” movies. Such content is usually categorized within a narrow range of choices and users can only search for content that falls within those specific categories. For example, if a user wants to search for something like action, which is not exactly action, the user will have to pick one of the other category tags, such as comedy, thriller, etc.; there is no option for picking a range around the idea of action because words are too precise for an unspecific search. Likewise, some content, such as movies, may be rated according to whether people liked the movie, with a well-liked movie getting five stars, or some similar rating, and unliked movies getting one star or no stars. Users can then search for movies that have more than a certain number of stars, such as no movies with less than three stars. In such cases, the stars have nothing to do with the content of the movie itself, but rather only people's opinion of that movie.