Data presented to consumers of electronic information is often provided in pure “list” form—that is, as a one-dimensional listing in response to a query. Although much effort goes into determining the contents of the list, the ordering of the results and even the visual presentation of individual items, the consumer must still have some knowledge of the subject matter being searched to make the results meaningful. Attempts to provide general classifications (early implementations of search engines such as Yahoo!, for example) often become outdated, overly burdensome or, even worse, irrelevant.
In an attempt to help consumers direct their searches, many websites (typically those selling electronics, automobiles, books, etc.) categorize their products and associate each product with one or more of these categories. As a result, the data is semi-structured, meaning there are certain data elements that are common to all the products, and the values of these elements can be used to classify and select subsets of the products. One example can be seen on many consumer-electronics websites that sell computers. It is common to classify computers as either notebooks or desktops, by price (e.g., <$1000, between $1000 and $2000, and >$2000), screen size, processing power, weight and/or projected use (business, personal, gaming, graphical design). Each of these categories is referred to as a “facet” or “dimension” that can be used to assist the consumer in narrowing down his search using known data elements prior to presenting the results of a search query.
While facet-based searching provides a significant improvement over conventional query/result methods, it is not without its drawbacks. In particular, current techniques for implementing faceted-based search require a significant amount of work to determine the facets long before a website is implemented. Likewise, it is difficult to change the facets as the underlying data and queries evolve without disrupting or modifying the functionality of the search application that acts on the data.
What is needed, therefore, is a method and supporting systems for analyzing data and automatically determining data facets for use as search categories.