A number of techniques have been previously discussed for searching and analyzing data sets stored in an information processing system. The reader is assumed to have knowledge of SQL and other standard database and analysis packages as well as knowledge of search techniques and methods commonly used on the Internet.
Prior Patents and Publications
The following patents and publications may be related to the invention or provide background information. Listing of these patents here should not be taken to indicate that a formal search has been completed or that any of these patents constitute prior art.
U.S. Pat. No. 5,924,090 discusses a method and search apparatus for searching a database of records that organizes results of the search into a set of most relevant categories. In response to a search instruction from the user, the search apparatus searches the database, which can include Internet records and other records, to generate a search result list corresponding to a selected set of the records. The search apparatus processes the search result list to dynamically create a set of search result categories with each category associated with a subset of the records in the result list having common characteristics. Categories can be displayed as a plurality of folders. Each record within the database is classified according to various meta-data attributes (e.g., subject, type, source, and language characteristics). Substantially all of the records are automatically classified by a classification system into the proper categories. The classification system automatically determines the various meta-data attributes when such attributes are not available from the source. The technique discussed is directed to using category analysis to further narrow a set of returned documents. The technique is provided for public use at www.northernlight.com. For example, the search for “pamela anderson lee” at that site returns 19,043 total items and indicates 14 category folders. Each of these folders, when selected, displays a set of documents and a set of subfolders. For example, selecting the “Actors & Actresses” folder returns 2,587 documents and indicates 13 additional subfolders for those documents. The subfolder “Lee, Tommy” returns 151 items, and indicates 10 further subfolders. The sub-sub folder “rockcool.com” returns three total items. Neither the patent nor website discuss or provide techniques for expressing search strategies using category analysis nor methods or techniques that allow for other types of analysis to be performed on returned documents. The references also do not discuss expressing returned documents from an analysis as anything other than category folders and associated documents.
At the present time, querying and data mining/data analysis are generally considered two different fields with two different audiences. Querying is widely used by both data handling professionals and general computer users. Many websites (such as Altavista or Ebay) provide query ability using operators to all users, allowing users to specify document subsets using AND, OR, NOT or similar expressions. However, it has become an increasingly common scenario for users to receive an overwhelming list of results with no additional guidance as to how to further understand, prioritize or explore said results.
Data analysis, in contrast, is generally considered the province of data handling professionals. Analysis packages often require specialized training and use commands and syntax that are specific to a particular package. Data Mining/Data Analysis is generally considered a separate and specialized function apart from accessing data using queries.