It is now widely recognized that the main challenge to information storage and retrieval is not availability of information, but filtering that information. Modern search engines that rely on keyword searching have advanced that science to a high art, but searches performed with such search engines often yield an intractably large number of records. For example, a recent search for diamonds on e-Bay™ yielded more than 6000 records, and the same search on Yahoo!™ Shopping yielded more than 570,000. The problem is even worse on general search engines that are not specifically focused on marketplace items. A recent search on Google™ for diamonds yielded about 24,300,000 records.
Not only are there too many records in a typical results set, but the fact that the records are so inconsistent in content and terminology means that it is impossible to filter them correctly. To continue with the diamond example, it is extremely difficult to search for a diamond in the 1,000 to $2,000 price range, because there is no convenient way to match a record that lists a diamond for $1,499.85. The current answer to that problem is to tag the data with metatags, special codes that identify particular items of data. Using Microsoft's™ XML tags, for example, it is possible for a search engine to identify a number as a price, and then store the price in an indexed field. In that manner the search engine could find records with information that matches a range of prices.
Tagging works reasonably well for the parameter (also known as a characteristic or attribute) of price, but only because price is common among a great many marketplace items. The system loses much of its effectiveness as soon as one begins to focus on parameters that are specific to different types of goods and services. In the diamond example, a searcher might well be interested in only those diamonds having a size of at least 1 carat, clarity of at least VVS2, color of at least E, and so forth. The only way metatags could be used effectively to filter through only the desired diamonds would be for the data to have been stored using consistent metatags, and that just isn't done. Thus, despite the prevalence and enormous power of modern search engines, they are still of very limited usefulness in conducting searches such as the diamond example above. To search for that limited selection of diamonds on eBay™ one would have to actually view each and every one of the more than 6,000 records.
As a result of the inability of search engines to adequately narrow searches based upon multiple different parameters, there are still many millions of databases used for specific classes of products. For example, there are automobile databases that store item information using fields for one or more of make, model, year, mileage, and price. Similarly, there are boat databases that store boat information in fields for make, model, year, condition, and price, and also length, displacement, number of sails, number and size of engines, number of cabins, and so forth.
Unfortunately, there are still very significant problems with those specialty databases. For one thing, the sheer number of specialty databases means that the data is distributed, forcing a searcher to examine the data from many different databases for even a single type of product. Thus, a searcher conducting a thorough search for a used car is forced to examine hundreds or even thousands of automobile databases. There are consolidator services that collect data from many different databases, but disparity in the underlying data forces them to present the data in formats that cannot be properly filtered, and are still incredibly time consuming to utilize.
A second problem with specialty databases is that they exist only for a relatively small number of products and services. One would be hard-pressed to find anything even close to a comprehensive flashlight database, or a comprehensive ball-bearing database. The closest that one finds in such fields are vendor listings that show only the particular products they have to sell.
A third problem with specialty databases is that they tend to parametize the data using only a very limited number of parameters. For example, the automobile databases typically do not parametize color or condition. A searcher wanting to view only red automobiles in at least very good condition needs to view the memo text, and sometimes the images, of every single record to find desired automobiles.
What is needed is a universal database that parametizes data for all different types of goods and services. But the very fact that different types of items require different sets of parameters makes it extremely difficult to store multiple different types of items in a single database. Instead of a table with 5 or 6 columns that might be needed for a single type of item, a simple flat table adequately storing different types of items might well need thousands of columns. Still further, the cells of such a table would be mostly empty, since only a few of the cells in each row would be populated.
These problems were addressed in U.S. Pat. Nos. 6,035,294, 6,195,652, and 6,243,699, the disclosures of which, along with all other extrinsic materials cited herein, are incorporated herein by reference. In those three patents the focus was on a database that evolved by virtue of: (a) users being able to add their own parameters for a given type of item; and (b) the list of available parameters being shown to subsequent users in a list that was sorted by frequency of use. Frequently used parameters would eventually float to the top of the list, while infrequently used parameters would sink to the bottom of the list. It was still further contemplated that users could add their own values to a values listing, which would similarly be sorted by frequency of use, so that commonly used values would appear at the top of the list while infrequently used values would sink to the bottom.
By way of example, a user would list an automobile for sale by selecting 10 or 15 parameters from a list of possibly 30 or more automobile related parameters. Since the list of available parameters would be sorted by prior usage frequency, the 10 or 15 parameters that the user would most want to utilize would be those at the top of the list. Most likely, the user would thereby decide to describe his automobile using make, model, year, color, mileage, etc. He could select a parameter called exhaust system, or add such a parameter if it wasn't already in the list, but he would be dissuaded from doing so by a desire to conform to the prior usage patterns of others. With respect to values, the user would likely see that prior values for color are white, black, red, green, blue, etc. He might also see that off-white is a color that had been used by others, but he would likely be dissuaded from using off-white because that color is much farther down the list than white.
Since there are many thousands of different types of products and services for which one could store records on the database, it was contemplated that the classification scheme should be hierarchical. The patents cited above contemplated a three level classification tree, including major class, minor class, and item description.
In terms of database structure, the U.S. Pat. No. 6,243,699 cited above contemplated separate tables for users, classification, parameters, values, and items, with the items table having columns for classification pointers, parameter pointers, and value pointers. Assuming that each item could be adequately described with a relatively small maximum number of parameter/value pairs, the items table only needs about 2n+c columns, where x is the maximum number of parameter/value pairs, and c is a small number (perhaps 5) to identify classification, date, user pointer, and so forth. Assuming that each row consumed only about 256 bytes, one could store 100 million items in only 25 gigabytes.
It is also now appreciated that users might well want to identify and classify results sets in particular, and groupings of records in general, according to their own preferences. For example, with respect to groupings of records, a given user might search for a new home having a given set of parameter/value pairs (e.g. zip code=9262?, sq ft≧2500, no. bedrooms≧, lot size≧10,000 sq ft), and then use active links to view the actual properties. In doing so he develops a set of properties that he wants to pursue, and after doing all that work he may well want to find some way of storing (and characterizing) that record set for future use. Perhaps even more importantly, a given user might want to see how other users characterized the same or similar groupings of records. It is already known for users to store search strings (queries) for later use or reference. That is already done on Lexis-Nexis,™ Micropatent™ and other search facilities. It is also already known for users to store results sets, either directly from the search engine or as modified. Micropatent, for example, stores sets of records in “worksheets.”
It appears, however, that there is no teaching, suggestion, or motivation in the prior art for storing queries and record sets relating to a storage system in which users can add their own parameters. It also appears that the prior art is entirely silent as to enabling ordinary unsophisticated users (as opposed, for example, to database designers or programmers) to store characterize their queries and/or record sets using multiple, open-ended characterizations. All the prior art systems that the current inventor knows of, merely allow a user to associate a fixed set of information with a query or results set, including for example, name of the query, matter number, run frequency, output format, and so forth. One item in particular that is missing from the prior art is identification of parent, child, sibling, or other relationships between or among different queries or different results sets.
The U.S. Pat. No. 6,195,652 mentioned in passing that “some of the classifications may also deal with miscellaneous information, including scientific articles, historical facts, and so forth.” The implication was that informational items should be considered goods and services, and subject to users classifying such items with whatever parameters and values they wished. Unfortunately, there was no teaching, suggestion or motivation, for users to couple the content of the item to other items. Thus, it was contemplated that a user loading data might classify a book by title, author, subject, type of work (novel, fantasy, etc), quality of work (compelling, silly, thoughtful, etc), length (180,000 words, or perhaps 459 pages) on a book by book basis. In that manner, a subsequent user could, for example, search for novels of less than 250 pages characterized as compelling. But there was no consideration given to other users classifying books (or other items) having records that were loaded by others, which could be accomplished by subsequent users characterizing their own data sets.
Thus, it is now contemplated that many of the ideas disclosed in those prior patents could be improved upon. For example, allowing users to add their own parameters to the database may be problematic because users could add all manner of stupid and inconsistent parameters. It is also contemplated that sorting parameters and values by frequency of prior occurrence could in many instances be cumbersome. In selecting a value for color, for example, a user might simply prefer to see an alphabetic list of colors that had been used by others. It is also contemplated that even a three level classification scheme is too complicated for many users, and that the data structure previously described, although very efficient for storing data, would be far too slow for retrieving data. It is still further contemplated that users can store their searches and record sets in an open-ended parameter/value manner. Thus, there is a need for further improvements.