It is now widely recognized that the main challenge to information storage and retrieval is not availability of information, but filtering that information. Modem search engines that rely on keyword searching have advanced that science to a high art, but searches performed with such search engines often yield an intractably large number of records. For example, a recent search for diamonds on E-BAY™ search engine yielded more than 6000 records, and the same search on YAHOO!™ Shopping search engine yielded more than 570,000. The problem is even worse on general search engines that are not specifically focused on marketplace items. A recent search on GOOGLE™ search engine for diamonds yielded about 24,300,000 records.
Not only are there too many records in a typical results set, but the fact that the records are so inconsistent in content and terminology means that it is impossible to filter them correctly. To continue with the diamond example, it is extremely difficult to search for a diamond in the 1,000 to $2,000 price range, because there is no convenient way to match a record that lists a diamond for $1,499.85. The current answer to that problem is to tag the data with metatags, special codes that identify particular items of data. Using XML tags, for example, it is possible for a search engine to identify a number as a price, and then store the price in an indexed field. In that manner the search engine could find records with information that matches a range of prices.
Tagging works reasonably well for the parameter (i.e. characteristic) of price, but only because price is common among a great many marketplace items. The system loses much of its effectiveness as soon as one begins to focus on parameters that are specific to different types of goods and services. In the diamond example, a searcher might well be interested in only those diamonds having a size of at least 1 carat, clarity of at least VVS2, color of at least E, and so forth. The only way metatags could be used effectively to filter through only the desired diamonds would be for the data to have been stored using consistent metatags, and that just isn't done. Thus, despite the prevalence and enormous power of modern search engines, they are still of very limited usefulness in conducting searches such as the diamond example above. To search for that limited selection of diamonds on eBay™ one would have to actually view each and every one of the more than 6,000 records.
As a result of the inability of search engines to adequately narrow searches based upon multiple different parameters, there are still many millions of databases used for specific classes of products. For example, there are automobile databases that store item information using fields for one or more of make, model year, mileage, and price. Similarly, there are boat databases that store boat information in fields for make, model, year, condition, and price, and also length, displacement, number of sails, number and size of engines, number of cabins, and so forth.
Unfortunately, there are still very significant problems with those specialty databases. For one thing, the sheer number of specialty databases means that the data is distributed, forcing a searcher to examine the data from many different databases for even a single type of product. Thus, a searcher conducting a thorough search for a used car is forced to examine hundreds or even thousands of automobile databases. There are consolidator services that collect data from many different databases, but disparity in the underlying data forces them to present the data in formats that cannot be properly filtered, and are still incredibly time consuming to utilize.
A second problem with specialty databases is that they exist only for a relatively small number of products and services. One would be hard-pressed to find anything even close to a comprehensive flashlight database, or a comprehensive ball-bearing database. The closest that one finds in such fields are vendor listings that show only the particular products they have to sell.
A third problem with specialty databases is that they tend to parametize the data using only a very limited number of parameters. For example, the automobile databases typically do not parametize color or condition. A searcher wanting to view only red automobiles in at least very good condition needs to view the memo text, and sometimes the images, of every single record to find desired automobiles.
What is needed is a universal database that parametizes data for all different types of goods and services. But the very fact that different types of items require different sets of parameters makes it extremely difficult to store multiple different types of items in a single database. Instead of a table with 5 or 6 columns that might be needed for a single type of item, a simple flat table adequately storing different types of items might well need thousands of columns. Still further, the cells of such a table would be mostly empty, since only a few of the cells in each row would be populated.
These problems were addressed in U.S. Pat. Nos. 6,035,294, 6,195,652, and 6,243,699, the disclosures of which are incorporated herein by reference. In those three patents the focus was on a database that evolved by virtue of: (a) users being able to add their own parameters for a given type of item; and (b) the list of available parameters being shown to subsequent users in a list that was sorted by frequency of use. Frequently used parameters would eventually float to the top of the list, while infrequently used parameters would sink to the bottom of the list. It was still further contemplated that users could add their own values to a values listing, which would similarly be sorted by frequency of use, so that commonly used values would appear at the top of the list while infrequently used values would sink to the bottom.
By way of example, a user would list an automobile for sale by selecting 10 or 15 parameters from a list of possibly 30 or more automobile related parameters. Since the list of available parameters would be sorted by prior usage frequency, the 10 or 15 parameters that the user would most want to utilize would be those at the top of the list. Most likely, e user would thereby decide to describe his automobile using make, model, year, color, mileage, etc. He could select a parameter called exhaust system, or add such a parameter if it wasn't already in the list, but he would be dissuaded from doing so by a desire to conform to the prior usage patterns of others. With respect to values, the user would likely see that prior values for color are white, black, red, green, blue, etc. He might also see that off-white is a color that had been used by others, but he would likely be dissuaded from using off-white because that color is much farther down the list than white.
Since there are many thousands of different types of products and services for which one could store records on the database, it was contemplated that the classification scheme should be hierarchical. The patents cited above contemplated a three level classification tree, including major class, minor class, and item description.
In terms of database structure, the U.S. Pat. No. 6,243,699 patent cited above contemplated separate tables for users, classification, parameters, values, and items, with the items table having columns for classification pointers, parameter pointers, and value pointers. Assuming that each item could be adequately described with a relatively small maximum number of parameter/value pairs, the items table only needs about 2n+c columns, where n is the maximum number of parameter/value pairs, and c is a small number (perhaps 5) to identify classification, date, user pointer, and so forth. Assuming that each row consumed only about 256 bytes, one could store 100 million items in only 25 gigabytes.
It is now contemplated that many of the ideas disclosed in those prior patents could be improved upon. For example, allowing users to add their own parameters to the database may be problematic because users could add all manner of stupid and inconsistent parameters. It is also contemplated that sorting parameters and values by frequency of prior occurrence could in many instances be cumbersome. In selecting a value for color, for example, a user might simply prefer to see an alphabetic list of colors that had been used by others. It is also contemplated that even a three level classification scheme is too complicated for many users, and that the data structure previously described, although very efficient for storing data, would be far too slow for retrieving data. Thus, there is a need for further improvements.