This invention relates to electronic databases in general, and more specifically to a method and apparatus for analyzing the content of a database for various qualities such as comprehensibility, completeness and consistency which bear on the usefulness of the database in comparison to other databases.
Searchable electronic catalogs are commonly used in support of electronic commerce and purchasing functions. These electronic catalogs can be created from printed catalogs, spreadsheets, text documents, databases or lists and typically are rendered into databases, HTML page collections and other electronic means. Individual purchaser or marketplace system installations frequently contain several catalogs from several sources. For example, an office supply installation may contain office supply catalogs from several different office supply vendors or manufacturers. Some of the catalogs may describe identical items such as a blue pen while each catalog will likely describe similar but different items, such as different makes of blue pens. These catalogs may vary in their quality and usability as measured by the ability of users to find and purchase items. An objective measurement of the qualities of each catalog allows one to compare catalogs and identify catalog deficiencies quickly. With sufficient support, such analyses can quickly localize the source of the deficiency.
Three critical aspects of catalog usage are purchasing, item identification and validation, and finding. Sufficient information must be present in the catalog for describing an item so that a user or a prospective buyer can find the item. A catalog supplier strives to present a catalog that maximizes the likelihood that items will be found, identified and then purchased. The information needed for a purchase may be only a part number or include very detailed item descriptions with images and interactive applications. Catalogs that support a greater amount of specific information generate greater sales so they are scored higher in evaluating the catalog""s usefulness and in evaluating the key attribute of how easy it is for a purchaser to find the item that is sought.
In a preferred embodiment, the present invention provides a method for scoring a database for a quality, for example, completeness, consistency or comprehensibility. The method includes selecting fields of the database that are to be analyzed, fetching values for each record of the database from the fields that are to be analyzed and comparing the fetched values to a standard. Preferably, after the comparison, a score is assigning for each field based on the comparison. The fields are ranked in order of pertinence to the quality that is to be measured and the scores are weighted for each field based on the rank of each field. The weighted scores are finally combined to obtain a score for the database.
Where the quality to be analyzed is completeness, the invention includes comparing fetched values for a field to other fetched values for the same field. Assigning a score comprises assigning points for each null value so that the score a for a field corresponds to the number of null values for all records in that field.
Where the quality to be analyzed is consistency, the invention includes comparing the fetched values for a field to a dictionary of possible values. Assigning a score comprises assigning points for each fetched value that does not match a dictionary value so that the score a for a field corresponds to the number of non-matching values for all records for that field.
Where the quality to be analyzed is comprehensibility, the present invention includes comparing the fetched values for a field to a dictionary of possible values and assigning a score comprises assigning points for each fetched value that does not match a dictionary value so that the score a for a field corresponds to the number of non-matching values for all records for that field.