1. Field of the Invention
This invention relates to the field of computer software. More specifically the invention relates to an improved method and apparatus for structuring, maintaining, and using families of data.
2. Background Art
Many companies use catalogs to convey information about the products they sell. The organization and layout of each catalog that is published is important because the catalog must quickly convey information to the purchaser about the products the company offers for sale. For instance, when publishing the contents of a catalog, product information should be organized into a more detailed arrangement than that provided by the categories of a typical classification scheme. A detailed arrangement groups items according to the category value and other criteria. For example, products in a certain category, such as paintbrushes, may also be grouped by manufacturer. These groupings are referred to as families. Generally speaking, a family can be defined as a group of records, in a table, related by one or more common fields having the same value. These families may also have additional fields of common information, such as of common information, such as images, logos paragraphs of descriptive text, bullets of specifications, and other data. Families provide a way of identifying groupings by fixing one or more common fields and/or attribute values. Existing methods use data structures to store and retrieve these families of records. However, these methods present several problems with defining structures. To educate the reader, a brief description of some of the problems with arranging records in families follows.
For illustration purposes a brief example of a family will follow. Initially, the data to be illustrated in a catalog (or any other type of data in a database) is represented in a classification scheme called a taxonomy. The taxonomy provides for the partitioning of a table and its records into multiple categories, with or without a hierarchy, along with the assignment of attributes to each of a number of categories. In Table 1, a taxonomy is used where a table and its records are partitioned into categories, with or without a hierarchy, where each category comprises a set of common attributes. A category's attributes may not be physically part of a record but instead can be considered part of the definition of the record, where the record contains a reference to the category's attributes.
The examples that follow will be based on the taxonomy and data displayed in Tables 1-4:
TABLE 1Category IDCategoryParent IDPosition1Printers002Daisy Wheel Printers103Dot Matrix Printers114Inkjet Printers125Laser Printers13
TABLE 2Attribute IDAttributeType1Pages Per Minute (ppm)Numeric2ColorText
TABLE 3Attribute IDFeature IDFeature21Color22Black & White
TABLE 4Category IDAttribute ID1112The four tables above define the following taxonomy:
Printers (ppm, color)                Daisy Wheel Printers        Dot Matrix Printers        Inkjet Printers        Laser Printers        
The taxonomy provides an example of a category hierarchy with five categories, a root category (a node that has no parent), identified as “Printers”, and four remaining child (and leaf node) categories associated with the “Printers” category. The “Printers” category may have two attributes “ppm” and “color”.
TABLE 5PRINTERS:IDModelManufacturerCategory IDDescriptionPrice1ALP1Acme58 pages per minute. black & white$5002AIJP1Acme43 pages per minute ink, black & white$1503ALP2Acme58 pages per minute; color$40004ADMP1Acme33 pages per minute; black & white$1005BLP1Best520 pages per minute, color$50006BLP2Best520 pages per minute, black & white$10007BIJ1Best44 pages per minute; color$2508BDWP1Best22 pages per minute, black & white$75
The first table (Table 1), or category table, defines categories within the taxonomy. The category table includes a “Parent ID” field that may be used to define a hierarchy and, more particularly, a category's level within a category hierarchy. An attributes table (Table 2) defines attributes that may be included in a category. Table 3, a feature-values table, may be used to define enumerated values of an attribute of the attributes table. In the example, the feature values table identifies two enumerated values for the “color” attribute. Table 4, a category-attribute table, identifies the attributes that are associated with a record of the category table. Inheritance may be used to allow child categories to inherit attributes that are associated with a parent category. The families, in the examples, will be defined by the combination of manufacturer and category. The fifth table (Table 5) shows a list of data entries for printers. The “Position” field identifies a position within a hierarchical level for a given category. Each of the records in a uniform fields table (i.e., Table 5) references a category record in the category table (Table 1) that defines additional data elements (or attributes) of the referencing record.
Several solutions may be used to partition the data (e.g., in Table 5) into families. A brief description of some of these solutions and the problems associated with them follows.
The “Table Per Family” Approach
The “table per family” approach partitions the records into families by storing the records of each family in its own table (e.g., Tables 6-11).
TABLE 6IDModelManufacturerCategory IDDescriptionPrice1ALP1Acme58 pages per minute; black & white$5003ALP2Acme58 pages per minute; color$4000
TABLE 7IDModelManufacturerCategory IDDescriptionPrice2AIJP1Acme43 pages per minute ink; black & white$150
TABLE 8IDModelManufacturerCategory IDDescriptionPrice4ADMP1Acme33 pages per minute; black & white$100
TABLE 9IDModelManufacturerCategory IDDescriptionPrice5BLP1Best520 pages per minute; color$50006BLP2Best520 pages per minute; black & white$1000
TABLE 10IDModelManufacturerCategory IDDescriptionPrice7BIJ1Best44 pages per minute;$250color
TABLE 11IDModelManufacturerCategory IDDescriptionPrice8BDWP1Best22 pages per minute; black & white$75This approach provides for efficient storage of the data. However, as the number of families increases, so does the number of tables. Data management and searching for records then becomes increasingly complex and time-consuming because additional tables must be accessed. Furthermore, changes to the family definition require complex restructuring of the tables and reorganization of the records contained within them. For example, if families were changed to be defined as the combination of the category and the color attribute, then six new tables (Laser/Color, Laser/B&W, Inkjet/Color, Inkjet/B&W, Dot Matrix/B&W, and Daisy Wheel/B&W) would need to be created and populated, and the old tables would have to be destroyed.The “Table Lookup” Approach
The “table lookup” approach typically requires three steps. First, a table containing a record for each of the families must be created (e.g., Table 12). Second, a lookup field for the family must be added to the partitioning table. Third, the identifier (ID) of the proper family record, in the family table, must be placed into this field for each record of the partitioning table to create a relationship between each record and its corresponding family (e.g., Table 13).
TABLE 12Family IDDescription1Acme Laser Printers2Acme Inkjet Printers3Acme Dot Matrix Printers4Best Laser Printers5Best Inkjet Printers6Best Daisy Wheel Printers
TABLE 13IDModelManufacturerCategory IDDescriptionPriceFamily ID1ALP1Acme58 pages per minute; black & white$550012AIJP1Acme43 pages per minute ink; black & white$15023ALP2Acme58 pages per minute; color$400014ADMP1Acme33 pages per minute; black & white$10035BLP1Best520 pages per minute; color$500046BLP2Best520 pages per minute; black & white$100047BIJ1Best44 pages per minute; color$25058BDWP1Best22 pages per minute; black & white$756
This approach has several major drawbacks. First, the manual process of assigning the family identifiers is time-consuming, error-prone and extremely tedious. Second, changes to the record do not result in the product being properly reassigned to the correct family. Third, changes to the families may require that some or all of the records of the family be reassigned.
The “Stored Query” Approach
Because the related records in a family have the same fixed values for a set of field values, they can be identified by a query specifying these common values. This query can be stored and later referenced to identify and locate the records for the family.
TABLE 19Query NameQueryAcme LaserManufacturer = Acme; Category = Laser PrintersPrintersAcme InkjetManufacturer = Acme; Category = Inkjet PrintersPrintersAcme Dot MatrixManufacturer = Acme; Category = Dot Matrix PrintersPrintersBest LaserManufacturer = Best; Category = Laser PrintersPrintersBest InkjetManufacturer = Best; Category = Inkjet PrintersPrintersBest Daisy WheelManufacturer = Best; Category = Daisy Wheel PrintersPrinters
This approach also has several shortcomings. First, there are a variety of problems setting up and maintaining the queries. Setting up the queries is time-consuming and error-prone, because each must be manually done. Each query must be given a name or identifier so that it can be referenced and, with a large number of families, it quickly becomes difficult to organize and manage the set of family queries. There is no way to guarantee that the set of queries will contain the entire set of records, while also ensuring that each record belongs to exactly one query; that is, some queries may inadvertently overlap so that a single record belongs to multiple families, or the queries may not provide adequate coverage, so that some records may not belong to any family. The relationship between the families is not visually obvious from the queries, nor is there any single structure that identifies, illustrates, or maintains these relationships. Finally, while the queries identify which records belong to the family, they fail to provide an efficient way to determine to which family a particular record belongs. Finding the family for a particular record would require examining each of the queries, one at a time, to see if the record matched the criteria for that query.
Storing Common Information For Family
Another common data storage problem concerns the need of a database to store fields of common information that relate to a family of related records rather than just a single record. The challenge is to store information in a way that is efficient, easy to implement for existing data, and easy to maintain, as additional records are added to the database.
Single Table Approach
Existing solutions use a “Single Table approach” or a “Multiple Table approach”. In the “Single Table” approach, all of the data values for a main table record, including the common information that applies to an entire family of records are stored, within the record itself in the single main table. As a result, the table structure is very simple but, at the same time, it is both wasteful of storage because the common data values are duplicated in multiple records, and wasteful of effort because each of the values must be entered manually and repetitively for each of the multiple records in a family. In addition, a change to any of the common data values is not automatically propagated through the entire family of records; rather, the data value must be updated in each of the multiple records that contain the value, introducing the potential for inconsistency and error.
Multi-Table Approach
The “Multi-Table” approach is consistent with the relational data model and uses multiple tables to store related information. The primary table stores the specific information about each main table record while a lookup table contains a record for each family that stores the fields of common information. Records in the tables are linked by placing an identifier in both tables that links each record in the primary table to the corresponding record in the lookup table. The advantage of this approach is that the common data values are stored only once in a single record in the lookup table, eliminating duplication and saving space; additionally, changes to the single copy of the common information are automatically reflected in all the records of a family. The drawback of this approach is that the link between each record in the primary table and corresponding record in the lookup table still needs to be defined manually; similarly, new records that are added to the database must be manually linked to the common information by the user rather than automatically linked by the system. In addition, if there are many different fields of common information, but only some of them are used for each family, the columns that store the information will be sparse.
Publishing
A third aspect related to data storage and retrieval relates to publishing catalogs of product information in paper and electronic media. Publishing catalogs of product information in paper and electronic media historically has been two very different and distinct processes, with a very different level and type of effort involved, and very different standards and expectations for quality. The challenge is to eliminate the distinctions between paper and electronic output and combine the best of both media in a way that brings to electronic catalogs the structure and high standard of quality typical of paper catalogs and, at the same time, dramatically reduces the cost of laying out paper catalogs by flexibly, programmatically, and automatically generating page layouts in real time.
Known solutions present several shortcomings. Paper catalogs are meticulously laid out, with existing page layout programs, a page at a time. Tables are formatted individually by manually populating page layouts with product data, a process that is time-consuming, tedious and very, very expensive. There is also no simple way to experiment with different tabular layout formats and views of the data. Once a page has been laid out, it is difficult to add or remove records from tables without destroying the structure of the page and requiring that it be laid out again (sometimes from scratch), which discourages updates with the result that catalog pages tend to quickly become out-of-date. The upside of this complex process, however, is that manual page layout usually results in high page density, flexible and well-structured tabular layout formats using pivots to eliminate redundant information, and a very high overall standard of quality. Notwithstanding the high level of quality, however, it remains difficult to enforce a uniform look throughout a publication because more than one person is usually involved in the page layout process, and each lays out pages somewhat differently.
By contrast, electronic catalog pages are typically database-driven and generated programmatically in real-time. Since page layouts do not actually exist until the electronic catalog page is displayed, new products can be added and old products removed without disturbing the system or the published output. Unfortunately, the downside of this flexibility is that automatically generated electronic catalog pages are usually no more than wide, ugly, “spreadsheet-style” tables of data with redundant information, very little structure, and none of the sophisticated tabular layout formats that are standard for paper pages. With category-specific attributes and a large number of categories, it is even more impractical to have a customized hand-coded display for each family, so generic unstructured presentations are even more the norm.
Moreover, when publishing to multiple media, none of the effort invested in meticulously laying out paper pages can be leveraged for the electronic catalog, since both the structure of the tabular layout formats as well as the product data are typically trapped within the page layout itself, while the electronic catalog requires that the data be stored and managed in a database to be searchable and generated in real-time. Thus the worlds of the two media are completely distinct and non-overlapping, very difficult to integrate, and require two distinct publishing efforts.