1. Field of the Invention
The present invention relates generally to taxonomies, and in particular, to a method, apparatus, and article of manufacture for mapping and searching across multiple different taxonomies.
2. Description of the Related Art
Taxonomies are used within almost every website today to categorize and help users find data. When browsing a website, a user must learn the structure of a proprietary taxonomy or use a standard taxonomy that may not be familiar to the user. What is needed is a mechanism for searching across a variety of different taxonomies using a schema or taxonomy that the user is familiar with. The problems of the prior art may be better understood with an explanation of taxonomies and prior art search and retrieval methodologies.
As used herein, the term “taxonomy” generally refers to a scheme by which parts may be categorized. The term “category” refers to the category (e.g., a casement window) for a particular part. More specifically, a “taxonomy” is a data format for a particular supplier for a particular part for a particular industry. Different manufacturers will not always use the same data formats. The individual formats are referred to as taxonomies.
Taxonomies may be defined by multiple industries for a broad range of uses. These taxonomies include some of the following: CSI MasterFormat 2004™ taxonomy, OmniClass 1.0™ taxonomy, and CSI UniFormat II™ taxonomy.
In addition, taxonomies can be defined more narrowly. For example, a specific manufacturer may maintain an internal taxonomy to organize inventory or a catalog. As an example, a company (e.g., the assignee of the present invention, Autodesk, Inc.™) may maintain a taxonomy for each of its product catalogs. In this regard, Autodesk, Inc™ utilizes the following taxonomies: Autodesk AutoCAD-MEP Catalog Structure™, Autodesk Revit-Arch Catalog Structure™, Autodesk Revit-MEP Catalog Structure™, and Autodesk AutoCAD-Str Catalog Structure™. Each of these taxonomies has its own vocabulary and unique hierarchy. Further, taxonomies often involve a class and one or more subclasses.
Traditionally, architects have supplied floor plans or other drawings as the initial step in the design process (e.g., the home design process) and the general contractors (GC) have supplied the individual pieces that fill out the floor plan. This includes everything from windows and doors to faucets and light fixtures. Increasingly, architects are adding these specific pieces to drawing files. Unfortunately, searches to find the particular fixture involves using a standard search engine and often take far longer than it should (up to 60% of the time). Further, searches may result in a huge number of records in many different formats (e.g., taxonomies). I
Accordingly, in the prior art, users have had to use traditional searches to find the information that they need (e.g., using web search engines). In this regard, to find certain parts in certain formats, users utilized standard web searches that potentially returned thousands of hits. The user would then refine the search, being more specific, and return fewer hits. Subsequently the user would select a link and search for the part. Once a part was selected, and if a corresponding file actually exists, the user would download a file for the part. If the file does not exist, the user would continue searching. With such an approach, the user may not find relevant information, and would be forced to refine the search, and start the entire process over.
Another problem with the prior art is how to effectively digest a wide range of catalog data (pertaining to specific parts) defined in a myriad of classification schemes/taxonomies. Once (and if) this data has been indexed, the further problem is how to query across it without being limited to data originating from specific classification schemes/taxonomies.
Traditional database aggregation approaches to this problem have involved a strict normalization of all incoming data into a single master schema. Such an approach is inherently a non-scalable model due to the specific attention that is required to each individually different schema and the fragility of this system in the face of a changing schema. Furthermore, the database approach suffers from the problem of requiring an exact match to a query or no results are returned.
In view of the above, what is needed is the capability to process/digest a wide range of catalog data defined in a myriad of different classification schemes and the capability to search across all of the different catalog data regardless of the schema used to conduct the search.