1. Field of the Invention
The present invention relates to a computer system, and deals more particularly with a method, system, and computer-readable code for grouping dynamic schema data using Extensible Markup Language notation.
2. Description of the Related Art
The International Business Machines (xe2x80x9cIBMxe2x80x9d) Software Glossary defines a xe2x80x9cschemaxe2x80x9d as xe2x80x9cThe set of statements, expressed in a data definition language, that completely describe the structure of a database.xe2x80x9d (This Software Glossary is located on the World Wide Web at www.networking.ibm.com/nsg.) These statements provide a logical view of the database structure, including the layout format of the database records as well as relationship information. The layout information includes which fields appear in each record, the data type for each field (such as whether it is numeric, binary, character, image, etc.). Relationship information specifies how various fields are related within the database. For example, for data that has a hierarchical structure, parent and child relationships will be described in the schema.
A recent advance in the database art, including directory databases and other types of data repositories, is the notion of a dynamic schema. Originally, databases supported records that had a fixed, predefined format. The schema for these databases was therefore static and predetermined as well. If the structure of a group of records in the database changedxe2x80x94such as adding a new fieldxe2x80x94then the schema and all the records of that group had to be rebuilt to reflect the change. If a rebuilt record had no value for a newly-added field, that field was still present for the record but remained empty. This resulted in inefficient use of storage space. With a dynamic schema, on the other hand, each addressable record in the data repository can have a different number of fields. Only those records that are individually affected by a schema change need to be rewritten in the data repository to reflect the changed format.
For example, consider a directory database that contains information about a company""s employees. (xe2x80x9cDirectory databasexe2x80x9d, or simply xe2x80x9cdirectoryxe2x80x9d, is a term known in the art that reflects the recent trend of using the information stored in a data repository as an on-line directory of information.) Further suppose that this example directory contains information about the employees, such as their name and home address, employee identification number, social security number, etc., and that the company uses this directory to store information about each employee""s computer access privileges. The company may have a number of different types of computer systems. Some employees may have access to all the different systems, while other employees have access to only one or perhaps to none of the systems. For each different computer system, different types of information may be pertinent. The database systems of the past, with their fixed schema, would have required each employee record to contain all the fields for each potential system (making the records, in effect, a union of the possible fields) even though many of the fields would be unused. Or, the information for each employee would have to have been segmented, so that the different types of information were stored in separate homogeneous repositories. Data repositories for which dynamic schema can be used, however, simply store the relevant fields for each record and omit those that are not relevant. In the employee information directory discussed above, if an employee for whom a directory record exists is subsequently granted access to another computer system, the pertinent fields and values are simply added to the employee""s record. Thus, one employee""s record may be quite different in format from another employee""s record.
One example of database directories that support dynamic schema is what is commonly referred to as an xe2x80x9cX.500 directoryxe2x80x9d. This terms refers to ITU Recommendation X.500, which specifies a particular approach to implementation of a directory service. This information is also published as an international standard in ISO/IEC 9594-1, xe2x80x9cThe Directory: Overview of Concepts, Models, and Servicesxe2x80x9d (1995). An xe2x80x9cX.500 directoryxe2x80x9d is a directory service according to these specifications. X.500 directories are widely used in the Internet and World Wide Web (hereinafter, xe2x80x9cWebxe2x80x9d) for providing centralized storage and management of information. The X.500 specification defines a default schema, and provides for extending and customizing the schema according to the requirements of a particular implementation.
While a dynamic schema provides a convenient and efficient means for storing records of different formats in a data repository, the dynamically varying formats on a record-by-record basis can cause problems for other programs that need to process this stored data. Programs are typically written to expect a known data format. If a dynamic schema is used, a program processing the data in the conventional manner has no techniquexe2x80x94other than rewriting the codexe2x80x94for processing new fields that have been added or preventing the processing of fields that have been removed. In addition, when record format can vary widely from one record to another (such as in the employee directory example discussed above), existing techniques for processing the data require code that is pre-written to accommodate each potential variation. At best, a user of the program will be presented with an inaccurate depiction of the data content when the format of the information changes. In the worst case, the program will cease to operate properly as the fields it expects are no longer present, and will have to be rewritten. The difficulty of keeping code synchronized with dynamically changing underlying data formats will be apparent.
Accordingly, a need exists for a technique with which data having dynamically variable record formats can be easily and efficiently accommodated, without requiring modification of the code that processes the data each time the underlying data format changes. The present invention provides a novel way to gather data that may have had changes to its format, and create a structured representation of this data that flexibly adapts to format variations. This novel technique enables all added data fields in a record to be made available for processing and removed data fields to be omitted, without requiring advance knowledge of the added and removed fields.
An object of the present invention is to provide a technique whereby data from a dynamic schema, having dynamically variable record formats, can be easily and efficiently accommodated by program code processing that data, without requiring modification of the code that processes the data each time the underlying data format changes.
Another object of the present invention is to provide a technique for gathering data that may have had changes to its format, and creating a structured representation of this data that flexibly adapts to format variations.
Still another object of the present invention is to provide this technique whereby all added data fields in a record are made available for processing and removed data fields are omitted, without requiring advance knowledge of the added and removed fields.
It is another object of the present invention to provide this technique using a DOM tree created from an XML syntax representation of the source data, and creating an output DOM tree in which the destination data gathered from the source is reformatted and stored.
Yet another object of the present invention is to provide this technique whereby the results of an LDAP query from an LDAP directory can be used as the source data.
A further object of the present invention is to provide a technique whereby fields that are required to be present in the source data can be specified as required in the destination data.
Yet another object of the present invention is to provide a technique for specifying that fields from the source data are to be excluded when creating the destination data.
Other objects and advantages of the present invention will be set forth in part in the description and in the drawings which follow and, in part, will be obvious from the description or may be learned by practice of the invention.
To achieve the foregoing objects, and in accordance with the purpose of the invention as broadly described herein, the present invention provides a software-implemented process, a system, and a method for use in a computing environment, for gathering data having dynamically variable record formats such as those created when a dynamic schema is used with a data repository. This technique comprises: providing an input data source comprising one or more records, wherein each of the records has this dynamically variable record format, and wherein the dynamically variable record format of each record comprises a plurality of dynamically variable fields; processing a gather verb specification, wherein the gather verb specification identifies a selected one of the records from the input data source and an output data destination; gathering the dynamically variable fields from the selected one of the records according to the gather verb specification; and transferring the gathered dynamically variable fields to the output data destination according to the gather verb specification, wherein the gathering and the transferring flexibly adapt to a presence or an absence of the dynamically variable fields. The selected record may be formatted as a first Document Object Model (DOM) tree, the gather verb specification may be formatted as a second DOM tree, and the output data destination may be formatted as a third DOM tree. Optionally, the first DOM tree may be created by parsing an Extended Markup Language (XML) representation of the selected record and the second DOM tree may be created by parsing an XML representation of the gather verb specification. Preferably, the gather verb specification further identifies: a required group of fields, an optional group of fields, an excluded group of fields, and an other group of fields, wherein one or more of the optional group, the excluded group, and the other group may be empty. The gathering preferably further comprises: locating a field from the dynamically variable fields corresponding to each of the fields in the required group, and generating an error if any of the fields in the required group cannot be located; locating a field from the dynamically variable fields corresponding to each of the fields in the optional group if the corresponding field is present in the selected record; locating a field from the dynamically variable fields corresponding to each of the fields in the excluded group if the corresponding field is present in the selected record; and locating all fields from the dynamically variable fields that did not correspond to any of the fields in the required, optional, or excluded groups. The transferring preferably further comprises: transferring each of the fields located by the first locating process to a corresponding required output field in the output data destination; transferring each of the fields located by the second locating process to a corresponding optional output field in the output data destination; preventing transfer of each of the fields located by the third locating process to the output data destination; and transferring each of the fields located by the fourth locating process to a corresponding other output field in the output data destination. The input data source may represent a result of a query from a directory database, and this directory database may be accessed using a Lightweight Directory Access Protocol (LDAP).