1. Field of the Invention
The present invention relates to specification of data transformation statements, and in particular, to systems, methods and computer program products for interactively defining data transformations.
2. Description of the Related Art
Automated transformation of data from one or more source formats, systems, database environments, or schemas, to one or more targets has become an important enabling technology for managing heterogeneous data environments, for migrating data from legacy systems to newer data environments, for creating so-called xe2x80x9capplication islands,xe2x80x9d and for data warehousing or otherwise maintaining multiple instances of the same semantic information. The need for such automated data transformation is driven in part by rapid changes in computing and information systems technology, but also by business changes such as the acquisition and merging of business units, by year 2000 (Y2K) issues, by growth and/or change of business focus, by the use of enterprise planning, front office and back office systems, by increasing demands by customers, suppliers, and business partners for real-time or near-real time data, and by competitive pressures that demand timely operational data to support management decisions.
Large information technology departments face formidable challenges in transforming data. For example, even when incorporating off-the-shelf packaged applications, the level of effort required to integrate the off-the-shelf application into a pre-existing data environment can be substantial. Indeed, some estimates place the level of effort associated with data conversion alone at 40% of overall effort for integrating enterprise-wide software applications. Companies frequently address the data conversion challenge by building teams of programmers and consultants to hand-code programs to convert and integrate the required data. Unfortunately, converting large amounts of data can be a time-consuming and difficult process, even for skilled programmers. Automated tools are a practical necessity.
The need for automated data conversion tools goes well beyond simple, one-time, transformation of data from a legacy source format to that required by a new software and/or hardware system. Data conversion programs, sometimes called data bridge programs, are often written when an organization does any of the following.
1. Integration of a new application that uses a different data format into a pre-existing computing environment. For example, suppose a company becomes self-insured and buys a flexible benefits software package. To use the package, the company must extract data from a personnel database, a payroll database (perhaps in a flat file form), a claim system (which could be maintained out of house), a general ledger, and other financial data stores. It is very likely that the systems reside on different platforms and use different data formats. Moreover, once the new system has been implemented and integrated, the company will have an ongoing need to access data from these original sources, as well as to apply changes to some of these databases to reflect changes in the flexible benefits system.
2. Modification of a data structure definition (schema) used by a database management system. When a software support organization develops a new application (perhaps to reflect new equipment being deployed in the field), it is often necessary to modify the schema definition of various databases to reflect the new information. This process can require dumping the data content of the original database and reloading the data content into the new record types, along with information about the new equipment or application, which may be stored in a flat file or database.
3. Integration of data from multiple sources or reports for use with decision support tools. Companies typically store related data on a variety of different systems in different data formats. For example, business data are typically stored in an environment dominated by older applications and storage formats, whereas engineering data may be stored in an object-oriented database on a workstation and simulation data may be stored in a flat file on a mainframe. Periodic planning often requires that data from the various systems be collected and integrated.
Although brute force, hand-coding techniques have been, and continue to be, used in support of the above, data-driven techniques are preferable because of their flexibility and because they can allow non-technical personnel to define appropriate data transformations. One such data-driven environment is the ETIxc2x7EXTRACT(copyright) Tool Suite available from Evolutionary Technologies International, Inc. of Austin, Texas (ETIxc2x7EXTRACT is a registered trademark of Evolutionary Technologies International, Inc). The ETIxc2x7EXTRACT Tool Suite is an extensible set of software tools for automating the generation and execution of data conversion programs. Because the tool suite is data-driven, functionality can be added without changing the ETIxc2x7EXTRACT code itself. In general, the data that drives the tool suite includes (1) internal databases that store metadata, i.e., data about data, describing the structure and format of source and target data sets and (2) transformation rules. Using the ETIxc2x7EXTRACT Tool Suite, transformational rules are developed using grammars that define the universe of legal transformation statements. In this way, transformational rules developed using the ETIxc2x7EXTRACT Tool Suite are correct by construction.
In the information systems arts, the use of metadata is emerging as a critical element in effective information resource management. Vendors and users alike recognize the value of metadata; however, the rapid proliferation of data manipulation and management tools has resulted in information technology products that represent and process metadata differently and without consideration for sharing of metadata. To enable full-scale enterprise data management, different information tools must be able to freely and easily access, update, and share metadata. One viable mechanism to enable disparate tools from different vendors to exchange metadata is a common metadata interchange specification. Another is through repositories (e.g., a Microsoft Repository with its Open Information Model and Oracle""s Repository with its Common Warehouse Model) with business models and tools to interchange information about these business models with other applications.
Although there is currently no generally deployed standard for metadata exchange, a number of products (e.g., CASE tools and data repositories) do have the ability to exchange metadata with a subset of other products. A proposed Meta Data Interchange Specification (MDIS) created by the Meta Data Coalition, is a file-based exchange format currently supported by a variety of vendors. Unfortunately, metadata exchange is only part of the solution. While having a standard metadata exchange mechanism is critical, with one exception, it is relatively easy to relate the metamodels used by different tools. The exception is business rules, i.e., the functional logic used to test and transform data values.
In general, users today specify business rules in one of three ways: (1) SQL or some proprietary control language (2) fragments of code such as of COBOL, C++ or BASIC, (3) code fragments with embedded SQL or (4) merely as documentation strings. These representations are unfortunate for two reasons. First, for products that require code blocks or some Fourth Generation Language (4GL), the user of the product must have some technical training. Second, and more importantly, such encodings make it extremely difficult for products to exchange business rules. Systems and methods are desired that allow interoperability and exchange of business rules or test and transformation logic (e.g., as part of metadata exchange) are desired.
Accordingly, a parser-translator technology has been developed to allow a user to specify complex test and/or transformation statements in a high-level user language, to ensure that such test and/or transformation statements are well-formed in accordance with a grammar defining legal statements in the user language, and to translate statements defined by the user into logically and syntactically correct directives for performing the desired data transformations or operations. In some embodiments configured for transforming data represented in one or more databases, code-generation for a particular data access system, access language and/or database installation is performed. In some embodiments, the parser-translator technology is configured as a software component for use in providing user interface functionality. In some embodiments, multiple instances of a parser-translator component, each with a corresponding grammar, provide multiple software applications with a common interface to high-level user language statements, whether stored in a computer readable medium or interactively supplied by a user. In some embodiments, a single parser-translator component provides multiple software applications with an interface to high-level user language statements wherein operation of the single parser-translator component is suitably defined for each software application using a corresponding grammar encoding.
Using the developed parser-translator technology, a user can focus on the semantics of the desired operations and need not be concerned with the proper syntax of a language for a particular system. Instead, grammars (i.e., data) define the behavior of a parser-translator implementation by encoding the universe of statements (e.g., legal test and/or transformation statements) and by encoding translations appropriate to a particular data processing application (e.g., a data conversion program, etc.).
Parser-translator implementations in accordance with certain embodiments of the present invention interface dynamically with other systems and/or repositories to query for information about objects, systems and states represented therein, and/or their respective interfaces. Grammars in accordance with embodiments of the present invention encode context-sensitivity. In this way, context-sensitive prompting and validation of correct specification of statements is provided. A combination of parser technology and dynamic querying of external system state allows users to build complex statements (e.g., using natural languages within a user interface environment) and to translate those complex statements into statements or directives appropriate to a particular data processing application.
In one merely exemplary data processing application, the complex statements define test and/or transformation logic and the data processing application manipulates contents of a data store (or stores) in accordance with the test and/or transformation logic. By sharing the same context-sensitive grammar framework, software tools vendors can define a subset of a natural language to be used in specifying business rules in such a way that enterprise tools from various vendors and targeted at various data access or decision support functions may freely exchange such business rules. By supplying vendor- or tool-specific translation rules and external query handlers, a single context-sensitive grammar framework can be used to define business rules that may be used by a multiplicity of tools employing parser-translator technology in accordance with an embodiment of the present invention.
In one embodiment in accordance with the present invention, an apparatus includes an instance of a parser-translator software component and a software tool operable with the parser-translator instance. The parser-translator instance is executable to read an input stream encoding at least one business rule and to evaluate the input stream in conjunction with a grammar encoding an external query. The software tool is operable with the parser-translator instance to receive an output thereof corresponding to the business rule. By operation of the external query, the output is sensitive to an external context.
In another embodiment in accordance with the present invention, a method of operating a data processing program in accordance with at least one business rule includes: reading an input stream encoding the business rule, evaluating the input stream in accordance with a grammar encoding an external query to least one externally represented data or metadata state, performing, in the context of the evaluating, the external query; and supplying an operating directive for the data processing program. The operating directive corresponds to the business rule and is sensitive to the externally-represented data or metadata state. In one variation, the input stream evaluating includes: recognizing tokens from the input stream, parsing the tokens, and translating a sequence of the tokens to produce the operating directive.
In still another embodiment in accordance with the present invention, a system for providing business rule interoperability in a computing environment includes a computer readable encoding of a grammar and a parser-translator component. The grammar encoding includes (i) rules descriptive of legal statements in a data transformation language for transforming between a first format and a second format and (ii) entries descriptive of data transformation language elements in the legal statements. At least one of the entries specifies an external query to resolve a corresponding data transformation language element at evaluation time. The parser-translator component is executable to read an input stream encoding at least one business rule and to evaluate the input stream in conjunction with the grammar encoding including the external query and, based thereon, to define a data transformation. In a variation, the system further includes a second computer readable encoding of a second grammar. The second grammar encoding is also readable by the parser-translator component and thereby allows the parser-translator component to read the input stream and define a second data transformation based thereon.
In still another embodiment in accordance with the present invention, a data-driven system for providing business rule interoperability amongst multiple data processing programs includes first and second grammar encodings, an input stream encoding at least one business rule, and a single parser-translator component. The first and second grammar encodings are respectively specific to first and second data processing programs and both first and second grammar encodings including an external query. Based on the first grammar encoding, the single parser-translator component parses the input stream and translates the input stream into a first data transformation for use in conjunction with the first data processing program. Based on the second grammar encoding, the single parser-translator component parses the input stream and translates the input stream into a second data transformation for use in conjunction the second data processing program. In some variations, the input stream includes an interactive stream. In some variations, the input stream includes a batch stream.
In still yet another embodiment in accordance with the present invention, a method of interactively specifying a transformation of data from a first representation to a second representation thereof includes supplying a computer readable encoding of a grammar and based on the grammar encoding, presenting a user with choices. The grammar includes syntax rules and dictionary entries. The syntax rules are descriptive of legal sequences of tokens in a data transformation language. The dictionary entries are descriptive of terminal ones of the tokens. The dictionary entries include at least a first and a second entry respectively descriptive of at least first and second tokens. The first entry specifies at least one data transformation language element corresponding to the first token, and the second entry specifies an action to dynamically-query an external data source to resolve at least one data transformation language element corresponding to the second token. The method includes presenting a user with at least one choice for a first element in a data transformation statement in accordance with the syntax rules and presenting a user with successive at least one choices for successive elements in the data transformation statement in accordance with the syntax rules. At least one of the successive element presentings includes performing a dynamic query in accordance with the second entry of the dictionary.
In still yet another embodiment in accordance with the present invention, a computer program product includes a grammar encoded by or transmitted in at least one computer readable medium. The grammar includes a set of phrase structure rules defining legal orderings of tokens in the data transformation language, a dictionary of entries descriptive of terminal ones of the tokens, and at least one encoding of a query to an external data or metadata store.
In still yet another embodiment in accordance with the present invention, a method for interactively specifying a data transformation statement in accordance with a grammar including (i) rules defining legal orderings of tokens in a data transformation language for transforming between a first data representation in a first data store and second data representation and (ii) dictionary entries descriptive of terminal ones of said tokens includes at least two steps. First, supplying a computer readable encoding of the grammar, wherein a function identifier is associated with at least one of the dictionary entries. The function identifier encodes a run-time query to the first data store or metadata corresponding thereto. Second, based on the grammar, presenting a human user with choices for successive elements in the data transformation statement, wherein at least one of the choices is resolved by the run-time query.
In still yet another embodiment in accordance with the present invention, a data processing system includes first and second data stores and parser-translator means. The first data store encodes a grammar defining legal data transformation statements in accordance with a predetermined data transformation language. The second data store encodes metadata descriptive of a structure of a first data set. The parser-translator means is executable to access the first and second data stores and has a user interface allowing a human user to interactively define a data transformation statement in accordance with the grammar. At least one element of the data transformation statement is selected by the human user from alternatives resulting from a dynamic query of the metadata.