The amount of information being processed and stored is rapidly increasing as technology advances present an ever-increasing ability to generate and store data. Additionally, computer systems are becoming increasingly integrated so a need exists to correctly, but efficiently, integrate data from one system into another system. Ensuring translation correctness can require considerable software development expense, which is in tension with a need to perform this conversion with commercial efficiency.
One common type of data format conversion is converting data from a first textual format to a second textual format. Examples of such conversion are readily apparent in all commercial, educational, political, and technical fields. For example, an electronic record for a credit card purchase can be comprised in part of several textual fields, including the name of the card holder, an identifying number for the credit card used in the transaction, and merchant information identifying the nature of the purchase and identifying the merchant. Consumers frequently track their credit card purchases through online billpay or online banking software, but the textual format of credit card transaction data within the online billpay environment can differ from the textual format of credit card transaction data within the environment of the originating credit card processor. Thus, data format conversion is needed to integrate data formatted consistent with a credit card company's computing environment with the computing environment of a consumer's online billpay application. Fortunately for billpay software providers, the format of credit card transaction data is relatively straightforward and is relatively stable compared to other data conversion environments.
Some data conversion environments have very complex data conversion requirements and these data conversion requirements can be subject to frequent revision. Complex data conversion requirements and frequently changing data conversion requirements can arise in situations where a parser must process data from numerous independent sources, each of which can format their data in arbitrarily complex forms and can add new formats or change existing formats with arbitrary frequency. As the number of different formats a parser must support increases, the complexity of the parser increases. As the complexity of the parser increases, the software development resources required to update and test the parser can increase dramatically. Thus, increasing parser complexity is in tension with both goals of reliable and commercially efficient data translation.
Existing parsing tools do not perform well in complex parsing environments that frequently change. One traditional approach to designing a text parser is for a software developer to write regular expressions that will recognize strings or portions of a string and modify those strings or portions of a string by a predefined transformation. One problem with this approach is that regular expression transformations can provide efficient and correct solutions for relatively simple data conversions, but complex transformations using regular expression can be very difficult to write, test, modify, and/or interpret. Moreover, in some data conversion environments, a result generated by one regular expression transformation can be an input to another regular expression transformation, which tends to significantly increase the conceptual complexity and practical expense of developing and maintaining a parser based on regular expressions.