1. Field of the Invention
This invention relates to methods and processes for translating data from one format or programming paradigm to another programming format or paradigm, and especially to methods for converting Extensible Markup Language data structures to COBOL data structures.
2. Background of the Invention
Common Business Oriented Language or “COBOL” is one of the oldest high-level computer programming languages still in widespread use today. Like other programming languages, COBOL is designed to allow development of computer programs that perform specific purposes and tasks. COBOL is especially well suited for business data processing tasks, as opposed to scientific or engineering data processing tasks. Business oriented data is characterized by “transactions”, often which are either reflections of manually-conducted activities (e.g. sales, deposits, withdrawals, etc.) or reflections of automatically-conducted activities (e.g. compounding of interest on a principal amount). A common “output” of business-oriented tasks is a “report”. As such, COBOL is tailored to collect, organize, validate, update, process and report with respect to business-oriented data.
As COBOL was originally targeted to and executed by large “mainframe” business computers, it was one of the original languages employed in the “client-server” topologies of 20 to 30 years ago. As evolution of computing has brought processing paradigms full circle with the advent of “thin clients”, networked servers, wide area networks and the Internet, “client-server” architectures-are once again in favor, albeit the “clients” are now commonly web browser computers, and the servers are many thousands of times more capable in processing bandwidth, storage capacity, and communications capabilities.
COBOL, and related products and systems such as International Business Machine's (“IBM's”) Customer Information Control System (“CICS”), and IBM's Information Management System (“IMS”) are well known in the industry, being employed in business and government enterprises ranging from banking, finance, investment, insurance, to manufacturing and service operations.
As newer programming languages such as object-oriented languages including “C”, “C++”, and Sun Microsystem's Java [TM], have found widespread acceptance in the software industry, some may be led to believe that COBOL, IMS, CICS, and similar products and languages are of limited future value. Regardless of whether COBOL, CICS, and IMS/DC are recognized as strategic products, from a business case perspective, there are literally billions of lines of COBOL business application code in use today. While Java has now become the application development language of choice, until recently COBOL was the main application development language, in use since the early 1970's, in both CICS and IMS/DC transaction processing environments. For example, in 1999, the IBM Hursley (United Kingdom) development laboratory estimated that more than 20 billion transactions per day were processed in IBM's customers' CICS installations worldwide. Therefore, COBOL remains an important technology, and problems arising from interfacing and interacting COBOL resources with newer technology resources (e.g. applets, servlets, etc.) must be addressed as inventively as any other “cutting edge” technology problem. For example, and inefficient solution to a COBOL problem which is executed 20 billion times per day accumulates to massive wasted processing bandwidth, memory and storage waste, and communication inefficiencies. By the very nature of COBOL applications (e.g. business transactions), such results manifest themselves as increased costs, latency to complete transactions, and reduced profits.
The more modern concept of “Data Mining” can be summarized as the ability to re-use business application logic from existing application programs to solve business problems in the future. Data Mining implementations exist in varying degrees of sophistication. For example, a simple application refacing solution may use a Web browser connected to a Web Server, which in turn accesses data from a transaction-processing server using an Extensible Markup Language (“XML”) interface. In another example, a tightly coupled business-to-business (“B2B”) application may connect a company to a supplier, where an XML document serves as the common data transport. In this example, it can be seen that XML-based servers will enable the evolution of Web Services to access older “legacy” data such that a business can greatly extend its reach to customers over time while continuously updating, upgrading, and migrating its business application programs to offer enhanced services and products with ever-increasing cost and response efficiencies.
As such, the networked economy is driving the businesses from rigidly designed business computing systems to employ flexible application design on scalable computing platforms, from static to dynamic interaction-between partners, and from technology integration to business integration.
So, two extremes in technology are firmly established—billions of lines of code of COBOL representing untold billions of dollars of business investment on one end, with XML data transport technology ensuring the ability of future business applications to access and use legacy data. Neither XML or COBOL may be used exclusively. However, there are significant technical challenges to interfacing COBOL and XML to each other, especially with respect to multi-dimensional arrays or “tables” of data which are so prevalent in today's business application requirements.
Brief Review of COBOL Field Definitions and Tables
While COBOL is well known in the art, it will be useful to briefly review the implementation of “tables” or “arrays” of data in COBOL in order to fully understand the impact of converting tabular or indexed data structures to and from COBOL and XML.
In COBOL syntax, a “picture” clause is used to define a field for use, such as shown in Table 1.
TABLE 1Example COBOL Field Definition01Data - Field02 Data - Item - 1Pic X(1).03 Data - Item - 2Pic X(1).
In this example, two data items, both being alphanumeric fields, are defined with a “precision” of one character. The “X” following the “Pic” indicates the field is alphanumeric, and the “(1)” following the “X” indicates the field is 1 character in length. A “Pic 9(3)” field type would be a numeric field having 3 digits, as would be a “Pic 999” field type. Other field types, such as literals, and numerics with decimal (e.g. fractional components) may be defined, as is well known in the art. Fields may be defined within groups, as indicated by elementary levels within group levels, as is also well known within the art.
So, for example, a customer information record could be defined as shown in Table 2, wherein the customer name may have up to 30 characters, the telephone number 10 digits, and the account number 18 alphanumeric characters.
TABLE 2Example Customer Information Defined in COBOL01Data - Field02 Customer - NamePic X(30).03 Customer - TelephonePic 9(10).04 Customer - AcctnumPic x(18).
This type of customer data is often organized into arrays or tables of information, such as arrangements employed by relational database application programs.
Implementation and storage layout of an array structure varies by language. The COBOL language table structure, implemented by use of the COBOL “occurs” clause, stores array elements in consecutive memory locations.
For example, a one-dimensional array T beginning at memory location X, containing six elements e, where each element is four characters in length, is defined in a COBOL program as follows:
01 T.05 e OCCURS 6 TIMES PICTURE 9999.
FIG. 1 shows the table represented by this data structure definition. As in other programming languages, COBOL organizes data into arrays according to their definitions with respect to size (e.g. dimension) and field types. Multiple methods of declaring such tables, however, may result in different actual run time implementations of the data structure, especially with respect to the physical organization of the data when stored in memory. For example, Table 3 shows an example single-index (e.g. mono-dimension) array in COBOL, in which an 8-character alphanumeric field is defined in an array of seven fields.
TABLE 3Example Single Dimension Table Definition in COBOL01DaysOfWeek - Table.03 Day - NamePic X(8) Occurs 7 Times.
During initialization of a program using such an array or immediately following the definition of such an array as shown in Table 3, the initial values (e.g. strings containing the weekday names) of the fields could be set using a COBOL “move” verb, such as shown in Table 4.
TABLE 4Example Table Initialization in COBOL000061Move “Monday” To DAY-Name(1)000062Move “Tuesday” To DAY-Name(2)000063Move “Wednesday” To DAY-Name(3)...000067Move “Sunday” To DAY-Name(7)
Once the table is loaded or initialized, individual field values can be quickly and directly accessed using the day number index, and specialized verbs such as the COBOL “search” verb, can be used to inspect or verify the information.
However, most business oriented data cannot be as simply organized as a single-dimension array. For example, customers may be organized by product type, sales volume, geographic location, etc. COBOL allows for multi-dimensional tables to be defined essentially as tables of tables, as shown in Table 5.
TABLE 5Example Multi-Dimensional Table Definition in COBOL000040 01 Sales-Transactions.03 Customer-Num Occurs 100 Times Indexed by Cust-Index05 Order-Num Pic X(3) Occurs 15 Times Indexed byOrder-Index 07 Order-Items Pic X(45) Occurs 25 TimesIndexed by Item-Num
In this example, 100 different customers, each with a customer number, are tracked for 15 orders (each with an order number), with each order having up to 25 items listed or described using up to 45 characters in each item description. In COBOL implementation, this is realized as 15 tables of 25 fields, further organized into an array of 100 tables (e.g. 100×15×25).
Multidimensional COBOL tables are stored in row major order, in which rows are, placed one after another in memory, as is described in the text “Data Structures & Their Algorithms” by Harry R. Lewis and Larry Denenberg (HaiperCollins, 1991). The row is defined as the first index, where a table having three dimensions and indexed by the 3-tuple (x, y, z), x is the index for the row.
For example, a two-dimensional array T beginning at memory location X, containing two rows and three columns, where each element (x) is two characters in length, and each element (y) is four characters in length, as defined in a COBOL program with the three COBOL statements:
01 T.05 x OCCURS 2 TIMES PICTURE XX.10 y OCCURS 3 TIMES PICTURE 9999.
FIG. 2 depicts how this structure is represented as a table in contiguous storage. This method of storage provides an efficient means for fast access of the table elements, located in main memory, by the application program but can be inefficient for long term storage and retrieval, where storage space is at a premium.
For example, databases may store sparse arrays (partially filled arrays) by using linked lists or a hierarchical table of pointers. During processing the application program may decide to store the array elements in a database.
n-Dimensional Array Data Mapping Between Markup Language and COBOL
An XML document request can originate from an external source (e.g. another company, another agency, another enterprise, etc.), or it can be generated by a network server that requires an XML interface to access mainframe applications. Mapping an XML document to COBOL data structures is distinct from the challenge of developing algorithms for the efficient storage and retrieval of XML documents on mass media.
An XML parser and mapper are somewhat straightforward to develop for simple XML documents. The challenge, however, arise when mapping XML documents that may have multi-dimensional table or array data. This requirement, though, is actually very common.
For example, a system for displaying information about an athletic sport league might use an XML document to display statistics for an individual player, within a team, within a conference, within a division, and within a league. The resulting structure would be a 4-dimensional array.
An example using the National Football League (“NFL”) can be used to illustrate the general conceptual procedure for mapping a data element into the result array. In this example, the NFL divisions are the first dimension (w), the conferences are the second dimension (x), the teams are the third dimension (y), and the players are the fourth dimension (z). If a team name is parsed from the XML document containing the relevant statistics, the following procedure must be followed to map the team name into the result array:    1. Identify the result array dimension. The parsed XML tag and element might appear as:            <team—name>Dolphins</team—name>             The tag name can be used to determine that team—name belongs to the teams dimension (the third dimension).    2. Navigate to the identified target dimension. The Dolphins are a team in the AFC East conference, which is in the AFC division. If we have previously parsed and mapped elements for the NFC division, and AFC West, AFC Central, and AFC East conferences in the AFC division, then we have progressively navigated through (1, x, y, z), (2, 1, y, z) and (2, 2, y, z). We are currently processing teams in the AFC East conference, so we know that the array index will be (2, 3, y).    3. Determine the target dimension array index. In this case, the target dimension array index is the value for y. The system, then, searches for the first empty team—name bucket in the target dimension (y). If this is the third team that we have processed in the AFC East, then the third bucket will be empty, and the target dimension array index will be 3.    4. Move the current data value into the empty field in the array (the array field may or may not be empty). In this example, we finally move the value “Dolphins” to (2, 2, 3) in the result array.
However, actually designing a software implementation for this general conceptual procedure presents challenges to developing an efficient method to navigate the result array, and for determining the target dimension array index, especially if the result array has been pre-loaded with state data. Therefore, there is a need in the art for a method and system which efficiently maps multi-dimensional table array data to and from XML and COBOL.