1. Field of the Invention
The invention relates to the field of data processing applications. Specifically, the invention relates to selecting data from data sources and communicating the selected data to processes with predefined input syntax.
2. Description of the Related Art
Data processing frequently involves retrieving stored data, processing the retrieved data, and storing or otherwise using the processed data. These functions may be embodied in a data processing application that is part of a larger data processing system. In many cases, the process is independent from the storage and retrieval system and a predefined syntax is used to communicate data between the two. Some storage and retrieval systems restrict how data is stored and retrieved. For example, a storage and retrieval system may group data into files, records, and fields, and typically access to the data at the field level or higher. Similarly, the process itself may only accept and return field level data. Accordingly, the syntax for communicating the data to the process may also be limited to field level data. The lowest level data sets handled by a storage and retrieval system, process, or communication syntax may be referred to as a data segment. In the example above, the data segments are fields.
For additional understanding of the limitations in some data processing systems, a look at how digital data is stored may be instructive. Digital data is frequently stored in one or more data sources, such as a database or data warehouse. The data may be stored according to a variety of data structures corresponding to physical memory locations. The physical data structure may, in turn, correspond to one or more storage/access structures, such as a file structure or hierarchical or relational database structure. For example, the data may be divided into files, which are divided into records, which are divided into fields. Fields may actually include a sequence of bits (1s and 0s). Depending on the data type of the field, the sequence of bits may be translated into another format, such as integers, floating point numbers, strings, logic values, or other formats. Fields may be viewed as being composed of one or more sub-segments based upon their data type. For example, an integer or floating point number may include one or more decimal digits, a string may include one or more characters. These individual decimal digits or characters are the field's sub-segments. Sub-segments may correspond directly to a fixed number of bits, such as characters corresponding to bytes, packed decimal digits corresponding to four bit nibbles, or Boolean or binary data corresponding to a single bit. Other data, such as some integers, may be stored in variable bit length sub-segments. Fields may also include one or more bits corresponding to other information, such as a sign or the location of a decimal point. In some cases, this additional information may be coded into a particular sub-segment in the field. Other data processing systems may handle data using data segments other than fields that are similarly composed of sub-segments.
It may be desirable to process only a portion of the data segment, in other words, to process a sub-segment or a group of sub-segments. For example, a user may only want to enact a process on the first three characters (three sub-segments) of a string field (the segment). Many data processing systems are not designed to easily retrieve, communicate, or process sub-segments. Further, many processes are designed to receive and operate on data segments and do not include logic for selectively handling sub-segments. It may be difficult to select only a portion of the field's data for processing in some systems.
One environment in which processing sub-segments of data segments is sometimes desirable is within a mainframe processor connected to one or more data sources, for example, an IBM™ mainframe connected to a storage area network, wherein a data processing application accesses data stored in one or more storage and retrieval platforms. For example, data in the data sources may be stored according to the record layouts of one or more high level programming languages, such as COBOL or PL/I. The application may allow a user to identify a file, identify a record layout, select a field, select a process to be executed on that field, and specify what is to be done with the output (e.g., update the field, store results elsewhere, pass the results to another process, etc.). The process may include one or more black box functions accepting a predefined set of parameters according to a predefined syntax. Prior applications have not provided a way to select sub-segments as one of the parameters for processing.
In the past, there have been two common solutions to the lack of sub-segment processing. One solution was to write a specific application to select the sub-segments and process the sub-segments in the desired way. This solution had to be applied to every file format that required sub-segment processing. The need for substantial, repeated application development time and associated costs makes this solution less desirable.
The other solution was to create alternate record layouts, reformat the data to the new format, process the files, and then reformat them back to the original formats. Again, this solution would have to be applied to every file format that required sub-segment processing. It would also require substantial, repeated development time for the new layouts, though probably less development time than the first solution. However, it would also require additional processing time since the files would have to be processed three times, and would require additional storage space for the reformatted files.
A data processing application that allows easy selection and processing of sub-segments from within the data processing system's data segments is desirable. A solution that does not require the development of separate applications or record layouts for each file format would be a vast improvement over the prior solutions.