1. Field of the Invention
This invention relates to sorting data and more particularly relates to sorting character representations of numeric data.
2. Description of the Related Art
Data sets often include an encoding scheme other than a character encoding scheme that may be displayed in a form easily recognized by a user. For example, a data set may include a predefined numbering system such as a hexadecimal number system and a binary number system. The user often converts data in the predefined numbering system to a character representation for analysis. Data sets may include database records, log records, trace records, and system dump records. Numeric data values of the data set are frequently converted to a character representation by formatting tools to improve readability. Typically, the data set is presented to a user in a series of data lines. Formatting tools generally sort the data lines using raw data (data not converted to a character representation) for one or more data values and add delimiter information including titles, page headers, page footers, and the like.
Unfortunately, if a user requires that the data lines of a data set be sorted in an alternate manner or using an alternate data value as a sort handle, the data lines often do not sort correctly because the character representations of the data values are not ordered the same as the data in the original number system. For example, the character representation of the hexadecimal value ‘F9’x (249) is ‘F9’. To sort this data, the hexadecimal value for the character representation is sorted on the computer. In one example using the Extended Binary Coded Decimal Interchange Code (EBCDIC) character set, the value ‘C6F9’x represents ‘F9.’ So, the computer sorts using the value ‘C6F9’x for ‘F9’x (249). Similarly, ‘C6C1’x is used to sort the character representation ‘FA’ (250) of hexadecimal value ‘FA’x. Thus, although ‘F9’ should precede ‘FA’ in an ascending sort, ‘FA’ will in fact precede ‘F9’ because ‘C6C1’x precedes ‘C6F9’x. In addition, titles, page headers, and page footers can also be sorted into arbitrary and useless positions in the data set.
FIG. 1 illustrates a conventional presorted data set 100. A formatting tool has added a title 120, and one or more page headers 140 to a plurality of data lines 150. Generally, the formatting tool sorts the data lines 150 according to some fixed sort criteria. Each data line 150 includes at least one data value 160 that may be selected as a sort data value 160 and may also contain at least one data value 170 that is not selected as a sort data value.
In a hypothetical situation, the sort data value 160 is a character representation of hexadecimal data. Also the sort data values 110 wrap around from a large value of a set of sequentially ordered values to resume sequential ordering with sort data values 130 from near the beginning of the set of sequentially ordered values. In other words, the sort data value 160 has a fixed number of digits. Consequently, as the sequence numbers of the sort data value 160 exceed the highest value that can be represented with the fixed number of digits, the value in the sort data value wraps back to all zeros for each digit. This wrapping is easily conceptualized using a conventional odometer as an example. Once all places on the odometer reach a 9 digit, the odometer wraps to all zeros.
The numerically larger sort data values 110 are referred to herein as pre-wrap sort data values 110 and the numerically smaller sort data values 130 subsequent to the wrap are referred to herein as post-wrap sort data values 130. Although the pre-wrap sort data values 110 are numerically larger than the post-wrap sort data values 130, the pre-wrap sort data values 110 represent data values that occur sequentially before the post-wrap sort data values 130. The data values of the pre-wrap sort data values 110 reach a maximum value and wrap to resume sequencing at the lower numeric values of the post-wrap sort data values 130.
FIG. 2 illustrates a sorted data set 200. In the depicted sorted data set 200, a user has sorted the presorted data set 100 of FIG. 1 by the sort data value 160. The sort data values 220, 260 containing the character representations of the hexadecimal digits A-F incorrectly sort ahead of the sort data 240, 280 containing the character representations of hexadecimal digits 0-9. The post-wrap sort data values 130 precede the pre-wrap sort data values 110, although as explained in FIG. 1, the pre-wrap sort data values 110 sequentially should come before the post-wrap sort data values 130 in precedence. Although the user might prefer to retain the title 120 and pager headers 140 in the positions each occupied relative to the beginning of the report and the beginning of each page, the title 120 and the page headers 140 are grouped at the beginning of the data set. Thus, the data lines 150 of the sorted data set 200 are not sorted properly.
From the foregoing discussion, it should be apparent that a need exists for an apparatus, system, and method that sort character representations of sort data values in data sets consistent with the original numeric encoding. The apparatus, system, and method should retain the original formatting including properly placed delimiters of the presorted data sets. In addition, there exists a need for an apparatus, system, and method that sorts pre-wrap and post-wrap data values.