As is well known, the Unicode standard for character representation is a two byte (“double byte” in some terminology) system in which each character is represented by 16 bits or two bytes of information. This standard provides a vastly expanded range of representable characters including those from languages in which ideographs are employed to represent words and ideas as opposed to the use of individual letters. This is in distinction to ASCII or EBCDIC character representations which provide a maximum of 255 characters or signal indicators.
It is also known that each byte (eight bits) in a data processing system can represent two decimal numbers. However, it is often the case that decimal numbers are provided in a format in which each byte contains a representation of but one decimal number. It is therefore convenient to be able to PACK decimal numbers (or other data) into a packed format, that is, from a one-decimal-digit-per-byte format to a two-decimal-digit-per-byte format. This is typically accomplished with some form of PACK instruction which is structured as a basic member of the set of a computer's instruction set. These instructions usually come in PACK/UNPACK pairs.
Also relevant to the present discussion are the notions of big-endian and small-endian. These concepts relate to the position in the memory architecture where the high order byte portion of an integer (or other data) is stored. In the big-endian scheme, the most significant byte of the integer is stored in the memory location with the lowest address. In the small-endian scheme, the most significant byte of the integer is stored in the memory location with the highest memory address. The Intel x86 processors and chips which seek to duplicate their functionality, such as those produced by Advanced Micro Devices, Inc., use the small-endian (also called little-endian) format. The zSeries of machines and most of the PowerPC devices employ the big-endian format.
PKU is an instruction present in the very well known zSeries computer architecture as found in products manufactured and sold by the assignee of the present invention. Descriptions of this and other instructions are found in any of the Principles of Operation (PoP) manuals published as accompanying documentation for the aforementioned data processing machine products. This particular instruction converts a Unicode string to a packed format. The format of the PKU instruction is “PKU TARGET, SOURCE (L2)” where L2 is the Length of the second operand (0≦L2≦64). The length of the target is always 16 bytes. A sample program included herein as Appendix I provides a description of an approach to providing emulation code for the PKU (Pack Unicode) instruction. Appendix I thus illustrates a block level algorithm that is used herein.
The format of the second operand is changed from Unicode to packed, and the result is placed at the first-operand location. The packed format is described in Chapter 8, “Decimal Instructions.”
The two-byte second-operand characters are treated as Unicode Basic Latin characters containing decimal digits, having the binary encoding 0000-1001 for 0-9, in their rightmost four bit positions. The leftmost 12 bit positions of a character are ignored. The second operand is considered to be positive.
The implied positive sign (1100 binary) and the source digits are placed at the first-operand location. The source digits are moved unchanged and are not checked for valid codes. The sign is placed in the rightmost four bit positions of the rightmost byte of the result field, and the digits are placed adjacent to the sign and to each other in the remainder of the result field.
The result is obtained as if the operands were processed right to left. When necessary, the second operand is considered to be extended on the left with zeros.
The length of the first operand is 16 bytes.
The byte length of the second operand is designated by the contents of the L2 field. The second-operand length must not exceed 32 characters or 64 bytes, and the byte length must be even (L2 must be less than or equal to 63 and must be odd); otherwise, a specification exception is recognized.
When the length of the second operand is 32 characters (64 bytes), the leftmost character is ignored.
Paragraphs [0005] to [0011] above are taken from the published description of the Pack Unicode instruction z/Architecture Principles of Operation having a document number of SA22-7832-03 with a “Build Date” of May 4, 2004 12:13:20 and a “Build Version” of 1.3.1 of “BUILD/VM Version: UG03935” and a Drop Date of Thursday Aug. 8, 2003.
PKA is a zSeries instruction that converts an ASCII string to packed format. The format of the PKA instruction is “PKA TARGET SOURCE (L2)” where L2 is the Length of the second operand (0≦L2≦32). The length of the target is always 16 bytes.
The format of the second operand is changed from ASCII to packed, and the result is placed at the first-operand location. The packed format is described in Chapter 8, “Decimal Instructions.”
The second-operand bytes are treated as containing decimal digits, having the binary encoding 0000-1001 for 0-9, in their rightmost four bit positions. The leftmost four bit positions of a byte are ignored. The second operand is considered to be positive.
The implied positive sign (1100 binary) and the source digits are placed at the first-operand location. The source digits are moved unchanged and are not checked for valid codes. The sign is placed in the rightmost four bit positions of the rightmost byte of the result field, and the digits are placed adjacent to the sign and to each other in the remainder of the result field.
The result is obtained as if the operands were processed right to left. When necessary, the second operand is considered to be extended on the left with zeros.
The length of the first operand is 16 bytes.
The length of the second operand is designated by the contents of the L2 field. The second-operand length must not exceed 32 bytes (L2 must be less than or equal to 31); otherwise, a specification exception is recognized.
When the length of the second operand is 32 bytes, the leftmost byte is ignored.
Paragraphs [0014] to [0020] above are taken from the published description of the Pack Unicode instruction found in the same published Principles of Operation manual cited above.