1. Field of the Invention
This invention relates generally to instruction implementation and register utilization within a computer processor, and more particularly to providing a method, article, and system for the effective implementation of an instruction set that performs operations on Unicode and Unicode-transformation-format (UTF) characters, and can be implemented on 24, 31, and 64-bit architectures, while maintaining backward compatibility with existing systems.
2. Description of the Related Art
Software has become a major portion of the cost associated with computer systems because it is very “labor-intensive.” Some of this cost is due to the effort involved in writing and debugging programs; other costs involve maintaining programs after they have been written. Accordingly, considerable effort has been expended in order to reduce the time and costs involved with writing, debugging and maintaining moderate and large software programs. Much of this effort has been related to developing programming languages and programming techniques, which will allow programmers to build on or “reuse” programs and code segments that have been written by others.
Until very recently, software programming was heavily dominated by an approach referred to as “structured programming.” Common software programming languages used in this approach were, and remain, BASIC, FORTRAN, COBOL, PL/1, and C. These are considered “higher order” languages that are written in human readable code and ultimately translated into machine or computer readable code by a compiler. Typically, structured programs have consisted of a combination of defined variables of specific data types, e.g. integer, real, and character, and a complimentary set of functions or routines, which operate on these variables. Often, a program would include sub-routines which are smaller routines within a program or larger routines that carry out certain operations, e.g. printing data in a given output format. The emphasis to this approach was inputs—functions—outputs and they were often represented as flowcharts by the designers, which logically represented how the program functioned and branched into different functional paths. As an increasing number of programs became large (tens of thousands of lines of code and above) structured programs became increasingly complex and difficult to write, troubleshoot and maintain.
In response to the unwieldy nature of structured programs and their related flowcharts, new approaches to software engineering called Object-Oriented Design (OOD) and Object-Oriented Programming (OOP) have emerged and gained increasing popularity among software developers. OOP promised greater reuse and maintainability than its structured programming predecessor because of an emphasis on well-defined and self-contained objects, rather than the structured programming emphasis on a proliferation of relatively loosely related data manipulating functions and subroutines.
Object Oriented Programming techniques involve the definition, creation, use and destruction of “objects.” These objects are software entities comprising data elements, or attributes, and methods, or functions, which manipulate the data elements. The attributes and related methods are treated by the software as an entity and can be created, used and destroyed as if they were a single item. Together, the attributes and methods enable objects to model virtually any real-world entity in terms of the entity's characteristics, represented by the data elements, and the entity's behavior, represented by data manipulation functions or methods. In this way, objects can model concrete things like people and computers, and they can also model abstract concepts like numbers or geometrical designs. Object-Oriented Programming languages include C++, Java, as well as other languages.
As was previously mentioned the “higher order” programming languages (structured, object oriented) must ultimately be translated into machine or computer readable code by a compiler to carry out instructions to be executed by a computing device and/or processor.
Instruction sets used in computer systems employing so-called Complex Instruction Set Computing (CISC) architecture include both simple instructions (e.g. LOAD, or ADD) and complex instructions (e.g. PROGRAM CALL, or LOAD ADDRESS SPACE PARAMETERS). Typical complex instruction-set computers have instructions that combine one or two basic operations (such as “add”, “multiply”, or “call subroutine”) with implicit instructions for accessing memory, incrementing registers upon use, or dereferencing locations stored in memory or registers. As an example to which the invention has particular relevance, see “The z/Architecture Principles of Operation” (Publication Number SA22-7831-04, available from IBM Corporation, Armonk, N.Y.), which is incorporated herein by reference in its entirety. As these computer systems (e.g. IBM System 390, IBM System z9) have become more powerful, larger percentages of the instruction set have been implemented using hardware execution units to increase system performance. Conventionally, the complex functions are implemented in microcode because building hardware execution units to execute them is expensive and error prone. A microcode/microprogram implements a central processing unit (CPU) instruction set. Just as a single high level language statement is compiled to a series of machine instructions (load, store, shift, etc), each machine instruction is in turn implemented by a series of microinstructions, sometimes called a microprogram.
The Extended-Translation Facility 3 (ETF3) is an instruction set introduced on the 113M series of z/990 processors. The z/990 processors (T-Rex GA3) are designed for use in high performance computer servers for data and transaction serving. The z/990 processors and associated computer servers are designed to support both 32 and 64 bit computations, as well as both structured and object oriented programming languages. The ETF3 performs operations on Unicode and Unicode-transformation-format (UTF) characters. The facility consists of six instructions, which are documented in “z/Architecture Principles of Operation” (Publication Number SA22-7832-04, available from IBM Corporation, Armonk, N.Y.), which as previously stated is incorporated herein by reference in its entirety.
However certain ETF3 instructions, and in particular, the CONVERT UTF-16 TO UTF-32 (CU24), CONVERT UTF-16 TO UTF-8 (CU21), CONVERT UTF-8 TO UTF-16 (CU12), and CONVERT UTF-8 TO UTF-32 (CU14) were designed using the Unicode 2.0 Standard. For performance reasons, the implementation of these instructions allows irregular code values to be transformed without detecting an illegal character. This behavior was allowed in the Unicode 3.0 Standard (incorporated herein by reference in its entirety), as stated in definition 32 (D32) on page 46: “For a given UTF, an ill-formed code value sequence that is not illegal is called an irregular code value sequence. To make implementations simpler and faster, some transformation formats may allow irregular code value sequences without requiring error handling. For example, UTF-8 allows nonshortest code value sequences to be interpreted: a UTF-conformant process may map the code value sequence C0 80 (110000002 100000002) to the Unicode value U+0000, even though a Unicode-conformant process shall never generate that code value sequence it shall generate 00 (000000002) instead. A conformant process shall not use irregular code value sequences to encode out-of-band information.”
The Unicode 4.0 standard (incorporated herein by reference in its entirety) substantially restricts the allowable code value sequences. Definition 32 (cited above) is superseded, and the irregular code sequences described therein are now disallowed by definition 36 (D36) as shown in pages 77-78 of the 4.0 standard. It has been observed that the CONVERT UTF-8 TO UNICODE (CUTFU) instruction fails to set condition code 2 for invalid characters as defined in the 4.0 standard. This problem requires changes to be made to the architecture.
The present invention is directed to addressing, or at least reducing the effects of, one or more of the problems set forth above, through the introduction of an enhanced version of ETF3.