The present invention relates to a system and method for defining and processing data types and more particularly relates to type systems used by a compiler and/or runtime environment.
Almost from the beginning, computer programming languages have embodied the notion of data types. Data types include such basic concepts as a character, string, integer, float, and so forth. At its lowest level, data stored in a computer is a simple bit pattern stored in a location of a particular size (e.g., a 32-bit memory location). Data types define the notion of how to interpret the bit pattern. For example, a particular bit pattern in a storage location of a particular size might be interpreted one way if the storage location was deemed to hold a xe2x80x9ccharacterxe2x80x9d and another way if the storage location was deemed to hold an xe2x80x9cintegerxe2x80x9d.
In some computer languages, although the notion of data type exists, few rules are enforced either by the compiler or any associated runtime for mixing of different data types in expressions of a computer program. So no compiler error will be generated in the C programming language, for example, if an integer value is multiplied by a floating point number value. In order to minimize various types of errors, many such languages had built-in type rules that allowed for the implicit conversion of certain data types. In other instances, languages included explicit constructs to xe2x80x9ccoercexe2x80x9d or convert one data type into another data type. Needless to say, although such languages provided great flexibility, certain programming errors could be introduced if care was not taken when mixing data types in various programming expressions.
Strongly typed languages tried to reduce the instances of programming errors by enforcing strict typing rules. In strongly typed languages, a compiler error would be generated when data type mismatches were detected. For example, a compiler error would be generated in Pascal if a programmer tried to assign a character value to an integer variable. This had the effect of reducing certain types of programming errors, but the rules seemed to be too restrictive.
With the advent of object oriented programming languages, the concept of data types took on new meaning. In object oriented languages, objects may typically be represented by an object class hierarchy, where some objects are derived from (or inherit) fields (also referred to as properties) and methods from other xe2x80x9cbase classxe2x80x9d objects. Objects in these languages can be a mixture of fields (typically represented by variables of a particular data types) and methods or functions which allow manipulation of the fields or which provide certain functionality. In addition, object oriented languages also typically include a number of built-in data types, such as float, integer, character, string and so forth, which can be used either as basic variables or as fields in an object. Thus, in Java, for example, a programmer can define a variable of type integer and define an object with fields, one of which is of the xe2x80x9cintegerxe2x80x9d data type.
In object oriented programming languages, there can be different treatment for objects and basic data types. For example, an object with a single property of type integer and a variable of type integer would not be considered to be of the same data type in many object oriented languages, although at the bottom, both simply represent an integer. The variable of type integer simply exists as a bit pattern in a particular storage location with no additional information, while the object has a storage location of the same size and additional information (or xe2x80x9cmetadataxe2x80x9d) that describes how to interpret the value in the storage location.
To provide some sort of equivalency between an object representation and a basic data type representation, the notion of xe2x80x9cboxingxe2x80x9d was conceived. The process of adding metadata to a basic data type representation to yield an object representation is termed xe2x80x9cboxingxe2x80x9d. Similarly, removing the metadata from an object representation to yield a basic data type representation is termed xe2x80x9cunboxingxe2x80x9d. However, even with the development of boxing and unboxing, present compilers and/or runtime systems use a fragmented notion of data types with strict separation between the notion of objects and the notion of basic data type representations. Although this separation has many implications, one area where the implications are quite apparent is in how these languages treat user-defined types.
Even prior to object oriented programming, many, if not most, programming languages had the notion of user-defined data types. These programming languages allowed a programmer to build up new xe2x80x9cdata typesxe2x80x9d from the basic built-in types of the language. For example, a programmer could define a new type xe2x80x9cdata_pointxe2x80x9d as consisting of an x coordinate value of type float and a y coordinate value of type float. Certain object oriented programming languages, like Java, however, do not allow extension of the basic built-in types in this manner. In some such implementations, user-defined types are only allowed in the form of objects. Existing solutions have also failed to adequately address the need for a unified data type system that can be applied during runtime.
The present invention addresses, among other things, a mechanism to avoid the currently fragmented view of data types. The invention also addresses the inefficiencies associated with using basic data types where object types would be more efficient and object types where basic data types would be more efficient.
In accordance with the present invention, the above and other problems are solved by providing a system and method for efficiently processing user-defined data types. The present invention provides for a more unified view of the type system of programming languages, and object oriented programming languages in particular. In the present invention, the type system includes a dual representation for basic data types. One representation is the basic data type representation common to such basic built-in data types. In this application this representation will be referred to as a value type representation, or more simply, a value type. However, unlike other type systems, each of the basic data types also has a boxed representation that exists in the object hierarchy of the type system itself. This dual representation can also be extended to user-defined types, so that user-defined types may exist both as a value type and as an object within the object hierarchy of the type system. This allows the compiler and/or runtime to select the most effective and efficient representation for the data type depending on the particular need at the moment.
In addition to the dual representation of data types, another aspect of the invention allows for the application of rules to determine when to use the boxed representation and when to use the value type (or unboxed) representation of a data type. These rules can be applied, for example, by a compiler and allow, among other things, for implicit conversion between the boxed and unboxed representations of a particular data type.
In another aspect of the invention, the unified view of the type system is reflected in the behavior of virtual methods for objects. One basic feature of objects is that they can inherit methods from xe2x80x9cparentxe2x80x9d objects. Such methods may include methods that take objects as arguments. The dual representation of value types both as value types and as objects in the hierarchy implies that value types can have methods and can behave as objects in some instances and as value types in other instances. Although the details are discussed more completely below, the practical effect is that when value types are in their boxed representation, they can possess type information like other objects. Furthermore, when value types are in their unboxed representation, they can be valid arguments to methods that would otherwise expect an object type (such as a boxed representation). This approach provides entirely new and powerful programming paradigms to developers. Furthermore, since both boxed and unboxed representations are available, all this functionality can be provided without the developer having to explicitly specify in the source code the value type version (i.e., boxed or unboxed) to use or the conversion from one form to another.
In one implementation of the present invention, a unified type system is provided in a runtime environment. A source code file includes an unboxed value type representation. Metadata is associated with the unboxed value type representation for converting the unboxed value type representation into a boxed value type representation. Output code is generated from the compiler converting between the unboxed value type representation and the boxed value type representation in response to a detection of different types in a runtime operation.
In another implementation of the present invention, a method for compiling a source file containing at least one unboxed value type representation is provided. It is determined that the source file includes the unboxed value type representation. Metadata is associated with the unboxed value type representation, responsive to the determining operation. An operation having operands with differing types is specified in the source file. One operand is the unboxed value type representation and another operand is a boxed value type representation. Output code is emitted from the compiler for converting one of the operands match the type of the other operand.
In other implementations of the present invention, articles of manufacture are provided as computer program products. One embodiment of a computer program product provides a computer program storage medium readable by a computer system and encoding a computer program for compiling a source file containing at least one unboxed value type representation. Another embodiment of a computer program product may be provided in computer data signal embodied in a carrier wave by a computing system and encoding the computer program for compiling a source file containing at least one unboxed value type representation. The computer program product encodes a computer program for executing on a computer system a computer process for compiling a source file containing at least one unboxed value type representation. It is determined that the source file includes the unboxed value type representation. Metadata is associated with the unboxed value type representation, responsive to the determining operation. An operation having operands with differing types is specified in the source file. One operand is the unboxed value type representation and another operand is a boxed value type representation. Output code is emitted from the compiler for converting one of the operands match the type of the other operand.
In a further aspect of the invention, the notion can be combined with a runtime or execution environment to produce a unique runtime environment that supports value types, object classes, and interfaces.
These and various other features as well as advantages, which characterize the present invention, will be apparent from a reading of the following detailed description and a review of the associated drawings.