Almost from the beginning, computer programming languages have embodied the notion of data types. Data types include such basic concepts as a character, string, integer, float, and so forth. At its lowest level, data stored in a computer is a simple bit pattern stored in a location of a particular size (e.g., a 32-bit memory location). Data types define the notion of how to interpret the bit pattern. For example, a particular bit pattern in a storage location of a particular size might be interpreted one way if the storage location was deemed to hold a “character” and another way if the storage location was deemed to hold an “integer”.
In some computer languages, although the notion of data type exists, few rules are enforced either by the compiler or any associated runtime for mixing of different data types in expressions of a computer program. So no compiler error will be generated in the C programming language, for example, if an integer value is multiplied by a floating point number value. In order to minimize various types of errors, many such languages had built-in type rules that allowed for the implicit conversion of certain data types. In other instances, languages included explicit constructs to “coerce” or convert one data type into another data type. Needless to say, although such languages provided great flexibility, certain programming errors could be introduced if care was not taken when mixing data types in various programming expressions.
Strongly typed languages tried to reduce the instances of programming errors by enforcing strict typing rules. In strongly typed languages, a compiler error would be generated when data type mismatches were detected. For example, a compiler error would be generated in Pascal if a programmer tried to assign a character value to an integer variable. This had the effect of reducing certain types of programming errors, but the rules seemed to be too restrictive.
With the advent of object oriented programming languages, the concept of data types took on new meaning. In object oriented languages, objects may typically be represented by an object class hierarchy, where some objects are derived from (or inherit) fields (also referred to as properties) and methods from other “base class” objects. Objects in these languages can be a mixture of fields (typically represented by variables of a particular data types) and methods or functions which allow manipulation of the fields or which provide certain functionality. In addition, object oriented languages also typically include a number of built-in data types, such as float, integer, character, string and so forth, which can be used either as basic variables or as fields in an object. Thus, in Java, for example, a programmer can define a variable of type integer and define an object with fields, one of which is of the “integer” data type.
In object oriented programming languages, there can be different treatment for objects and basic data types. For example, an object with a single property of type integer and a variable of type integer would not be considered to be of the same data type in many object oriented languages, although at the bottom, both simply represent an integer. The variable of type integer simply exists as a bit pattern in a particular storage location with no additional information, while the object has a storage location of the same size and additional information (or “metadata”) that describes how to interpret the value in the storage location.
To provide some sort of equivalency between an object representation and a basic data type representation, the notion of “boxing” was conceived. The process of adding metadata to a basic data type representation to yield an object representation is termed “being”. Similarly, removing the metadata from an object representation to yield a basic data type representation is termed “unboxing”. However, even with the development of boxing and unboxing, present compilers and/or runtime systems use a fragmented notion of data types with strict separation between the notion of objects and the notion of basic data type representations. Although this separation has many implications, one area where the implications are quite apparent is in how these languages treat user-defined types.
Even prior to object oriented programming, many, if not most, programming languages had the notion of user-defined data types. These programming languages allowed a programmer to build up new “data types” from the basic built-in types of the language. For example, a programmer could define a new type “data_point” as consisting of an x coordinate value of type float and a y coordinate value of type float. Certain object oriented programming languages, like Java, however, do not allow extension of the basic built-in types in this manner. In some such implementations, user-defined types are only allowed in the form of objects. Existing solutions have also failed to adequately address the need for a unified data type system that can be applied during runtime.
The present invention addresses, among other things, a mechanism to avoid the currently fragmented view of data types. The invention also addresses the inefficiencies associated with using basic data types where object types would be more efficient and object types where basic data types would be more efficient.