Type inference is the process whereby a compiler tries to determine the types of expressions and variables, given an inevitable lack of complete type information. The compiler is inferring the missing type information based on contextual information of the input data stream. Difficulty can be with arithmetic operators, for example, where the same syntax is used for integer and floating point arithmetic operations. This creates a problem for type inference because it is not possible to unambiguously reconstruct type information for a function such as fn n=>n+n. This simplistic function is ambiguous because there is no way to tell whether the addition operation is integer or floating point addition. For example, this expression can be interpreted as abbreviating fn n:int=>n+n, with type int→int, or fn n:real=>n+n, with type real→real.
In some cases the surrounding context is used to determine what is meant. A related source of difficulty is a “sharp” notation for records. Absent information from the context, the type of a function that uses these notations cannot be determined. Therefore this function will be rejected as ambiguous because there is insufficient information to determine the domain type of the function.
The above examples illustrate situations where ambiguity leads to difficulties; however, it cannot be concluded that type inference will fail unless the missing type information can be uniquely determined. In many cases there is no unique way to infer omitted type information; however, there can be a best way.
One of the main advantages of XML (eXtensible Markup Language) is that documents can be processed without a priori knowing their exact schema. However, manipulating languages such as C# or Java force programmers to use a verbose, interpretative, and computationally inefficient programming model to access such untyped documents, as represented in the following code.
XmlDocument b = new XmlDocument( ); b.Load(...);string s = b.GetElementsByTagName(“Title”)[0].Value;
If the schema or type of a value is known, it is desired to provide more efficient access to parts of the value, that is, it would be desirable to compile access patterns assuming type information. Without assuming schema information, values have to be represented using some form of universal representation, and access is necessarily interpretative. When the schema of the document is known at compile time, a set of classes can be generated that correspond to the schema, the document can be deserialized into an object graph, and programmed against the document in a concise, strongly typed, and computationally efficient manner, as represented by the following code.
Book b = new XmlSerializer(typeof(Book)).Deserialize(...);string s = b.Title;
However, this has shortcomings, in that, in many cases there is no schema available at compile time, and one is forced to use the interpretative approach. Databases and contemporary programming languages such as C# and Java deal very poorly with non-static types. Scripting languages such as Perl, Python, Ruby, PHP, Groovy, deal well with dynamic types but at the cost of not being robust and not scaling well to large software systems. Thus, there is a substantial unmet need in the prior art for a mechanism that provides improved data access across dynamic and statically typed languages.