Programming languages need to continuously evolve to help programmers cope with complicated applications. These evolutionary steps are typically quite modest; most commonly, the provisioning of better or reorganized APIs (Application Program Interfaces). Occasionally, a more radical evolutionary step is taken. One such example is the addition of generic classes to languages such as both Java and C#.
The time has come, however, for another large evolutionary step to be taken. Much software is now intended for distributed, web-based scenarios. It is typically structured using a three-tier model consisting of a middle tier containing the business logic that extracts relational data from a data services tier (a database) and processes it to produce semi-structured data (typically XML-eXtensible Markup Language) to be displayed in the user interface tier. These middle tier applications are most commonly written in an object-oriented language such as Java or C# and have to deal with relational data (essentially SQL (Structured Query Language) tables), object graphs, and semi-structured data (e.g., XML, HTML).
Unfortunately support for such data access has barely evolved at all. All that exists is naive access via simple APIs. Consider the following fragment of Java that uses JDBC (Java DataBase Connectivity—which is an API that lets a Java application access a database via SQL) to query a SQL database (the user-supplied country is stored in variable input).
Connection con=DriverManager.getConnection( . . . );
Statement stmt=con. createconnection( );
String query=“SELECT * FROM COFFEES WHERE Country=‘“+input+” ’”;
ResultSet rs=stmt.executeQuery(query);
while (rs.next( )) {
String s=rs.getString(“Cof_Name”);
float n=rs.getFloat (“Price”);
System.out.println(s+“-”+n>;
}
Using strings to represent SQL queries is not only clumsy but also removes any possibility for static checking. The impedance mismatch between the language and the relational data is quite striking; e.g., a value is projected out of a row by passing a string denoting the column name and using the appropriate conversion function. Perhaps most seriously, the passing of queries as strings is often a security risk (the “script code injection” problem—e.g., consider the case when the variable input is the string “'OR 1=1 −”.
The future of e-commerce is largely dependant on development of what are referred to as Web Services, which are Internet-based APIs that provide valuable functions or services for users. For example, Microsoft Passport® is a Web Service that facilitates user interaction by transferring user profile information to designated websites. The broad idea behind Web Services is to loosely couple heterogeneous computer infrastructures together to facilitate data transmission and computation to provide the user with a simple yet powerful experience.
A significant component in functionality of Web Services is programmatic interaction with web data. However, the world of web data is presently quite disjunctive. In general, there are three major components that make up the world of web data relational data (e.g., SQL), semi-structured data (e.g., XML), and a runtime environment. FIG. 1 illustrates a Venn diagram 100 that depicts a conventional web data world. A popular method of implementing a relational data model is by means of SQL that facilitates accessing data of a relational database system which is typically stored in tables. An accepted standard for semi-structured data is XML. XML is a World Wide Web Consortium (W3C) standard language that describes data via a schema or Document Type Definition (DTD). XML data is stored through the use of tags. A runtime environment is a general-purpose multilanguage execution engine (e.g., Common Language Runtime (CLR)) that allows authors to write programs that use both relational data and self-describing data.
However, in common with the situation with relational data access, there is also an impedance mismatch between looseness of the “document world” from which XML evolved, and a more structured world of object-oriented (OO) programming languages, which dominate the applications world. Bridging these two worlds today is conventionally accomplished by employing specialized objects that model the XML world called “XML Document Object Model,” or by “XML Serialization” technologies, which intelligently map one world into the other at runtime. However, these bridging mechanisms are often cumbersome and/or limited in functionality.
Object-oriented languages like C++, Java, and C# provide a way of defining classes and/or structs, and then constructing instances of those types via “constructors” using the “new” operator. The objects being constructed and the arguments being passed to the constructors are all strongly typed. These languages usually also provide convenience mechanisms for initializing simply homogeneous arrays of objects. These constructs are designed to make programs written in these languages run fast.
XML, on the other hand, provides syntax for describing heterogeneous graph(s) of data where typing rules (usually called “schema validation”) are entirely optional and loosely bound to those type instances. Furthermore, the XML schemas associated with those documents can describe more complex structures with sequences, choices, unbounded type collections, and a combination of typed and untyped data using constructs like <xsd:any/> and <xsd:anyAtrribute/>. These constructs are designed to allow a loosely coupled architecture that minimizes hard dependencies between different parties that make up a complex distributed system and have proven to be the only way to make distributed systems scale up to a level of complexity required for today's interconnected business systems.
Seamless integration of data-access in an OO host language is an extremely tricky problem and many people have attempted to solve this problem in the past with varying degrees of success. At the heart of the problem are three different and distinct type systems: the semi-structured XML that is used to describe data elements on web page and business-to-business documents; the SQL language, that is used to interrogate and process data in a relational database; and, the CLR, which are OO services and security services that applications can use.
Dealing with the complexity of these disparate models is a major pain for programmers today, since mainstream programming languages like C, C++, VB, C#, or Java simply do not know anything about relational or semi-structured data, yet programmers need to deal with all three data models at once.
Most programming languages do not provide an integrated view of these three worlds, but typically provide a “hands off” API to access one domain from the other. However, data integration via APIs has reached its limits. Alternatively, various methods of so-called data-binding have been explored where concepts from an XML or relation world are mapped onto the OO world. However, without type-system and language extensions these attempts will only be of limited value because of the size of the impedance mismatch they are attempting to bridge.
Unfortunately API support in both Java and C# for XML and XPath/XQuery is depressingly similar. XPath has been widely used in the XML community as a query language to navigate and retrieve from an XML data source. Furthermore, XQuery uses XPath as its query language to retrieve data from an XML data source.
Due to the increasingly complex nature of software systems, programmers have been riddled by undetectable programmatic errors that oftentimes do not manifest until too late. Developers continue to try to expand power of programming languages by incorporating complex mathematical and philosophical concepts. Additionally, the software market is becoming increasingly platform independent and service oriented. Combining powerful object-oriented programmatic concepts into the new data centric and service based world causes programmers problems as they try and piece together best parts of a multitude of different technologies in an ad hoc fashion.
Type systems are a formal mechanism for ensuring that typed programs perform correctly and in a well-behaved manner. Typed programs or typed systems are generally programs or systems that assign types to variables (e.g., Boolean, integer, real, etc.) or objects. Types are classifications of data that describe how a programmer wants to use the data and how a compiler should interpret such data. However, many functions are only defined to work on particular types (e.g. integer addition or floating point addition). If a given function is defined to work with a certain data type and it receives a different type of data, a type error will be produced. A type system can prevent certain execution errors by utilizing a type-checking algorithm to determine whether a program is well behaved or ill behaved. This process is referred to as type checking. Type checking allows for early detection and therefore correction of errors that may often go undetected by programmers. If such errors are left uncorrected they may lurk in the code, only to become manifestly obvious at a most inopportune time.
In general there are two varieties of type systems—nominal and structural. A nominal type system is a system in which type names are used to determine whether types are equivalent. In a structural type system, names are not essential, because types are said to be equivalent if they have the same structure, as the name suggests. For example assume Type A=string of integers (1 . . . 10) and Type B=string of integers (1 . . . 10). Further assume that a is of Type A, b is of Type B, and the expression a=b is written into the same program. Under a nominal type system this expression would produce an error because a and b are of different types. Under a structural type system such an assignment would be legal because the types are equivalent.
There is an unmet need for common OO languages to evolve to support data access associated the rich structure of both relational and semi-structured data.