Like any other programming task, writing queries is error-prone, and it is helpful if the programming language in which queries are expressed gives assistance in identifying errors while queries are being written, and before they are executed on a database. Programming languages often provide “types” for that purpose, indicating for each variable what kind of value it may hold. A programming language for expressing queries is usually called a “query language”. The most popular example of a query language is SQL.
In SQL, and in most other query languages in prior art, types are assigned to each variable separately. As a consequence, very few errors are caught before the query is executed on a database. The only kind of error found is when an operation does not make sense: for instance, a string cannot be subtracted from an integer. In particular one cannot predict accurately (without running the query), whether a query will return any results or not. And yet, a query that returns no results is the most common symptom of a programming error.
In the logic programming community, some attempts have been made to construct type checkers that detect queries where there are no results at all, regardless of the contents of the database being queried. None of these attempts precisely tracks the dependencies between variables, however, and therefore they are suboptimal: there are many queries for which one could prove that no results are returned, and yet the type checkers do not find the errors. It is very confusing for users that some errors of this kind are caught, and others are not.
In the theoretical database community, there has been some work on proving containment between queries, but this is typically restricted to unnatural fragments of the query language. Furthermore these works do not take advantage of the type hierarchies that typically exist on data stored in a database.
What is desired, therefore, but not yet accomplished before the present invention, is a system and method of computing types for queries that accurately approximate the actual set of results, taking the type hierarchy into account.