Query languages are computer languages employed to interrogate data sources. Such languages are typically classified as either database query languages or information retrieval query languages. Examples of query languages include SQL (Structured Query Language), T-SQL (Transact-SQL), eSQL (Entity SQL), XQuery (Extensible Markup Language Query), LINQ (Language Integrated Query), MDX (Multidimensional Expressions), and DMX (Data Mining Extensions), among others.
Testing a query language is a complex and challenging task. A tester must determine a representative subset of queries, which must be tested from an infinite set of valid queries. Once the representative set of queries is determined, the tester is faced with the task of verifying the results returned when the queries are executed.
A query expression is composed of a series of clauses that apply successive operations on data. A SQL query includes, at minimum, a target (“FROM”) clause and a projection (“SELECT”) clause. Additional clauses such as “WHERE” and “ORDERBY” are used to filter and shape the data returned when the query is executed.
When testing a query language, testers must not only test each clause, but they must also test any modifiers the clause supports (e.g., SELECT TOP, SELECT DISTINCT . . . ), and they must test the clauses in combination with each other. Add supporting features built into the language such as functions (e.g., SELECT Count(C) FROM . . . ), relational algebraic operators (e.g., FROM Customers JOIN Products) and sub-queries in expressions (e.g., SELECT Name FROM (SELECT Id, Name FROM Customers)), and the test matrix explodes. In fact, the possibility of sub-queries in expressions makes the set of valid queries infinite. These factors combine to make selecting the correct representative set of queries that must be tested a daunting task.
Once the tester has determined the set of queries to test, the next problem he/she will face is results verification. Even if you want to test a single query, which includes the minimal unmodified target and select clauses, verification of results is very difficult.
The choice of the data to test against is another important concern for the tester. If the data in the test database always returns empty results for the tested queries, only a small aspect of the query has been tested. For maximum coverage, data should be generated with the tested queries in mind.
Conventionally, testing approaches are custom for a query language leveraging expertise of a development team. Many such approaches have been a around for decades including crafting hard-coded query strings, baselining results, and sprinkling query knowledge throughout a test query. These conventional approaches have proven to be an expensive investment in terms of both upfront and maintenance costs as well as limited in coverage.