1. Field of the Invention
The present invention relates to the usage pattern, commonly found in many software applications, of defining sets of objects based on object attributes. A specifically designed set definition language for defining sets, abbreviated SDL, is described as well as a software system that implements this language efficiently on top of standard relational database management systems (RDBMS).
2. Description of Related Art
Without a doubt, the most common query language is the SQL language that is implemented in most relational database systems (RDBMS). However, although SQL is a very powerful language it is too complex for many users because of its versatile nature and the need for the user to know the underlying data schema. The aim with the invention SDL language is to (i) define a language that is powerful enough to allow users to define sets based on multiple criteria, (ii) define a language that is easy to use, and (iii) define a language that is easier to learn than SQL which does not require the user to know the underlying data schema. The SDL system is invented in the life-science domain, nevertheless, it has a much wider applicability. Similar attempts have been made before in that domain, such as with the health query language (HQL see www.clinical info.co.uk/miquest.htm), where instead of eliminating the need for schema knowledge, the schema was kept fixed, hence allowing for certain simplifications in the language as compared to SQL. Thus, HQL is considered simple enough such that the average medical doctor can use it for epidemiological studies. HQL is however both a language for defining patient sets and calculating statistics on those sets, whereas SDL leaves that task to more standard on-line analytical processing (OLAP) systems.
Systems that aim at providing query capabilities for users while hiding the complexity of formal query languages such as SQL are not new. For instance, tools have been devised to facilitate the generation of SQL queries with a graphical support (GUI), such as by Shaw et.al., “Apparatus and Method for Synthesizing a Query for Accessing a Relational Database”, U.S. Pat. No. 4,506,326. Unlike SDL, this system is purely graphical and generates QBE syntax that is then translated into SQL. Other systems of similar nature, such as Business-Objects (BO), see Cambot and Liautaud, “Relational Database Access System Using Semantically Dynamic Objects”, U.S. Pat. No. 5,555,403, have been made to hide the SQL and the database schemas from the users. Like the system mentioned previously, BO does not have its own language and is as such GUI based. However, in addition to the former system it also provides an abstraction on top of the SQL metadata, the so-called business objects and rules for building database joins, given a list of such objects in a report. Database abstraction of this nature goes further back. In the work of El-Sharkawi et.al. “Architecture and Implementation of ENLI: Example-Based Natural Language Assisted Interface” Proc. of PARBASE-90, Miami, Fla., Mar. 6–9, 1990, pp.430–432, the authors use an English sentence to describe the meaning of each database attribute to build an English-schema. Their English-like query language is then translated into QBE before it is mapped to SQL. Other systems of similar nature are knowledge based visual query systems that map a knowledge base onto a relational database system. A paper by K. L. Siau et.al., “Visual Knowledge Query Language as a Front-End to Relational Systems,” Proceedings of the 15th Annual International Computer Software and Applications Conference, 1991, pp. 373–378, Tokyo: IEEE Computer Society Press, describes a knowledge abstraction based on an enhanced entity-relationship model (EER). The system they present is a GUI-based application that uses an EER-based knowledge schema and a visual knowledge query language (VKQL) that is mapped to SQL for query evaluation.
Although the invention SDL system shares some aspects with the above systems, there are also clear distinctions. First, the SDL system is based on a new language as well as novel definitions of virtual relations and dimension attributes. A new and unique feature of a set-output-dimension enables implicit relational equi-joins and a very sparse and intuitive syntax. Thus in the present invention the SDL dimensions, the virtual relations and the SDL language define the relational algebraic constraints, as compared for instance to BO where the join-rules for objects are defined in explicit metadata. Also, the BO system is purely GUI based whereas the invention SDL system is centered on the SDL language. The SDL query tools use English descriptions for dimensions, i.e. a metadata on the dimensions. The descriptions are not part of the SDL language, however, they can provide a more user-friendly, English-like feeling for the SDL expression, as well as an easier view on the dimension metadata. Note that the dimensions can also have very descriptive names as long as there is no naming conflict. This can however result in longer SDL expressions. A major difference in the present invention SDL system from all of the above systems is its focus on sets and its approach for defining the sets. In the SDL language, a set is defined without any connection to the view that is used to present the elements in the set. These set views can be textual reports or graphical in nature. Finally, the SDL language is very true to its relational origin and can therefore be integrated with the SQL language, as shown in the present invention, thereby generating a new language that Applicants refer to here as SSDL.
Another language or protocol with similar aim as the invention SDL is the lightweight directory access protocol (LDAP), see S. Shi, E. Stokes, D. Byrne, C. Corn, D. Bachmann, and T. Jones, “An enterprise directory solution with DB2”, IBM Systems Journal, 39(2): 360–383, 2000. Common to them is the set definition pattern, however, LDAP uses very different syntax than SDL aimed at application programmers and unlike SDL, LDAP also provides access control methods to its data. Also, their data structures implementation, i.e. their approach for storing data with arbitrary attributes in a RDBMS is very different from the one presented in this invention. Implementation with data structures that are closer to the one presented here for SDL, although for strikingly different languages such as Smalltalk, can be found in work by B. Czejdo et.al., “Integration of Database Systems and Smalltalk”, in Proceedings of Symposium on Applied Computing, Kansas City, 1991. Recently there has also been a large effort in defining XML query languages and work to map them into SQL. See Florescu, D. and D. Kossman, “A Performance Evaluation of Alternative Mapping Schemes for Storing XML Data in a Relational Database”, Technical Report, INRIA, France, May 1999 or F. Tian et.al. “The Design and Performance Evaluation of Alternative XML Storage Strategies”, ACM Sigmod Record, vol. 31(1), 2002 Some of the storage strategies described in the above references are similar to the one used in the present invention, however, SDL is significantly different from all these languages.
There are certain features in the SDL language, such as primary and virtual dimensions, that have some correspondence to object oriented techniques (see R.G.G. Cattell, “Object Data Management: Object-Oriented and Extended Relational Database Systems”, 1994, Addison Wesley Publishing Company, Inc.). In object schemas that utilize composition or aggregation, it is a common technique in object-oriented languages to de-reference references with path expressions (cascaded dot notation). Object aggregation is also referred to as implicit joins, for reasons that become obvious by reading the discussion on virtual relations. However, these implicit joins are not what Applicants refer to as implicit joins in the present invention. In these teachings, implicit join refers to the equi-join constraint that is generated based on the output-dimension in SDL set definitions. Although related, this is different from the implicit join that results from path expressions.
A relatively recent comparison of relational query languages (RQL) and object query languages (OQL) by Brown, S. A., “The Semantics of Database Query Languages”, PhD dissertation, University of Sheffield, UK, 1999, points out important semantic differences between OQL and RQL in “grouped” queries. Although SDL has some resemblance with OQL (virtual dimensions), it is strictly connected to the relational model and mathematical aggregate operators behave identical in SDL as SQL. SDL is thus closer to object-relational extensions of SQL (SQL3). For the equivalence of the output-dimension in SDL, both SQL and OQL require the definition of a cursor, thus those languages don't support implicit join, in the meaning of the word in the present invention, although they support de-referencing and path expressions.
Another technical point that emphasizes the difference between SDL and OQL/SQL is that in SDL multiple virtual dimensions can be combined in a record-operator, thus constraints on multiple attributes is possible in SDL even though “path expressions” are used. It is impossible to use path expressions in OQL/SQL to refer to multiple attributes simultaneously without introducing an intermediate cursor or the use of succinct notation (see M. Stonebraker. “Object-relational DBMSs: the next great wave”, Morgan Kaufmann Publishers, Inc. 1996).
Accordingly, there are other differences between SDL and both OQL and SQL, both in terms of their language structure as well as the underlying data models.