1. Field
The present application relates to databases, particularly relational databases and RDBMS (Relational Database Management System) implementations. The present application also relates to databases or systems used to provide directory services, such as X.500 service directories and LDAP service directories.
2. Description of the Related Art
It is considered that there is an ongoing need to provide systems which have improved performance over prior art systems, and in particular improved performance in the features of speed, performance and/or accuracy. Certainly, users seem to constantly seek an ever increasing improvement in speed, performance and accuracy.
It is not uncommon to have hundreds of thousands, or even millions, of entries in a database. Users of applications that use database systems that store a large number of entries, such as service directories, complain that the performance (execution) of such applications is often slow. A common complaint is that if a search is performed on such a large database, the search may take seconds, minutes or hours to provide a result. From a user's perspective, this sort of delay is considered irritating. Users prefer to have little, if any, delay in operating a database system.
This perception of lack of performance or lack of speed is especially noticeable when executing a search instruction/service. In essence, the problem is that in executing a search instruction/service, the application has to take a complex directory query, such as a complex X.500 directory service query, and map that query into SQL instructions. The SQL performed is based on a table design, and these tables may use metadata. Even though the step of mapping the query is possible, conventionally the mapping is complex or produces relatively complex SQL expressions to be executed by the database application. In general, the more complex the SQL instructions, the slower the performance. Nonetheless, users want the instruction/service executed in such a way that the system exhibits higher performance.
Part of what aggravates the performance problem is that, in a metadata design, the tables tend to be very large tables. The larger a table grows, the slower the application's perceived performance. Also, if the tables are large and a complex query joins a number of the large tables the perceived performance problem may get even worse.
To illustrate this problem, an example will be described of the mapping of a relatively complex X.500 query into SQL. The problem with the mapping is that X.500 and LDAP basically gives a query language that involves AND's, OR's and NOT'S. SQL also gives a query language which involves AND's, OR's and NOT's. A complex SQL expression may use one or more joins and/or subselects. When there are a number of joins and/or subselects, the query may take a relatively long time to evaluate. There may be a number of ways to write an SQL query and if it only contains joins and not subselects then it is often termed a “flattened” query.
The illustration will be made in respect of a relatively simple example. Looking at FIG. 1, a search for an X.500 query using LDAP notation is expressed in expression 101. The search example uses a filter to look for a common name=Rick, AND surname=Harvey. The figure shows a search table 102 and also an attribute table 103 for reference. Lets suppose that attribute identifier (AID) number 3 is for the common name (cn) and AID number 4 is for the surname (sn). In the search table there are a number of AIDs. The Rick Harvey entry is EID (Entry Identifier) 101 with AID 3, corresponding to a common name ‘RICK’ and also EID 101 has ‘HARVEY’ for the normalised value. There is also another AID 3 in the table for John Smith, John who has EID 200 and Smith who has EID 200. In doing a search for Rick AND Harvey, the search will try to find all objects (EIDs) who have AID=3, norm=RICK, and AID=4, norm=HARVEY.
So, in essence, the search wants to select entries ‘Rick’ and ‘Harvey’.
One way of doing this search is by using a subselect (or nested query). The SQL required for this is:
Select EID from search
Where AID=3 AND NORM=‘RICK’
AND EID IN                (select EID from SEARCH.                    where AID=4 AND NORM=‘HARVEY’)                        
In this nested query, the clause in brackets is a subselect. The subselect is evaluated corresponding to where AID=4 and Norm=HARVEY and the resulting list of EIDs is saved. Then the outer clause is evaluated corresponding to where AID=4 and Norm=HARVEY such that the list of EIDs returned is also in the list of EIDs previously saved.
The problem with the sub-select is that if there are many, many ‘HARVEY’s a huge set will be built and there may not be many ‘RICK’s and thus this query will be lopsided and may take a long time to evaluate.
Another way of doing this search is by using a join (or flattened query). The SQL required for this join is:
Select S1.EID from search S1, search S2
Where S1.AID=3.AND S1.NORM=‘RICK’
AND S2.AID=4 AND S2.NORM=‘HARVEY’
AND S1.EID=S2.EID
The result is that if table S1 has a million entries, and table S2 has a million entries, the search may be conducted over a huge combination of entries when the tables are joined. As can be seen, even for this quite simple search/instruction, performance of an application can be severely diminished in a relatively large database. However, usually a join version of a query will be faster than a subselect.
The sub-select is equivalent to the join, and in fact many prior art database applications may convert a sub-select into a join. However, this may be too difficult if there is more than one level of nesting or the where clause is too complex.
A further example will now be discussed where a search involves a ‘NOT’ instruction. In other words, in this example a filter asks for ‘not equal’. We are looking for ‘RICK’ NOT ‘HARVEY’ that is common name is Rick and surname not equals Harvey.
The nested query, in SQL would be:
Select EID from search
Where AID=3 AND NORM=‘RICK’
AND EID NOT IN                (select EID from search        where AID=4 AND NORM=‘HARVEY’)        
The flatten version of the above query may be accomplished with an outer join. An outer join is considered to be quite complex and relatively slow in execution. An outer join for this search would be something like:                SELECT S1.EID FROM                    (SEARCH S1 LEFT JOIN SEARCH S2                            ON S1.EID=S2.EID                AND S2.AID=4)                                                Where S1.AID=3                    AND S1.NORM=‘RICK’AND (S2.NORM< >‘HARVEY’)                        
Again, the above example relates to a relatively simple search for ‘Rick’ and not ‘Harvey’; in other words, a search of sets involving set ‘A’ and NOT set ‘B’ i.e A.!B.
If we look at a search involving more complex searches, such asA.(B+C.!D),  expression 201the SQL for this query would be (in abstract):
SELECT ‘A’                AND EID IN                    (SELECT ‘B’ OR                            (SELECT ‘C’ AND EID NOT IN                                    (SELECT ‘D’)))                                                                                
Note that the above query is very difficult to flatten into an expression that involves only joins. Also, note that “!D” means “NOT D”