A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever.
The invention disclosed herein relates to object-oriented programming and more particularly, to techniques for efficiently supporting multi-method dispatching in certain object-oriented programming languages.
The use of object-oriented (OO) computer programming languages has become the norm for software development. Standard languages such as C++ and Smalltalk that embody some of the basic tenets of OO programming are highly popular. In brief, object-oriented programming involves the use of program building blocks called objects or classes, which are modules of data and processing, and for which instances may be created during execution of a program. The processing performed on data by an object is usually represented in one or more methods within the object, the methods being procedures with several arguments. Computation in OO programs proceeds by invoking methods on class instances.
Determining the correct method definition that gets applied on any such method invocation constitutes the so-called dispatching problem. The correct method definition is determined by the dynamic types or arguments of the class instances involved at the moment of invocation.
Methods in currently popular OO languages such as C++ and Smalltalk are mono-methods, that is, each method must have a unique name to avoid ambiguity with methods in other objects. In contrast, some new generation OO languages such as CommonLoops and PolyGlot support multi-methods, that is, two or more methods which have the same name and may otherwise be related such as by type or inheritance of procedures. Multi-methods are generally considered to be more expressive and hence more powerful, more natural, more readable, and less error-prone than mono-methods.
However, multi-methods raise ambiguities when methods are invoked in a program. This problem is sometimes referred to as the multi-method dispatch problem. As a result, the suitability of multi-methods in practice is in question since supporting multi-methods as opposed to single dispatching is believed to be space- and time-intensive. That is a serious concern because user programs spend considerable amount of time in dispatching.
As a simplified example of the multi-method dispatching problem, two methods may be defined both having the same name account. The methods each have two argumentsxe2x80x94name and type. One method is defined as account (name : :  less than customer greater than , type : :  less than markets greater than ), and performs certain functions when passed instances of  less than customer greater than  and  less than market greater than . The second method is defined as account (name : :  less than person greater than , types : :  less than mutual fund greater than ), and performs other functions when passed instances of  less than person greater than  and  less than mutual fund greater than , which is a more specialized type than  less than market greater than . The second method may even inherit some functionality from the first method.
If the account method is invoked with arguments which are clearly instances of the classes defined in one method, that method is easily selected. However, if arguments are present in a method invocation from different methods, an ambiguity arises which may or may not be resolvable. In this case, an ambiguity could not be resolved without providing a third method having arguments of the general classes name : :  less than person greater than  and type : :  less than market greater than . The third method would be selected in a mixed multi-method invocation.
As even this simplified example shows, multi-method dispatching requires substantial support to work effectively. The dispatching problem becomes extremely difficult to resolve as the numbers of methods and arguments increases, as they would in most useful programs.
A more formalized description of the multi-method dispatching problem is presented below to provide a better understanding of the solution provided by the present invention.
Let T be a rooted tree denoting the class hierarchy with classes as nodes. This tree defines a partial order  on the set of classes, that is, A  B if and only if class A is a descendant of class B. A set of methods is also defined on T, which are procedures with classes as arguments. Any predeclared subset of the arguments of a method may be involved in the dispatching operation and these are called the dispatchable arguments; in what follows, only the dispatchable arguments are listed and others ignored. For example, s(A1,A2) is a method with name s, dispatchable argument classes A1 and A2, and possibly additional arguments. The set of methods has the property that the same method name (here s) may appear many times but the methods are defined for distinct argument lists. The hierarchy and set of methods may be preprocessed.
Solving multiple dispatching involves resolving all the method invocations in the program, either at the run time or at the compile time depending on the OO language. Each method invocation is a query of the form s(A1,A2 , . . . , A4) where s is a method and A1 . . . , Ad are class instances of the d dispatchable arguments in that order. For any such query, a method s(Bxe2x80x21, Bxe2x80x22, . . . , Bxe2x80x2d) is applicable if and only if AiBi, for all i=1, . . . , d (note that method names must coincide). The most specialized method among the applicable ones needs to be retrieved, say s(Bxe2x80x21, Bxe2x80x22, . . . , Bd), such that ∀i, Bxe2x80x2iCi for every other applicable method s(C1, C2, . . . , Cd). It is possible that two methods, e.g., s(Bxe2x80x21, Bxe2x80x22, . . . , Bxe2x80x2d) and s(Bxe2x80x31, Bxe2x80x32, . . . , Bxe2x80x3d), are both applicable and neither is more specific than the other (this is called an ambiguity), that is, Bxe2x80x2iBxe2x80x3iBxe2x80x3jBxe2x80x2j and for some indices 1xe2x89xa6i,jxe2x89xa6d. For each query, the goal is to detect the ambiguity if any and to find the most specialized method otherwise. If no applicable method exists, NIL is returned. This is referred to as the d-ary dispatching problem henceforth.
The case of d=1 is the single dispatching problem faced in currently popular OO languages such as C++ and Smalltalk in which all methods have only one dispatchable argument, even if they might have a number of other arguments. The case d greater than 1 arises in the next generation of OO languages, as explained above. The relevant parameters are n, the number of nodes in the class hierarchy, M, the number of distinct method names, m, the number of methods in all, and d, the number of arguments involved in the dispatching operation. Note that mxe2x89xa6Mnd and typically it is much smaller, and Mxe2x89xa6m trivially.
Some results are known and are summarized in Table 1 below. The d=1 case is illustrative. One solution is to tabulate the answers to all possible queries and look up the solution when needed; this is referred to herein as Algorithm A. This uses O(nM) table space and takes O(1) time per query but the space is prohibitively large for existing hierarchies. To cope with this, known practical solutions use the following observation: many s(A) queries return NIL. Thus they employ various heuristics to compress the non-NIL entries of the table so the query time remains small. The best known theoretical bound for the d=1 case is linear space, i.e., O(n+M) and O(log log n) query time. Another algorithm, referred to herein as Algorithm B, is described in S. Muthukrishnan and W. Muller, Time Space Tradeoffs for Method Look-up in Objected Oriented Programs, Proc. 7th ACM Symp. On Discrete Algorithms, 1996, which is hereby incorporated by reference into this application.
For the d greater than 1 case, Algorithm A uses table space O(Mnd). Thus there is a combinatorial explosion in the space used as d increases. Since even the d=1 case uses prohibitive amount of space, this solution is clearly useless for dxe2x89xa72. However, the d=2 case is not well understood and effective heuristics for table compaction schemes or for using proper automaton to prune the search space are still open.
Best known theoretical results are summarized in the following Table 1.
There is therefore a need for a method for solving the multi-method dispatch problem in an efficient manner.
In accordance with the invention, the multi-method dispatching problem is reduced to geometric problems on multi-dimensional grids and new data structures are designed to solve the resulting geometric problems. The multi-method dispatching problem is mapped to a geometric problem referred to as the point enclosure problem, in which each possible method invocation is mapped to a n-dimensional rectangle, where n is the number of arguments in the methods, and a particular method invocation in a program is mapped to an n-dimensional point. The smallest rectangle which encloses the point is the most specific method to use for the given method invocation.
In particular, methods having the same name are mapped to a set of rectangles based on a pair of numbers associated with each argument. The pair of numbers is an interval or set of first and last numbers identifying the position of the argument in a class hierarchy tree. The interval may be found by computing an Euler Tour of the class hierarchy tree, the Euler Tour being a technique for sequentially assigning numbers to each node in the class tree as the tree is systematically traversed.
For a given method invocation in an OO program, the method invocation is mapped to a point based on one of the numbers in the interval associated with each argument in the invocation. The problem of finding the most specific method for the method invocation is thus transformed into the so-called point enclosure problem in geometry, in which the smallest rectangle is found which encloses a given point.
To help find efficient solutions to the point enclosure problem, the set of rectangles is broken into a number of subsets having certain geometric properties and stored in efficient data structures. Queries are performed on the various data structures to find the smallest or minimal rectangle, if any, in the various subsets. The result is either the identification of the minimal rectangle overall, or of an ambiguity requiring resolution by the programmer.
The geometric results described herein have other applications as well. In particular, they are also applicable to improve the best known bounds for the problem of multiple matching with rectangular patterns in combinatorial pattern matching.