1. Field of the Invention
This invention relates generally to interfacing two software modules, one of which is a processing module to send various data structures to the other software module which is an application processing module which utilizes the data structures. The present invention is more specifically related to use of an abstract class having a virtual function and a plurality of first level derived classes of the abstract class defining the plurality of data structures to be sent and a plurality of second level derived classes of the plurality of first level derived classes of the abstract class, each second level derived class defining the virtual function to utilize the data structures. The invention is more specifically related to a method, system, computer program product, and a memory for interfacing two software modules.
2. Discussion of the Background
Object-oriented-technology is based upon the manipulation of software objects instantiated from software classes. A software class is considered as a user defined type equivalent to normal types such as integer type. The software class is typically declared with data items and procedures or software methods that operate on the data items. Many high-level languages, including C++, support the declaration of a class. Software objects instantiated for software classes are called instances of the software classes from which they are instantiated, and have all the features, or the xe2x80x9ctypexe2x80x9d of the software class used for instantiation.
An abstract class is a software class that is not intended to be instantiated. The purpose of an abstract class is to define interfaces shared by derived classes through inheritance. An abstract class is frequently used with virtual functions or software methods which declare the interfaces with or without definitions. When a software class derived from an abstract class defines an inherited virtual function of the abstract class, the virtual function of the derived software class will be executed even when the instantiated object of the derived software class is accessed through a reference type of the base abstract class. If the function referenced is not a virtual function, the base class function or software method will be executed. This technique allows the client or user of the software object to execute the correct function or software method with only the knowledge of the abstract class. Many examples of such techniques are shown in Gamma, E., Helm, R., Johnson, R. and Vlissides, J., Design Patterns: Elements of Reusable Software, Addison-Wesley, Massachusetts, 1995, which is incorporated herein by reference.
Object-Oriented Programming (xe2x80x9cOOPxe2x80x9d) is a programming methodology in which a program is viewed as a collection of discrete objects that are self-contained collections of data structures and routines that interact with other objects. As discussed above, a class has data items, structures, and functions or software methods. Data items correspond to variables and literals of prior programming art. Structures are named groupings of related data items and other structures. Software methods correspond to functions and subroutines of prior programming art. An object-oriented framework is a reusable basic design structure, comprising abstract and concrete classes, that assists in building applications.
Pointers used for accessing specific objects, data items, and software methods are data items which include values of system equivalents of absolute addresses in computer memory. Null pointers, or zero pointers, are pointer variables or literals which have been assigned a system value, for example, zero, denoting that a specific pointer is currently pointing to a null or non-existent item. References and reference variables are generally data items which have values of system equivalents of absolute addresses in computer memory. In programming terminology, dereferencing a reference means accessing information at the computer memory address referenced by a pointer or reference.
A compiler is a software program that translates programs written in a high-level language, such as C++ or Pascal, into an intermediate language or machine language which is specific to a particular computer system configuration. In general programming terminology, data items, variables, and functions or software methods are declared so that a compiler knows specific names the programmer will use in the high-level language code to be translated. A compiler typically creates a symbol table to keep track of valid data items, variable names, function or software method names, structures, and addresses thereof as space is allocated. This process enables the compiler to assign numeric addresses to references to the data items, variables, functions or software methods, or software structures, or to create executable code to enable referencing of the data items, variables, functions or software methods or software structures during execution of the executable code that is output from the compilation process. For purposes of this invention, a declaration of a data item, variable, function, or software method is a declaration of the name of the data item, variable, function, or software method. A definition of the data item, variable, function, or software method is the defining content for the data item, variable, function, or software method. For example, the declaration of a software method named xe2x80x9cdrawxe2x80x9d includes the name and types of interfaces for the software method, but not the defining code. The definition of the software method named xe2x80x9cdrawxe2x80x9d includes the name of the software method, any needed data type information, information concerning parameters to be passed, and the defining code for the software method. In some programming languages, a definition is also a declaration.
The three main features of object-oriented programming are inheritance, encapsulation, and polymorphism. Encapsulation and polymorphism have already been described and are already well known in patents relating to object-oriented systems. Inheritance allows a programmer to establish a general software class with features which are desirable for a wide range of software objects. For example, if a programmer designs a software class shape having certain generalized features such as a closed convex shape and a generalized computable property called xe2x80x9cdraw,xe2x80x9d it is then possible to construct subclasses derived from the superclass shape such as triangles, squares and circles, all having the shared properties of the parent class shape, with additional properties such as the lengths of sides or a radius value. It is also possible, for example, to have derived subclasses of classes which have additional properties such as a solid circle and a dashed circle.
The class shape is considered a base class, in that instantiations of actual objects is performed in its subclasses. The class shape is also considered an abstract class, in that it makes no sense to instantiate a shape object since object properties are not fully defined for the class shape. An abstract class is a class from which no objects are instantiated, and for which an interface for subclasses is established. The class shape establishes certain properties inherent to all shape subclasses for inheritance purposes. For example, an operation named xe2x80x9cdrawxe2x80x9d of a shape, a commonly requested operation among users of shapes, can be declared as a software method for the class shape, to be inherited in all subclasses of the class shape. A programmer creates new classes derived from the class shape which inherit all desired features of the class shape without rewriting code already written for the class shape. This feature, called reusability, offers tremendous savings of time and resources in system development, maintenance, and support.
In many high-level programming languages, a programmer declares a derived class by providing the name of the class being declared and the names of base classes from which the derived class is to inherit properties. In the shape example discussed previously, the class shape is considered to be at a top level of an inheritance hierarchy, and is abstract since it makes no sense to instantiate shape objects with no definition of an actual shape, for example a square or a circle. Subclasses declared a level below the class shape are the subclasses specifically derived from the class shape, such as triangles, squares and circles. The subclasses triangles, squares and circles are then called children or subclasses of the class shape, and the class shape is called a parent or superclass of the classes triangles, squares and circles. Declarations of the subclasses specifically refer to the class shape for establishing inheritance. Subclasses a level below the class circle are the subclasses specifically derived from the class circle, such as solid circle and dashed circle. The classes solid circle and dashed circle are then called children or subclasses of the class circle, and the class circle is called a parent or superclass of the classes solid circle and dashed circle. Declarations of these subclasses specifically refer to the parent class circle for establishing inheritance. Since the class circle is derived from the class shape, the derived classes solid circle and dashed circle inherit all features of the class shape, and all additional features of the class circle.
In object-oriented programming, a pure virtual function is a function or software method declared with no defining code in an abstract class. For example, in declaring the abstract class shape described previously, a programmer declares a pure virtual function named xe2x80x9cdraw,xe2x80x9d with no defining code, as a software method for the abstract class shape. Subclasses derived from the abstract class shape inherit the pure virtual function as a virtual function having the same name as the pure virtual function of the parent abstract class. The function name or software method name has executable code defined at some level in subclasses of the parent abstract class.
For the shape example discussed previously, assume the abstract class shape has a declaration for the pure virtual function named xe2x80x9cdraw.xe2x80x9d Using formulas from basic algebra and geometry, the actual code executed for drawing a shape differs from one shape to another, so the code for the function named xe2x80x9cdrawxe2x80x9d is defined only in derived base classes used for instantiation of software objects. In C++, the virtual function is declared as a virtual function in all abstract subclasses to be used as superclasses for derived subclasses from which objects are to be instantiated with defining code for the virtual function of the abstract classes. For example, drawing a circle requires plotting points equidistant from a center point. Drawing a square generally requires plotting points to form four straight sides having equal length which are connected at right angles. Therefore, a request to draw a particular shape needs to accommodate the different properties of various desired shapes. Using a pure virtual function named xe2x80x9cdrawxe2x80x9d in the abstract class shape, the code for drawing a circle is included as a software method named xe2x80x9cdrawxe2x80x9d for instantiated circle software objects, and the code for drawing a square is included as a software method named xe2x80x9cdrawxe2x80x9d for instantiated square software objects. A reference to a software object instance of the software method named xe2x80x9cdrawxe2x80x9d causes execution of the code to draw the shape represented by the software object instance. For this example, the shape of a circle is drawn if the code for an instantiated circle object is accessed, and a square is drawn if the code for an instantiated square object is accessed.
In C++, the code for the desired software method named xe2x80x9cdrawxe2x80x9d is accessible by using a format including a reference to the desired circle or square instantiated software object and the name xe2x80x9cdraw.xe2x80x9d A comprehensive discussion of the pure virtual function property of abstract classes in C++ is provided in Stroustrup, B., The Design and Evolution of C++, Addison-Wesley, Massachusetts, 1994, and in Meyers, S., Effective C++: 50 Specific Ways to Improve Your Programs and Designs, Addison-Wesley, Massachusetts, 1992, which are incorporated herein by reference.
Some object-oriented programming languages support multiple inheritance, wherein a software class derived from plural existing parent software classes inherits attributes and software methods from all parent software classes included in the desired derivation. As discussed above with regard to inheritance, a child subclass is declared by supplying the name of the class to be declared, and the names of the desired parent base classes for multiple inheritance. Additional properties for the child subclass are then declared and/or defined.
A comprehensive discussion of OOP is provided in Coad, P. and Yourdon, E., Object-Oriented Analysis, Second Edition, Prentice-Hall, Inc., New Jersey, 1991, and in Booch, G., Object-Oriented Analysis and Design with Applications, Second Edition, Addison Wesley Longman, California, 1994, which are incorporated herein by reference.
Software modules are collections of routines and data structures that perform a particular task or implement a particular abstract data type and manipulate software objects. Different software modules have varying requirements for data types to be used in processing. As data is passed from one software module to the next for various tasks to be performed, communication is necessary to determine what data types of data items are to be accessed and manipulated. For example, if a first software module manipulates data items having an integer data type and requires the data items to be processed by a second software module that manipulates data items having a floating point data type, a conversion from the integer data type to the floating point data type will be necessary before the second software module processes the data items, as integer data type and floating point data type have different internal representations. After the processing, the resulting data items will need to be converted back to integer format before being returned to the first software module.
If two software modules support processing of data items of plural data types, then information concerning the data types of data items sent from one software module to another needs to accompany the data items so that the receiving software module knows how to process the data items received. The receiving module must determine what data type the data items have so that correct code will be executed to accommodate the data type of data items sent. For example, if a first software module performs addition of two data items sent by a second software module, then the first software module must determine whether the two data items sent by the second software module are two integers, two floating point numbers, two complex numbers, two arrays, or some other data type supported by the first software module. Once the data type of the data items is determined, the first software module executes the correct code for adding the two data items together. For example, addition of two integers is a different operation internally from addition of two arrays.
Another example of passing data of different types among software modules is illustrated by a software compiler which generally is a program used to translate a program written in a high level programming language into a program in machine language. For the software compiler program a parser is a software module that breaks up the high level language program into various types of components called tokens and passes the tokens and their types to other software modules to be processed into generated machine code.
In some programming languages such as C, pointer variables are declared and defined to point to a specific data type of data item. A compiler will not generate code for references using a defined pointer variable which does not follow compiler rules concerning compatibility of data type with the pointer declaration and definition. For example, assume a pointer named PTR1 is defined as a pointer to data type float or floating point data type. If PTR1 is referenced to access a data item of data type char or character data type, the compiler recognizes a data type conflict and generally refuses to generate code for the reference in conflict. If the programmer wishes to access the data item having data type char, he/she writes code to convert the data item to type float or he/she writes code to use a pointer to data type float to accommodate the compiler requirements for data type compatibility.
Some programming languages such as C and C++ support type casting of variables and values. Type casting has various formats, for example an integer variable written on the left hand side of an assignment statement having a floating point value on the right hand side is interpreted by the compiler to mean that a conversion is performed by the compiler to store an integer format of the floating point value in the storage allocated for the variable on the left hand side. A statement such as xe2x80x9c(float) xxe2x80x9d is interpreted to mean that a conversion is to be performed by the compiler to convert the value of the variable x into floating point format, usually for data type compatibility in expressions. The data type and value of the variable x are not affected by the type cast operation.
An information passing technique in the related art involves passing of information between two software modules by passing both the data and information about the data type of data being passed. A sending module sends data and information concerning the data type of the data as parameters or arguments, and a receiving software module receives the data and the information concerning the data type of the data received. The receiving software module is then responsible for determining the data type of data received so that the receiving software module executes correct instructions to process the data type of data received.
In programming languages, data types are used to determine the internal representation of data to be used in software programs. For example, a data item having an internal representation of sixteen bits in two""s complement format is of data type integer in many hardware configurations, and a data item having an internal representation of eight bits in American Standard Code for Information Interchange (xe2x80x9cASCIIxe2x80x9d) format is of data type character in many hardware configurations. If a first software module passes an integer as a first data item and an ASCII character as a second data item to a second software module, with a request to add the two data items together and return their sum, the second software module will need information about the data type of the received data items so that the second software module will know what code to execute in order to perform an addition of the two data items of different data types and different internal representations.
If a user needs to access a software module to perform a function, with the processing results to be used by plural software objects having different structures as data types, the related art scheme of passing data items along with information concerning the data type of the data items sent involves the disadvantage of writing different executable code to handle every case of data type information for every type of software object in the system. If the data types are modified, for example, by adding new software object types, or by modification of existing software object types, the code to handle the various data types must be updated or rewritten for every modification made in the system. Such updating or rewriting of the software may be a time consuming and expensive task.
Accordingly, an object of this invention is to provide a novel method, system, object-oriented system, and computer program product for interfacing two software modules supporting various applications. Exemplary applications include parsing tokens of documents and processing the component tokens passed from the parsing process.
It is a further object of this invention is to provide a novel method, system, object-oriented system, and computer program product for interfacing an application software module and a processing software module supporting various applications.
It is a further object of this invention is to provide a novel method, system, object-oriented system, and computer program product for interfacing an application software module and a parser software module supporting applications for parsing input files in a first structured format and transforming the input files into output files in a second structured format.
It is a further object of this invention to provide a novel method, system, object-oriented system, and computer program product for interfacing an application software module and a parser software module supporting applications for parsing input documents in Standard Generalized Markup Language (xe2x80x9cSGMLxe2x80x9d) format and transforming the input documents into output documents in HyperText Markup Language (xe2x80x9cHTMLxe2x80x9d) format.
These and other objects are accomplished by a novel method, system, object-oriented system, and computer program product for interfacing two processes, a data analysis process and an application process. The data analysis process analyzes input data and passes various kinds of data structures to the application process. The application process uses the received data for its own purposes. The main purpose of the invention is to allow the interface between two processes to be independent of the definition and local structure of the application process to support different application processes that use the passed data differently depending upon the desired application.
An abstract class with one virtual function is used to declare a function to be defined by different applications while the derived classes from the abstract class add various data items that are abstracted by a data analysis unit. The data analysis unit therefore passes different kinds of objects to an application process unit. According to the present invention, the application process unit accesses a passed object as an object derived from the abstract classes. The application process unit executes the virtual function without knowing the exact nature of the software object that defines the virtual function.
In addition, the present invention uses multiple inheritance and static data to allow common data to be shared among the virtual functions of different application environments. An alternative design to multiple inheritance is discussed for data sharing.
The data sharing discussed above is accomplished by creating a base application software class and an abstract software class having a declared virtual function which is derived by plural subclasses using multiple inheritance. The virtual function is defined in executable software code at some level for instantiation of plural software objects having different code defined for the virtual function. An application software module accesses the defined code for the inherited virtual function of an instantiated object by using a pointer defined to reference the abstract class and set to have a value of a reference to the desired instantiated object. The pointer is set to reference the desired software object by a software module different from the application software module. Therefore, the application software module accessing the defined code for the virtual function in an instantiated software object only needs the pointer and the name of the virtual function as declared for the abstract class, and does not need information concerning which instantiated object or type of instantiated object is accessed by the call to the software method.
An exemplary application of the present invention is parsing and processing documents written in markup languages to transform the documents from a first structured format into a second structured format. A parser parses an input document to break it into component parts called tokens. In order to support processing of different types of tokens, a multi-level object-oriented software inheritance structure is implemented as an interface for software modules involved in processing the documents and tokens.
At the top or zero level of the multi-level interface structure, a zero-level abstract software class having a declared virtual function for processing and transforming tokens from the first structured format to the second structured format is declared. At the next level down from the top or zero level of the multi-level interface structure, plural first-level derived software subclasses of the zero-level abstract class are declared having different data structures added to the zero level abstract class. In order to enable access to common data items for all instantiated objects, a base application software class is declared having defined data items which are generally needed by instantiated objects of the system.
At the next level down from the first level of the multi-level interface structure, plural second-level software subclasses are declared with each second-level software subclass derived from one of the first-level derived software subclasses having the virtual function and the base application data software class using multiple inheritance. The plural second-level software subclasses have a definition for software code for the virtual function, with different code in different subclasses for processing and transforming different token types. Each second-level software subclass is designed to process a particular type of token recognized by the parser.
A driver application software module calls the parser designed as a parser application software module for processing an input document. The parser application software module parses the input document to determine plural tokens which are component parts of the document needing further processing for desired output in a different format. The parser application software module determines the type of token parsed and the instantiated object type needed to process the parsed token. The parser application software module has plural pointers which are defined to reference the derived first-level subclasses of the zero-level abstract class and which are set to have values of the software objects instantiated from the second-level software subclasses. The parser application software module sets a first-level pointer declared to reference the first-level abstract software class to have a value of a reference to the desired instantiated software object by setting the first-level pointer to the value of a first-level pointer defined to reference the derived first-level subclasses from which the selected software instantiated object is derived. The parser application software module then passes the first-level pointer to the driver application software module.
The driver application software module receives the pointer sent by the parser application software module by using a pointer defined to reference the zero-level software abstract class. The driver application software module then requests processing of the token by referencing the defined code of an instantiated software object using only the zero-level pointer and the name of the declared virtual function of the zero-level abstract class. The driver application software module has no need to know which type of token has been recognized by the parser application software module, and no need to know which instantiated software object is accessed to process the recognized token. If additions, deletions or modifications are needed by the system for token types to be processed, no modification is needed for the driver application software module. Therefore, reusability of code is possible.