Software applications are typically created using and relying upon an application development environment, such as Microsoft's .NET environment, Sun Microsystem's Java environment, and others. These development environments provide libraries and structures for creating richly customized programs for particular applications. By using existing, general purpose environments and tools, the development of complex and feature-rich software can be streamlined.
Software may be developed for general purpose use, as exemplified by word processing programs, spreadsheet programs, presentation software, and the like. Software may also be developed to model the information flow in a specific business. Software that is specialized for a business is often referred to as “enterprise software”. Enterprise software is distinguishable from general purpose applications such as word processors, spreadsheets, or other such applications, in that enterprise software seeks to specifically model the information generated and managed by a particular type of business, to make the flow of that information more efficient and to reduce the training required to obtain best utilization of human resources.
Applications typically provide customized screen controls and dialogs, to present information of particular relevance to a particular application. These controls access data relevant to a business process, and present that data for review, editing and reporting in a manner that matches the information flow for the business.
Data utilized by software comes in a number of categories, but is all handled in a relatively similar, modular fashion, consistent with the application development environment. For example, software may manage a unique catalog of products or services and their pricing, a unique set of vendors and business partners, unique office locations, unique employee information, and, of course, unique customer information and transactions, accounting and billing records. During application development, each of these data categories is incorporated into an overall data model or database schema, and that data model is then implemented in an underlying database application, as a collection of tables for each type of data, with the table populated by records representing each unique data item. For example, one table might identify partners, and include columns that identify, for each partner, the name, address, telephone and main contact information for the partner. A second table might identify customers and have similar information for each customer. These tables may link to each other in various ways, such as by the use of a common postal (zip) code table associating postal codes with names of associated cities, towns or geographic areas.
It will be understood that the use of tables of data coupled to software functions is common practice in enterprise software where a business typically creates and consumes specialized data as part of its day-to-day operation. However, tables of data are also utilized in general purpose software. For example, a general purpose word processor or spreadsheet program may provide a zip/postal code lookup function that is driven by a table of zip/postal code data.
Typical modern programming practice utilizes an object oriented methodology, in which an executing application is treated as a dynamic set of interacting objects. Objects are typically associated with data and methods that access that data in predefined ways. Most objects typically permit access to their data members only through their defined methods. This is useful in that it ensures that the data will remain in a well-defined state as defined by the methods. Thus, in a program consistent with the example described above, an object may be created for handling partner information, and include within that object a method for accessing partner name, address and contact information. A second object may be created for handling invoice information, and include objects for accessing an invoice, editing an invoice, changing an invoice status, and the like. The variety of methods provided for particular data in an object may vary based upon the business practice; thus, methods for modifying invoices may be provided because such modifications are done frequently in normal business practice, whereas methods to modify vendor or partner information may not be provided, since such modifications are rare and can be better accomplished directly with a database program by a system administrator.
As an example, referring to FIG. 1, to construct a screen control 10 on a display screen 12, such as a control that presents partner information to a user (e.g., so that the user may associate a partner with a particular transaction), program execution passes from a main executable 14 to an object 16 that creates the control, which then passes a query request message 18 to an object 20 managing the partner information. The partner information object 20 then performs a query by passing a query message 22 to a database server 24 which performs the required query upon a database 26. The result of the query executed upon the database is, e.g., an array 28 of partner information including the name, address and contact information for all partners in the database. Array 28 is returned via object 20 to object 16 which utilizes the array to populate the control 10.
The modularity apparent in the above example, affords advantages. Specifically, routines that are not always used by a program, such as code for objects 16 and 20 in the above example, may be moved to libraries, which are collections of “helper” code and data, which provide services to independent programs. Libraries were originally popularized as a way for an operating system (OS) such as Microsoft's Windows operating system to provide system services, with the initial purpose of saving both disk space and memory required for applications. Any code which many applications share could be separated into a library which only exists as a single disk file and a single instance in memory. Extensive use of libraries allowed early versions of Windows to work under tight memory conditions.
Libraries are also useful in that they permit code and data to be shared and changed in a modular fashion. To explain, the use of libraries needs to be elaborated.
Executables and libraries make references known as links to each other through the process known as linking, which is typically done by a linker. These links are typically dynamic, which means that a library is not copied into an executable program, or another library, at compile time, but instead the library remains in a separate file on disk. Only a minimal amount of work is done at compile time by the linker—it only records what libraries the executable needs and the index names or numbers. The majority of the work of linking is done at the time the application is loaded (loadtime) or during the execution of the process (runtime). The necessary linking code, called a loader, is part of the underlying operating system. At the appropriate time (when a library object is needed) the loader finds the relevant libraries on disk and adds the relevant data from the libraries to the process's memory space.
A library that is loaded only when needed, is called a dynamically linked library. This term is sometimes shortened to “dynamic link library” or DLL in Microsoft Windows environments, although when used in this application the term “library” refers to any dynamically loaded component of a program, and includes, among others, Windows DLL files and Java JAR files. A DLL, is a collection of routines that can be called by applications and by other DLLs. DLLs contain shareable code or resources, and they provide the ability for multiple applications to share a single copy of a routine they have in common. A DLL library contains a table of all the objects and supported methods within it, known as entry points. Calls into the library “jump through” this table, looking up the location of the code in memory, then calling it.
Some DLL's are referenced and linked into a program at compile time. The executable thus has all the information it needs to load the DLL. This speeds up loading, but the DLL cannot change after compilation or the application will not run. In contrast, there are DLL's that are not linked at compile time, but defined and loaded at runtime. These DLL's can change content and still be loaded.
Because dynamically linked libraries are linked only when used, they allow changes to be made to code within the self-contained library, with the resulting change seamlessly shared by potentially several applications, without any change to the applications themselves. This basic form of modularity allows for relatively compact patches and service packs for large applications, such as Microsoft Office, Microsoft Visual Studio, and even Microsoft Windows itself.
However, libraries, and Microsoft-type DLLs in particular, also have drawbacks: DLL conflicts may occur when several applications use the same DLL library but conflict as to which version is to be used. In older versions of Microsoft Windows, programs and dll's needed to be registered and usually stored in a common location. As a consequence, there were problems in older versions of Microsoft Windows when one copy of a dll was overwritten by a different file with the same name, as well as registry conflict problems.
Recently, Microsoft's .NET programming environment introduced dll's which do not need to be registered, and can be copied directly into the program folder or the Global Assembly Cache (GAC), where they are uniquely identified, and multiple copies can coexist. Microsoft's .NET framework uniquely names and associates DLL's with the associated executable, with the intention of allowing side-by-side coexistence of different versions of what may be the same shared library. This compromises disk space for operability, which is reasonable with the easing of disk space restrictions in modern computer systems.
In the context of database-using applications, libraries are often used to separate code for infrequently used objects and methods from code that is used in typically every execution of the application. Thus, for example, if the lookup of a partner name and contact information is relatively rare, good programming practice might be to separate the code for that lookup, i.e., the code of objects 16 and 20, into libraries, which are dynamically loaded when the user requests such a lookup. This approach has the advantage of saving memory space for other uses until needed, and furthermore allows modular updating if, for example, the format of the database storing particular information changes, requiring an update of the executable, since in that case the upgrade may be handled by the mere installation of a replacement library (e.g. DLL file) rather than a reinstallation of a larger executable program.
Although the modularity provided by libraries and object oriented programming offers these numerous advantages in operability and upgradability, there are drawbacks in performance. Primarily, the repeated indirection involved in obtaining data can be time consuming and disturbing to a user. That is, referencing the example above, to build screen control 10 including partner contact information, the system must first locate the library files for objects 16 and 20, link and load them, and then use the embedded methods in those objects to execute a query onto a database server 24, after which the resulting data is packaged as a returned result and delivered to the object generating the screen control.
The delays in this process come in several parts, but are greatest at those points in the process that involve access to a dynamically allocatable storage device (DASD) such as a hard disk drive, which will occur whenever data must be obtained outside of the processor's available memory, such as when locating and installing a library, and when performing the subsequent database query.
Databases, due to their size, are typically stored outside of local memory of the processor that executes the software application. Indeed, in a multi-user environment, to implement sharing of data, that data is often not local to the computer executing the application, but rather is remotely accessed, such as from a DASD at a central location such as a local or remote database server. As a result, database query access often involves the remote access of a large data file on a DASD over a network or the Internet. If the server is accessed through the Internet, the Internet connection speed is a slowing factor, as well as the fact that, typically, the data will have to go through one or more encrypting and decrypting processes. Also, if a remote server is being used, the application can only run when the remote server is accessible. Thus, applications using remote data face challenges and far greater potential delay in building a screen control of an application.
The delays inherent in a large database cannot be readily avoided in many situations, for the reason that some of the data stored in an application's database may be large, such as detailed customer purchase records, invoicing, and the like, which provide necessary and highly specific details about the operations of a business. However, some aspects of the database may be far smaller in size (but no less critical to the operation of the enterprise program), such as a list of vendors or partners, or a zip/postal code lookup table, but those are a small fraction of the information typically held in a database.
It is possible to optimize the indexing and structure of a database to increase the speed with which data is accessed, albeit often by compromising in favor of one type of access pattern over others. A great deal of time and energy has been devoted to the optimization of database applications to improve their performance. However, while these approaches have offered efficiencies, there remain substantial performance gaps in the performance of applications, particularly in those portions which utilize unusual database queries for which database performance is not optimized.
More recently, in some applications data has been stored in an XML (eXtensible Markup Language) file, a method that differs in a number of ways from the use of databases and objects to access them. Specifically, XML files are flat data, marked with predefined tags according to open and standard rules. Functionality (methods) for data lookup need to be separately coded from the XML file and designed to access the tags and the utilize the tags to access the corresponding data. (The presence of the tags may increase the size and loading time of the XML as compared to an equivalent database file.) XML files are often unencrypted and thus easily accessed outside of the application, and would need to be encrypted to prevent such outside access, whereas protections from such accesses is normally provided in the security of a database and the inherent encryption of code for accessing the database.