1. Field of the Invention
This invention relates to the field of computer software and more specifically to a method and apparatus for automatically optimizing execution of a computer program.
Portions of the disclosure of this patent document may contain material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office file or records, but otherwise reserves all copyright rights whatsoever. Sun, Sun Microsystems, the Sun logo, Java and all Java-based trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries.
2. Background Art
The use of networks for the distribution of computer programs has grown dramatically, particularly in the case of the Internet and the World Wide Web. A problem with distributing large computer programs over a network is that a large amount of time is required to download and initialize the program. This problem can be understood by reviewing the current methods for distributing computer program and documents over computer networks.
Internet
Computer programs and documents can both be distributed over the Internet. The Internet is a worldwide network of interconnected computers. A client, or Internet client, typically refers to the computer that a person uses to access the Internet. An Internet client accesses other computers on the Internet via an Internet provider. An Internet provider is an organization that provides a client with access to the Internet (via an analog telephone line or Integrated Services Digital Network (ISDN) line, for example).
Distributing Documents Over a Network
A client can, for example, download a document from or send an electronic mail message to another computer/client using the Internet. One type of file a client can download is known as a web page. A web page contains text, pictures, or other types of files that are displayed on the client using a web browser. A web browser is a software program that executes on the client and uses the Hypertext Transfer Protocol (HTTP) to request web pages from servers throughout the Internet. HTTP is a set of rules for exchanging files (text files, image files, sound files, video files, and other data files). Netscape Navigator and Internet Explorer are both examples of widely used web browsers that use the HTTP protocol. (Note that HTTP is only one protocol available to exchange files. Any of several other protocols may also be used.)
Displaying Documents (or files) Using a Web Browser
The Hypertext Markup Language (HTML) is used to describe how to display a web page. An HTML description is typically comprised of a set of xe2x80x9cmarkupxe2x80x9d symbols. An HTML document or file that contains the xe2x80x9cmarkupxe2x80x9d symbols for a web page is transmitted to the client computer. The web browser executing at the client parses the xe2x80x9cmarkupxe2x80x9d symbols in the HTML document and produces and displays a web page based on the information in the HTML document. Consequently, the HTML document defines the web page that is rendered at runtime on the browser. For example, the following set of xe2x80x9cmarkupxe2x80x9d symbols directs the browser to place a title, a heading, and an image called xe2x80x9cimage.jpgxe2x80x9d on a web page:
xe2x80x9cwhere (active domain name address) comprises a URL (or Uniform Resource Locator) or an IP (or Internet Protocol) address. The xe2x80x9cmarkupxe2x80x9d symbols typically surround an HTML command. The xe2x80x9c less than xe2x80x9d symbol indicates the start of an HTML command and the xe2x80x9c less than /xe2x80x9d symbol indicates the end of an HTML command. Each start or end command has a corresponding xe2x80x9c greater than xe2x80x9d to indicate the close of that particular command. The information the HTML command is issued on is typically contained between the xe2x80x9c greater than xe2x80x9d symbol of the start command and the xe2x80x9c less than /xe2x80x9d symbol of the end command. An HTML command describes to the web browser what to do with the block of information located between the two commands. In the above example, xe2x80x9c less than HTML greater than xe2x80x9d, xe2x80x9c less than /HTML greater than xe2x80x9d, xe2x80x9c less than /TITLE greater than xe2x80x9d and xe2x80x9c less than /TITLE greater than xe2x80x9d are examples of HTML commands surrounded by xe2x80x9cmarkupxe2x80x9d symbols.
When a web browser receives an HTML document it displays the information contained between each set of xe2x80x9cmarkupxe2x80x9d symbols in a manner that coincides with the HTML commands issued. For example, the text xe2x80x9c less than TITLE greater than This is the document title less than /TITLE greater than xe2x80x9d directs the web browser to place the text xe2x80x9cThis is the document titlexe2x80x9d in the title bar of the web browser.
Some HTML commands have attributes. The HTML command xe2x80x9c less than IMG greater than xe2x80x9d, for example, has a xe2x80x9cSRC=xe2x80x9d attribute. The  less than IMG greater than  command tells the web browser to display an image and the xe2x80x9cSRC=xe2x80x9d attribute identifies the location and name of the image to be displayed. For example, the statement xe2x80x9c less than IMG SRC=xe2x80x9dhttp://(active domain name address/image.jpgxe2x80x9c greater than xe2x80x9d tells the web browser to display an image named image.jpg that can be obtained from the web server located at the address xe2x80x9chttp://java.sun.com.xe2x80x9d
Distributing Computer Programs Over a Network
In addition to retrieving documents containing xe2x80x9cmarkupxe2x80x9d symbols a client can also use the Internet to obtain other files such as a computer program from another computer located on the same network. Various mechanisms exist for retrieving computer programs using the Internet or other types of computer networks. To understand the problems and disadvantages associated with current approaches to distributing computer programs over a network, a review of programming environments is helpful.
The Programming Environment Utilized by the Java Programming Language
Java is an object-oriented programming language and thus programs written in the Java programming language are comprised of a number of different classes and interfaces. Unlike many programming languages, in which a program is compiled into machine-dependent, executable program code, programs written in the Java programming language are compiled into machine-independent bytecode classfiles. Each classfile contains code and data in a platform-independent format called the classfile format. The computer system acting as the execution vehicle contains a program called a virtual machine, which is responsible for executing the bytecode. The virtual machine provides a level of abstraction between the machine-independent bytecode classes and the machine-dependent instruction set of the underlying computer hardware. Virtual machines exist for a variety of different operating systems. A xe2x80x9cclass loaderxe2x80x9d within the virtual machine is responsible for loading the bytecode classfiles as needed, and either an interpreter executes the bytecodes directly, or a xe2x80x9cjust-in-timexe2x80x9d (JIT) compiler transforms the bytecodes into machine code, so that they can be executed by the processor.
FIG. 1A provides an example of the programming environment utilized by the Java programming language. When this programming environment is installed on a client computer 111-113, (FIG. 1B) that computer can run programs. Programs are run in a Java Runtime Environment 103 (JRE). The JRE 103 is comprised of a class loader 104, a bytecode verifier 105, and one or more classes 106. The JRE utilizes a Java Virtual Machine (JVM) 107 comprised of a bytecode interpreter 108 and a peer interface 109. The JVM 107 interprets the bytecode using a bytecode interpreter 108. (Note that in some architectures (e.g. JavaStation), the bytecodes can be executed directly, requiring neither interpreter nor JIT.)
The programming environment utilized by the Java programming language can optionally include additional components. For example, a Just In Time (JIT) compiler 110 can also be added to the programming environment if desired. The JIT is a compiler that converts bytecode into machine code as a program executes.
Referring now to FIG. 1B, each computer contains its own JVM 107. The JVM 107 interfaces with different types of native operating systems 114-116. Each computer 111-113 may contain a different operating system 114-116. Using JVM 107 computers 111-113 are able to execute bytecode in a manner that coincides with the operating system 114-116 each computer contains. For example, computer 111 and computer 112 can both execute the same program even if each computer contains a different operating system. Computer 111 can execute a program using the UNIX operating system 114 and computer 112 can execute a program using the Java operating system 115.
Description of an Applet
Sending an applet across a computer network is one way to distribute a computer program. An applet is a computer program that may appear embedded in a web page or may be separate from a web page. Applets are sent to a client when a web page containing the applet is requested by the client. For example, if a person views a web page that contains the following code an applet called xe2x80x9canimatorxe2x80x9d is sent to the browser for display:
 less than HTML greater than 
 less than APPLET CODE=xe2x80x9canimator.classxe2x80x9d WIDTH=640 HEIGHT=480  greater than 
 less than /HTML greater than .
An applet written in the Java programming language that runs with the aid of a web browser or another program, such as AppletViewer, is called a Java applet. A program written in Java that executes without the aid of a web browser is called a Java application. A Java applet is a program written in the Java programming language that is distributed as part of a web page. In the example above, a classfile named xe2x80x9canimator.classxe2x80x9d is downloaded to the client and executed with the aid of a web browser.
Although applets are described as being transferred via HTTP to web browsers, the present application is not limited to such transfers. The invention contemplates transfers using any suitable protocol, with or without the use of a web browser. The transfer may be via FTP, via modem communication, via network distribution, client/server distribution, via carrier wave, via transfer on portable media, or any other method of transfer of data.
Further, the present application is not limited to Java programming language based applets and applications, but has equal application to any file or programming language application, or where non-application code of any sort is being distributed.
Uncompressed Java Programs
When a program is created using the Java programming language it is typically comprised of a number of individual files. Sending a copy of these files from one computer to the next through a computer network is one way to distribute a program. For example, referring now to FIGS. 2A and 2B, a single Java applet 200 may require a number of different classfiles 201-210, image files 211-215, sound files 216-220, and other types of data files 221-223 during its execution. Java applet 200 resides on a server 259 and is sent across network 250 when a request 253 is made by the browser 252 on client computer 251. Client computer 251 initiates a request 253 by issuing an HTTP request to the server computer 259 upon which the Java applet 200 resides. In response to request 253 each file that comprises the Java applet 200 is sent across the network 250.
These files may be sent across the network 250 in separate HTTP request/response pairs. Classfile 201, for example, is sent across network 250 in a separate request/response pair from classfile 203. In FIG. 2, every file is sent using a separate HTTP request/response pair. The lines connecting files 201-223 with network 250 represent a series of HTTP request/response pairs 260-282. Network 250 utilizes each HTTP request/response pair 260-282 to contact client computer 251 and send files 201-223 to client computer 251. The Java applet will begin executing when the first classfile is received. As additional files are required, execution will be suspended until the additional files can be retrieved, and then execution will resume. Files 201-223 are not compressed before they are sent across the network. A problem with distributing computer programs in this manner is that long time delays exist between the time when the initial request for the Java applet is made and the time of the response. (Here, a request is a message from the client to the server asking for something, such as a classfile or jar file. A response is the server""s message back to the client to fulfill a request). As a result program startup is delayed. This results in a perceptible delay for the user who wishes to display a web page that contains an applet.
Packaging and Compression/Decompression Techniques
One way of reducing the time it takes to send a program across a network is to package several files into one file, or to compress each file or the packaged files before sending across the network. There are several different schemes for packaging and/or compressing files.
ZIP Files
A ZIP file is used to package and distribute files. ZIP files contain one or more individual files. To create a new ZIP file the user manually decides what files to place inside the ZIP file. Once the user selects one or more files, the ZIP file is created when a create command is issued. The files that comprise a Java program (e.g. a Java applet), for example, can be grouped together and placed into a ZIP file for distribution.
Placing a group of files together into a ZIP file provides a way to transport that group across the network faster than if the files were sent individually. That is, a single request/response pair is used to send the packaged data. When a client computer, for example, requests a ZIP file the entire file is transported across the network in a single HTTP request/response pair. A disadvantage of ZIP files is that the user must manually select what files to place inside the ZIP file. Classfiles can be loaded directly from a zip file. It should be noted that the packaged files can be compressed or the files comprising the package can be individually compressed if desired.
Java Archive (JAR) Files
A Java Archive (JAR) file is a file that contains one or more other files. A JAR file, for example, enables the user to place a collection of compressed files into a single archive file. A JAR file typically contains the classfiles and other data files associated with an applet or application. One way to create a JAR file is to use a JAR packager. The JAR packager is a Java application that combines multiple files into a single JAR file. The JAR packager can be used, for example, to create a single file containing all the compressed files needed to execute a Java applet or Java Application. The basic format for creating a JAR file using the JAR packager is as follows:
jar cf jar-file input-file(s).
This command generates a compressed JAR file. The jar command also generates a default manifest file called MANIFEST.MF and places it in a directory called META-INF. The manifest file is a file that contains information about the files packaged in the JAR file. The c option indicates that the user wants to create a JAR file. The f option indicates that the output is sent to a file rather than to an output stream. The c and the f option can appear in any order, but should not be separated by a space. The jar-file argument is the name of the JAR file the user wants to create. The input-file(s) argument is a space-delimited list of one or more files the user wants to place in the JAR file. The input-file(s) argument can contain a wildcard (*) symbol. If any of the input-file(s) are directories, the contents of these directories are added to the JAR file recursively.
Alternative ways to manually create a JAR file also exist. Any file that organizes information using the JAR format is considered a JAR file. The JAR format is a set of conventions for associating digital signatures, and other information with the files held in the JAR file. Thus, one way to make a JAR file is to create a file that utilizes these conventions. The following steps, for example, enable the user to create a JAR file: 1) manually identify the files to be placed in the JAR file 2) create a manifest file 3) if the files in the JAR archive are going to be digitally signed, the appropriate digital signature files are placed in a subdirectory called META-INF 4) combine the META-INF files with the other files and create a single ZIP file.
Once the components of an applet or application (classfiles, image files, sound files, data files, etc.) are combined into a single JAR file it is possible to distribute a group of files using a single HTTP request/response pair rather than requiring a new request/response pair for each file. Combining more than one file into a single file and then compressing those files improves download times. A disadvantage with JAR files is that the user must specify what files to place in the archive prior to creating the JAR file. An additional disadvantage is that the entire JAR file must be downloaded before a program can begin to execute.
FIG. 3 shows an example of a computer program comprised of multiple files. Java applet 200, for example, is comprised of numerous classfiles 201-210, image files 211-215, sound files 216-220, and data files 221-223. Using the JAR packager, for example, an archive file 300 can be built that contains Java applet 200. Archive file 300 can also be created using other suitable file formats such as the ZIP file format. When browser 252 encounters a web page containing HTML xe2x80x9cmarkupxe2x80x9d symbols enclosing an  less than APPLET greater than  command, an HTTP request 301 is made for Java applet 200 contained in Archive file 300. For example, if Java applet 200 is contained in an archive file 300 named jarFile.jar the following HTML code results in a request for the file named jarfile.jar. When a request is made the file jarFile.jar applet 200 is transferred from server 259 to client 251. The archive attribute can list one or more jar files, all of which will be downloaded before the applet begins execution.
In response Server 259 delivers the file jarfile.jar to browser 252 using a single HTTP response 302. This provides a way to transfer all of the files that comprises Java applet 200 from server 259 to the browser 252 that resides on client 251. A disadvantage with this approach is that browser 252 cannot begin executing Java applet 200 until the entire archive file 300 is received. Another disadvantage with JAR files is that the user must manually determine what files to place in the JAR file.
As the above example indicates a problem with distributing large computer programs over a network is that a large amount of time is required to download and initialize the program. In many cases this amount of time is unacceptable to the user. In an effort to reduce the amount of time a user must wait before a program loads archive files may be utilized. Many of the prior art solutions do not provide a way to minimize the amount of time required for a program to reach a desired state. The ZIP format improves on this by attempting to place them into a single archive file. However, to place files in a ZIP archive the user must manually determine what files need to be placed in the archive file. To utilize the JAR format the user must also manually determine what files to place in the archive. To create an archive file that is capable of bringing a computer program to a certain point of execution the user must manually determine what files accomplish that and then manually create an archive file consisting of those files. This process is cumbersome and does not provide a way to automatically optimize the execution of a computer program. Alternatively, the prior art contemplates manually determining only those files needed for startup and placing those files into a packaged file.
With advancements in network technology, the use of networks for facilitating the distribution of computer programs has grown dramatically, particularly in the case of the Internet and the World Wide Web. These computer programs have increasingly grown in size. A problem with distributing such computer programs over a network is that a large amount of time is required to download and initialize the program.
Embodiments of the invention comprise a method and apparatus for automatically optimizing loading of a computer program. The files needed to execute the computer program until it reaches a desired state are identified and placed in an optimized file. The optimized file can be executed on the computer on which it resides or it can be transmitted from one computer to another (e.g. from a server to a client computer). The optimized file enables a computer to execute a computer program until a previously determined desired state is attained. Other files may be provided which enables the computer with the ability to execute the computer program beyond the desired state. The desired state of the computer program may be, for example, the point where user input can be accepted by the computer program.
In one embodiment of the invention, the optimized file is a collection of data files packaged together into a single file. A Java Archive (JAR) file, for example, is one kind of optimized file. The present invention can also utilize other types of file packaging formats such as the ZIP file format. Any kind of file comprised of one or more other files is suitable for creating an optimized file. The type of computer programs addressed by the present invention are comprised of more than one file. A computer program implemented as a Java applet, for example, is comprised of classfiles, image files, sound files, data files, etc. Of all the files that make up the computer program only a percentage of these files is required to actually start the program. In an embodiment of the invention, the files needed to start the program are automatically determined and placed into an optimized file.