1. Field of the Invention
The present invention relates generally to an extensible and dynamic software operating environment supporting applications which process structured information, and particularly to an environment supporting XML processors.
2. Description of the Prior Art
A computer system comprises hardware and software arranged to store and process data. The hardware of a typical computer system includes a central processing unit (CPU), memory, external storage, data input devices, data output devices and data communication devices.
The CPU can manipulate data, which resides in memory, move data between memory and external storage, and control the other devices. Instructions, which also reside in memory, direct the CPU to perform these actions. The CPU fetches instructions from memory and executes them one by one. A program is a sequence of instructions to accomplish some task, and the term software denotes programs in general.
In a computer system data is encoded in a form the computer can manipulate efficiently, usually as groups of binary digits. In external storage, data is organized in larger units known as files. Programs which are not being executed may reside in memory or in external storage, and can be treated as data.
In the early days of computing, each program had to manage the computer as well as doing its own work. Programs had to contain detailed instructions to control devices, handle errors, communicate with the operator, and clean up afterwards.
These housekeeping chores made programs inflexible and fragile. The programmer had to know which devices were attached to the computer, where the data was stored, how much memory would be available and how to send messages to the operator. The program could not run on a different computer, and if the hardware changed the program would stop working. The solution was to divide the work between two kinds of software: system programs and application programs.
System programs are written by programmers who understand the computer hardware. An operating system is a collection of system programs, which manage a computer. When a new device is added to the computer a new system program is added to the operating system.
Application programs (xe2x80x98applicationsxe2x80x99) are written by programmers who understand the job the computer system is to accomplish. The operating system provides a complete environment for the application. When the application needs to perform some hardware operation such as reading or writing data or communicating with the operator, it calls the operating system to help.
When one program calls another program there is an interface between them. In many cases the details of the interface don""t matter, but if different programmers are involved, or if many programs will call each other in similar ways, the interface is important and must be specified precisely. In this case it is called an Application Programming Interface (API). An operating system has an API.
It is often necessary to transport data from one computer to another. A simple method uses a portable storage medium such as a diskette. The sending computer writes data to the diskette using an output device, the diskette is physically transported to the receiving computer, and the receiving computer reads the data from the diskette using an input device.
A more convenient method is to connect the two computers by cable. The sending computer uses a data communication device to write data to the cable, and the receiving computer uses a similar device to read data from the cable.
Such an arrangement works in both directions. Each computer can send and receive, but they must ensure that when one is sending the other is receiving. They do this by agreeing to follow a protocol.
Communication between two computers is useful, but communication between several computers is even better. Three or more computers can share one cable, and when one computer sends data all the other computers receive it. This is called broadcasting, and a group of computers which are connected like this is called a broadcast network.
A more complicated protocol is needed for broadcast communication. In particular, it is necessary to assign a unique address to each computer. Whenever one computer sends data to another computer it includes its own address and the receiver""s address. All the computers hear the transmission, but they all ignore it except the receiver.
Broadcast communication works well if the number of parties is limited. When two broadcast networks must be connected to each other, it is better to let one computer in each network handle external communication.
A computer which performs this role on behalf of a network is called a router. The routers are connected to each other directly. When a sender broadcasts data for a different network the router sends it to the other router. The other router sends it to the receiver.
The two broadcast networks and the link between them form a routed network, called an xe2x80x98internetxe2x80x99. This internet may be connected to other internets to form a bigger network, and so on. The Internet is an example of a large public internet.
Routed networks need more complicated transport protocols and addresses than broadcast networks. The most widely adopted protocol and addressing scheme is the Internet Protocol (IP).
IP helps routers to move data around networks, and provides a foundation for a family of protocols for specialized communication. Transmission Control Protocol (TCP) guarantees a reliable channel between two applications on different computers. Simple Mail Transport Protocol (SMTP) uses TCP to move electronic mail from one computer to another. Hypertext Transport Protocol (HTTP), also based on TCP, forms the basis of the World Wide Web and is widely supported. File Transfer Protocol (FTP) uses two or more TCP connections to move files between computers.
Computer systems can only work with data, but people are interested in the information which that data represents.
Here is some data, represented as a sequence of characters:
DUB200003220620030000EI123HTWONTIME08001
A human observer might detect patterns in the data, and obtain some information by inference and guesswork. Sophisticated computer systems have been designed to do the same, though not so well. For ordinary purposes however, computers and humans need some clues about the structure and context. Here is the same data, structured as a
DUB,200003220620030000,EI123,HTW,ONTIME,0800,1
sequence of elements:
Structuring the data in this way helps somewhat. It can be seen that the last element is the number 1, the second element might be a date and xe2x80x98123xe2x80x99 belongs with xe2x80x98EIxe2x80x99 in the third element.
The context in which this data should be interpreted is: xe2x80x9cAirline flight statusxe2x80x9d. Now, the three-character airport references are clear, and the flight number EI123. This still does not explain the meaning of the last two fields.
The information this data actually represents is:
xe2x80x98Aer Lingus flight EI123 from Dublin to Heathrow at 06:20:03 GMT on 22nd Mar. 2000 landed on time at 08:00, and is assigned to terminal 1.xe2x80x99
This free text representation makes the information very clear to humansxe2x80x94at least, to those who understand a little about air travel and the English language. It is not easy for computers, however, because the structure has been lost.
Markup languages offer a powerful compromise between the information content of free text and the fixed structures that computer systems need. Here is the same information expressed using Extensible Markup Language (XML).
An XML document comprises elements, and each element is introduced by a start tag containing a name and followed by an end tag containing the same name prefixed by a forward slash. Tags are delimited by angle brackets. Elements may contain text or other elements. In the example above, element STATUS contains the text ONTIME, and element FLIGHT-EVENT contains seven other elements.
An application which processes XML must be prepared to undertake many tasks. It will have to obtain the document as a stream of characters from the network or from external storage, scan the document looking for special characters such as the angle brackets which delimit tags, extract element names and the text contents, ensure there is an end tag for each start tag and ensure the elements are properly nested. To write an XML document an application must assemble elements, properly nested and in the correct order, format the tags and write characters to external storage or the network.
If the application""s job is simply to identify flights from Dublin, or send a message whenever the status of a flight changes, the programmer will want to avoid as much of this housekeeping as possible. One way to do this is to divide the application into several programs, then use programs which have already been written.
A program which reads a stream of characters and identifies tags and elements is called a parser. It is common to use a parser with other programs, and standard interfaces have been designed for XML parsers.
The parser may read the whole document into memory and then supply parts of the document in response to calls from another program. This is the Document Object Model interface (DOM).
Alternatively, the parser may call another program as soon as it recognises something interesting in the character stream. This interface is known as the Simple API for XML (SAX). Each call is known as an event, a sequence of calls is an event stream, and the program which the parser calls is an event processor. An event processor for a small application can be written quickly, and a programmer can build a complex application by assembling a chain of simple event processors.
This approach to application development has potential, but is costly, time-consuming and error-prone at present because of the lack of a supporting software environment. The programmer must still provide the communication, management and housekeeping facilities such software needs, and the resulting application is not as portable or flexible as it could be. What is missing is an operating system for structured information processors.
Briefly, an embodiment of the present invention includes an extensible and dynamic software operating environment supporting applications which process structured information, and particularly to an environment supporting XML processors.
The foregoing and other objects, features and advantages of the present invention will be apparent from the following detailed description of the preferred embodiments which make reference to several figures of the drawing.