This invention relates in general to data processing in networked computers and, more specifically, to an object-oriented approach for handling digital information in large distributed networks such as an Intranet or the Internet.
The evolution of computer systems and computer networks is one of accelerating growth. Individual computer systems are equipped with vast amounts of storage and tremendous processing power. Even as these computing resources have increased, the physical size of the systems has decreased, along with the cost of such systems, so that the number of computer systems has grown dramatically.
Not only has the number of computer systems increased at an astounding rate, but improvements in telecommunications have contributed to making massive worldwide networks such as the Internet a reality. However, these advances in computer systems and computer network technology make an overwhelming amount of information available. So much so that it is difficult for a user to extract desired information from the network or to be able to efficiently use the network to accomplish many tasks. Although the Internet has made it very easy to both publish information as well as access it through content servers and browsers, respectively, this opportunity has given rise to thousands of information publishers and millions of information consumers.
This phenomenal growth is making it increasingly difficult for an information consumer to hook up with the publishers that are of interest. A second problem is that the continuous exchange of information in the form of data or algorithms between the myriad types of computer systems, or platforms, which are foreign to one another and mutually incompatible, means that a user of the Internet needs to be aware of compatibility issues such as which types of applications run on which platform, what data formats can be used with which applications, etc. A third problem with the Internet is that, although computer systems have acquired resources to be able to run very large, resource-intensive application programs, this type of “fat” resident application, or client, on a user's machine does not fit today's Internet paradigm where restricted bandwidth makes it impractical to transfer large amounts of data or programs. Also, the shear volume of information and number of users means that stored space on the various server computers making up the Internet, and used to perform routing of information on the Internet, is at a premium.
Brute-force “keyword” search engines already prove incapable of effectively solving the problems of the Internet. A whole plethora of “push” technologies is emerging that is attempting to provide solutions to this problem within some spectrums. A few “publish-subscribe” solutions exist, but these demand a fair amount of infrastructure at both the publisher and consumer ends. Each of the shortcomings of these approaches is discussed in turn.
Keyword Search
An example of the inefficient data search and retrieval of the popular keyword search engines is illustrated by the following example of performing a simple search to locate job candidates.
Assume, as was done in an actual test case, that the user of a computer system on the Internet wants to locate and hire computer programmers. Using traditional Internet search technology, the user might go to a website such as AltaVista, Yahoo!, HotBot, etc., and enter a search query for “programmer available.” This search on AltaVista in February 1998, produced 166 documents matching the query. However, the vast majority of these documents are useless in accomplishing the goal of finding a job candidate. For example, many of the documents are outdated. Other of the documents merely use the phrase “programmer available” in ways other than to identify an actual job candidate. Some of the documents are from “dead” links which no longer exist and are inaccessible. Many of the documents were duplicate documents resulting from peculiarities in the way the database is compiled and maintained.
Many of the documents in the search results would not be useful even if they identified an available programmer candidate. This is because the candidates are from different places in the world and many of the documents are old, meaning the programmers are probably not available anymore or have moved. Of course, the search can be refined by adding additional keywords, such as the specific type of programming language skill desired, region, restricting the documents to a specific timeframe, etc. However, since the only tool available to the user to refine the search is to add keywords, or to place relational conditions on the keywords, a second full-text search of the entirety of documents on the Internet would yield many of the same problems as in the previous search, along with new problems introduced by unpredictable uses of the additional or modified text phrases in the free-form format of documents on the Internet.
Another limitation with the full-text search engines available on the Internet today is that much of the information on the Internet exists in “dynamic” web pages which are created in response to specific one-time requests or actions by human users or automated signals. Even the so-called “static” web pages are updated frequently, or are left on the Internet long after they cease to be supported or cease to be valid or relevant. Since the search engines compile a database based on “robots” or “spiders” visiting sites on the Internet at repeated time intervals many of their results are unrepeatable or outdated and invalid. Also, the spiders are not able to discover all possible web pages such as pages that might be included in a resume database that is not published in the form of one web page per resume. Still further problems exist with keyword search engines in that use of the text language is not fully standardized. An example of this is that many people use the spelling “programers” instead of “programmers” with two ‘m’s.
The second problem with the Internet, that of compatibility issues between platforms, programs and data types, is encountered by a user of today's Internet whenever a user tries to obtain software, and sometimes data, from the Internet. Although the Internet has provided a wealth of commercial (and free) software, utilities, tools, etc., much of this software requires a great deal of effort the part of the user to get it running correctly, or is of little or no value because of incompatibility problems that must be solved at the user's time and expense.
For example, when a user downloads a piece of software, they must know about their computer, operating system, compression/decompression utility required, etc. in order to determine whether the software being downloaded is going to be usable in the first place. Keeping track of proper versions of the software and utilities further complicates matters. This is especially true when the software obtained is designed to work with data of a certain type, such as where the software is used to access multimedia files of a certain provider's format, is a new driver for hardware from a specific manufacturer, etc. This makes it difficult for would-be manufacturers of third party “value-added” utilities to produce software that can be used with other software, data or hardware made by another manufacture. Thus, although today's Internet is successful in making available a plethora of software, utilities, tools, drivers and other useful programs; and can usually adequately deliver the software to a user, it fails in providing a uniform and a seamless environment that eliminates significant compatibility problems essential to allowing a user to easily obtain added functionality.
The third shortcoming of the Internet is the relatively poor ability of the Internet to download large amounts of digital information which make up the data and programs of interest to a user. Today, because of improvements in storage capacity and processing power, a typical user runs applications that are resource-intensive and thus require large amounts of data and large programs to manipulate the data. For example, it is not unusual for a user to download a demonstration program on the order of 10 to 20 megabytes. Such a download through a 28.8 k bit/sec. modem might take 3-6 hours depending on the speed of the user's overall connection to the Internet, server overload, number of server “hops” to connect with the download server, etc. Thus, although the trend in computer systems has been toward larger-and-larger application programs which manipulate huge data structures, this trend is incompatible with a network such as the Internet which is rather limited in the speed with which it can handle the demands of the many millions of users trying to utilize it.
“Push”
The approach of finding out what information a user desires and “pushing” this information to the user by sending it over the network to the user's computer from time-to-time is epitomized by the application PointCast. The application program requires the user to specify areas of interest such as categories of news (e.g., politics, business, movies, etc.), specific sports teams, stocks, horoscope, etc. The user's interests are then recorded at a Pointcast server site. Periodically the user is sent, or “pushed,” the specific information from PointCast's server site to the user's computer. The information is compiled and maintained by PointCast although other companies may be involved.
Although “push” technology such as PointCast has the advantage that the user can be updated automatically about specific categories of information, this approach is not very flexible and does not provide much improvement in obtaining information other than providing a tailored version of the daily newspaper. Drawbacks with “push” technology include the inability of the user to specify arbitrary information—the user must pick from a list; there is no mechanism for the user to obtain information from outside of the “push” provider's server site, and the user cannot upload the user's own information for distribution.
“Push” technology provides uniformity across platforms and data types but it does so only by limiting the user to a single application front end and to information controlled by a single main entity. In this sense, “push” technology thwarts the usefulness of a universal interactive network like the Internet and transforms it into a non-interactive traditional information model, such as radio or television.
Because the pushed information comes from a server or servers controlled by a single entity, push technology fails to create a standardized market for information object, information processing products and information services. Instead, the push approach pits different push providers against each other for user share. The push approach, unlike the Internet paradigm, is not an open approach and, in fact, is contrary to what many view as the exciting and valuable qualities of the Internet.
Publish-Subscribe
The Publish-Subscribe approach provides a more powerful information exchange system than the “push” approach. The Publish-Subscribe approach allows a user, in a similar manner to the “push” approach, to specify the type of information to which the user wishes to subscribe. However, the user is not strictly limited to categories presented by an application program front-end. Rather, a typical Publish-Subscribe approach allows a user to specify more general types of information such as by using a plain-text subject description.
In a Publish-Subscribe network, publishers provide information that is freely available for all users. Publishers can make any generalized type of information available and identify such information by a mechanism such as the “subject” line. With publishers publishing information and identifying the information by subject, and subscribers subscribing to information identified by subject, processes within the Publish-Subscribe network perform the task of matching up the subscription requests with the available published information and setting up resources in the form of, for example, “channels,” so that the transfer of information can take place. However, Publish-Subscribe has been successful only in relatively small, proprietary networks where the publishers and subscribers are aware of the types of information in the network and agree on how to identify the information. Since Publish-Subscribe is limited in how the types of information are specified, as by plain-text, for example, a subject header ensuring that a proper match takes place introduces problems similar to those discussed above with the keyword search engines. So far, the Publish-Subscribe approach has failed to be scaled up to be suitable for larger, more generalized networks such as a large Intranet or the Internet because the Publish-Subscribe model fails to provide efficient mechanisms allowing simple and versatile unrestricted data subscriptions and efficient, organized and robust distribution of information.
Further, Publish-Subscribe technology relies on custom front-ends that have specific features designed by a single manufacturer. For this reason, Publish-Subscribe technology limits the user in the types of information, and distribution of information, that are possible. Similar to “push” technology, Publish-Subscribe does not provide an “open” information architecture allowing unrelated commercial entities to provide information items, information processing products and information services in a compatible and mutually beneficial manner.
Prior Art Information Processing Models
FIG. 2A shows the prior art models for information processing in networked computer systems.
In FIG. 2A, conventional distributed client/server applications and their manner of processing data are illustrated in diagram form. The execution of a “singleton” application is shown at 160. This represents an application program executing on the user's local computer, such as a desktop computer where almost all of the data and executed instructions reside on the user's computer. The application program has a large amount of data associated with it and typically performs all of its processing on the local data, which may be copies of remote data. The computer is hooked up to a network represented by lines 162. The network, in the singleton application case, is used only to access a common database 162 that might be shared among several users as, for example, in a workgroup. An example of a singleton application is a database program such as would be common in the late 1980s. The common database 162 can be modified by various database application programs executing at the various user computers. Such updates or modifications are typically made through a limited set of commands such as Get/Set illustrated at 164. Data can also be routed through routing hardware 178 to remote data store servers such as 182 via additional network connections such as 180.
Later models of information processing make more use of the network so that more data can be present at remote locations. As shown at 170 the evolved model has a user operating a client at a local computer, or workstation, as before. However, much of the data 173 now resides in a remote location as, for example, at the user's server. Communication with the server (not shown) is via a local network such as an Ethernet, indicated by line 172. Naturally, various copies of data will exist and some copies of the data will necessarily exist at the user's desktop or workstation such as in the user's random access memory (RAM), cache memory or disk, in local files that the user may have designated or that may be automatically created by the client application. However, the model from the user's and the client's point of view is that remote data is being accessed.
Get/Set operations 174 can be performed on the data 173. The data is often obtained from, and synchronized with, other remote databases through one or more networks and associated routing hardware, indicated by networks 176 and 180, routing hardware 178 and remote data store server 182. Additional networks and servers can be used to transfer data to and from the user's local server database as indicated by additional network 184 and data store server 186.
Note that a property of the approaches shown in FIG. 2A is that the processing entity, namely the clients 190 and 192, resides in the user's local computer system. Also, as is typical with traditional information processing models, each client is specific and dedicated to processing data of certain types and to performing specific limited tasks. In other words, the dozens of processing applications created by different software manufacturers are incompatible with each other in that they cannot, without considerable effort, be made to process a data structure created by a foreign application program.
From the above discussion, it is apparent that a system that provides for data searching and manipulation on the Internet in an efficient, effective and intuitive manner would be most welcome. The system should provide an environment and architecture that is adaptable for different applications and customizable by information providers and consumers to suit the many types of commercial, recreational, educational and other uses that the Internet fulfills. The system should also operate uniformly across the various computer platforms connected to the Internet and so provide a basis for uniform development of algorithms, or computer programs, to make feasible a third-party value-added market for the Internet. Finally, the system should operate efficiently within the boundaries of the Internet's limited bandwidth so as to make today's Internet a truly valuable resource.