The rapid increase in popularity of the World Wide Web (WWW or web) has led to a corresponding increase in the amount of data available to users. Due to this massive amount of information, many users suffer since it has become increasingly difficult to find information they want since there are so many choices. Further, it has become increasingly difficult to choose documents having groups of objects (compound documents) appropriate to particular audiences (e.g., elementary school children) who aren't yet ready to view or read certain types of material (e.g., violence). This is especially true for collections of electronic data, such as compound documents, whose contents frequently change, not just quantitatively, but qualitatively as well.
One example of such a compound document is an NNTP newsgroup. The content of such a compound document, the articles, changes daily, because new articles are added and older ones deleted. Although the articles for a given newsgroup should share a particular theme (e.g., alt.rec.guitar should provide discussions related to guitars, songs and guitar playing), a given topic or thread (such as a flame war) inappropriate or irrelevant to a given user might appear one day; last for a few days and then disappear. Therefore, a user cannot depend upon the titles of NNTP newsgroups during a search for information since the titles may be unresponsive to the actual contents of the newsgroups.
Another example of uncertainty of information within compound documents is a web site that provides a list of HTTP links to very new resources relevant to the web site (known as a "cool links" list; e.g., a list of links to neat new java applets from the java.sun.com page). As with the example above, a user cannot depend on the name or description of the cool link's web site in searching for data, since neither is responsive to the contents of the cool links collection.
Many users employ search tools (e.g., http://www.altavista.com) which return links to resources matching a user's query. For example, the query:
Query; Find text containing: "Java" AND "applications" AND "business"
returns an answer which is a list of data objects whose contents match the query. A problem encountered is only the immediate contents of data objects is checked during the query; the contents of any children of the data objects (either immediate or recursive) is not taken into account by the query. Thus, such search tools don't allow users to search for collections of data efficiently or accurately, only atomic data objects (i.e., objects without children).
Another method of information retrieval is that provided by yahoo, which is well known in the art and a description may be found, for example, at http://www.yahoo.com. With this method, an Internet service maps the compound documents of its information providing customers into a concept taxonomy. Information seekers can then navigate through the taxonomy to try and find the information they seek. Even though this method provides a useful device by which to navigate, it does not make provisions for compound documents whose contents change. A given compound document's location or locations in the taxonomy are determined when the compound document is added and can be changed only through manual intervention. Thus, this system is also unresponsive to compound document's whose contents change dynamically.
Thus, there is a need for dynamically providing labels to objects within compound documents, called dynamic META-tagging of compound documents, whose contents are not assumed static and which provides accurate META-tags at all levels of granularity (data object to collection).