The present invention relates generally to a data processing system and computer program product for sorting information, and more specifically, to searching hypertext documents and selectively transmitting and/or displaying a document and/or its component parts.
Metadata is data about data, or more specifically, data concerning the content of data. Metadata can include the source of data, the type of data, and dates related to data. For example, a granted patent has at least two associated metadata. First, the granted patent has a filing date that identifies when the data was submitted as a patent application to the U.S. Patent and Trademark Office. Second, the granted patent has a granted date, which is the date on which the patent office granted the patent in a way that permits the owner to enforce his rights to the material described therein.
Metadata plays a key role in the Internet. Specifically, as used in markup languages such as hypertext markup language (HTML), metadata can be placed in documents in a manner that is not ordinarily visible to a user of a modern browser. This metadata is enclosed in one or more HTML tags. The first HTML pages were authored by people in a way where each keystroke is typically added by a human being. However, as new models for blending articles with advertising and navigation evolved, pages began to appear based on fragments or elements that were authored independent of one from the other.
Search engines are a combination of databases that are fed by information collected by spiders. Spiders are automated programs that collect HTML by methodically traversing the links in each page. Spiders can obtain metadata based on the date that the spider visited an HTML page, for example, as defined by its universal resource locator (URL). Such information can be used later to filter results of a search engine query to a specified range of dates—where the dates are those dates that the spider collected the data.
One limitation of the above method of creating ‘date’ metadata is that the search engine only identifies the date on which the spider visited the web page. Nothing indicates the creation date. Conventional web pages' HTML tags also do not identify an expiration date, nor other metadata concerning when the web page, or its component elements, is obsolete or otherwise invalid. In addition, the component parts themselves may have distinct creation dates, which the prior art fails to identify in HTML tags.
Accordingly, a remedy is sought.