Internet portals and search engines, such as MSN®, serve as information gateways to Internet users by accumulating and categorizing information, and providing a wide array of services. Two methods are generally utilized to accumulate information and content to populate a portal's site. The first method is crawling the Web for information by utilizing so-called “spider” programs that examine Web pages looking for a variety of components. The spider scores each page for relevancy using the portal's search engine's proprietary algorithm A limitation of this technique is that relevant information is often missed or ignored by the spider. Further, information that is contained within databases accessed via websites, i.e., information that must be queried to be retrieved, is not retrieved by spiders. Thus, searches conducted by visitors of portals that acquire information in this manner often do not yield satisfactory results and information. These dissatisfied visitors often leave the portal for another source of information.
The second method of acquiring data involves receiving content directly from affiliated data providers and importing the data into the portal's database management system. In the second method, the portals may regularly receive data from hundreds of sources. A limitation of this method is that data providers must conform to the portal's particular data format. Another limitation is that the data must be checked for accuracy, as errors in the importation are common. Yet another limitation is that if the data provider changes its own format, the data aggregator must conform to these changes. This is burdensome on the data providers and the portal operator, and makes it difficult for the portal to add new providers of data and content.
Related to the technical field of data exchange and interoperability, and the second method above, XML is quickly becoming a universal format for structured documents and data on the Web and in software programs. Structured data includes spreadsheets, address books, configuration parameters, financial transactions, and technical drawings. As is known in the art, the Extensible Markup Language (XML) is a set of rules for designing text formats that allows computers to generate and read data, and ensure that the data structure is unambiguous The XML Specification is defined in “Extensible Markup Language (XML) 1.0 (Second Edition),” W3C Recommendation, 6 Oct. 2000, which is incorporated herein by reference in its entirety.
In XML, tags are used to delimit the data within an XML data file (“instance document”) and XML Schemas allow developers to precisely define the structures of their own XML-based formats. The data in the files may be manipulated via several modules and services. Such services include Xpointer, which is a syntax for pointing to parts of an XML document in a similar fashion as a Uniform Resource Locator (URL). Another service is XSL, which is the advanced language for expressing style sheets in XML. XSL is based on XSLT, which is the transformation language used for rearranging, adding and deleting tags and attributes.
Another service is XPath which provides a common syntax and semantics for functionality shared between XSLT and Xpointer. XPath gets its name from its use of a path notation (as in URLs) for navigating through the hierarchical structure of an XML document. The primary purpose of XPath is to address parts of an XML document and it also provides basic facilities for manipulation of strings, numbers and booleans. XPath uses a compact, non-XML syntax to facilitate use of XPath within Uniform Resource Identifiers (URI) and XML attribute values. URIs are strings that identify resources in the web such as documents, images, downloadable files, services, electronic mailboxes, and other resources. XPath operates on the abstract, logical structure of an XML document, rather than its surface syntax.
In addition to its use for addressing, XPath can be used for matching, i.e., testing whether a node matches a pattern. XPath models an XML document as a tree of different types of nodes, e.g., element nodes, attribute nodes and text nodes. XPath fully supports XML Namespaces, where developers can qualify element names and relationships to make names recognizable and to avoid name collisions.
With all of these advantages, it is desirable to apply XML to the problem of receiving and processing data from external data providers. Thus, in view of the foregoing, there is a need for systems and methods that overcome the limitations and drawbacks of the prior art. In particular, there is a need for system by which portals and other data aggregators may utilize XML as a means of simplifying the transferring and validating data and content.