This invention relates generally to information systems and more particularly to a computing system having an extensible architecture that is capable of storing information for millions of users.
A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright client has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever. The following notice applies to the software and data as described below and in the drawing hereto: Copyright (copyright) 1998, Microsoft Corporation, All Rights Reserved.
The Internet is a worldwide collection of networks that span over 100 countries and connect millions of computers. In 1997 traffic on the Internet doubled every 100 days. At the end of 1997, more than 100 million people were using the Internet. Reports indicate that the Internet is growing faster than all preceding information technologies including radio and television. The World Wide Web (WWW) is one of the fastest growing facets of the Internet and represents the computers that support the hypertext transfer protocol (HTTP) which is a common protocol for exchanging information.
Because there is no central authority controlling the WWW, finding useful information within the WWW can be a daunting task. In an effort to ease this burden, specialized web sites, known as xe2x80x9cportalsxe2x80x9d, seek to provide a single access point for users. Many of these portals implement software, referred to as robots or crawlers, that traverse the WWW in order to collect information and generate a searchable catalog. Thus, a key element to these systems is a massive storage system that holds the voluminous catalog. In addition, recent portals allow each user to customize the information, thereby further burdening the storage system with personalization data for millions of users. For these reasons, and for other reasons stated below which will become apparent to those skilled in the art upon reading and understanding the present specification, there is a need in the art for a scalable storage system that is capable of efficiently and reliably handling millions of accesses per day.
A massively scalable architecture has been developed for providing an extensible storage system that is capable of handling hundreds of millions of users and tens of billions of files. The storage system includes a plurality of storage clusters, each storage cluster having one or more storage servers. Each storage element has a corresponding schema object that is used to parse the data elements into the data and attributes. Applications executing on the web servers are able to dynamically define a new type of element for storage within storage system. In one embodiment the schemas are defined in Extensible Markup Language (XML).