This invention relates generally to information systems and more particularly to a storage system having a scalable architecture that is capable of storing information for millions of users.
A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright client has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever. The following notice applies to the software and data as described below and in the drawing hereto: Copyright(copyright) 1998, Microsoft Corporation, All Rights Reserved.
The Internet is a worldwide collection of networks that span over 100 countries and connect millions of computers. In 1997 traffic on the Internet doubled every 100 days. At the end of 1997, more than 100 million people were using the Internet. Reports indicate that the Internet is growing faster than all preceding information technologies including radio and television. The World Wide Web (WWW) is one of the fastest growing facets of the Internet and represents the computers that support the hypertext transfer protocol (HTTP) which is a common protocol for exchanging information.
Because thee is no central authority controlling the WWW, finding useful information within the WWW can be a daunting task. In an effort to ease this burden, specialized web sites, known as xe2x80x9cportalsxe2x80x9d, seek to provide a single access point for users. Many of these portals implement software, referred to as robots or crawlers, that traverse the WWW in order to collect information and generate a searchable catalog. Thus, a key element to these systems is a massive storage system that holds the voluminous catalog. In addition, recent portals allow each user to customize the information, thereby further burdening the storage system with personalization data for millions of users. For these reasons, and for other reasons stated below which will become apparent to those skilled in the art upon reading and understanding the present specification, there is a need in the art for a scalable storage system that is capable of efficiently and reliably handling millions of accesses per day.
A massively scalable architecture has been developed for providing a highly reliable storage system that is capable of handling hundreds of millions of users and tens of billions of files. The storage system includes a plurality of storage clusters, each storage cluster having one or more storage servers. Each client, such as a user, application, user group, community, etc., is assigned to a unique partition within one of the storage clusters. Within each cluster, however, the data stored in each partition is replicated across multiple storage servers. Thus, the storage system can be easily scaled as the number of reads increases by adding individual storage servers to each storage cluster. In addition, the storage system easily scales to handle an increase in the number of writes, or as the number of files per cluster exceeds a predefined limit, by adding new storage clusters to the storage system. In this manner, the storage system provides redundancy for reads and writes, thereby achieving virtually no downtime when individual servers fail.
In one embodiment, the storage clusters include a write master, a cluster backup and one or more storage servers. One storage cluster, such as a storage cluster zero, further includes a partition master that maps individual clients into a unique partition for storing data elements received from the clients. One beneficial aspect of this technique is that the partition map isolates the clients from knowing where the data is located. Directory paths are generated, directly from a partition ID and an element ID, thereby eliminating time consuming path lookups. In addition, partition IDs and element IDs are assigned so as to balance the directory structure.
Clients access the storage system through a plurality of web servers. In one embodiment, each web server executes Internet Information Server (IIS) on the Windows(copyright) NT operating system. Each web server has an application interface layer, such as Internet Server API, (ISAPI) that retrieves the client specific information from the storage servers.
In one embodiment, the storage system includes a storage manager for configuring and controlling the storage system. In another embodiment, the storage system includes a storage monitor that performs various checks on the partition master, the write master, the cluster backup and on each storage server. The storage monitor informs the storage manager when a failure is detected. In response to the failure message, storage manager promotes one of the storage servers to perform the lost functionality. In this manner, the storage system self-corrects most failure without requiring administrator interaction.
According to one aspect, the storage system facilitates the addition of new storage servers, and the fast recovery of failed storage servers, by logging system transactions in multiple journals of different lengths. When a storage server fails, the cluster backup determines the time of failure and attempts to replay one of the journals in order to bring the failed storage server up to date.
According to another aspect, the storage system facilitates an extensible file store in the each storage element has a corresponding schema object that is used to parse the element into the encapsulated data and attributes. In this manner, applications executing on the web servers are able to dynamically define a new type of element for storage within the storage system. In one embodiment, the schemes are defined in Extensible Markup Language (XML).
According to yet another aspect, the storage system includes a cluster of database servers that resolve complex queries for the storage system. The storage system maintains RAM-based indexes for replying to a majority of the read requests; however, the database clusters resolve complex queries based on the attributes of the stored elements.