OpenStack
OpenStack is a free and open-source cloud computing software platform. Users primarily deploy it as an infrastructure as a service (IaaS) solution. The technology consists of a series of interrelated projects that control pools of processing, storage, and networking resources throughout a data center which users manage through a Web-based dashboard, command-line tools, or a RESTful API.
Representational State Transfer (REST) is a software architecture style consisting of guidelines and best practices for creating scalable Web services. REST is a coordinated set of constraints applied to the design of components in a distributed hypermedia system that can lead to a more performant and maintainable architecture.
OpenStack Object Storage (Swift) is a scalable redundant storage system. Objects and files are written to multiple disk drives spread throughout servers in the data center, with the OpenStack Swift software responsible for ensuring data replication and integrity across the cluster. Storage clusters scale horizontally simply by adding new servers. Should a server or hard drive fail, OpenStack Swift replicates its content from other active nodes to new locations in the cluster. Because OpenStack Swift uses software logic to ensure data replication and distribution across different devices, inexpensive commodity hard drives and servers can be used.
Object Storage Cluster
A collection of servers, called nodes, running all the services and processes needed to behave as a distributed object storage system can be referred to as a cluster. These object storage processes can include proxy, account, container, and object server processes. Proxy server processes handle external communication with clients using a RESTful HTTP API. In this architecture, the account, container, and object server processes each handle their own kind of data. See FIG. 1.
A cluster of nodes can be grouped by region, which is often defined by geography, and then zone. See FIG. 2.
Data Access and Placement
Once a cluster is running processes and correctly grouped it is ready to store objects. In the case of Swift, objects are stored by clients (people or programs) by their storage location.
In Swift, a client sends a request to the storage cluster's API endpoint (http://example.com) and appends the storage location of the object (/account/container/object). Swift provides a user with an account having containers into which objects are put. Accounts are the root storage locations for data in a cluster (/account) and the account server process maintains account information in a database in the cluster. Containers are user-defined segments of the account that provide a way to group objects together (/account/container) and the container server process maintains container information in a database in the cluster. Each object has a unique storage location based on its name and the account and container in which it is stored (/account/container/object). The object server process is the storage service that can store, retrieve, and maintain objects on the drives of the nodes.
While users and applications find an object by its storage location (/account/container/object), the object is actually stored in more than one place in the cluster. The default behavior of the object storage system is to store whole copies of the data on multiple drives for storage. The industry standard is to store three copies of the data, each as far from each other as possible in the cluster so that one hardware failure does not cause data loss or unavailability of data.
Data placement is determined with a variation of consistent hashing ring methodology. Consistent hashing is based on mapping each object to a point on the edge of a circle or, equivalently, mapping each object to a real angle. The system maps each available machine or other storage bucket to many pseudo-randomly distributed points on the edge of the same circle.
To find where an object should be placed, the system finds the location of that object's key on the edge of the circle; then walks around the circle until it falls into the first bucket it encounters or, equivalently, the first available bucket with a higher angle. The result is that each bucket contains all of the resources located between its point and the previous bucket point.
If a bucket becomes unavailable, for example because the computer it resides on is not reachable, then the angles it maps to are removed. Requests for resources that would have been mapped to each of those points now map to the next highest point. Because each bucket is associated with many pseudo-randomly distributed points, the resources that were held by that bucket now map to many different buckets. The items that mapped to the lost bucket must be redistributed among the remaining ones, but values mapping to other buckets still do so and do not need to be moved.
A similar process occurs when a bucket is added. By adding a bucket point, any resources between that and the next smaller angle map to the new bucket. These resources are no longer associated with the previous bucket, and any value previously stored there is not found by the selection method described above.
The portion of the keys associated with each bucket can be altered by altering the number of angles to which that bucket maps.
In OpenStack Swift, the storage location of an account (/account), container (/account/container) or object (/account/container/object) is hashed and the result is used in a data structure, called a ring, to look up the physical locations where data is placed in the cluster. Each cluster has a set of rings, e.g. one account ring, one container ring, and one or more storage policy object rings, which are copied to each node. During the creation of a ring, an algorithm is used to determine how to keep the copies as far apart as possible, while accounting for several factors including storage policies.
Storage policies are a way of defining space within a cluster that can be customized for various factors to meet the specific data storage needs of a user, e.g. hardware tiering, increased data durability, and geographic constraints. These policies are defined in a configuration file and, for each defined policy, a corresponding storage policy object ring is created. A policy can then be applied to a container during its creation and any object stored in the container has the storage policy applied to it.
Because there can be multiple object storage policies, when the system is handling an object, it first checks the object's container to determine which policy is used. The system can then use the correct storage policy object ring to find the locations of the object in the cluster.
For example, a storage policy, e.g. policy-2, is created to store four copies of data instead of the standard three. A new container has this policy applied to it. A user puts an object in that container. The system determines that the container policy is policy-2 and then goes to the corresponding storage policy object ring, e.g. the object-2.ring, and uses the hash of the object storage location to determine the four locations where the object is stored. See FIG. 3.
Characteristics of Object Storage
Frequently when talking about storage systems the idea from Brewster's CAP theorem comes up that a distributed system can only have two out of three characteristics:                Consistency—Updates are applied to all relevant nodes at the same logical time;        Availability—Every non-failing node executes queries returns a response; and        Partition tolerance—The system can still operate and provide consistency and availability if packets are lost or arrive late, or if part of the system is unreachable.        
In reality, nearly all systems start by choosing partition tolerance, i.e. having a system that grinds to a halt if the network is not perfect cannot survive in real world conditions. Accordingly, most storage systems trade off consistency against availability. In this embodiment, consistency is traded off in favor of availability. This is generally referred to as having an eventually consistent storage system.
Eventual consistency makes sense for a distributed object system because there is a fairly good chance that one or more nodes may be unreachable to the rest of the system for some period of time. In such a case, the nodes on either side of the connection failure could very well be able to continue to operate, including storing data, creating containers, etc.
Once the connection is reestablished, there is a possibility of conflicts, including a situation where two containers with the same name but different storage policies have been created. In such a split-brain case, a solution must be created to help the system heal. It would be advantageous to maintain eventual consistency and resolve conflicts between multiple storage policies that are erroneously associated with a hierarchical construct, such as a container or bucket.