This invention relates generally to microservices systems and methods, and more particularly to an architecture for metadata microservices for deduplication and storage of object data in large storage systems.
Microservices refers to an architectural style that structures applications as a collection of loosely coupled collaborating services which run in separate containers to implement a business capability. These services are built around the business capability, are independently deployable, and the architecture is characterized by a minimum of centralized management. The microservice architecture enables the continuous delivery/deployment of large, complex applications. It also enables an organization to evolve its technology easily according to needs. Microservices is an architecture that is particularly suitable for deployment in a distributed environment. One example of a business capability for which microservices is especially useful is the distributed deduplication of data to maximize the efficient utilization of storage resources in large data storage systems.
Deduplication involves determining whether data to be stored is unique or is a duplicate of previously stored data. A deduplication process segments an incoming data stream, creates metadata that uniquely identifies data segments, such as fingerprints, and then compares the fingerprints to those of previously stored data segments. If the segment is unique, it is stored on disk. However, if an incoming data segment is a duplicate of what has already been stored, a reference may be created to it and the segment is not stored again. Using fingerprints avoids the need to compare larger data segments. This conserves storage resources, reduces costs and increases performance. Thus, metadata creation, storage and processing are important to deduplication.
Key needs for a deduplicated storage system of a large organization are that the system be able to scale resources based upon demand, has a fast response to queries and metadata updates, maintains persistent state across node crashes, and has low cost. Scalability is important because the load in large distributed systems varies, and the cost of resources is not insignificant. Crash resiliency requires that state be maintained persistently for recovery. Optimum use of resources and high performance requires an architecture in which resources are coordinated and orchestrated. While there are existing databases that can scale, they do so based upon specific capacity, not based upon demand. Thus, they are not flexible in adapting to changing conditions. Moreover, few, if any, known systems have an architecture in which resources are orchestrated or are able to satisfy the requirements of fast response, persistence across node crashes, and low cost.
There is a need for systems and methods that address the foregoing and other problems with known deduplication systems and methods, and it is to these ends that the present invention is directed.