Service-Oriented Architecture (SOA) is an approach for creating loosely coupled, highly composable services and APIs in organizations with large IT centers. SOA became very popular in the early part of the 21st century as large enterprises searched for a better approach to delivering applications fast and efficiently. SOA was also seen as an appropriate strategy to address the growing need for organizations to share information across their own boundaries, with business partners, government agencies, customers, and the public at large. However, over time, the term SOA became associated with overreaching and ill-fated attempts to re-invent IT in the enterprise. Many of these initiatives ultimately failed, and the term rapidly fell into pejorative use.
Nevertheless, the fundamental concepts of SOA—building loosely coupled interfaces for well-described functions, leveraging highly successful and ubiquitous Web transports as the substrate of distributed communications—were sound and very much needed by any organization attempting to publish functionality to customers, partners, and its own staff. There has been a significant shift from the complex protocols most often associated with SOA (XML messaging, SOAP envelopes, multiple transport bindings) to much simpler and more lightweight approaches using the principles of the RESTful architectural style, which advocates that distributed computing should follow the fundamental architecture of the World-Wide Web. This trend toward simplicity and ease of development touches content models (JSON replacing XML), identity (OAuth and API keys replacing more sophisticated security tokens such as SAML or username/password combinations), and transport (now exclusively HTTP). Even the word “service” is gradually being replaced by API, with the later implying the same principles but generally being associated exclusively on HTTP transport, RESTful invocations, and JSON content. In this document we will use the two terms together—that is, service/API—as at their core they represent the same basic concept describing the best practices to publish componentized functionality.
Regardless of the architectural flavour, the basic challenge an organization faces when publishing a service/API is securing it. Access control is one important aspect of security. Making a service/API available to the outside world involves deploying a publically accessible endpoint, which may be challenging because of perimeter security models employed in most organizations (this is illustrated in FIG. 1). Restricting access based on identity—the process of authentication and authorization—is a part of this. Related to access control is audit, and in particular capturing a permanent record of who-is-accessing-what. Many organizations now want to monetize their services/APIs, so accurate capture of the callers identity is essential for billing purposes. Detection of Internet-based threats, such as SQL injection or Cross-site Scripting (XSS) is another important security function that must be applied to all services and APIs.
There are other aspects of management of services/APIs that while not related to security are nonetheless essential in a robust distributed computing environment. Monitoring of transactions is important both for operationally continuity, and access to historical information is critical for troubleshooting and formal forensic investigation. Message transformation is useful for versioning or adaptation of mismatched clients and servers. Rate limiting is important to protect systems from bursts of traffic which might adversely affect overall performance.
Any system that is directly exposed to Internet transactions must also be hardened to withstand potential attack. Typically this involves complex operating system (OS) configuration and continuous attention to patches and emerging threat vectors. Hardening systems is a difficult task requiring very specialized skills and knowledge. Many organizations lack these skills, and so many systems are compromised not through exploitation of a service/API they host, but through exploitation of the underlying OS that contains unaddressed vulnerabilities.
It is certainly possible to apply all of the above functions to each individual application and server in an organization's network. However, this approach does not scale well, especially in a diverse environment with different architectures and operating systems. Consistency is a fundamental concept in good security, and this is extremely difficult to achieve simultaneously across multiple systems. Furthermore, embedding access control, transformation, audit, etc into service/API implementation is an extremely inflexible solution. Any changes to the policy these actions embody may demand a new compile cycle with associated testing and formal migration through environments including development, test, and production. A seemingly minor policy change in an organization, such as update of a trusted certificate, can have a ripple effect on production systems, each of which must be individually updated.
One currently used system to address this issue is to deploy an intermediate Policy Enforcement Point (PEP) as a security gateway between the client and the service/API. This is most commonly deployed in the DMZ, and acts as a reverse proxy brokering communications to internal systems hosting services and APIs (see FIG. 2).
The PEP is a security-hardened platform that is designed to be a buffer between the Internet and internal systems that are not deployed on platforms that are DMZ-ready (meaning sufficiently hardened against sophisticated Internet-originating attacks). The PEP takes on responsibility for security and monitoring of applications. This includes authentication, authorization, audit, message transformation, routing, rate limiting, transport mediation, orchestration between services/APIs, etc. These are all elements of policy that act on a connection request.
Delegating the above functions to the PEP means that these functions are decoupled from code and local implementation. This has a number of benefits. It promotes reuse and thus consistency. It is declarative, so policy is easily modified to accommodate changing requirements or address emerging threats. But more important, it places security responsibility into the hands of dedicated security professionals, rather than distributing it among developers who may not understand the scope of the problem they must address.
This best practice is known as perimeter-security model. The PEP is effectively a border-guard stationed at the perimeter, using policy to grant access to clients. It allows internal servers to rely completely on the PEP for security and monitoring, greatly simplifying their implementations. In particular, it allows internal systems to be deployed on general purpose operating systems that use no specialized security hardening. Overall this greatly simplifies security management and monitoring of services/APIs in an organization. However, it also means that internal systems are completely reliant on the PEP-guarded perimeter security model, and thus are not easily moved to a different environment with a higher risk profile. More specifically, it becomes very difficult to migrate perimeter-secured, internal applications hosting services/APIs outside to cloud environments.
Cloud computing promises to make the deployment and operation of applications more agile and offer customers an opportunity to rapidly scale their application instances up or down in response to changing resource requirements, while under a pay-for use model that promises to revolutionize the delivery of IT services. To accomplish this, cloud computing systems leverage technologies such as virtualization. Cloud computing service providers deploy virtualization infrastructure en mass over farms of commodity servers, providing a multi-tenancy operating model to their customers. The use of this commoditized infrastructure and centralized management allow for vast economies of scale to be achieved, thus driving down costs for IT.
Cloud allows an organization to shift its budget focus from CAPEX-dominated budgets (equipment acquisition, long lead times) to smaller OPEX-focused budgets (pay-for-use, instant access and scaling). This model is very attractive to CIOs and CEOs, who see IT as critical to their business, but not something their organization is necessarily effective at. Outsourcing to cloud eliminates capital expense and leverages outside expertise in day-to-day system operations.
Cloud is generally characterized according to definitions put forward by NIST. The NIST definition outlines three major cloud instantiations: Software-as-a-Service (SaaS), Platform-as-a-Service (PaaS) and Infrastructure-as-a-Service (IaaS). SaaS basically describes fee-for-service web sites that satisfy important business functions, such as Customer Relationship Management (CRM) or office applications. Salesforce.com and Google docs are two such examples of SaaS cloud computing. SaaS has specific limitations in what customers can do. SaaS customers can basically customize screen flow and work on data; however they cannot load arbitrary applications into a SaaS cloud.
PaaS provides the building blocks of applications, such as databases, application servers, directories, queuing software, etc. PaaS is intended as a curated stack of critical production development infrastructure that is available to developers so they don't need to worry about installing and operating them—they are just available and maintained by the PaaS provider. Developers deploy code—such as Java, PHP, Ruby, .NET framework languages, Python, etc—into the PaaS containers. Of the three instantiations of cloud, PaaS is probably the least defined at this time, but arguably has the greatest future potential. Microsoft's Azure, and VMware/Salesforce.com's VMForce are PaaS initiatives.
IaaS is basically mass virtualization infrastructure for hire. Customers of IaaS providers take standardized virtual images of their applications (which include both the OS and their application) and run this in the cloud provider's infrastructure, as illustrated in FIG. 3. The provider maintains a large data center of commodity servers (CPUs/memory), distributed storage space (usually SAN disks), and network elements. All of the infrastructure is shared in an attempt to get very high utilization rates and keep costs down. The uniformity of the environment means that vast economies of scale can be achieved, in hardware, in process, and in people running the service.
IaaS cloud computing is the focus of the innovation that is to be described here. Subsequent references to cloud computing should be taken to imply IaaS clouds.
Cloud computing in general, however, introduces a number of new risks and challenges around data and applications. Many of these are a by-product of the shift from a private computing environment to a multi-tenancy environment, the transfer of control of infrastructure from local IT staff to a cloud service provider, and the loss of the perimeter security model. These issues conspire to erode confidence in the emerging cloud computing trend among IT managers, and slow the uptake of cloud technology.
In the IaaS cloud, no perimeter-buffer exists. Every application stands on it's own, completely accessible from the outside world. These applications are thus subject to all of the risks of an application deployed in a corporate de-militarized zone (DMZ). In addition, this threat is elevated by virtue of residing in a cloud provider, such as Amazon's EC2. As cloud providers gather a large number of applications into a readily accessible space, they are obvious targets for system crackers. Thus, every application in the cloud must be hardened to be resilient to external attack.
This radically different risk profile from the enterprise data center makes it difficult to migrate existing enterprise applications to the cloud. These applications are generally built under the assumption that a security perimeter is in place, and that a PEP may be in place to delegate security and monitoring processing. The application is deployed on an un-hardened operating system, and the application may have no capabilities for sophisticated access control models, threat detection, rate limiting, etc. Thus, the application hosting the service/API is highly vulnerable to compromise if moved to the cloud unchanged. This is illustrated in FIG. 4.
One solution might be to put a PEP in the cloud, insulating the application from direct outside access. This may work in some controlled situations, but few cloud environments support isolation models that can guarantee that there is no way for an outside party to end-run around the PEP and contact the application directly. This is a side-effect of the surrender of control organizations subject themselves to when they move to the cloud. In a multi-tenant environment, it is generally not possible to impose traditional restrictions on routing to accommodate the needs of a single customer.
There is a further internal threat which must also be considered. Cloud providers attempt to achieve very high utilization of CPU, storage, and network resources so they may maximize profit potential. An effective cloud provider will have hardware utilization rates that far exceed a typical organization because they run multiple virtual images simultaneously on the hardware. This means, however, that an organization may not have exclusive access to the hardware their applications run on. A competitor—or a system cracker—may be running on the same machine, in a separate virtual image space. Modern hypervisors are very effective at isolating access between images (meaning no visibility into memory space of another image, no ability to put network drivers into promiscuous mode to see traffic not intended for this image, etc); however, normal network access is generally permitted (such as a socket connections from one application to another). Therefore, every application in a multi-tenant environment must consider the potential threat of connections from a hostile party also running internal to the cloud provider (see FIG. 5). This is another reason that perimeter-deployed PEPs are not effective in cloud environments. Even if they could restrict outside traffic from connecting to an internal application, they cannot easily mitigate internal threats.
The foregoing issues all result from the fundamental differences in deployment environment between the private organization and the cloud, and the surrender of network control that is necessary when an organization moves to a cloud provider. The private organization has complete control over their network topology. They can configure switches, routers, and firewalls to build an effective DMZ-based perimeter security model that supports deployment of PEPs. This allows internal systems to completely delegate security and monitoring responsibility to the PEP; thus, these systems can be quite insecure themselves.
In the cloud, customers surrender all control of network topology and operate in a multi-tenant environment where all running images are accessible from the Internet and moreover from other internal hosts. Thus, every app moved into the cloud must run on a security hardened platform, and by extension must independently implement all aspects of security and monitoring that otherwise would be the responsibility of a dedicated PEP. If an organization has a large number of applications, this represents tremendous effort and a huge potential ongoing maintenance burden.
At present, there is no integrated solution that offers a complete answer to this challenge. There are some partial solutions, but each falls considerably short of providing a simple way to secure and manage applications in the cloud.
Rightscale (http://www.rightscale.com/) offers a solution for automation and management of Infrastructure-as-a-Service (IaaS) cloud-based solutions. They do not address application transaction security, nor they do not monitor application transaction activity. Their solution is aimed much more at managing the elastic scaling of images.
Symplified (http://www.symplified.com/) offer cloud single-sign on and access control. Their method uses a centralized, cloud-based authentication broker. This is their PEP equivalent, and it has only limited policy enforcement capabilities (mainly focused around simple HyperText Transfer Protocol (HTTP) and Representational State Transfer (REST) based communications). Their solution is cloud based, so forces application developers to harden their images and provide simple access control from the Symplified servers. (It should be noted that cloud-based here does not imply it provides any isolation or benefit from possibly executing in the same cloud as the target service. It is still possible to end run around these solutions and get direct access to the running application either from the Internet or within a cloud provider.)
Mashery (http://www.mashery.com/) provides a cloud-based solution for simple access control, rate limiting, and activity monitoring. However, developers still must harden their own systems and provide basic access control to ensure that only connections from the remote Mashery server are accepted.
Similarly, APIgee (http://www.apigee.com/) and 3Scale (http://www.3scale.com/) offer simple cloud-based authentication and activity monitoring systems. APIgee has both a cloud-based solution and a regular enterprise on-premise PEP.
3Scale does provide agents for integration into an application. This agent communicates with a cloud-based Policy Decision Point (PDP) to render access control decisions. This agent provides some rudimentary access control protection, as well as a scheme to do simple rate limiting; however it does nothing to provide real isolation of the application in the cloud, assist in hardening the baseline images, or any other features of a comprehensive PEP such as routing, orchestration, message transformation, etc.
Amazon Web Services (AWS, at http://aws.amazon.com/) has an isolation offering called Virtual Private Cloud (VPC), which creates a network-based isolation layer around all of an organization's running images. It ties this isolated sub-cloud into an enterprise's on-premise network with a VPN. Thus, the VPC becomes effectively a data annex for the enterprises. All communication to or from cloud-resident images must go back to the enterprise and proceed past the regular corporate perimeter security model.
VPC is an effective solution for carving out part of the cloud and using it as a data annex; however, there are drawbacks. If a cloud-based application is compromised (an outcome that is much more likely for a cloud-resident application than a local one), it has a direct path back into the enterprise as a trusted host. Organizations must trust completely the isolation model in the cloud, the details of which are not shared by the provider. Under these circumstances, it is difficult to be certain that VPC is not subject to compromise. Nevertheless, up to now, VPC is probably the best available solution for organizations who want to carve out part of the cloud as a private data center annex.
CloudSwitch (http://www.cloudswitch.com/) provides a solution similar to Amazon's VPC, with a particular emphasis on dynamically switching work loads out into the Amazon cloud in a secure manner. It is subject to the same connection restrictions of Amazon's VPC.
VMware (http://www.vmware.com/) has been working to provide levels of network isolation in the cloud using virtual networking. Their vShield Edge product allows cloud providers that use Virtual Cloud Director (vCD) to create layer 2 and 3 network isolation zones that contain multiple running virtual machine images. Thus, virtual DMZ's can be created, and virtual PEPs can be deployed in these to protect applications hosting services/APIs in a virtualized secure zone.
The drawback to this approach is that it only works in clouds built in the VMware vCD infrastructure, which is proprietary. vShield is an integral part of the offering and cannot be easily moved to a non-VMware environment, such as a Xen-based environment like Amazon Web services.
There are various approaches to provisioning using assemblies. 3Terra (http://www.3terra.com/) has a product that supports composition and deployment of multi-tier applications in the cloud. It support provisioning and scaling up/down, and monitoring and visibility based on instrumentation of virtual images. However, it does not monitor transactions at the application protocol level. Furthermore, 3Terra does not offer a security solution.
The Open Virtualization Format (OVF, see http://www.dmtf.org) has a packaging structure for multi-image applications. This addresses portability and packaging; however it does not offer scalability, monitoring, or security.
The lack of a simple solution to these basic security and management problems is keeping organizations from undertaking wide scale deployment of applications into the cloud. It is an object of the present invention to obviate or mitigate the above disadvantages.