It has become commonplace to use computers, and networks of computers, to facilitate a wide variety of activities including work and recreation. Modern computer networks incorporate layers of virtualization so that physically remote computers and computer components can be allocated to a particular task and then reallocated when the task is done. Users sometimes speak in terms of computing “clouds” because of the way groups of computers and computing components can form and split responsive to user demand, and because users often never see the computing hardware that ultimately provides the computing services. More recently, different types of computing clouds and cloud services have begun emerging.
Cloud service platforms vary in the types of services they provide and the types of applications they are intended to support. At one end of the spectrum are “low level” services, such as platforms that provide access to the operating system, one or more development frameworks, databases and other like facilities. A primary goal for these platforms is to reduce hardware and IT costs without otherwise restricting the application developer's choice of technical solution or application space. At the other end of the spectrum are platforms that provide facilities to create applications in the context of a preexisting application with a well-defined purpose. Such “high level” cloud services typically focus on one or more well-defined end user applications such as business applications. A goal of these platforms is to enable the creation of extensions to a core application. The services provided in this case are typically skewed toward the context of the embedding application and away from low-level services and choice of technical solution. Some high level cloud services provide an ability to customize and/or extend one or more of the end user applications they provide, however high level cloud services typically do not provide direct access to low level computing functions. This can be problematic with respect to fault tolerance, for example, maintenance of data and/or behavioral integrity after experiencing an unexpected or interrupting event such as a power or communications network failure, since conventional approaches typically use low level computing functions to implement fault tolerance.
FIG. 1 depicts aspects of an example computing environment 100 that may benefit from at least one embodiment of the invention. A variety of client applications (not shown) incorporating and/or incorporated into a variety of computing devices 104 may communicate with a multi-tenant distributed computing service 108 through one or more networks 112. Examples of suitable computing devices 104 include personal computers, server computers, desktop computers, laptop computers, notebook computers, personal digital assistants (PDAs), smart phones, cell phones, computers, and consumer electronics incorporating one or more computing device components such as one or more processors. Examples of suitable networks include networks including wired and wireless communication technologies, networks operating in accordance with any suitable networking and/or communication protocol, private intranets and/or the Internet.
The multi-tenant distributed computing service 108 may include multiple processing tiers including a user interface tier 116, an application tier 120 and a data storage tier 124. The user interface tier 116 may maintain multiple user interfaces 128 including graphical user interfaces and/or web-based interfaces. The user interfaces 128 may include a default user interface for the service, as well as one or more user interfaces customized by one or more tenants of the service. The default user interface may include components enabling tenants to maintain custom user interfaces and otherwise administer their participation in the service. Each tier may be implemented by a distributed set of computers and/or computer components including computer servers. The data storage tier 124 may include a core service data store 132 as well as a data store (or data stores) 136 for storing tenant data.
The application tier 120 of the multi-tenant distributed computing service 108 may provide application servers 140 for executing customizable and/or extendible end user applications. For example, the application tier may enable customization with a programmatic language such as a scripting language. Custom program code may be executed in a controlled execution environment instantiated 144 by the application servers 140. For example, custom scripts may be executed by a scripting language interpreter.
Conventional attempts to enable customization of high level cloud services, such as the multi-tenant distributed computing service shown in FIG. 1, while addressing fault tolerance issues are inefficient, ineffective and/or have undesirable side effects or other drawbacks.
For example, programs running under a conventional computer operating system can typically use low-level mechanisms provided by a database to ensure that data consistency is maintained in the presence of unexpected interruptions. For particularly long running processes, other mechanisms may be employed to track the progress of a program in order to support recovery from unexpected events such as power loss, network interruptions or other system failures. For example, a program performing a repeated operation on a homogenous list of data objects can encapsulate each identical operation in a database transaction and include information to indicate completion of each unit of work. If the process is interrupted, when the system restarts, the program can query for the unprocessed objects and resume without sacrificing consistency in the data.
A process running on a cloud-based platform may not, however, have access to the same low-level facilities available to one written directly on the operating system. On such a system, data consistency may only be guaranteed within a scope of one system-level data access operation, such as the read or write of a business object. Even in a case where a long-running process is built from multiple identical computational units, these units may include more than a single data access operation. This can leave the process vulnerable to data inconsistencies if an unexpected interruption occurs, not between computational units, but in the midst of a single computational unit.
Users of low-level platforms are in a position to manage consistency issues because they typically have access to facilities available to conventional (non-Cloud, on-premise, etc.) development environments. For example, the transactional nature of a relational database may be combined with an architecture that minimizes a number of operations performed in a single transaction as the basis for ensuring consistent, durable data state.
On high level platforms, however, a program may not have direct access to a database transaction. Instead, the underlying database transactions may be used to ensure consistency of data access to the higher level business objects that these platforms interact with. In order to create an atomic process that spans multiple high-level data access operations, these platforms may provide a restricted form of transaction management that spans a small number of accesses and/or impose further limits on the types of platform services that may be utilized during the transaction.
Such a high level platform, such as the multi-tenant distributed computing service shown in FIG. 1, may be susceptible to intermittent interruptions including system failure, a resource governance mechanism, planned system restart, and monitoring-initiated restart. Resource governance may include resource utilization tracking on a per tenant basis by the platform and resource utilization limitation in accordance with an agreement between the tenant and the platform service provider. However, advantageously to the service provider, interruptions need not be guaranteed to provide signals that allow a client process to safely shut down. Such interruptions can be classified as (i) those occurring during a data write operation and (ii) those occurring outside a data write operation. In the former case, interruptions occurring inside a data write operation may not leave a business object in an inconsistent state because they are protected by an underlying database transaction. However, during the latter type of interruption, data consistency between business objects may be lost.
Embodiments of the invention are directed toward solving these and other problems individually and collectively.