Cloud computing is a computing infrastructure for enabling ubiquitous access to shared pools of servers, storage, computer networks, applications and other data resources, which can be rapidly provisioned, often over a network, such as the Internet.
A “data resource” as used herein may include any item of data or code (e.g., a data object) that can be used by one or more computer programs. In example embodiments, data resources are stored in one or more network databases and are capable of being accessed by applications hosted by servers that share common access to the network database. A data resource may for example be a data analysis application, a data transformation application, a report generating application, a machine learning process, a spreadsheet or a database, or part of a spreadsheet or part of a database, e.g. records or datasets.
Some companies provide cloud computing services for registered customers, for example manufacturing, scientific and technology companies, to create, store, manage and execute their own resources via a network. This enables customers to offload data storage and data transformation, data analysis functions etc. to a cloud-based platform having appropriate resources and computing power for providing secure access to the data resources, potentially for many registered users of the customer.
The platform may also provide applications, services and microservices for performing additional functions, such as specific transformations or analysis on the data, thereby negating the need for the customer to install such applications, services and microservices on their own servers and provide the consequential support and maintenance.
Customers may wish to use their own applications, code or use languages in which they are fluent on their own datasets for example to perform a transformation task forming part of a data processing pipeline comprising multiple such tasks and/or datasets. For example, the customer may own a dedicated application that is not provided for by the platform, nor can be uploaded to the platform, e.g. due to its size, or due to confidentiality or licensing restrictions. For example, the customer may require the use of one or more data resources, such as a complex model, which contains a large set of data, possibly confidential data and/or requires specialist hardware to run. This may mean that it is not feasible to provide it to the integrated platform. Nonetheless, the customer may need to use one or more datasets stored on the integrated platform as input to the data resource and may need to provide the outputted data back to the integrated platform for storage or so that one or more further tasks of a pipeline can be carried out.