Data science typically refers to the science that incorporates various disciplines including, but not limited to, operations research, mathematics, statistics, computer science, and domain-specific expertise. A data scientist thus is one who practices some or all aspects of data science in attempting to solve complex data problems. Such complex data problems may, for example, come up in big data and cloud computing contexts.
A data science project typically runs through a data analytic lifecycle, which includes creation of hypotheses, collection of data, exploration of the data in an analytic “sandbox,” and execution of analytic models across that data. A so-called “sandbox” is the computing resource environment associated with tasks such as data exploration. Typically, there are multiple stakeholder (actor) types involved with a data science project, e.g.: data scientist, data engineer, database administrator, project sponsor, project manager, business intelligence analyst, and business user. One or more of these actors are typically involved in the various stages of the data analytic lifecycle.
Conventional data analytics solutions are becoming more and more limited due to the increasing sizes and variety of data sets that such solutions are applied against. Such limitations include the lack of ability to adequately calculate the cost of the data analytics solution, including costs associated with computing resources and time consumption, particularly in a cloud computing environment. Still further, manual reconfiguration of cloud computing resources after initial provisioning of cloud computing resources can drastically alter the cost and/or time to conduct a data science experiment, as well as put at risk the accuracy of the analytic results.