Large-scale data analytics and intelligence are increasingly important for the successful operations of an organization. To facilitate decision making, large volumes of data may be collected, stored, and analyzed. It may be challenging for all types and sizes of organizations to implement and manage their own big data infrastructure, given that such infrastructure requires large-scale compute, network, and storage resources that would be costly to procure, assemble, and maintain. Thus, on-premises big data operations and in-house management can be cumbersome and expensive. Given such difficulties, big data in the cloud is becoming more commonly used. This cloud-based approach, e.g., subscribing to compute services provided by a third-party vendor over the internet, is also described as a public cloud infrastructure. Cloud-based big data services are gaining popularity due to ease of use, flexibility, cost savings, higher utilization, and in some instances higher performance. Cloud vendors may be able to deliver lower cost services than on-premises options by focusing on high resource utilization and operational automation and sharing support among many clients. However, the increasing popularity of cloud-based big data services creates challenges for scalability and performance.