Estimating Resource Costs of Executing Data-Intensive Workloads in Public Clouds
The promise of “infinite” resources given by the cloud computing paradigm has led to recent interest in exploiting clouds for large-scale data-intensive computing. In this paper, we present an analytical model to estimate the resource costs for executing data-intensive workloads in a public cloud. The cost model quantifies the cost-effectiveness of a resource configuration for a given workload with consumer performance requirements expressed as Service Level Agreements (SLAs), and is a key component of a larger framework for resource provisioning in clouds. We instantiate the cost model for the Amazon cloud, and experimentally evaluate the impact of key factors on the accuracy of the model.
We formulate the constructs for modeling the cost of workload execution in a public cloud. We present a cost model for workload execution, and evaluate it in the Amazon cloud. Our cost model is workload aware and provides cost at the granularity of an hour. More importantly, we explore methods for building and instantiating a cost model for workload execution in IaaS-based clouds. These methods are relevant for other IaaS GoGrid or RackSpace. We believe that our cost model provides a basis for modeling dollar-cost for executing any workload type in the Amazon cloud. We anticipate that the users considering clouds for executing their application would find the cost model useful.
Our cost model provides an hourly cost of workload execution, and assumes that the data already exists in the cloud. The experimental evaluation shows that our cost model is a suitable tool for estimating the cost of workload execution for the pay-as-go-scheme in the Amazon clouds.
We vary the use-cases in the user-controllable variables: (a) workloads, (b) VM types and the (c) SLOs’ specifications. Our evaluation workloads consist of analytical, transactional and mixed types. We consider different workload combinations on different VM types. We also specify SLOs on the transactions and the queries belonging to different tenants. The SLOs vary in their threshold and penalty values. The absoluteaverage error in estimating configuration costs across all experiments is 6.28%, which is about $0.02 of the total measured cost of the configurations on average. With the scarcity of training samples, we are unable to verify the distribution of results and resort to using average as the aggregation method. Therefore, these results must be taken with caution.
The current cost model, while adequate for workload execution in a single zone, needs to be expanded in order to deal with any inter-zone and inter-region communication costs. Also we do not address the cost of maintaining consistency between replicas, leaving it as future work.