TR-2019-07

Managing the Value of Volatile Cloud Resources: Information Disclosure and Guarantee-Preserving Management

Chaojie Zhang. 31 January, 2019.
Communicated by Andrew Chien.

Abstract

Cloud providers sell unreliable or volatile resources that are unused by foreground (re- served/high priority) workloads to increase revenue while still meeting foreground requests. Hence, volatile resource properties are products of interactions between foreground demands and volatile resource management algorithms. Thus, different algorithms create distinct statistical properties, affecting user value. Also, since current cloud providers do not provide any statistical information or guarantees, it is difficult for users to efficiently exploit volatile resources. As a consequence, cloud providers must consider two key factors to maximize user value: (i) volatile resource management algorithm, and (ii) information provided to users about the resources as statistical guarantees. We describe and evaluate four volatile resource management approaches (Random, FIFO, LIFO, LIFO-pools) using commercial cloud resource traces from 608 Amazon EC2 instance pools, for a 3 month period from 5/2017 to 8/2017 from four AWS US regions. We also consider the value of several information models (MTTR, limited statistics, Full distribution, and Oracle) that statistically characterize the resources.

Our results show volatile resource management algorithms can increase user value by 30 to 45% in four instance exemplars. Information models also provide a 30% increase in numerous cases and more than 10-fold in extreme cases. Our results suggest that cloud providers should pay significant attention to what statistical information they provide to users. Simple statistics can increase achievable value by 10% to as much as 5x. However, skewed distribution can lead to misleading information, and thus sharply reducing derived value. Then our results of relative user value per resource-hour for the four exemplars show that 90pctile info model is best in all cases and achieves close to the max possible. Also, these results broadly characterize the vast majority (475 of 608) of instance pools. The results are the same ordering for VRMs and information models as in exemplars, and the frequency of various relations that are key conclusions for the exemplars are carried along in the vast majority of instance pools. Furthermore, we provide a detailed drill-down showing how the volatile resource management algorithms affect resource interval durations, and thus potential user value. We further show how the information model shapes user targeting, success rate, and user value.

Cloud providers may be concerned that providing statistical guarantees constrains re- source management flexibility to meet future foreground load. To address these concerns, we explore algorithms that preserve resource management flexibility and maintain statistical guarantees. We study two variations of foreground load changes and their impact on statistical guarantees. Then we propose three algorithms attempting to maintain a simple info model. Our offline algorithm results show that statistical guarantees can be fully preserved under foreground load changes with trivial resource waste, increasing user value by up to 134%. Moreover, two online algorithms, AIMD algorithm and Distribution Targeting algorithm, can dynamically preserve guarantees with online knowledge most of the time, and in doing so increase user value by up to 82%. This suggests that further research exploring online algorithms is promising.

Original Document

The original document is available in PDF (uploaded 31 January, 2019 by Andrew Chien).