TR-2019-11

Information Models: Creating and Preserving Value in Volatile Cloud Resources

Chaojie Zhang; Varun Gupta; Andrew A Chien. 7 May, 2019.
Communicated by Andrew Chien.

Abstract

Volatile resources are surplus cloud resources not consumed by high priority foreground (reserved/on-demand) load. These resources are exploited by a growing number of users. Today, cloud operators provide no statistical character- ization of volatile resources. We consider how releasing such statistics could improve user value by studying Amazonís 608 EC2 Spot Instance types. Results show that as little as two parameters such as (average, 90pctile) can increase user value by 30%. These results are robust over four-fifths (475 of 608) of instance types.

Beyond competitive concerns, cloud operators are reluctant to share volatile resource statistics because they might be considered a service-level agreement (SLA), and thus constrain their ability to serve foreground load. We show that clever resource management can allay such concerns. We study two plausible classes of foreground load changes, showing one class where such a concern is indeed valid and another where it is not. We design two online resource management algorithms that detect foreground load variation and adapt to maintain a statistical SLA. The algorithms not only improve the ability to maintain guarantees and user value but also improve user experience, reducing job failures by 50%. These results apply to the Stable and Transition classes of resource pools, which account for nearly all of the instance types (577 of 608).

Original Document

The original document is available in PDF (uploaded 7 May, 2019 by Andrew Chien).