TR-2019-17

Evaluating Achievable Latency and Cost SSD Latency Predictors (MittOS Model Inference)

Olivia Weng; Andrew Chien. 2 October, 2019.
Communicated by Andrew Chien.

Abstract

Cutting tail latencies at the millisecond level in internet services for good response times in data-parallel applications is possible by integrating MittOS, an OS/data center interface. Typically MittOS analyzes white-box information of the internals of devices such as SSD’s and decides if a given server can “fast reject” a service request. But commercial SSD’s have a black-box design, so MittOS researchers have developed machine learning models to determine if requests to commercial SSD’s can be rejected or not. When run on CPUs however, these models cannot predict in the time it takes an SSD to fully process a request, defeating MittOS’s fast-rejecting abilities. We demonstrate that ASICs such as the Efficient Inference Engine (EIE) accelerate the prediction times of these MittOS models well within the time it takes an SSD to complete a request at minimal cost, cutting SSD tail latencies. EIE achieves 2.01 μs inference latency while incurring minimal area costs (20.4 mm2) and power costs (0.29 W). We show that integrating machine learning into the critical path of operating systems becomes cost-efficient and within reason.

Original Document

The original document is available in PDF (uploaded 2 October, 2019 by Andrew Chien).

Additional Document Formats

The document is also available in PDF (uploaded 23 November, 2019 by Andrew Chien).

NOTE: The author warrants that these additional documents are identical with the originial to the extent permitted by the translation between the various formats. However, the webmaster has made no effort to verify this claim. If the authenticity of the document is an issue, please always refer to the "Original document." If you find significant alterations, please report to webmaster@cs.uchicago.edu.