RushMon: Real-time isolation anomalies monitoring

Zechao Shang; Jeffrey Xu Yu; Aaron J. Elmore. 4 April, 2018.
Communicated by Aaron Elmore.


Motivated by the applicability of HogWild!-style algorithms, people turn their focus on system architectures that provide ultra-high throughput random-access with very limited or no isolation guarantees, and build inconsistent-tolerant applications (i.e., large scale optimization algorithms) on top of them. Although some optimization algorithms have theoretical convergence guarantees, sometimes these systems fail to compute the correct results when the presumptions of convergence cannot hold. Moreover, there is no practical way to tell whether a given result is accurate (without cross validation) or to tune the isolation strength on-the-fly. To resolve these problems, these systems need an indicator to report the number of "bad event" caused by "out-of-order" executions. In this paper, we tackle this problem. Based on transaction processing theory, we find the number of cycles in the dependency graph, and demonstrate it is a good indicator. With this observation, we propose the first real-time isolation anomalies monitor. Our monitor is at least 1000x faster than naive implementations and reports accurate isolation anomalies levels with less than 1% extra overhead. Monitoring anomalies in a real-time manner efficiently protects the systems from excessive isolation anomalies which could lead to incorrect results. We verify the performance and effectiveness of our monitor via extensive experimental studies.

Original Document

The original document is available in PDF (uploaded 4 April, 2018 by Aaron Elmore).