Notification and Prediction of Heap Management Pauses in Managed Languages for Latency Stable Systems

Daniar H. Kurniawan; Cesar A. Stuardo; Ray Andrew O. S.; Haryadi S. Gunawi. 8 December, 2020.
Communicated by Haryadi Gunawi.


An increasing number of high-performance distributed sys- tems are executed on top of runtime environments like Java Runtime Environment (JRE), such as Cassandra [1], Hadoop [2], Spark [3], Hazelcast [4], Alluxio [5], Hive [6], and Re- thinkDB [7]. The developers are attracted to a managed language like Java as it offers services like Garbage Collec- tion (GC), run-time type checking, reference checking, and cross-platform execution. However, many applications run- ning on the JVM (e.g., big data frameworks such as Hadoop, data stores such as Cassandra) suffer from long garbage col- lection (GC) time. The long pause time due to Stop-The- World (STW) during Garbage-Collection (GC) not only de- grades application throughput and causes long latency, but also hurts overall system efficiency and scalability. To address these problems, we implement MITMEM that provides JVM support to cut millisecond level tail latencies induced by GC. MITMEM is a JVM drop-in replacement, re- quires no configuration and can run off-the-shelf Java appli- cations with a minimum modification (e.g., adding 120 LOC to integrate with Cassandra). Our experiments indicate that the MITMEM-powered Cassandra successfully reduces the tail latency up to 99%.

Original Document

The original document is available in PDF (uploaded 8 December, 2020 by Haryadi Gunawi).