Log-Structured Global Array for Efficient Multi-Version Snapshots

Hajime Fujita; Nan Dun; Zachary Rubenstein; Andrew A. Chien. 16 May, 2014.
Communicated by Andrew Chien.


In exascale systems, increasing error rates— particularly silent data corruption—are a major concern. The Global View Resilience (GVR) system builds a a new model of application resilience on versioned arrays. These arrays can used exploited for flexible, application-specific error checking and recovery. We explore a fundamental challenge to the GVR model – the cost of versioning. We propose a novel log-structured imple- mentation that appends new data to an update log, simultaneously tracking modified regions and versioning incrementally. We com- pare performance of log-structured to traditional flat arrays in two environments—message-passing and RMA (remote memory access) using micro-benchmarks and several full applications, , and show that versioning can be 10x faster, and reduce memory significantly. Further, in future systems with NVRAM, a log- structured approach is more tolerant of NVRAM limitations such as write bandwidth and wear-out.

Original Document

The original document is available in PDF (uploaded 16 May, 2014 by Andrew Chien).