TR-2014-05

Global View Resilience, API Documentation R0.8.1-rc0

Andrew A. Chien; The GVR Team. 28 April, 2014.
Communicated by Andrew Chien.

Abstract

Describes the application programming interface for the Global View Resilience (GVR) system.

Global View Resilience (GVR) is a new approach that exploits a global view data model (global naming of data, consistency, and distributed layout), adding reliability to globally visible distributed arrays. Key novel features in GVR include: 1) multi-version arrays with each versioning rate controlled separately by the application (multi-stream), 2) flexible multi-version recovery, and 3) unified error signalling and handling for flexible cross-layer error recovery. With a global versioned array as a portable abstraction, GVR enables application programmers to manage reliability (and its overhead) in a flexible, portable fashion, tapping their deep scientific and application code insights. We will research algorithms and a runtime that map and adapt the application/systemís reliability deployment based on application-specified reliability priorities. The unified error handling framework enables applications error detection (checking) and recovery routines that handle diverse classes of errors with a single application recovery. This architecture enables applications and systems to work in concert, exploiting semantics (algorithmic or even scientific domain) and key capabilities (e.g., fast error detection in hardware) to dramatically increase the range of errors that can be detected and corrected.

Original Document

The original document is available in PDF (uploaded 28 April, 2014 by Andrew Chien).