TR-2019-15
A Simple Cache Coherence Scheme for Integrated CPU-GPU Systems
Ardhi W. B. Yudha; Reza Pulungan; Henry Hoffmann. 30 July, 2019.
Communicated by Henry Hoffmann.
Abstract
This paper presents a novel approach to accelerate applications
running on integrated CPU-GPU systems. Many integrated
CPU-GPU systems use cache-coherent shared memory
to communicate efficiently and easily. In this pull-based approach,
the CPU produces data for the GPU to consume, and
data resides in a shared cache until the GPU accesses it, resulting
in long load latency on a first GPU access to a cache line.
In this work, we propose a new, push-based, coherence mechanism
that explicitly exploits the CPU and GPU’s producerconsumer
relationship by automatically moving data from
CPU to GPU’s last-level cache. The proposed mechanism results
in a dramatic reduction of the GPU’s L2 cache miss rate
in general, and a consequent increase in overall performance.
Our experiments show that the proposed scheme can increase
performance by up to 37%, with typical improvements in the
5–7% range. We find that even if an application does not
benefit from the proposed approach, performance is never
worse under this model. While we demonstrate how the proposed
scheme can co-exist with traditional cache-coherence
mechanisms, we also argue that it could serve as a simpler
replacement for existing protocols.
Original Document
The original document is available in PDF (uploaded 30 July, 2019 by
Henry Hoffmann).