Fan Yang. 4 October, 2015.
Communicated by Andrew Chien.
Related To: TR-2015-03 (updated 10/04/15)


Graph processing is widely recognized as important for a growing range of applications, including social network analysis, machine learning, data mining, and web search. Recently, many performance assessments and comparative studies of graph computation have been published, all of which employ highly varied ensembles of algorithms and graphs. To explore the robustness of these studies, we characterize how behavior varies across a variety of graph algorithms (graph analytics, clustering, collaborative filtering, etc.) on a diverse collection of graphs (size and degree distribution). Our results show that graph computation behaviors, with up to 1000-fold variation, form a very broad space, and inefficient exploration of this space may lead to a narrow understanding of graph processing performance, or worse misleading conclusions.

Hence, we consider how to construct a high-quality benchmark set, which employs as few experiments as possible, and exhibit a wide range of graph computation behavior. We study different ensembles of graph-algorithm pairs, and define two metrics, spread and coverage, to quantify how efficiently and completely an ensemble explores the space. Our results show that: (1) an ensemble limited to a single algorithms or a single graph may unfairly characterize a graph-processing system, (2) an ensemble exploring both algorithm diversity and graph diversity improves the quality significantly (30% better coverage and 200% better spread), but must be carefully chosen, (3) some specific algorithms are more useful than others for exploring the space, and (4) it is possible to reduce benchmarking complexity (i.e. number of algorithms, graphs, etc.) while conserving the benchmarking quality.

Original Document

The original document is available in PDF (uploaded 4 October, 2015 by Andrew Chien).