Systematic Evaluation of Workload Clustering for Designing 10x10 Architectures

Apala Guha; Andrew Chien. 29 June, 2012.
Communicated by Andrew Chien.


Chip power consumption has reached its limits, leading to the flattening of microprocessor frequency and single-threaded performance. We propose 10x10, a federated heterogeneous architecture, to continue performance scaling by specializing accelerator cores for different workload groups to achieve dramatically higher energy efficiency. The selection and de- sign of these accelerators depends on effective computation structure clustering; we develop a set of clustering methods and evaluation metrics in a systematic framework that enables disciplined study.

Using the clustering methods, we study a broad general- purpose workload that includes 34 codes from 6 benchmark suites, identifying the computationally important functions, and clustering based on two sets of instruction usage features (high-resolution and low-resolution) and targeting a range of numbers of clusters 8, 16, 32, 64, 128. The workload clusters are evaluated abstractly with five metrics (coverage, distance, standard deviation, customization benefit and weighted cus- tomization benefit). The latter two use instruction set usage as a proxy for customization opportunity, and four benefit models for customization.

These studies produced novel clusterings of computation structure created by on aggressive separation (to 100s of clusters) that exposes new opportunities for heterogeneous customization and corresponding higher potential benefits. Further, the studies show that no single clustering method is best in all scenarios. For example, the best clustering may vary with available silicon resources. Our experience validates the need for systematic clustering and disciplined use of metrics.

Original Document

The original document is available in Color PDF (uploaded 29 June, 2012 by Andrew Chien).