IBL for Replica Selection in Data-Intensive Grid Applications

Yu Hu; Jennifer M. Schopf. 7 April, 2004.
Communicated by Ian Foster.


In many scientific applications, Grid technologies and infrastructures facilitate distributed resource sharing and coordination in dynamic, heterogeneous multi-institutional environments. Replication of data can help enable high-throughput file transfer and scalable resource storage in scientific Grid applications that involve large data transfers. The selection of a replica can, however, significantly influence the efficiency of a replication scheme. Many current approaches assume that a significant amount of data is available, such as network status information, log files of historical GridFTP file transfers, and CPU status and predictions. We propose a lightweight instance-based learning (IBL) algorithm to allow efficient replica selection with much less required data. We implement the approach and evaluate it in a Grid environment. Our evaluation demonstrates that the IBL approach can be an efficient tool for replica selection when only limited data sources are available.

