IBL for Replica Selection in Data-Intensive Grid Applications

Yu Hu; Jennifer M. Schopf. 7 April, 2004.
Communicated by Ian Foster.


In many scientific applications, Grid technologies and infrastructures facilitate distributed resource sharing and coordination in dynamic, heterogeneous multi-institutional environments. Replication of data can help enable high-throughput file transfer and scalable resource storage in scientific Grid applications that involve large data transfers. The selection of a replica can, however, significantly influence the efficiency of a replication scheme. Many current approaches assume that a significant amount of data is available, such as network status information, log files of historical GridFTP file transfers, and CPU status and predictions. We propose a lightweight instance-based learning (IBL) algorithm to allow efficient replica selection with much less required data. We implement the approach and evaluate it in a Grid environment. Our evaluation demonstrates that the IBL approach can be an efficient tool for replica selection when only limited data sources are available.

Original Document

The original document is available in PDF (uploaded 7 April, 2004 by Ian Foster).

Additional Document Formats

The document is also available in Postscript (uploaded 7 April, 2004 by Ian Foster).

NOTE: The author warrants that these additional documents are identical with the originial to the extent permitted by the translation between the various formats. However, the webmaster has made no effort to verify this claim. If the authenticity of the document is an issue, please always refer to the "Original document." If you find significant alterations, please report to