Parallel Matrix Multiplication On a Cluster of PCs

Main Article Content

Phairoj Samutrak*
Jureeporn Boonniyom
Jeeraporn Srisawat

Abstract

This paper presents the implementation of the coarse-gained parallel matrix multiplication (c = A x B) with two ways of data partitioning on a cluster of PCs. In the past, most existing studies proposed the medium-grained parallel matrix multiplications on the hypercube-connected on mesh-connected parallel computers. We propose to study and implement the practical parallel matrix multiplication based on the MPMD model on the cluster of PCs using MPI (Message Passing Interface) standard. In particular, two data partitioning schemes for decomposing matrix A and matrix B with balancing workload are presented: 1) the row-block partitioning and 2) the checkerboard-block partitioning. Moreover, we also introduce a modified parallel matrix multiplication to cover an approach of the parallel all-pair shortest paths. Finally, the system performance of sequential and parallel processing of the matrix multiplication have been compared and evaluated in terms of response time, speedup, and efficiency. Based on our experimental results, the system performance of the matrix multiplication was improved up to 50% when the number of processors(p) were increased by one.


Keywords: Parallel matrix multiplication, data-block partitioning, an approach of parallel all-pair shortest paths, a cluster of PCs, MPMD (Multiple Program Multiple Data), MPI (Message Passing Interface)


Corresponding author: E-mail: s6063611@kmitl.ac.th

Article Details

Section
Original Research Articles

References

[1] Alonso Sanches, C.A. and Sing. S.W. 1994 SIMD Algorithms for Matrix Multiplication on the Hypercube. The 8th Int’l Proceedings on Parallel Processing Symposium, 490-496.
[2] Browne, S., Dongarra, J. and London, K. 1997 Review of Performance Analysis Tools for MPI Parallel rograms, http://www.cs.utk.edu/~browne/perftools-review/.
[3] Beaumont, O., Boudet, V., Rastello, F. and Robert, Y. 2001 Matrix Multiplication on Heterogeneous Platforms, IEEE Trans. On Parallel and Distributed Systems, v.12 (10), 1033-1051.
[4] Choi, J. 1997 A Fast Scalable Universal Matrix Multiplication Algorithm on Distributed-Memory Concurrent Computers, IEEE Trans. on Parallel and Distributed Systems 3, 310-314.
[5] Choi, J. 1997 A New Parallel Matrix Multiplication Algorithm on Distributed- Memory Concurrent Computers, HPC Asia’97 High Performance Computing on the Information Superhighway, 224-229.
[6] Li, K. 2000 Scalable Parallel Matrix Multiplication on Distributed Memory Parallel Computers, Parallel and Distributed Processing Symposium IPDPS 2000. Proceedings. 14th International, 307-314.
[7] Sengupta, A. and Raghavendra, C.S. 1998 All-To-All Broadcast and Matrix Multiplication in Faculty SIMD Hypercubes, IEEE Trans. on Parallel and Distributed Systems, v.9(6), 550-560.
[8] Tasic, J.F., Zajc, M. and Kosir, A. 1996 Comparison of Some Parallel Matrix Multiplication Algorithms, Electrotechnical Conference MELECON’ 96., 8th Mediterranean, v.1, 155-158.
[9] Typou, T., Stefanidis, V., Michailidis P. and Margaritis, K. 2004 Implementing Matrix Multiplication on an MPI Cluster of Workstations. The 1st Int’l Conference from Scientific Computing to Computational Engineering (IC-SCCE), Athens.