Parallel Global and Local Convergent Algorithms for Solving the Inverse Additive Singular Value Problem

Parallelizing MI and EP algorithms

Download 0.58 Mb.

Page	2/3
Date	09.06.2018
Size	0.58 Mb.
	#53782

1 2 3

4.1 Parallel MI algorithm

4 Parallelizing MI and EP algorithms

We have implemented the parallel versions of the MI and EP algorithms to solve the IASVP by using ScaLAPACK routines and some other routines directly implemented by the authors by using PBLAS and BLACS. We assume we have a Distributed Memory Machine consisting of P_rc processors, each one with its own local memory, and connected through an interconnection network featured by the parameters , the word-sending time, and , the latency of the network. No assumption about the physical topology of the interconnection network is done.

The ScaLAPACK library partitions matrices by blocks and uses a 2-D block cyclic data distribution, mapping the blocks on a logical mesh of processors. We work with P_rc = P_r x P_c processors in a logical mesh, with P_r processors by row and P_c processors by column and assume that matrices and some vectors of MI and LP are partitioned in blocks and cyclically distributed among the processors of the mesh, following the ScaLAPACK technique in order to obtain a good load balance.

We summarize the data distributions in the parallel machine, for MI and LP:

Global mxn matrices A₀, A₁, ..., A_l, P⁽^k⁾ are locally stored in blocks of size m/P_r x n/P_c(note that only n first columns of P⁽^k⁾ are needed);

Global nxn matrix Q⁽^k⁾ is locally stored in blocks of size n/P_r x n/P_c;

Global l vector c⁽^k⁾ is stored in a replicated way with size l;

Global n vectors S(c⁽^k⁾) y S* are stored in a replicated way with size n each.

For MI:

Global nxl matrix J⁽^k⁾ are locally stored in blocks of size n/P_r x l/P_rc;

Global n vector b⁽^k⁾ is locally stored in blocks of size n/P_r.

For LP:

Global lxl matrix T is locally stored in blocks of size l/P_r x l/P_c;

Global l vector d⁽^k⁾ is locally stored in blocks of size l/P_r.

4.1 Parallel MI algorithm

Each processor execute the next parallel algorithm of MI with the portion of data it contains.

Parallel MI algorithm for IASVP
1. A⁽⁰⁾= A₀+ c₁⁽⁰⁾A₁+ ... + c_l⁽⁰⁾A_l

(* using daxpy of BLAS *)

2. [P⁽⁰⁾,S⁽⁰⁾,Q⁽⁰⁾] = svd(A⁽⁰⁾)

(* using pdgesvd of ScaLAPACK *)

3. error = || S⁽⁰⁾- S*||₂

(* using daxpy, dnrm of BLAS *)

4. For k = 0,1,..., While error > tol

4.1 Compute J⁽^k⁾

(* using ddot of BLAS, pdgemm of PBLAS

dgsum2d,dgesd2d,dgerv2d of BLACS *)

4.2 Compute b⁽^k⁾

(* using ddot of BLAS, pdgemm of PBLAS

dgsum2d,dgesd2d,dgerv2d of BLACS *)

4.3 Compute c⁽^k⁺¹⁾ solving J⁽^k⁾c⁽^k⁺¹⁾= b⁽^k⁾

(* using pdgetrf,pdgetrs of ScaLAPACK *)

4.4 Broadcast c⁽^k⁺¹⁾to all processors

(* using dgebs2d, dgebr2d of BLACS *)

4.5 A⁽^k⁺¹⁾= A₀+ c₁⁽^k⁺¹⁾A₁+ ... + c_l⁽^k⁺¹⁾A_l

(* using daxpy of BLAS *)

4.6 [P⁽^k⁺¹⁾,S⁽^k⁺¹⁾,Q⁽^k⁺¹⁾] = svd(A⁽^k⁺¹⁾)

(* using pdgesvd of ScaLAPACK *)

4.7 error = || S⁽^k⁺¹⁾- S*||₂

(* using daxpy, dnrm of BLAS *)
The theoretical computational cost of this parallel algorithm can be approximated as T_PAR=T_A+T_C, where T_A represents the arithmetic time, which can be expressed as

T_A = O(

) + k O(

) Flops

and T_C represents the communication time which, can be expressed as

T_C = O(17m+

) + k O(m²

),

In these expressions has been assumed m=n=l.

Directory: e-library -> conferences
conferences -> Comparing Architectures of Mobile Applications
conferences -> Harem: The Hybrid Augmented Reality Exhibition Model
conferences -> Alberto bonastre, juan V. Capella, rafael ors

Download 0.58 Mb.

Share with your friends:

1 2 3