Parallel Global and Local Convergent Algorithms for Solving the Inverse Additive Singular Value Problem

Download 0.58 Mb.

Page	3/3
Date	09.06.2018
Size	0.58 Mb.
	#53782

1 2 3

4.3 Experimental results

4.2 Parallel LP algorithm

Each processor execute the next parallel algorithm of LP with the portion of data it contains.

Parallel LP algorithm for IASVP
1. T = [ trace(A_i^tA_r) ]_r,i_=1,_l

(* using ddot of BLAS,MPI_REDUCE of MPI *)

2. [L,U] = lu(T)

(* using pdgetrf, pdgetrs of ScaLAPACK *)

3. Compute T^-1

(* using pdgetri of ScaLAPACK *)

4. stop = 0

5. For k = 0,1,..., While stop = 0

5.1 A⁽^k⁾= A₀+ c₁⁽^k⁾A₁+ ... + c_l⁽^k⁾A_l

(* using daxpy of BLAS *)

5.2 [P⁽^k⁾,S⁽^k⁾,Q⁽^k⁾] = svd(A⁽^k⁾)

(* using pdgesvd of ScaLAPACK *)

5.3 X⁽^k⁾ = P⁽^k⁾S*Q⁽^k^)t

(* using dscal of BLAS,pdgemm of PBLAS *)

5.4 d⁽^k⁾=[trace(A_r^t(X⁽^k⁾- A₀^t)]_r_=1,_l

(* using ddot of BLAS,

MPI_REDUCE of MPI,*)

5.5 Compute c⁽^k⁺¹⁾ solving Tc⁽^k⁺¹⁾= d⁽^k⁾

(* using pdgemv of PBLAS *)

5.6 Broadcast c⁽^k⁺¹⁾ to all processors

(* using dgebs2d,dgebr2d of BLACS *)

5.7 error = || c⁽^k⁺¹⁾- c⁽^k⁾||₂

(* using daxpy, dnrm of BLAS *)

5.8 If error < tol Then stop = 1

The theoretical computational cost is:

T_A = O(

) + k O(

) Flops

T_C=O(P_rcm²+P_rcm²) + k O(P_rc m

).

4.3 Experimental results

We present some results from numerical experiences carried out with the two methods proposed in this work for solving the IASVP. The algorithms have been implemented in FORTRAN90 language and have been tested and validated on a cluster of PCs, formed by six PCs, each one with Pentium Xeon processors, working at 2 Ghz, with 1 Gbytes of RAM, and connected through a SCI network. For the sake of simplicity we only show a few results. A more detailed analysis and figures can be found in [Flo04].

First, we analyse the convergence of both algorithms. In Table 1 the convergence of MI and LP are tested with different c⁽⁰⁾, with m=n=l=10. MI shows a fast convergence when c⁽⁰⁾ is close to c*, and give good approximations to S*; LP always converges to S* although with a high number of iterations and with less precision.
Table 1. Convergence of MI and EP algorithms, with differents initial points c⁽⁰⁾.

c⁽⁰⁾ = c* + 	MI
i	K	\|\|S⁽^k⁾-S*\|\|₂	Convergence
1e-3	2	2e-9	yes
1e-2	3	4e-9	yes
1e-1	14000	5e2	no
1e0	14000	3e1	no
1e1	14000	2e3	no
1e3	14000	5e0	no
c⁽⁰⁾ = c* + 	LP
 i	K	\|\|S⁽^k⁾-S*\|\|₂	Convergence
1e-3	117	2e-5	yes
1e-2	395	3e-5	yes
1e-1	757	8e-5	yes
1e0	6872	1e-4	yes
1e1	2234	4e-5	yes
1e3	2183	4e-5	yes

We can take advantage of the global convergence properties of LP algorithm to transform the MI into a global convergent algorithm. In this sense we have used as initial guest for the MI algorithm a value c⁽^k⁾ obtained from the LP algorithm with a small accuracy. The idea was first proposed by Chen and Chu in [11] and the objetive is to find a c⁽⁰⁾ for MI which is close enough to c*, to guarantee the quadratic convergence of Newton’s method.

In Table 2 we present a typical test, for this case showing the convergence of MI algorithm when c⁽⁰⁾ is chosen from LP iteration. The test has been carried out in a problem with size m=n=l=10. Similar results have been found with other sizes.
Table 2. Convergence of MI taking c⁽⁰⁾ = c⁽^k⁾ , where c⁽^k⁾is LP result.

c⁽⁰⁾	K	\|\|S⁽^k⁾-S*\|\|₂
c* + 1e-1	23	1e-2	LP
c* + 1e1	950	4e-3
c* + 1e3	899	4e-3
c⁽²³⁾ of LP	3	1e-10	MI
c⁽⁹⁵⁰⁾ of LP	3	6e-10
c⁽⁸⁹⁹⁾ of LP	3	6e-10

Table 3. Runtime and speedup of MI.

	Initial part, steps 1,2,3
Prc	Runtime (seconds)				Speedup
1	14	46	107	312	1	1	1	1
2	8	26	58	177	1.8	1.8	1.8	1.8
4	4	187	32	58	3.5	0.2	3.3	5.4
6	3	7	21	37	4.7	6.6	5.1	8.4
m	512	768	1024	1408	512	768	1024	1408
	One step of loop 4. For k=0,1,2,...
Prc	Runtime (seconds)				Speedup
1	355	1741	5458	19075	1	1	1	1
2	184	893	2784	9656	1.9	1.9	2.0	2.0
4	90	471	1434	4994	3.9	3.7	3.8	3.8
6	60	323	1010	3471	5.9	5.4	5.4	5.5
m	512	768	1024	1408	512	768	1024	1408

Now, we analyze the performance of the parallel algorithms. We use parallel runtime and speedup [8] to assess the MI and LP. Both methods have a different parallel behaviour of initial part and iterative part of their algorithms. Besides, the total time depends on the number of iterations and this depends on how far is the initial point from the solution. Thus in Table 3, we show the runtime and speedup of MI separately for both parts. Also, in Table 4 we show the runtime and speedup of LP for both parts. The results are shown for different sizes of problem, with m=n=l, and different number of processors.

Table 4. Runtime and speedup of LP.

	Initial part, steps 1,2,3,4
Prc	Runtime (seconds)				Speedup
1	260	1259	4068	13672	1	1	1	1
2	128	630	1967	6843	2.0	2.0	2.1	2.0
4	69	339	1053	3531	3.8	3.7	3.9	3.9
6	52	225	706	2376	5.0	5.6	5.8	5.8
m	512	768	1024	1408	512	768	1024	1408
	One step of loop 5. For k=0,1,2,...
Prc	Runtime (seconds)				Speedup
1	23	79	182	552	1	1	1	1
2	17	54	126	359	1.4	1.5	1.4	1.5
4	7	24	65	130	3.3	3.3	2.8	4.2
6	5	12	45	64	4.6	6.6	4.0	8.6
m	512	768	1024	1408	512	768	1024	1408

As it can be appreciated, parallel performance of MI and LP algorithms are good enough, and better as the problem size increases, alleviating the high computational cost of the sequential algorithms.

5 Conclusions

We have explored two parallel algorithms to solve the IASVP, by adapting some well-know methods for solving the IAEP. Convergence features of both algorithms have been studied and contrasted with numerical experiences.

LP algorithm is a good alternative to solve the problem of the local convergence of MI algorithm if it is used in the estimate of the initial value, c⁽⁰⁾, for MI.

The algorithms developed have been parallelized with good performance on a cluster of PCs, by using standard public domain software: MPI, BLAS, LAPACK, ScaLAPACK, thus guaranteeing the portability of the code.

Performances of both parallel algorithms are fairly good; the speedups reached are near to the optimum, especially in the case of large size problems.

Acknowledgement

This work has been supported by Spanish CICYT Grant TIC2003-08238-C02-02

References:

[1] Anderson, E. et al., LAPACK User Guide, SIAM, 1995.

[2] Blackford, L. et al., ScaLAPACK User’s Guide, SIAM, 1997.

[3] Choi, J et al., A Proposoal for a Set of Parallel Basic Linear Algebra Subprograms, Tecnical report ut-cs-95-292, Department of Computer Science, University of Tenessee, 1995.

[4] Groupp, W. et al., Using MPI: Portable Parallel Programming with Message Passing Interface. MIT Press, 1994.

[5] Dennis, J. and Schnabel, R., Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Prentice Hall, 1983.

[6] Dongarra, J. and Van de Geijn A., Two Dimensional Basic Linear Algebra Comunications Subprograms, Tecnical report st-cs-91-138, Department of Computer Science, Univertisy of Tenessee, 1991.

[7] Hammarling, S. et al., An Extended Set of Fortran Basic Linear Algebra Subroutines, ACM Trans. Methemathical Software, 1988.

[8] Kumar, V. et al., Introduction to Parallel Computing. Design and Analysis Algorithms, The Benjamin/Cummings Publishing Company, 1994.

[9] Bai, Z. et al., The Intercept Method for Inverse Singular Value Problems, 2003.

[10] Chen, X. and Chu, T., On the Least Squares Solution of Inverse Eigenvalue Problems, SIAM, Journal on Numerical Analysis, Vol. 33, No. 6, 1996, pp 2417-2430.

[11] Chu M., Numerical Methods for Inverse Singular Value Problems. SIAM, Journal Numerical Analysis, Vol. 29, 1992, pp 885-903.

[12] Friedland, S. et al., The Formulation and Analysis of Numerical Methods for Inverse Eigenvalue Problems, SIAM, Journal on NumericalAnalysis, Vol. 4, No. 3, 1987, pp 634-667.

[13] Flores, G., Un conjunto de métodos para la resolución del PIAVS, Technical Report, DSIC/04, Departamento de Sistemas y Computación, Universidad Politécnica de Valencia, 2004.

[14] Engl, H. and Kügler, P. Nonlinear Inverse Problems: Theoretical Aspects and Some Industrial Applications. Institute for Pure&Applied Mathematics. University of California, Los Angeles.

http://www.ipam.ucla.edu/programs/invtut/, 2003.

Directory: e-library -> conferences
conferences -> Comparing Architectures of Mobile Applications
conferences -> Harem: The Hybrid Augmented Reality Exhibition Model
conferences -> Alberto bonastre, juan V. Capella, rafael ors

Download 0.58 Mb.

Share with your friends:

1 2 3