4.2 Parallel LP algorithm
Each processor execute the next parallel algorithm of LP with the portion of data it contains.
Parallel LP algorithm for IASVP
1.
T = [ trace(
AitAr) ]
r,i=1,l
(* using ddot of BLAS,MPI_REDUCE of MPI *)
2. [L,U] = lu(T)
(* using pdgetrf, pdgetrs of ScaLAPACK *)
3. Compute T-1
(* using pdgetri of ScaLAPACK *)
4. stop = 0
5. For k = 0,1,..., While stop = 0
5.1 A(k) = A0 + c1(k)A1 + ... + cl(k)Al
(* using daxpy of BLAS *)
5.2 [P(k),S(k),Q(k)] = svd(A(k))
(* using pdgesvd of ScaLAPACK *)
5.3 X(k) = P(k)S*Q(k)t
(* using dscal of BLAS,pdgemm of PBLAS *)
5.4 d(k)=[trace(Art (X(k) - A0t)]r =1,l
(* using ddot of BLAS,
MPI_REDUCE of MPI,*)
5.5 Compute c(k+1) solving Tc(k+1) = d(k)
(* using pdgemv of PBLAS *)
5.6 Broadcast c(k+1) to all processors
(* using dgebs2d,dgebr2d of BLACS *)
5.7 error = || c(k+1)- c(k)||2
(* using daxpy, dnrm of BLAS *)
5.8 If error < tol Then stop = 1
The theoretical computational cost is:
TA = O(
) +
k O(
) Flops
TC=O(
Prc m2+
Prc m2) +
k O(
Prc m
).
4.3 Experimental results
We present some results from numerical experiences carried out with the two methods proposed in this work for solving the IASVP. The algorithms have been implemented in FORTRAN90 language and have been tested and validated on a cluster of PCs, formed by six PCs, each one with Pentium Xeon processors, working at 2 Ghz, with 1 Gbytes of RAM, and connected through a SCI network. For the sake of simplicity we only show a few results. A more detailed analysis and figures can be found in [Flo04].
First, we analyse the convergence of both algorithms. In Table 1 the convergence of
MI and
LP are tested with different
c(0), with
m=
n=
l=10.
MI shows a fast convergence when
c(0) is close to
c*, and give good approximations to
S*;
LP always converges to
S* although with a high number of iterations and with less precision.
Table 1.
Convergence of MI and
EP algorithms, with differents initial points
c(0).
c(0) = c* +
|
MI
|
i
|
K
|
||S(k)-S*||2
|
Convergence
|
1e-3
|
2
|
2e-9
|
yes
|
1e-2
|
3
|
4e-9
|
yes
|
1e-1
|
14000
|
5e2
|
no
|
1e0
|
14000
|
3e1
|
no
|
1e1
|
14000
|
2e3
|
no
|
1e3
|
14000
|
5e0
|
no
|
c(0) = c* +
|
LP
|
i
|
K
|
||S(k)-S*||2
|
Convergence
|
1e-3
|
117
|
2e-5
|
yes
|
1e-2
|
395
|
3e-5
|
yes
|
1e-1
|
757
|
8e-5
|
yes
|
1e0
|
6872
|
1e-4
|
yes
|
1e1
|
2234
|
4e-5
|
yes
|
1e3
|
2183
|
4e-5
|
yes
|
We can take advantage of the global convergence properties of LP algorithm to transform the MI into a global convergent algorithm. In this sense we have used as initial guest for the MI algorithm a value c(k) obtained from the LP algorithm with a small accuracy. The idea was first proposed by Chen and Chu in [11] and the objetive is to find a c(0) for MI which is close enough to c*, to guarantee the quadratic convergence of Newton’s method.
In Table 2 we present a typical test, for this case showing the convergence of MI algorithm when c(0) is chosen from LP iteration. The test has been carried out in a problem with size m=n=l=10. Similar results have been found with other sizes.
Table 2. Convergence of MI taking c(0) = c(k) , where c(k) is LP result.
c(0)
|
K
|
||S(k)-S*||2
|
|
c* + 1e-1
|
23
|
1e-2
|
LP
|
c* + 1e1
|
950
|
4e-3
|
c* + 1e3
|
899
|
4e-3
|
c(23) of LP
|
3
|
1e-10
|
MI
|
c(950) of LP
|
3
|
6e-10
|
c(899) of LP
|
3
|
6e-10
|
Table 3. Runtime and speedup of MI.
|
Initial part, steps 1,2,3
|
Prc
|
Runtime (seconds)
|
Speedup
|
1
|
14
|
46
|
107
|
312
|
1
|
1
|
1
|
1
|
2
|
8
|
26
|
58
|
177
|
1.8
|
1.8
|
1.8
|
1.8
|
4
|
4
|
187
|
32
|
58
|
3.5
|
0.2
|
3.3
|
5.4
|
6
|
3
|
7
|
21
|
37
|
4.7
|
6.6
|
5.1
|
8.4
|
m
|
512
|
768
|
1024
|
1408
|
512
|
768
|
1024
|
1408
|
|
One step of loop 4. For k=0,1,2,...
|
Prc
|
Runtime (seconds)
|
Speedup
|
1
|
355
|
1741
|
5458
|
19075
|
1
|
1
|
1
|
1
|
2
|
184
|
893
|
2784
|
9656
|
1.9
|
1.9
|
2.0
|
2.0
|
4
|
90
|
471
|
1434
|
4994
|
3.9
|
3.7
|
3.8
|
3.8
|
6
|
60
|
323
|
1010
|
3471
|
5.9
|
5.4
|
5.4
|
5.5
|
m
|
512
|
768
|
1024
|
1408
|
512
|
768
|
1024
|
1408
|
Now, we analyze the performance of the parallel algorithms. We use parallel runtime and speedup [8] to assess the MI and LP. Both methods have a different parallel behaviour of initial part and iterative part of their algorithms. Besides, the total time depends on the number of iterations and this depends on how far is the initial point from the solution. Thus in Table 3, we show the runtime and speedup of MI separately for both parts. Also, in Table 4 we show the runtime and speedup of LP for both parts. The results are shown for different sizes of problem, with m=n=l, and different number of processors.
Table 4. Runtime and speedup of
LP.
|
Initial part, steps 1,2,3,4
|
Prc
|
Runtime (seconds)
|
Speedup
|
1
|
260
|
1259
|
4068
|
13672
|
1
|
1
|
1
|
1
|
2
|
128
|
630
|
1967
|
6843
|
2.0
|
2.0
|
2.1
|
2.0
|
4
|
69
|
339
|
1053
|
3531
|
3.8
|
3.7
|
3.9
|
3.9
|
6
|
52
|
225
|
706
|
2376
|
5.0
|
5.6
|
5.8
|
5.8
|
m
|
512
|
768
|
1024
|
1408
|
512
|
768
|
1024
|
1408
|
|
One step of loop 5. For k=0,1,2,...
|
Prc
|
Runtime (seconds)
|
Speedup
|
1
|
23
|
79
|
182
|
552
|
1
|
1
|
1
|
1
|
2
|
17
|
54
|
126
|
359
|
1.4
|
1.5
|
1.4
|
1.5
|
4
|
7
|
24
|
65
|
130
|
3.3
|
3.3
|
2.8
|
4.2
|
6
|
5
|
12
|
45
|
64
|
4.6
|
6.6
|
4.0
|
8.6
|
m
|
512
|
768
|
1024
|
1408
|
512
|
768
|
1024
|
1408
|
As it can be appreciated, parallel performance of MI and LP algorithms are good enough, and better as the problem size increases, alleviating the high computational cost of the sequential algorithms.
5 Conclusions
We have explored two parallel algorithms to solve the IASVP, by adapting some well-know methods for solving the IAEP. Convergence features of both algorithms have been studied and contrasted with numerical experiences.
LP algorithm is a good alternative to solve the problem of the local convergence of
MI algorithm if it is used in the estimate of the initial value,
c(0), for
MI.
The algorithms developed have been parallelized with good performance on a cluster of PCs, by using standard public domain software: MPI, BLAS, LAPACK, ScaLAPACK, thus guaranteeing the portability of the code.
Performances of both parallel algorithms are fairly good; the speedups reached are near to the optimum, especially in the case of large size problems.
Acknowledgement
This work has been supported by Spanish CICYT Grant TIC2003-08238-C02-02
References:
[1] Anderson, E. et al., LAPACK User Guide, SIAM, 1995.
[2] Blackford, L. et al., ScaLAPACK User’s Guide, SIAM, 1997.
[3] Choi, J et al., A Proposoal for a Set of Parallel Basic Linear Algebra Subprograms, Tecnical report ut-cs-95-292, Department of Computer Science, University of Tenessee, 1995.
[4] Groupp, W. et al., Using MPI: Portable Parallel Programming with Message Passing Interface. MIT Press, 1994.
[5] Dennis, J. and Schnabel, R., Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Prentice Hall, 1983.
[6] Dongarra, J. and Van de Geijn A., Two Dimensional Basic Linear Algebra Comunications Subprograms, Tecnical report st-cs-91-138, Department of Computer Science, Univertisy of Tenessee, 1991.
[7] Hammarling, S. et al., An Extended Set of Fortran Basic Linear Algebra Subroutines, ACM Trans. Methemathical Software, 1988.
[8] Kumar, V. et al., Introduction to Parallel Computing. Design and Analysis Algorithms, The Benjamin/Cummings Publishing Company, 1994.
[9] Bai, Z. et al., The Intercept Method for Inverse Singular Value Problems, 2003.
[10] Chen, X. and Chu, T., On the Least Squares Solution of Inverse Eigenvalue Problems, SIAM, Journal on Numerical Analysis, Vol. 33, No. 6, 1996, pp 2417-2430.
[11] Chu M., Numerical Methods for Inverse Singular Value Problems. SIAM, Journal Numerical Analysis, Vol. 29, 1992, pp 885-903.
[12] Friedland, S. et al., The Formulation and Analysis of Numerical Methods for Inverse Eigenvalue Problems, SIAM, Journal on NumericalAnalysis, Vol. 4, No. 3, 1987, pp 634-667.
[13] Flores, G., Un conjunto de métodos para la resolución del PIAVS, Technical Report, DSIC/04, Departamento de Sistemas y Computación, Universidad Politécnica de Valencia, 2004.
[14] Engl, H. and Kügler, P.
Nonlinear Inverse Problems: Theoretical Aspects and Some Industrial Applications. Institute for Pure&Applied Mathematics. University of California, Los Angeles.
http://www.ipam.ucla.edu/programs/invtut/, 2003.