Instituto Tecnológico de Puebla.
Av. Tecnológico 420, Colonia Maravillas, Puebla
1 Introduction
Inverse problems arise in a wide variety of applications in Science and Engineering [10,12,14]: Computerized tomography, Inverse scattering, Inverse heat conduction problems, Geophysical inverse problems, Identification of parameters, etc. In particular, the Inverse Eigenvalue Problem (IEP) and the Inverse Singular Value Problem (ISVP) are problems of parameters identification; they both are the reconstruction of a matrix with prescribed eigenvalues and singular values, respectively, and maintaining some specific characteristics of structure. Among this kind of problems our work focuses on the Inverse Additive Singular Value Problem (IASVP). The IASVP can be stated as it follows [11]:
Given real, mxn, matrices A_{0}, A_{1}, ..., A_{l} (mnl) and a set of real numbers S*={S*_{1},S*_{2},...,S*_{n}}, with S*_{1}>S*_{2}>...>S*_{n}, find a real l vector c=[c_{1},c_{2},...,c_{l}]^{t}, such that the singular values S_{1}>S_{2}>...>S_{n} of the matrix
(1) A(c) = A_{0 }+ c_{1}A_{1 }+ ... + c_{l}A_{l}
satisfy S*_{i}(c) = S*_{i}, for i=1,2,...,n.
A good way to tackle the IASVP consists of expressing it as a nonlinear system and to apply a Newton type method. This idea was previously used by Friedland et al. in [12] to solve the Inverse Additive Eigenvalue Problem (IAEP). They proposed up to four methods, known as Method I, II, III and IV, which have been of great importance in the specialized literature. Thus, the Method III was adapted by Chu [11] deriving a discrete method for solving the IASVP, which was modified later by Bai et al, to construct in the Intercept method [9] for solving the IASVP too.
Other approach was proposed by Chen and Chu. In [10] they developed a method denoted as Lift&Project to solve the IAEP. It is constructed as a modification of the conventional alternating projection method, and it presents global although linear convergence.
Both of the above cited methods, Newton approach and Lift&Project, can be easily adapted to solve the IASVP, but they both have a high computational complexity and in the case of Lift&Project method a slow speed of convergence. Thus, the developing of parallel algorithms is of great importance in order to solve this kind of problems in a reasonable time.
In this work we proposed two parallel algorithms for solving the IASVP: the first one, denoted as MI, is an adaptation of the Method I [12] to solve the IASVP formulated like a nonlinear problem; Newton’s method is used directly to solve the nonlinear system, and consequently the algorithm converges locally. The second one, denoted as LP, is a global convergent algorithm developed as an adaptation of the Lift&Project [10], to solve the IASVP formulated like a least squares problem.
All our algorithms were implemented using portable standard packages. For the serial algorithms we used the LAPACK library [1] and BLAS computational kernels [7], and to implement the parallel codes we used the ScaLAPACK [2], PBLAS [3] and BLACS [6] libraries on the top of the MPI [4] communication library. Experimental results have been obtained on a cluster of PCs.
The rest of the paper is organized in the following form. In section 2, we consider the IASVP like a nonlinear problem and derive MI algorithm to solve it; we also present the serial algorithm. In section 3 we develop the LP algorithm to solve the IASVP like a least squares problem. The serial algorithm is presented too. Section 4 is devoted to parallel algorithms and experimental results. We outline our conclusions in section 5.
2 A local convergence Newton type algorithm
In this section we use the idea of Method I [12] to develop a Newton type algorithm (MI) for solving the IASVP. Let us consider the singular value decomposition of matrix A(c) defined in (1): A(c)=P(c)S(c)Q(c)^{t}. To solve the IASVP we are requiring S(c)=S*. Hence, we can construct the vector function F(c) = [F_{i}(c)]_{ i}_{=1,}_{n} =[S(c)_{i } S*_{i}]_{i}_{=1,}_{n}, where S(c)_{i} is the ith singular value of A(c) and S*_{i} is the ith singular value to be assigned. Thus we can transform the IASVP in the problem of finding the solution of the nonlinear system in c
(2) F(c) = [S(c)_{i } S*_{i}]_{i}_{=1,}_{n} = 0.
If Newton’s method is applied to solve this nonlinear system, a succession of vectors must be computed in order to approximate the solution of (2). Then, if c^{(}^{k}^{)} is the kth element of this succesion, the (k+1)th element is given by the expression [5]
(3) c^{(}^{k}^{+1)} = c^{(}^{k}^{)}  J(c^{(}^{k}^{)})^{1}F(c^{(}^{k}^{)}),
where J^{(}^{k}^{)}=J(c^{(}^{k}^{)}) is the Jacobian matrix of F(c) evaluated at c^{(}^{k}^{)}. By using that A(c^{(}^{k}^{)}) = P(c^{(}^{k}^{)})S(c^{(}^{k}^{)})Q(c^{(}^{k}^{)})^{t} is the singular value decomposition of A(c^{(}^{k}^{)}) and hence columns of P(c^{(}^{k}^{)}) and of Q(c^{(}^{k}^{)}) are orthogonal, p(c^{(}^{k}^{)})_{i}^{t}p(c^{(}^{k}^{)})_{i}^{ }= q(c^{(}^{k}^{)})_{i}^{t}q(c^{(}^{k}^{)})_{i}^{ }=1, it can be proved that the Jacobian matrix of F(c^{(}^{k}^{)}) is
(4) J^{(}^{k}^{)} = [p(c^{(}^{k}^{)})_{i}^{t }A_{j} q(c^{(}^{k}^{)})_{i}]_{ i,j}_{=1,}_{n}.
Hence, having in mind the definition of A(c^{(}^{k}^{)}) (1), F^{(}^{k}^{) }= F(c^{(}^{k}^{)}) can be expressed as
(5) F^{(}^{k}^{)} = [p(c^{(}^{k}^{)})_{i}^{t }A_{0 }q(c^{(}^{k}^{)})_{i} – S*_{i}]_{ i}_{=1,}_{n} + J^{(}^{k}^{)}c^{(}^{k}^{)}
Thus, by using (4) and (5), expression (3) can be transformed in:
J^{(}^{k}^{)}c^{(}^{k}^{+1)}–
J^{(}^{k}^{)} c^{(}^{k}^{)} = – [
p(
c^{(}^{k}^{)})
_{i}^{t }A_{0 }q(
c^{(}^{k}^{)})
_{i}
S*i]
_{ i}_{=1,}_{n}+
J^{(}^{k}^{)}c^{(}^{k}^{)}.
Then, c^{(}^{k}^{+1)} can be found by solving the linear system J^{(}^{k}^{)} c^{(}^{k}^{+1)} = b^{(}^{k}^{)} , where
(6) b^{(}^{k}^{)} = S*  [p(c^{(}^{k}^{)})_{i}^{t } A_{0 }q(c^{(}^{k}^{)})_{i}]_{ i}_{=1,}_{n}.
So, a Newton’s iteration has been derived and if the initial guest c^{(0)} is close enough to the solution the iteration converges quadratically to point c* [5], such that singular values of A(c*) coincides with S*. The corresponding algorithm is given in the next paragraph.

Sequential Algorithm
MI algorithm for IASVP
1.
Compute A^{(0) }(* in accordance with (1) *)
2. [P^{(0)},S^{(0)},Q^{(0)}] = svd(A^{(0)})
3. error =  S^{(0)} S*_{2}
4. For k = 0,1,..., While error > tol
4.1 Compute J^{(}^{k}^{)} (* in accordance with (4) *)
4.2 Compute b^{(}^{k}^{)} (* in accordance with (6) *)
4.3 Compute c^{(}^{k}^{+1)} solving J^{(}^{k}^{)}c^{(}^{k}^{+1) }= b^{(}^{k}^{)}
4.4 Compute A^{(}^{k}^{+1) }(*in accordance with (1) *)
4.5 [P^{(}^{k}^{+1}^{)},S^{(}^{k}^{+1}^{)},Q^{(}^{k}^{+1}^{)}] = svd(A^{(}^{k}^{+1}^{)})
4.6 error =  S^{(}^{k}^{+1)} S*_{2}
To implement some steps of this algorithm we used efficient routines of LAPACK:
dgesvd to compute the singular value decomposition of a matrix (steps 2 and 4.5),
dgetrf and
dgetrs to compute the lu factorization of a matrix and solve a linear system of equations, respectively (step 4.3). The other steps have been
implemented by the authors, using routines of BLAS:
daxpy to add two vectors (steps 1, 3, 4.4 and 4.6),
dnrm to compute the 2norm of a vector (steps 3 and 4.6),
ddot to compute the dot product (steps 4.1 and 4.2),
dscal to scale a vector by a constant (step 4.2),
dgemm to compute matrixvector product (steps 4.1 and 4.2).
The cost of this algorithm, when m=n=l, is:
T_{secMI} = O((44/3)
m^{3}) +
k O(2
m^{4}) Flops
where k represents the number of iterations performed to reach the convergence. It can be note the high computational cost which we try to alleviate by using parallel computing techniques.
3 A global convergence method
The Lift&Project method was developed for the IAEP in [10]. In this section we derive the LP method to solve de IASVP in a similar way.
Let us define G(S*), as the set wich contains all the matrices whose singular values are S*, and L(c) as the set whose elements can be expressed in the form (1):
G(S*) = {PS*Q^{t}P^{mxm},Q^{nxn},orthogonals}
L(c) = {A(c)  c^{l}}.
We want to find the intersection of sets G(S*) and L(c), by using distances minimization techniques between both sets.
The method can be organized in an iterative way with two stages: the lift stage and the project stage. Thus, in each iteration k, the first stage is the lift stage: Given c^{(}^{k}^{)}, and consequently A^{(}^{k}^{)} = A(c^{(}^{k}^{)})L(c), find a matrix X^{(}^{k}^{)}G(S*) so that the distance between A^{(}^{k}^{) }and X^{(}^{k}^{) } is equal to the distance between A^{(}^{k}^{)} and G(S*). This is achieved by computing SVD of A^{(}^{k}^{)}=P^{(}^{k}^{)}S^{(}^{k}^{)}Q^{(}^{k}^{)t}, and then computing X^{(}^{k}^{)}=P^{(}^{k}^{)}S*Q^{(}^{k}^{)t}. The element of G(S*) closest to A^{(}^{k}^{)} is X^{(}^{k}^{)}=P^{(}^{k}^{)}S*Q^{(}^{k}^{)t}. In effect, if Y^{(}^{k}^{)}=US*V^{t} were another element of G(S*) closest to A^{(}^{k}^{)} then
 P^{(}^{k}^{) }S^{(}^{k}^{) }Q^{(}^{k}^{) t }– US*V^{t }_{F} =
=  P^{(}^{k}^{) }S^{(}^{k}^{) }Q^{(}^{k}^{) t }– P^{(}^{k}^{) }S*Q^{(}^{k}^{) t }+
+ P^{(}^{k}^{) }S*Q^{(}^{k}^{) t }– U S*V ^{t }_{ F} =
=  P^{(}^{k}^{)}(S^{(}^{k}^{)}S*)Q^{(}^{k}^{)t} + P^{(}^{k}^{)}S*Q^{(}^{k}^{)t} US*V^{t}_{ F} =
=  S^{(}^{k}^{) } S* + S*  P^{(}^{k}^{)t }U S*V^{t }Q^{(}^{k}^{) }_{ F}
 S^{(}^{k}^{)}S* _{F} +  S*  P^{(}^{k}^{)t }U S*V^{t }Q^{(}^{k}^{) }_{ F}
thus obtaining the minimum value of P^{(}^{k}^{)}S^{(}^{k}^{)}Q^{(}^{k}^{)t}–US*V^{t}_{F} just when U=P^{(}^{k}^{)} and V=Q^{(}^{k}^{)}. Note that orthogonal transformations conserve the Fnorm.
The projection stage can be expressed as: Given
X^{(}^{k}^{)}
G(
S*) find
c^{(}^{k}^{+1)}, and hence
A^{(}^{k}^{+1)}
L(
c), such that the distance between
X^{(}^{k}^{) }and
A^{(}^{k}^{+1) }equals the distance between
X^{(}^{k}^{)} and
L(
c). This is achieved by finding
c^{(}^{k}^{+1)} which minimizes the distance between
X^{(}^{k}^{) }and
A^{(}^{k}^{+1)}, that is: find
c^{(}^{k}^{+1)} which solves the problem
(7)
.
Let us denote
F^{(}^{k}^{+1)} =
. The minimization problem (7) can be solved by equating the gradient of
F^{(}^{k}^{+1)} to zero and
solving the linear system
(8)
.
By differentiating F^{(}^{k}^{+1)} it is easy to transform [13] the linear system (8) in Tc ^{(}^{k}^{+1) }= d^{ (}^{k}^{)}, where
(9) T = [ trace(A_{i}^{t}A_{r}) ]_{r,i}_{=1,}_{l} and
(10) d^{(}^{k}^{)} = [ trace(A_{r}^{t }(X^{(}^{k}^{)}  A_{0}^{t}) ]_{r }_{=1,}_{l} .
It is worth to note that matrix T does not depend from the value of c^{(}^{k}^{+1)} in the different iterations, thus it can be computed once at the beginning.^{ }Now, it is easy to construct an iterative algorithm (LP algorithm) to find a solution of the IASVP. The LP algorithm converges to a stationary point of the stated minimization problem in the sense that
 A^{(}^{k}^{+1) } X^{(}^{k}^{+1) }_{F}  A^{(}^{k}^{) } X^{(}^{k}^{) }_{F}.
Thus, in the kth projection stage, c^{(}^{k}^{+1) }is computed to minimize A^{(}^{k}^{+1)}X^{(}^{k}^{) }_{F}, then
 A^{(}^{k}^{+1) } X^{(}^{k}^{) }_{F}  A^{(}^{k}^{) } X^{(}^{k}^{) }_{F}.
On the other hand, in the (k+1)th lift stage the matrices that minimize  A^{(}^{k}^{+1) } X^{(}^{k}^{+1) }_{F} are P^{(}^{k}^{+1)} and Q^{(}^{k}^{+1)}, then
 A^{(}^{k}^{+1) } P^{(}^{k}^{+1)} S*Q^{(}^{k}^{+1) t }_{F}  A^{(}^{k}^{+1) } P^{(}^{k}^{)} S*Q^{(}^{k}^{) t}_{ F},
hence
 A^{(}^{k}^{+1) } X^{(}^{k}^{+1)} _{F}  A^{(}^{k}^{+1) } X^{(}^{k}^{) }_{ F}.

Sequential Algorithm
LP algorithm for IASVP
1. Compute
T (* in accordance with (9) *)
2. Compute lu factorization of T
3. Compute inverse of T
4. stop = 0
5. For k = 0,1,..., While stop=0
5.1 Compute A^{(}^{k}^{)} (* in accordance with (1) *)
5.2 [P^{(}^{k}^{)},S^{(}^{k}^{)},Q^{(}^{k}^{)}] = svd(A^{(}^{k}^{)})
5.3 X^{(}^{k}^{)} = P^{(}^{k}^{)}S*Q^{(}^{k}^{)t}
5.4 Compute d^{(}^{k}^{) }(* in accordance with (10) *)
5.5 Compute c^{(}^{k}^{+1)} solving Tc^{(}^{k}^{+1) }= d^{(}^{k}^{)}
5.6 error =  c^{(}^{k}^{+1)} c^{(}^{k}^{)}_{2}
5.7 If error < tol Then stop = 1
To implement this algorithm we have used the following routines of BLAS in steps 1, 5.1, 5.3, 5.4 5.5 and 5.6:
dscal,
ddot,
dgemv (to compute matrixmatrix product),
dgemm,
daxpy. The following routines of LAPACK have also been used:
dgesvd at step 5.2,
dgetrf at step 2,
dgetri (to compute the inverse of a matrix) at step 3.
The computational cost of this algorithm when m=n=l can be approximated as:
T_{seqLP} = O(
m^{4}) +
k O((56/3)
m^{3}) Flops,
where k is the numbers of iterations necessary to reach the convergence. As MI, LP has a high computational cost too. In the next section we present the parallel algorithms corresponding to both MI and LP algorithms.