AI.2.4. Cost of Computation
The Danielson-Lanczos Lemma, as give 0 in Eq. (A1.2.9), can be used to calculate
the cost of the computation. Let C (N) be the cost for evaluating the tzansform of N
points. Combining the transforms of N points in Eq. (A1.2.9) requires effort proportional
to N because of the multiplication of the terms by W n and the subsequent addition. If c is
a constant reflecting the cost of such operations, then we have the following result for
C(N).
C(N) = 2C()+cN (A1.2.11)
This yields a recurrence relation that is known to result into an O(NlogN) process.
Viewed another way, since there are 1og2N levels to the recursion and cost 0 (N) at each
level, the total cost is 0 (N log2 N).
274
AL3. COOLEY-TUKEY ALGORITHM
The Danielson-Lanczos Lemma presented a recursive solution to computing the
Fourier Transform. The role of the recursion is to subdivide the original input into
smaller lists that are eventually combined according to the lemma. The starting point of
the computation thus begins with the adjacent pairing of 1-point DFTs. In the preceding
discussion, their order was determined by the recurslye subdivision. An alternate method
is available to determine their order directly, without the need for the recursive algorithm
given above. This result is known as the Cooley-Tukey, or decimation-in-time algorithm.
To describe the method, we define the following notation. Let F ee be the list of
even-indexed terms taken from F e. Similarly, F e is the list of odd-indexed terms taken
fromF e. In general, the suing of symbols in the superscript specifies the path traversed
in the tree representing the recursive subdivision of the input data (Fig. A 1.2). Note that
the height of the tree is log2 N and that all leaves denote 1 -point DFTs that are actually
elements from the input numbers. Thus, for every pattern of e's and o's, numbering
log2 N in all,
Feeee'"ee = fn for some n (A1.3.1)
The problem is now to directly find which value of n corresponds to which pattern
ofe's and o's in Eq. (A1.3.1). The solution is surprisingly simple: reverse the pattern of
e's and o's, then let e = 0 and o = 1, and the resulting binary string denotes the value of
n. This works because the successive subdivisions of the data into even and odd are tests
of successive low-order (least significant) bits of n. Examining Fig. A1.2, we observe
that traversing successive levels of the tree along the e and o branches corresponds to
successively scanning the binary value of index n from the least significant to the most
significant bit. The strings appearing under the bottom row designates the traversed path.
The procedure for N = 8 is summarized in Table AI.1. There we see the binary
indices listed next to the corresponding array elements. The first subdivision of the data
into even- and odd-indexed elements amounts to testing the let(st significant (rightmost)
bit. If that bit is 0, an even index is implied; a 1 bit designates an odd index. Subsequent
subdivisions apply the same bit tests to successive index bits of higher significance.
Observe that in Fig. A1.2, even-indexed lists move down the left branches of the tree.
Therefore, the order in which the leaves appear from left to fight indicate the sequence of
ls and Os seen in the index while scanning in reverse order, from least to most significant
bits.
275
Original Index
ooo
OOl
OlO
Oll
lOO
lol
11o
111
Original Array
f0
fl
f2
f3
f4
f5
f6
f7
Bit-reversed Index
000
100
010
110
001
101
011
ill
Reordered Array
Table AI.I: Bit-reversal and array reordering for input into FFT algorithm.
The distinction between the Cooley-Tukey algorithm and the Danielson-Lanczos
Lemma is subtle. In the latter, a recursire procedure is introduced in which to compute
the DFT. This procedure is responsible for decimating the input signal into a sequence
that is then combined, during the traversal back up the tree, to yield the 'ansform output.
In the Cooley-Tukey algorithm, though, the recursion is unnecessary since a clever bit-
reversal trick is introduced to achieve the same disordered input. Furthermore, directly
reordering the input in this way simplifies the bookkeeping necessary in recombining
terms. Source code for the Cooley-Tukey FFT algorithm, written in C, is provided in
Section A1.5.
A1.3.1. Computational Cost
The computation effort for evaluating the FFT is easily determined frm this formu-
lation. First, we observe that there are log2 N levels of recursion necessary in computing
Fn. At each level, there are N/2 butterflies to compute the F, and Fn terms (see Fig.
A1.3). Since each butterfly requires one complex multiplication and two complex addi-
tions, the total number of multiplications and additions is (N/2) log2 N and N log2 N,
respectively. This 0 (N log2 N) process represents a considerable savings in computation
over the 0 (N 2) approach of direct evaluation. For example if N > 512, the number of
multiplications is reduced to a fraction of 1 percent of that required by direct evaluation.
276
A1.4. COOLEY-SANDE ALGORITHM
In the Cooley-Tukey algorithm, the given data sequence is reordered according to a
bit-reversal scheme before it is recombined to yield the transform output. The reordering
is a consequence of the Danielson-Lanczos Lemma that calls for a recursive subdivision
into a sequence of even- and odd-indexed elements.
The Cooley-Sande FFT algorithm, also known as the decimation-in-frequency algo-
rithm, calls for recursively splitting the given sequence about its midpoint, N/2.
Fn = fk e-i2v'nklN (A1.4.1)
k=0
(N/2)-I N-1
fke -i2v'nk/N + fke -i2nnk/N
k=0 k=NI2
(N/2)-t (N/2)-i
fke-i2nnk/N+ fk+Ni2 e-i2nn(k+N/2)lN
k=0 k=0
(N/2)-I
= k__O [fk + fk+N'2e-in] e-i2't'/
Noticing that the e -nin factor reduces to +1 and -1 for even and odd values of n, respec-
tively, we isolate the even and odd terms by changing n to 2n and 2n+l.
(N/2)-, [
F2n = fk + fk+N! e-i2u(2n)klN 0 --< n < NI2 (A1.4.2)
k=O
(N/2)-I c
k=O
(N/2)-i r
F2+l = Z [fk -f,t+V/2J e -12n(2n+l)'/v 0 < n < N/2 (Ai.4.3)
k=0
(N/2)- 1 r 1
Thus, the even- and odd-indexed values of F are given by the DFTs of f and fff where
f[ = f + fl*+N2 (A1.4.4)
f,= [f,--f/+N/2] W (A1.4.5)
The same procedure can now be applied to f[ and fl. This sequence is depicted in Fig.
A1.4. The top row represents input list fcontaining 8 elements. Again, note that lists am
delimited by bold lines. Regarding the butterfly notation, the lower left branches denote
Eq. (A1.4.4) and the lower right branches denote Eq. (A1.4.5).
277
Since all the even-indexed values of F need f,, a new list is created for that pur-
pose. This is shown as the left list of the second row. Similarly, the ff list is generated,
appearing as the second list on that row. Of course, the list sizes diminish by a factor of
two with each level since generating them makes use of f, and fk+v/2 to yield one ele-
ment in the new list. This process of computing Eels. (A1.4.4) and (A1.4.5) to generate
new lists terminates when N = i, leaving us F, the transform output, in the last row.
In contrast to the decimation-in-time algorithm, in which the input is disor-
dered but the output is ordered, the opposite is true of the decimation-in-frequency FFT
algorithm. However, reordering can be easily accomplished by reversing the binary
representation of the location index at the end of computation. The advantage of this
algorithm is that the values of f are entered in the input array sequentially.
f0 fl f2 f3 f4 f5 f6 f7
0 I I 0 I I 0 I I i I
Figure A1.4: Decimation-in-frequency FFT algorithm.
278
A1.5. SOURCE CODE
This section provides sottree code for the recursive FFT procedure given in Section
A1.2, as well as code for the Cooley-Tukey algorithm described in Section A1.3. The
programs e written in C and make use of some library routines described below.
The data is passed to the functions in quads. A quad is an image contool block, con-
raiding information about the image. Such data includes the image dimensions (height
and width), pointers to the uninterleaved image channels (buf [0] ... buf [15]), and other
necessary information. Since the complex numbers have real and imaginary com-
ponents, they occupy 2 channels in the input and output quads (channels 0 and 1). A
brief description of the library routines included in the listing is given below.
1) cpqd (q 1,q 2) simply copies quad q 1 into q 2.
2) cpqdinfo (q 1,q 2) copies the header information of q 1 into q 2.
3) NEWQD allocates a quad header. The image memory is allocated later when the
dimensions are known.
4) getqd(h,w, type) returns a quad containing sufficient memory for an image with
dimensions h xw and channel datatypes type. Note that FFT_TYPE is defined as 2
channels of type float.
5) freeqd (q) frees quad q, leaving it available for any subsequent getqd call.
6) divconst (q 1,num, q 2) divides the data in q 1 by num and puts the result in q 2. Note
that hum is an array of numbers used to divide the corresponding channels in q 1.
7) Finally, PI2 is defined to be 2, where = 3.141592653589793.
AI.5.1. Recursive FFT Algorithm
fftl D(ql ,dir,q2) /* Fast Fourlet Transform (1 D) */
int dir; /* dir=0: forward; dir=l:inverse */
qdP ql, q2; /* ql =input; q2=output */
(
int i, N, N2;
float *rl, *il, *r2, *i2, *ra, *ia, *ca, *lb;
double FCTR, fctr, a, b, c, s, num[2];
qdP qa, qb;
cpqdinfo(ql, q2);
N = ql->width;
rl = (float *) ql->buf[0];
il = (float *) ql->buf[1];
r2 = (float *) q2->buf[0];
12 = (float *) q2->buf[1];
if(N == 2) {
a = rl[0] + r111];
b: i1101 + i111];
r211] = rl[0]- r111];
i211] = i110]- i111];
r210] = a;
i210] = b;
} else {
N2=N/2;
qa = getqd(1, N2, FFT_TYPE);
qb = getqd(1, N2, FFT_TYPE);
ra = (float *) qa->buf[0]; ia = (float *) qa->buf[1];
ca = (float *) qb->buf[0]; ib = (float *) qb->buf[1];
/* split list into 2 halves: even and odd */
for(l=0; i
ra[i]= *rl++; ia[i] = *i1++;
Ca[i] = *rl++; ib[i] = *i1++;
}
/* compute fit on both lists */
ftl D(qa, dir, qa);
fftlD(qb, dir, qb);
/* build up coefficients */
if(!dir) /* forward */
FCTR = -PI2 / N;
else FCTR = PI2/N;
for(fctr=i=0; I
c: cos(fctr);
s = sin(fctr);
a = c*rb[i] - s*ib[i];
/* F(0)=f(0)+f(1); F(1)=f(0)-f(1) */
/* a,b needed when rl=r2 */
279
280
r2[il = ra[il + a;
r2[i+N21: ra[il - a;
a = s*rb[i] + c*ib[i];
i2[i] = ia[i] + a;
i2[i+N2] = ia[i] - a;
}
freeqd(qa);
freeqd(qb);
if(dir) { /* inverse: divide by log N */
num[0] = num[1] = 2;
divconst(q2, num, q2);
A1.5.2. Cooley-Tukey FFT Algorithm
fit1D(q 1, dir, q2) /* Fast Fourier Transform (1 D) */
int dir; /* dir=l: forward; dir= -1: inverse */
qdP ql, q2; /* Uses bit reversalto avoid recursion */
{ /* and trig recurrence for sin and cos */
int i, j, IogN, N, N1, NN, NN2, itr, offst;
unsigned int a, b, msb;
float *rl, *r2, *il, *i2;
double wr, wi, wpr, wpi, wtemp, theta, tempr, tempi, num[2];
qdP qsrc;
if(q1 == q2) {
qsm = NEWQD;
cpqd(ql, qsrc);
} else qsm =ql;
cpqdinfo(ql, q2);
rl = (float *) qsrc->buf[0];
il = (float *) qsrc->but[1];
r2 = (float *) q2->buf[0];
i2 = (float *) q2->buf[1];
N = ql->width;
N1 =N-l;
for(IogN=0,i=N/2; i; IogN++,i/=2); /* # of bits sig digits in N */
msb = LSB << (IogN-1);
for(i=1; i
a= 1;
b=O;
for(l=O; a && j
if(a & LSB) b I= (msb>>j);
a>>= 1;
}
/* swap complex numbers: [i] <--> [b] */
r2[i] = rl[b]; i2[i] = il[b];
r2[b] = rl[i]; i2[b] = il[i];
}
/* copy elements 0 and N1 since they don't swap */
r210] = rl[O]; i210] = i110];
r2[N1] = rl[N1]; i2[N1] = il[Nl];
/* NN denotes the number of points in the transform.
It grows by a power of 2 with each iteration.
NN2 denotes NN/2 which is used to trivially generate
NN points from NN2 complex numbers.
Computation of the sines and cosines of multiple
angles is made through recurrence relations.
wr is the cosine for the real terms; wi is sine for
281
282
the Imaginary termS.
'/
NN=I;
for(itr=0; ttr
NN2 = NN;
NN <<=1; /*NN*=2*/
theta = -PI2 / NN * dir;
wtemp = sin(.5*theta);
wpr = -2 * wtemp * wtemp;
wpi = sin(theta);
wi =04
for(offst=0; offstfor(i=offst; i
j=i+NN2;
tempr = wr*r2[j] - wi*i2[j];
tempi = wi*r2[I] + wr*i2[];
r2[j] = r2[i] - tempr;
r2[i] = r2111 +ternpr;
i2[j] = i2[i] - ternpi;
i2[i] = i2[i] + tempi;
/*trigonometric recurrence */
wr = (wtemp=wr)*wpr -wi'wpi + wr;
wi = wi*wpr + wtemp*wpi + wi;
}
)
if(dir == -1) { /* inverse transform: divide by N '/
' num[0] = num[1] = N;
divconst(q2, hum, q2);
If(qsrc I= ql) lreeqd(qsrc);
Appendix 2
INTERPOLATING CUBIC SPLINES
The purpose of this appendix is to review the fundamentals of interpolating cubic
splines. We begin by defining a cubic spline in Section A2.1. Since we are dealing with
interpolating splines, constraints are imposed to guarantee that the spline actually passes
through the given data points. These constraints are described in Section A2.2. They
establish a relationship between the known data points and the unknown coefficients used
to completely specify the spline. Due to extra degrees of freedom, the coefficients may
be solved in terms of the first or second derivatives. Both derivations are given in Sec-
tion A2.3. Once the coefficients are expressed in terms of either the first or second
derivatives, these unknown derivatives must be determined. Their solution, using one of
several end conditions, is given in Section A2.4. Finally source code, written in C, is
provided in Section A2.5 to implement cubic spline interpolation for uniformly and
nonuniformly spaced data points.
A2.1. DEFINITION
A cubic spline f (x) interpolating on the partition x0 < x < .. ß < xn-1 is a func-
tion for which f (xt,)=y,¾ It is a piecewise polynomial function that consists of n-i
cubic polynomials ft* defined on the ranges [xk,xk+l]. Furthermore, ft, are joined at
x (k=i,...,n-2) such that f and f[' are continuous. An example of a cubic spline pass-
ing through n data points is illustrated in Fig. A2.1.
The k t polynomial piece, f, is defined over the fixed interval [x,xk+ ] and has the
cubic form
f(x) = A3(x -x/) 3 +A2(x -x,0 2 +A l(X -x/) +A0 (A2.1.1)
283
284
f (x) f5
fo ft f
X0 X 1 X2 X3 X4 X5 X6
Figure A2.1: Cubic spline.
A2.2. CONSTRAINTS
Given only the data points (x,,y,t), we must determine the polynomial coefficients,
A, for each partition such that the resulting polynomials pass through the data points and
are continuous in their first and second derivatives. These conditions require ft, to satisfy
the following constraints
y, = f,t(x,) = A 0 (A2.2.1)
Yk+l = fJt(x,+O = A3A x + A2A x 2 + A A x,t + Ao
y,[ = f(x&) = A i
Y+I = J(xt+l) = 3A3zx& 2 + 2A2Ax. + A 1
(A2.2.2)
y' = f'(xt,) = 2,42 (A2.2.3)
Y$/+t = '+t (xt,) = 6A3ax,+2A2
Note that these conditions apply at the data points (xk,Yk). If the xk's are defined on a
regular grid, they are equally spaced and Axn =xe+i -x = 1. This eliminates all of the
Ax,t terms in the above equations. Consequently, Eqs. (A2.2.1) through (A2.2.3) reduce
to
285
yk = A o (A2.2.4)
Yk+l = A3 +A2 +AI +A0
y = A (A2.2.5)
y+l = 3A3+2A2+A1
y' = 2A 2 (A2.2.6)
Y+i = 6A3 + 2A2
In the remainder of this appendix, we will refrain from making any simplifying assump-
tions about the spacing of the data points in order to treat the more general case.
A2.3. SOLVING FOR THE SPLINE COEFFICIENTS
The conditions given above are used to find A 3, A2, A 1, and A0 which are needed
to define the cubic polynomial piece ft. Isolating the coefficients, we get
A0 = y (A2.3.1)
At = y
1 [3AYt-2y-y+,]
A2 = AXk
i f 2AY ,+ , ]
A3 = AX'- [-- x +Yk Y+iJ
In the expressions for A 2 and A 3, k = 0,..., n-2 and Ay,t = y+l -Y,t.
Y,t+l-A3Ax-yAx,t--y ' (A2.3.2a)
A 2 = AXk 2
A2.3.L Derivation of A2
From (A2.2.1),
From (A2.2.2),
Y+I - 3A 3Ax - y
2A 2 - (A2.3.2b)
Finally, A 2 is derived from (A2.3.2a) and (A2.3.2b)
286
A2.3.2. Derivation of A3
From (A2.2.1),
From (A2.2.2),
y+ - A A X 2 -- yA x -- ye
A3 = AX (A2.3.2c)
y+i - 2A 2/X x: - y
3A 3 = Xk2 (A2.3.2d)
Finally, A 3 is derived from (A2.3.2c) and (A2.3.2d)
Axk j
The equations in (A2.3.1) express the coefficients of f/ in terms of xk, Yk, x/+,
y+, (known) and y, y+ (unknown). Since the expressions in Eqs. (A2.2.1) through
(A2.2.3) present six equations for the four A i coefficients, the A terms could alternately
be expressed in terms of second derivatives, instead of the first derivatives given in Eq.
(A2.3.1). This yields
A0 = y (A2.3.3)
a l A y Ax, [Y,+i + 2y')
Ax 6
y'
A2=-
A3 = 6'-x
A2.3.3. Derivation ofA 1 and A 3
From (A2.2.1),
y+ - A 3Ax 3 - -Ax 2 - y
A1 =
Ax/c
From (A2.2.3),
y+ -y'
A3
6Ax/
Plugging Eq. (A2.3.4b) into (A2.3.4a),
a = Ay AXiy+ _y,,]-½Ax,
Ax 6
(A2.3.4a)
6Axk
(A2.3.4b)
AYk s I y+l + 2y,J (A2.3.4c)
Ax
287
A2.4, EVALUATING THE UNKNOWN DERIVATIVES
Having expressed the cubic polynomial coefficients in terms of data points and
derivatives, the unknown derivatives still remain to be determined. They are typically
not given explicitly. Instead, we may evaluate them from the given constcaims.
Although the spline coefficients require either the first derivatives or the second deriva-
tives, we shall derive both forms for the sake of completenessß
A2.4.1. First Derivatives
We begin by deriving the expressions for the first derivatives using Eqs. (A2.2.1)
through (A2.2.3). Recall that the A coefficients expressed in terms of y' made use of
Eqs. (A2.2.1) and (A2.2.2). We therefore use the remaining constraint, given in Eq.
(A2.2.3), to express the desired relation. Constcalnt Eq. (A2.2.3) defines the second
derivative off at the endpoints of its interval. By establishing that f'[_ (x) = f'(x,0, we
enfome the continuity of the second derivative across the intervals and give rise to a rela-
tion for the first derivatives.
6A3-'Ax,_i +2A - = 2A2 (A2.4.1)
Note that the superscripts refer to the interval of the coefficient. Plugging Eq. (A2.3.1)
into Eq. (A2.4.1) yields
A_i -12 Ayk-1 1 6 Ayk-1 --2y
AXk_i Ak- 1 AXk_ 1
J
4
AXk_ 1 -- AXk_-- AXk
After collecting the y' terms on one side, we have Eq. (A2.4.2):
1 +y,[2[ "'1 + L'I ] +y+! [Ax_l AxkJ
= 3/
Y- -1 L L x_l xj
Equation (A2.4.2) yields a matrix of n-2 equations in n unknowns. We can reduee the
need for division operations by multiplying both sides by AXe4 AXe. This gives us the
following system of equations, with 1
h=Ax and rk=Ay/Ax.
hi 2(ho+h ) ho
h2 2(hi+h2) hl
hn-2 2(hn-3+hn-2) hn-3
y
Y [ 3(roh + rtho)
Y ] 3(rh2+r2h)
_ [ 3 (r n -3 hn -2 - rn -2 hn -3 )
288
When the two end tangent vectors y6 and y_ are specified, then the system of
equations becomes determinable. One of several boundary conditions described later
may be selected to yield the remaining two equations in the matrix.
A2.4.2. Second Derivatives
An alternate, but equivalent, course of action is to determine the spline coefficients
by solving for the unknown second derivatives. This procedure is virtually identical to
the approach given above. Note that while there is no particular benefit in using second
derivatives rather than first derivatives, it is presented here for generality.
As before, we note that the A coefficients expressed in terms of y" made use of Eqs.
(A2.2.1) and (A2.2.3). We therefore use the remaining constraint, given in Eq. (A2.2.2),
to express the desired relation. Constraint Eq. (A2.2.2) defines the first derivative offk at
the endpoints of its interval. By establishing that -1 (xk) =f(xk) we enforce the con-
tinuity of the first derivative across the intervals and give rise to a relation for the second
derivatives.
3A-iAx2-! +2A-iAx,t-I +A -t = A1 (A2.4.3)
Again, the superscripts refer to the interval of the coefficient. Plugging Eq. (A2.3.3) into
Eq. (A2.4.3) yields
AX_i 6 ' [ y'[' + 2y_ =
AX 6 Y+i + 2y
After collecting the y" terms on one side, we have
Ax ax_l j
Equation (A2.4.4) yields the following matrix of n-2 equations in n unknowns. Again,
for notational convenience we let h,t =Ax,t and rk =Aye/Axe.
Share with your friends: |