An fpga implementation of the Smooth Particle Mesh Ewald Reciprocal Sum Compute Engine (rsce)


Figure 3 - 3-D View of the Charge Grid



Download 1.53 Mb.
Page24/25
Date09.08.2017
Size1.53 Mb.
1   ...   17   18   19   20   21   22   23   24   25

Figure 3 - 3-D View of the Charge Grid



Figure 4 - Interpolating a Charge @ (15.369, 8.850, 39.127) in a 2-D Mesh
Table 4 - Interpolating the Charge @ (15.369, 8.850, 39.127) in 3-D (O(P*P*P) Steps)

q[

(

i

+

(

j

+

k

*64)*64*2)+1]

=

θ1[

ith1

+n*4] * θ2[

ith2

+n*4] x θ3[

ith3

+n*4]*q

q[

(

13

+

(

6

+

37

*64)*64*2)+1]

=

θ1[

1

+n*4] * θ2[

1

+n*4] x θ3[

1

+n*4]*q

q[

(

14

+

(

6

+

37

*64)*64*2)+1]

=

θ1[

2

+n*4] * θ2[

1

+n*4] x θ3[

1

+n*4]*q

q[

(

15

+

(

6

+

37

*64)*64*2)+1]

=

θ1[

3

+n*4] * θ2[

1

+n*4] x θ3[

1

+n*4]*q

q[

(

16

+

(

6

+

37

*64)*64*2)+1]

=

θ1[

4

+n*4] * θ2[

1

+n*4] x θ3[

1

+n*4]*q

q[

(

13

+

(

7

+

37

*64)*64*2)+1]

=

θ1[

1

+n*4] * θ2[

2

+n*4] x θ3[

1

+n*4]*q

q[

(

14

+

(

7

+

37

*64)*64*2)+1]

=

θ1[

2

+n*4] * θ2[

2

+n*4] x θ3[

1

+n*4]*q

q[

(

15

+

(

7

+

37

*64)*64*2)+1]

=

θ1[

3

+n*4] * θ2[

2

+n*4] x θ3[

1

+n*4]*q

q[

(

16

+

(

7

+

37

*64)*64*2)+1]

=

θ1[

4

+n*4] * θ2[

2

+n*4] x θ3[

1

+n*4]*q

q[

(

13

+

(

8

+

37

*64)*64*2)+1]

=

θ1[

1

+n*4] * θ2[

3

+n*4] x θ3[

1

+n*4]*q

q[

(

14

+

(

8

+

37

*64)*64*2)+1]

=

θ1[

2

+n*4] * θ2[

3

+n*4] x θ3[

1

+n*4]*q

q[

(

15

+

(

8

+

37

*64)*64*2)+1]

=

θ1[

3

+n*4] * θ2[

3

+n*4] x θ3[

1

+n*4]*q

q[

(

16

+

(

8

+

37

*64)*64*2)+1]

=

θ1[

4

+n*4] * θ2[

3

+n*4] x θ3[

1

+n*4]*q

q[

(

13

+

(

9

+

37

*64)*64*2)+1]

=

θ1[

1

+n*4] * θ2[

4

+n*4] x θ3[

1

+n*4]*q

q[

(

14

+

(

9

+

37

*64)*64*2)+1]

=

θ1[

2

+n*4] * θ2[

4

+n*4] x θ3[

1

+n*4]*q

q[

(

15

+

(

9

+

37

*64)*64*2)+1]

=

θ1[

3

+n*4] * θ2[

4

+n*4] x θ3[

1

+n*4]*q

q[

(

16

+

(

9

+

37

*64)*64*2)+1]

=

θ1[

4

+n*4] * θ2[

4

+n*4] x θ3[

1

+n*4]*q

q[

(

13

+

(

6

+

38

*64)*64*2)+1]

=

θ1[

1

+n*4] * θ2[

1

+n*4] x θ3[

2

+n*4]*q

q[

(

14

+

(

6

+

38

*64)*64*2)+1]

=

θ1[

2

+n*4] * θ2[

1

+n*4] x θ3[

2

+n*4]*q

q[

(

15

+

(

6

+

38

*64)*64*2)+1]

=

θ1[

3

+n*4] * θ2[

1

+n*4] x θ3[

2

+n*4]*q

q[

(

16

+

(

6

+

38

*64)*64*2)+1]

=

θ1[

4

+n*4] * θ2[

1

+n*4] x θ3[

2

+n*4]*q

q[

(

13

+

(

7

+

38

*64)*64*2)+1]

=

θ1[

1

+n*4] * θ2[

2

+n*4] x θ3[

2

+n*4]*q

q[

(

14

+

(

7

+

38

*64)*64*2)+1]

=

θ1[

2

+n*4] * θ2[

2

+n*4] x θ3[

2

+n*4]*q

q[

(

15

+

(

7

+

38

*64)*64*2)+1]

=

θ1[

3

+n*4] * θ2[

2

+n*4] x θ3[

2

+n*4]*q

q[

(

16

+

(

7

+

38

*64)*64*2)+1]

=

θ1[

4

+n*4] * θ2[

2

+n*4] x θ3[

2

+n*4]*q

q[

(

13

+

(

8

+

38

*64)*64*2)+1]

=

θ1[

1

+n*4] * θ2[

3

+n*4] x θ3[

2

+n*4]*q

q[

(

14

+

(

8

+

38

*64)*64*2)+1]

=

θ1[

2

+n*4] * θ2[

3

+n*4] x θ3[

2

+n*4]*q

q[

(

15

+

(

8

+

38

*64)*64*2)+1]

=

θ1[

3

+n*4] * θ2[

3

+n*4] x θ3[

2

+n*4]*q

q[

(

16

+

(

8

+

38

*64)*64*2)+1]

=

θ1[

4

+n*4] * θ2[

3

+n*4] x θ3[

2

+n*4]*q

q[

(

13

+

(

9

+

38

*64)*64*2)+1]

=

θ1[

1

+n*4] * θ2[

4

+n*4] x θ3[

2

+n*4]*q

q[

(

14

+

(

9

+

38

*64)*64*2)+1]

=

θ1[

2

+n*4] * θ2[

4

+n*4] x θ3[

2

+n*4]*q

q[

(

15

+

(

9

+

38

*64)*64*2)+1]

=

θ1[

3

+n*4] * θ2[

4

+n*4] x θ3[

2

+n*4]*q

q[

(

16

+

(

9

+

38

*64)*64*2)+1]

=

θ1[

4

+n*4] * θ2[

4

+n*4] x θ3[

2

+n*4]*q

q[

(

13

+

(

6

+

39

*64)*64*2)+1]

=

θ1[

1

+n*4] * θ2[

1

+n*4] x θ3[

3

+n*4]*q

q[

(

14

+

(

6

+

39

*64)*64*2)+1]

=

θ1[

2

+n*4] * θ2[

1

+n*4] x θ3[

3

+n*4]*q

q[

(

15

+

(

6

+

39

*64)*64*2)+1]

=

θ1[

3

+n*4] * θ2[

1

+n*4] x θ3[

3

+n*4]*q

q[

(

16

+

(

6

+

39

*64)*64*2)+1]

=

θ1[

4

+n*4] * θ2[

1

+n*4] x θ3[

3

+n*4]*q

q[

(

13

+

(

7

+

39

*64)*64*2)+1]

=

θ1[

1

+n*4] * θ2[

2

+n*4] x θ3[

3

+n*4]*q

q[

(

14

+

(

7

+

39

*64)*64*2)+1]

=

θ1[

2

+n*4] * θ2[

2

+n*4] x θ3[

3

+n*4]*q

q[

(

15

+

(

7

+

39

*64)*64*2)+1]

=

θ1[

3

+n*4] * θ2[

2

+n*4] x θ3[

3

+n*4]*q

q[

(

16

+

(

7

+

39

*64)*64*2)+1]

=

θ1[

4

+n*4] * θ2[

2

+n*4] x θ3[

3

+n*4]*q

q[

(

13

+

(

8

+

39

*64)*64*2)+1]

=

θ1[

1

+n*4] * θ2[

3

+n*4] x θ3[

3

+n*4]*q

q[

(

14

+

(

8

+

39

*64)*64*2)+1]

=

θ1[

2

+n*4] * θ2[

3

+n*4] x θ3[

3

+n*4]*q

q[

(

15

+

(

8

+

39

*64)*64*2)+1]

=

θ1[

3

+n*4] * θ2[

3

+n*4] x θ3[

3

+n*4]*q

q[

(

16

+

(

8

+

39

*64)*64*2)+1]

=

θ1[

4

+n*4] * θ2[

3

+n*4] x θ3[

3

+n*4]*q

q[

(

13

+

(

9

+

39

*64)*64*2)+1]

=

θ1[

1

+n*4] * θ2[

4

+n*4] x θ3[

3

+n*4]*q

q[

(

14

+

(

9

+

39

*64)*64*2)+1]

=

θ1[

2

+n*4] * θ2[

4

+n*4] x θ3[

3

+n*4]*q

q[

(

15

+

(

9

+

39

*64)*64*2)+1]

=

θ1[

3

+n*4] * θ2[

4

+n*4] x θ3[

3

+n*4]*q

q[

(

16

+

(

9

+

39

*64)*64*2)+1]

=

θ1[

4

+n*4] * θ2[

4

+n*4] x θ3[

3

+n*4]*q

q[

(

13

+

(

6

+

40

*64)*64*2)+1]

=

θ1[

1

+n*4] * θ2[

1

+n*4] x θ3[

4

+n*4]*q

q[

(

14

+

(

6

+

40

*64)*64*2)+1]

=

θ1[

2

+n*4] * θ2[

1

+n*4] x θ3[

4

+n*4]*q

q[

(

15

+

(

6

+

40

*64)*64*2)+1]

=

θ1[

3

+n*4] * θ2[

1

+n*4] x θ3[

4

+n*4]*q

q[

(

16

+

(

6

+

40

*64)*64*2)+1]

=

θ1[

4

+n*4] * θ2[

1

+n*4] x θ3[

4

+n*4]*q

q[

(

13

+

(

7

+

40

*64)*64*2)+1]

=

θ1[

1

+n*4] * θ2[

2

+n*4] x θ3[

4

+n*4]*q

q[

(

14

+

(

7

+

40

*64)*64*2)+1]

=

θ1[

2

+n*4] * θ2[

2

+n*4] x θ3[

4

+n*4]*q

q[

(

15

+

(

7

+

40

*64)*64*2)+1]

=

θ1[

3

+n*4] * θ2[

2

+n*4] x θ3[

4

+n*4]*q

q[

(

16

+

(

7

+

40

*64)*64*2)+1]

=

θ1[

4

+n*4] * θ2[

2

+n*4] x θ3[

4

+n*4]*q

q[

(

13

+

(

8

+

40

*64)*64*2)+1]

=

θ1[

1

+n*4] * θ2[

3

+n*4] x θ3[

4

+n*4]*q

q[

(

14

+

(

8

+

40

*64)*64*2)+1]

=

θ1[

2

+n*4] * θ2[

3

+n*4] x θ3[

4

+n*4]*q

q[

(

15

+

(

8

+

40

*64)*64*2)+1]

=

θ1[

3

+n*4] * θ2[

3

+n*4] x θ3[

4

+n*4]*q

q[

(

16

+

(

8

+

40

*64)*64*2)+1]

=

θ1[

4

+n*4] * θ2[

3

+n*4] x θ3[

4

+n*4]*q

q[

(

13

+

(

9

+

40

*64)*64*2)+1]

=

θ1[

1

+n*4] * θ2[

4

+n*4] x θ3[

4

+n*4]*q

q[

(

14

+

(

9

+

40

*64)*64*2)+1]

=

θ1[

2

+n*4] * θ2[

4

+n*4] x θ3[

4

+n*4]*q

q[

(

15

+

(

9

+

40

*64)*64*2)+1]

=

θ1[

3

+n*4] * θ2[

4

+n*4] x θ3[

4

+n*4]*q

q[

(

16

+

(

9

+

40

*64)*64*2)+1]

=

θ1[

4

+n*4] * θ2[

4

+n*4] x θ3[

4

+n*4]*q

Program output for charge grid composition:

Inside fill_charge_grid() function in charge_grid.c

N=20739, order=4, nfft1=64, nfft2=64, nfft3=64

nfftdim1=65, nfftdim2=65, nfftdim3=65

theta1_dim1=4, theta2_dim1=4, theta3_dim1=4

q_dim2=65, q_dim3=65

n=1, fr1=32.000000, fr2=34.308041, fr3=33.426081, q[30][32][31] = -0.004407

n=1, fr1=32.000000, fr2=34.308041, fr3=33.426081, q[31][32][31] = -0.017630

n=1, fr1=32.000000, fr2=34.308041, fr3=33.426081, q[32][32][31] = -0.004407

n=1, fr1=32.000000, fr2=34.308041, fr3=33.426081, q[33][32][31] = 0.000000

n=1, fr1=32.000000, fr2=34.308041, fr3=33.426081, q[30][33][31] = -0.046805

n=1, fr1=32.000000, fr2=34.308041, fr3=33.426081, q[31][33][31] = -0.187218

n=1, fr1=32.000000, fr2=34.308041, fr3=33.426081, q[32][33][31] = -0.046805

n=1, fr1=32.000000, fr2=34.308041, fr3=33.426081, q[33][33][31] = 0.000000

n=1, fr1=32.000000, fr2=34.308041, fr3=33.426081, q[30][34][31] = -0.028217

n=1, fr1=32.000000, fr2=34.308041, fr3=33.426081, q[31][34][31] = -0.112868

n=1, fr1=32.000000, fr2=34.308041, fr3=33.426081, q[32][34][31] = -0.028217

n=1, fr1=32.000000, fr2=34.308041, fr3=33.426081, q[33][34][31] = 0.000000

n=1, fr1=32.000000, fr2=34.308041, fr3=33.426081, q[30][35][31] = -0.000389

:

n=1, fr1=32.000000, fr2=34.308041, fr3=33.426081, q[33][34][32] = 0.000000



n=1, fr1=32.000000, fr2=34.308041, fr3=33.426081, q[30][35][32] = -0.006465

n=1, fr1=32.000000, fr2=34.308041, fr3=33.426081, q[31][35][32] = -0.025858

n=1, fr1=32.000000, fr2=34.308041, fr3=33.426081, q[32][35][32] = -0.006465

n=1, fr1=32.000000, fr2=34.308041, fr3=33.426081, q[33][35][32] = 0.000000

n=1, fr1=32.000000, fr2=34.308041, fr3=33.426081, q[30][32][33] = -0.060405

n=1, fr1=32.000000, fr2=34.308041, fr3=33.426081, q[31][32][33] = -0.241621

n=1, fr1=32.000000, fr2=34.308041, fr3=33.426081, q[32][32][33] = -0.060405

n=1, fr1=32.000000, fr2=34.308041, fr3=33.426081, q[33][32][33] = 0.000000

n=1, fr1=32.000000, fr2=34.308041, fr3=33.426081, q[30][33][33] = -0.641467

:

n=1, fr1=32.000000, fr2=34.308041, fr3=33.426081, q[33][35][33] = 0.000000



n=1, fr1=32.000000, fr2=34.308041, fr3=33.426081, q[30][32][34] = -0.001803

n=1, fr1=32.000000, fr2=34.308041, fr3=33.426081, q[31][32][34] = -0.007214

n=1, fr1=32.000000, fr2=34.308041, fr3=33.426081, q[32][32][34] = -0.001803

n=1, fr1=32.000000, fr2=34.308041, fr3=33.426081, q[33][32][34] = 0.000000

n=1, fr1=32.000000, fr2=34.308041, fr3=33.426081, q[30][33][34] = -0.019152

n=1, fr1=32.000000, fr2=34.308041, fr3=33.426081, q[31][33][34] = -0.076608

n=1, fr1=32.000000, fr2=34.308041, fr3=33.426081, q[32][33][34] = -0.019152

n=1, fr1=32.000000, fr2=34.308041, fr3=33.426081, q[33][33][34] = 0.000000

n=1, fr1=32.000000, fr2=34.308041, fr3=33.426081, q[30][34][34] = -0.011546

n=1, fr1=32.000000, fr2=34.308041, fr3=33.426081, q[31][34][34] = -0.046185

n=1, fr1=32.000000, fr2=34.308041, fr3=33.426081, q[32][34][34] = -0.011546

n=1, fr1=32.000000, fr2=34.308041, fr3=33.426081, q[33][34][34] = 0.000000

n=1, fr1=32.000000, fr2=34.308041, fr3=33.426081, q[30][35][34] = -0.000159

n=1, fr1=32.000000, fr2=34.308041, fr3=33.426081, q[31][35][34] = -0.000636

n=1, fr1=32.000000, fr2=34.308041, fr3=33.426081, q[32][35][34] = -0.000159

n=1, fr1=32.000000, fr2=34.308041, fr3=33.426081, q[33][35][34] = 0.000000

:

n=3, fr1=0.000000, fr2=34.831113, fr3=32.683046, q[62][32][30] = 0.000005

n=3, fr1=0.000000, fr2=34.831113, fr3=32.683046, q[63][32][30] = 0.000022

n=3, fr1=0.000000, fr2=34.831113, fr3=32.683046, q[64][32][30] = 0.000005

n=3, fr1=0.000000, fr2=34.831113, fr3=32.683046, q[1][32][30] = 0.000000

n=3, fr1=0.000000, fr2=34.831113, fr3=32.683046, q[62][33][30] = 0.001768

n=3, fr1=0.000000, fr2=34.831113, fr3=32.683046, q[63][33][30] = 0.007071

n=3, fr1=0.000000, fr2=34.831113, fr3=32.683046, q[64][33][30] = 0.001768

n=3, fr1=0.000000, fr2=34.831113, fr3=32.683046, q[1][33][30] = 0.000000

n=3, fr1=0.000000, fr2=34.831113, fr3=32.683046, q[62][34][30] = 0.004306

:

n=3, fr1=0.000000, fr2=34.831113, fr3=32.683046, q[63][35][31] = 0.174262



n=3, fr1=0.000000, fr2=34.831113, fr3=32.683046, q[64][35][31] = 0.043565

n=3, fr1=0.000000, fr2=34.831113, fr3=32.683046, q[1][35][31] = 0.000000

n=3, fr1=0.000000, fr2=34.831113, fr3=32.683046, q[62][32][32] = 0.000592

n=3, fr1=0.000000, fr2=34.831113, fr3=32.683046, q[63][32][32] = 0.002368

n=3, fr1=0.000000, fr2=34.831113, fr3=32.683046, q[64][32][32] = 0.000592

n=3, fr1=0.000000, fr2=34.831113, fr3=32.683046, q[1][32][32] = 0.000000

n=3, fr1=0.000000, fr2=34.831113, fr3=32.683046, q[62][33][32] = 0.193902

n=3, fr1=0.000000, fr2=34.831113, fr3=32.683046, q[63][33][32] = 0.775608

n=3, fr1=0.000000, fr2=34.831113, fr3=32.683046, q[64][33][32] = 0.193902

n=3, fr1=0.000000, fr2=34.831113, fr3=32.683046, q[1][33][32] = 0.000000

n=3, fr1=0.000000, fr2=34.831113, fr3=32.683046, q[62][34][32] = 0.472327

n=3, fr1=0.000000, fr2=34.831113, fr3=32.683046, q[63][34][32] = 1.889307

n=3, fr1=0.000000, fr2=34.831113, fr3=32.683046, q[64][34][32] = 0.472327

n=3, fr1=0.000000, fr2=34.831113, fr3=32.683046, q[1][34][32] = 0.000000

n=3, fr1=0.000000, fr2=34.831113, fr3=32.683046, q[62][35][32] = 0.070553

n=3, fr1=0.000000, fr2=34.831113, fr3=32.683046, q[63][35][32] = 0.282213

n=3, fr1=0.000000, fr2=34.831113, fr3=32.683046, q[64][35][32] = 0.070553

n=3, fr1=0.000000, fr2=34.831113, fr3=32.683046, q[1][35][32] = 0.000000

n=3, fr1=0.000000, fr2=34.831113, fr3=32.683046, q[62][32][33] = 0.000054

n=3, fr1=0.000000, fr2=34.831113, fr3=32.683046, q[63][32][33] = 0.000216

n=3, fr1=0.000000, fr2=34.831113, fr3=32.683046, q[64][32][33] = 0.000054

n=3, fr1=0.000000, fr2=34.831113, fr3=32.683046, q[1][32][33] = 0.000000

n=3, fr1=0.000000, fr2=34.831113, fr3=32.683046, q[62][33][33] = 0.017691

n=3, fr1=0.000000, fr2=34.831113, fr3=32.683046, q[63][33][33] = 0.070766

n=3, fr1=0.000000, fr2=34.831113, fr3=32.683046, q[64][33][33] = 0.017691

n=3, fr1=0.000000, fr2=34.831113, fr3=32.683046, q[1][33][33] = 0.000000

n=3, fr1=0.000000, fr2=34.831113, fr3=32.683046, q[62][34][33] = 0.043095

n=3, fr1=0.000000, fr2=34.831113, fr3=32.683046, q[63][34][33] = 0.172378

n=3, fr1=0.000000, fr2=34.831113, fr3=32.683046, q[64][34][33] = 0.043095

n=3, fr1=0.000000, fr2=34.831113, fr3=32.683046, q[1][34][33] = 0.000000

n=3, fr1=0.000000, fr2=34.831113, fr3=32.683046, q[62][35][33] = 0.006437

n=3, fr1=0.000000, fr2=34.831113, fr3=32.683046, q[63][35][33] = 0.025749

n=3, fr1=0.000000, fr2=34.831113, fr3=32.683046, q[64][35][33] = 0.006437

n=3, fr1=0.000000, fr2=34.831113, fr3=32.683046, q[1][35][33] = 0.000000


3.2.8 Computation of F-1(Q) using 3D-IFFT & Q Array Update

The 3D IFFT is done in fft_back() function in the fftcall.c. It invokes the dynamic library libpubfft.a to perform the inverse 3D-FFT operation. The transformed elements are stored in the original charge array; hence, this is called in-place FFT operation.



There may be a confusion on why the program is calculating the inverse FFT when the equation 4.7 shows that the forward FFT F(Q)(m1, m2, m3) is needed. The reason is that the F(Q)(-m1, -m2, -m3) is mathematically equivalent to F-1(Q) except for the scaling factor. When one thinks about this, the multiplication of F(Q)(m1, m2, m3) and F(Q)(-m1, -m2, -m3) should result in something like:

(X*e) * (X*e-iθ)

=> (Xcosθ + iXsinθ) * (Xcosθ - iXsinθ)

=> X2cos2θ + X2sin2θ
On the other hand, if you calculate the F-1(Q), take out the real and imaginary component and then square them individually, you will get something the equivalent result as if you were calculating the F(Q)(m) * F(Q)(-m):

(X*e-iθ)

=> Real = Xcosθ Imaginary = Xsinθ

=> Real2 + Imaginary2

=> X2cos2θ + X2sin2θ
In the software implementation, the product of F(Q)(m1, m2, m3)*F(Q)(-m1, -m2, -m3) is calculated as “struc2 = d_1 * d_1 + d_2 * d_2”; therefore, either forward or inverse FFT would yield the same product (providing that the FFT function does not perform the 1/NumFFTpt scaling). The reason to implement the inverse FFT instead of forward one is that the reciprocal force calculation needs the F-1(Q).
In do_pmesh_kspace() function in ffcall.c...

fft_back(&q[1], &fftable[1], &ffwork[1], nfft1, nfft2, nfft3, &nfftdim1,

&nfftdim2, &nfftdim3, &nfftable, &nffwork);
In the fft_back() function in fftcalls.c…



Download 1.53 Mb.

Share with your friends:
1   ...   17   18   19   20   21   22   23   24   25




The database is protected by copyright ©ininet.org 2020
send message

    Main page