An fpga implementation of the Smooth Particle Mesh Ewald Reciprocal Sum Compute Engine (rsce)


Characteristic of the RSCE Speedup



Download 1.53 Mb.
Page14/25
Date09.08.2017
Size1.53 Mb.
1   ...   10   11   12   13   14   15   16   17   ...   25

6.4.Characteristic of the RSCE Speedup


In Section 4.3.1 and Section 4.3.2, it is estimated that the single-QMM RSCE can provide a speedup ranging from 3x to 14x while the usage of a multi-QMM is able to provide an additional speedup with an upper bound of NQ which represents the number of QMM the RSCE uses. In both the single-QMM and the multi-QMM cases, the magnitude of RSCE speedup varies with the values of K, P, and N. Therefore, depending on the simulation setting (K, P, and N), the total speedup of using the multi-QMM RSCE can range from the worst case of 3x to the best case of slightly less than NQ×14x. The following table shows how the simulation setting affects the single-QMM and the multi-QMM speedup.

Table 18 - Variation of Speedup with different N, P and K.




Single-QMM Speedup

Multi-QMM Speedup

MD Sim. Accuracy

Number of Particles: N ↑





N/A

Interpolation Order: P ↑







Grid Size K ↑






As indicated in the table, the single-QMM RSCE provides the best speedup when N is small, P is small, and K is large. On the other hand, the multi-QMM RSCE provides the best speedup when N is large, P is large and K is small. In a typical MD simulation, the user would choose the grid size K and the interpolation order P to control the simulation accuracy. A higher grid density and a higher interpolation order P would lead to a more accurate MD simulation. Therefore, to obtain an accurate simulation result, the value of K and P should be increased correspondingly. Furthermore, as stated by Darden [2], in an MD simulation using the PME algorithm, for any desirable accuracy, the total number of grid points (KX×KY×KZ) should be of the same order as the number of particles N. Hence, the values K, P, and N should be related to one another.


When the N is of the same order as (KX×KY×KZ), the usage of NQ-QMMs does provide a consistent speedup of close to NQ. As shown in Figure 52, when the value KX×KY×KZ is of the same order as N, the NQ=4 QMMs speedup is only affected by the interpolation order P. For example, with an interpolation order P of 4, the usage of four QMMs provides a speedup of 3.58. In Figure 52, the N is simply calculated as K×K×K.


Figure 52 - Effect of the Interpolation Order P on Multi-QMM RSCE Speedup

(When K×K×K is of the Same Order as N)
Based on the assumption that the number of grid points is of the same order as the number of particles and the simulation results presented in Table 16 and Figures 49 to 52, the worst case speedup of a single-QMM RSCE can be estimated to be 3-fold while that of the multi-QMM RSCE can be approximated to be around (NQ-1)×3. The factor (NQ-1) is approximated by locating the worst case speedup when K×K×K is of the same order as N in Figure 49. For example, when K=32 (K×K×K=32768), the range of N used to search for the worst case speedup is from 10000 to 99999.
To show the overall speedup of the multi-QMM RSCE against the software implementation running at the 2.4GHz Intel P4 computer, the single-QMM RSCE speedup data from Table 16 and the data used to plot Figures 49 and 50 are used to estimate the overall speedup. The 4-QMM RSCE speedup results are shown in Table 19. As seen in the table, the 4-QMM RSCE can provide a 14x to 20x speedup over the software implementation (based on the simulated cases). One thing worthwhile to notice is that when there is no relationship among the values of K, P and N, although the multi-QMM increases the speedup lower bound from 3x to 14x, it does not increase the maximum speedup significantly (increase from 14x to 20x). This can be explained by the fact that the single-QMM RSCE speedup and the multi-QMM RSCE speedup vary differently with the three simulation parameters (K, P, and N). The difference is shown in Table 18.

Table 19 - Speedup Estimation (Four-QMM RSCE vs. P4 SPME)




N

P

K

Single-QMM

Speedup against

Software Running @ Intel P4

Four-QMM

Speedup against

Single-QMM

Four-QMM

Speedup against

Software

Speedup

2000

4

32

8.32

1.99

16x

Speedup

2000

4

64

12.65

1.44

18x

Speedup

2000

4

128

14.65

1.37

20x

Speedup

2000

8

32

4.60

3.26

15x

Speedup

2000

8

64

8.92

1.99

17x

Speedup

2000

8

128

12.40

1.44

17x






















Speedup

20000

4

32

5.44

3.37

18x

Speedup

20000

4

64

6.97

2.10

14x

Speedup

20000

4

128

10.70

1.46

15x

Speedup

20000

8

32

3.72

3.90

14x

Speedup

20000

8

64

5.17

3.37

17x

Speedup

20000

8

128

7.94

2.10

16x





Download 1.53 Mb.

Share with your friends:
1   ...   10   11   12   13   14   15   16   17   ...   25




The database is protected by copyright ©ininet.org 2020
send message

    Main page