6.4.Characteristic of the RSCE Speedup
In Section 4.3.1 and Section 4.3.2, it is estimated that the singleQMM RSCE can provide a speedup ranging from 3x to 14x while the usage of a multiQMM is able to provide an additional speedup with an upper bound of N_{Q} which represents the number of QMM the RSCE uses. In both the singleQMM and the multiQMM cases, the magnitude of RSCE speedup varies with the values of K, P, and N. Therefore, depending on the simulation setting (K, P, and N), the total speedup of using the multiQMM RSCE can range from the worst case of 3x to the best case of slightly less than N_{Q}×14x. The following table shows how the simulation setting affects the singleQMM and the multiQMM speedup.
Table 18  Variation of Speedup with different N, P and K.

SingleQMM Speedup

MultiQMM Speedup

MD Sim. Accuracy

Number of Particles: N ↑

↓

↑

N/A

Interpolation Order: P ↑

↓

↑

↑

Grid Size K ↑

↑

↓

↑

As indicated in the table, the singleQMM RSCE provides the best speedup when N is small, P is small, and K is large. On the other hand, the multiQMM RSCE provides the best speedup when N is large, P is large and K is small. In a typical MD simulation, the user would choose the grid size K and the interpolation order P to control the simulation accuracy. A higher grid density and a higher interpolation order P would lead to a more accurate MD simulation. Therefore, to obtain an accurate simulation result, the value of K and P should be increased correspondingly. Furthermore, as stated by Darden [2], in an MD simulation using the PME algorithm, for any desirable accuracy, the total number of grid points (K_{X}×K_{Y}×K_{Z}) should be of the same order as the number of particles N. Hence, the values K, P, and N should be related to one another.
When the N is of the same order as (K_{X}×K_{Y}×K_{Z}), the usage of N_{Q}QMMs does provide a consistent speedup of close to N_{Q}. As shown in Figure 52, when the value K_{X}×K_{Y}×K_{Z} is of the same order as N, the N_{Q}=4 QMMs speedup is only affected by the interpolation order P. For example, with an interpolation order P of 4, the usage of four QMMs provides a speedup of 3.58. In Figure 52, the N is simply calculated as K×K×K.
Figure 52  Effect of the Interpolation Order P on MultiQMM RSCE Speedup
(When K×K×K is of the Same Order as N)
Based on the assumption that the number of grid points is of the same order as the number of particles and the simulation results presented in Table 16 and Figures 49 to 52, the worst case speedup of a singleQMM RSCE can be estimated to be 3fold while that of the multiQMM RSCE can be approximated to be around (N_{Q}1)×3. The factor (N_{Q}1) is approximated by locating the worst case speedup when K×K×K is of the same order as N in Figure 49. For example, when K=32 (K×K×K=32768), the range of N used to search for the worst case speedup is from 10000 to 99999.
To show the overall speedup of the multiQMM RSCE against the software implementation running at the 2.4GHz Intel P4 computer, the singleQMM RSCE speedup data from Table 16 and the data used to plot Figures 49 and 50 are used to estimate the overall speedup. The 4QMM RSCE speedup results are shown in Table 19. As seen in the table, the 4QMM RSCE can provide a 14x to 20x speedup over the software implementation (based on the simulated cases). One thing worthwhile to notice is that when there is no relationship among the values of K, P and N, although the multiQMM increases the speedup lower bound from 3x to 14x, it does not increase the maximum speedup significantly (increase from 14x to 20x). This can be explained by the fact that the singleQMM RSCE speedup and the multiQMM RSCE speedup vary differently with the three simulation parameters (K, P, and N). The difference is shown in Table 18.
Table 19  Speedup Estimation (FourQMM RSCE vs. P4 SPME)

N

P

K

SingleQMM
Speedup against
Software Running @ Intel P4

FourQMM
Speedup against
SingleQMM

FourQMM
Speedup against
Software

Speedup

2000

4

32

8.32

1.99

16x

Speedup

2000

4

64

12.65

1.44

18x

Speedup

2000

4

128

14.65

1.37

20x

Speedup

2000

8

32

4.60

3.26

15x

Speedup

2000

8

64

8.92

1.99

17x

Speedup

2000

8

128

12.40

1.44

17x








Speedup

20000

4

32

5.44

3.37

18x

Speedup

20000

4

64

6.97

2.10

14x

Speedup

20000

4

128

10.70

1.46

15x

Speedup

20000

8

32

3.72

3.90

14x

Speedup

20000

8

64

5.17

3.37

17x

Speedup

20000

8

128

7.94

2.10

16x

Share with your friends: 