An fpga implementation of the Smooth Particle Mesh Ewald Reciprocal Sum Compute Engine (rsce)

Characteristic of the RSCE Speedup

Download 1.53 Mb.

Page	14/25
Date	09.08.2017
Size	1.53 Mb.
	#29150

1 ... 10 11 12 13 14 15 16 17 ... 25

6.4.Characteristic of the RSCE Speedup

In Section 4.3.1 and Section 4.3.2, it is estimated that the single-QMM RSCE can provide a speedup ranging from 3x to 14x while the usage of a multi-QMM is able to provide an additional speedup with an upper bound of N_Q which represents the number of QMM the RSCE uses. In both the single-QMM and the multi-QMM cases, the magnitude of RSCE speedup varies with the values of K, P, and N. Therefore, depending on the simulation setting (K, P, and N), the total speedup of using the multi-QMM RSCE can range from the worst case of 3x to the best case of slightly less than N_Q×14x. The following table shows how the simulation setting affects the single-QMM and the multi-QMM speedup.

Table 18 - Variation of Speedup with different N, P and K.

	Single-QMM Speedup	Multi-QMM Speedup	MD Sim. Accuracy
Number of Particles: N ↑	↓	↑	N/A
Interpolation Order: P ↑	↓	↑	↑
Grid Size K ↑	↑	↓	↑

As indicated in the table, the single-QMM RSCE provides the best speedup when N is small, P is small, and K is large. On the other hand, the multi-QMM RSCE provides the best speedup when N is large, P is large and K is small. In a typical MD simulation, the user would choose the grid size K and the interpolation order P to control the simulation accuracy. A higher grid density and a higher interpolation order P would lead to a more accurate MD simulation. Therefore, to obtain an accurate simulation result, the value of K and P should be increased correspondingly. Furthermore, as stated by Darden [2], in an MD simulation using the PME algorithm, for any desirable accuracy, the total number of grid points (K_X×K_Y×K_Z) should be of the same order as the number of particles N. Hence, the values K, P, and N should be related to one another.

When the N is of the same order as (K_X×K_Y×K_Z), the usage of N_Q-QMMs does provide a consistent speedup of close to N_Q. As shown in Figure 52, when the value K_X×K_Y×K_Z is of the same order as N, the N_Q=4 QMMs speedup is only affected by the interpolation order P. For example, with an interpolation order P of 4, the usage of four QMMs provides a speedup of 3.58. In Figure 52, the N is simply calculated as K×K×K.

Figure 52 - Effect of the Interpolation Order P on Multi-QMM RSCE Speedup

(When K×K×K is of the Same Order as N)
Based on the assumption that the number of grid points is of the same order as the number of particles and the simulation results presented in Table 16 and Figures 49 to 52, the worst case speedup of a single-QMM RSCE can be estimated to be 3-fold while that of the multi-QMM RSCE can be approximated to be around (N_Q-1)×3. The factor (N_Q-1) is approximated by locating the worst case speedup when K×K×K is of the same order as N in Figure 49. For example, when K=32 (K×K×K=32768), the range of N used to search for the worst case speedup is from 10000 to 99999.
To show the overall speedup of the multi-QMM RSCE against the software implementation running at the 2.4GHz Intel P4 computer, the single-QMM RSCE speedup data from Table 16 and the data used to plot Figures 49 and 50 are used to estimate the overall speedup. The 4-QMM RSCE speedup results are shown in Table 19. As seen in the table, the 4-QMM RSCE can provide a 14x to 20x speedup over the software implementation (based on the simulated cases). One thing worthwhile to notice is that when there is no relationship among the values of K, P and N, although the multi-QMM increases the speedup lower bound from 3x to 14x, it does not increase the maximum speedup significantly (increase from 14x to 20x). This can be explained by the fact that the single-QMM RSCE speedup and the multi-QMM RSCE speedup vary differently with the three simulation parameters (K, P, and N). The difference is shown in Table 18.

Table 19 - Speedup Estimation (Four-QMM RSCE vs. P4 SPME)

	N	P	K	Single-QMM Speedup against Software Running @ Intel P4	Four-QMM Speedup against Single-QMM	Four-QMM Speedup against Software
Speedup	2000	4	32	8.32	1.99	16x
Speedup	2000	4	64	12.65	1.44	18x
Speedup	2000	4	128	14.65	1.37	20x
Speedup	2000	8	32	4.60	3.26	15x
Speedup	2000	8	64	8.92	1.99	17x
Speedup	2000	8	128	12.40	1.44	17x

Speedup	20000	4	32	5.44	3.37	18x
Speedup	20000	4	64	6.97	2.10	14x
Speedup	20000	4	128	10.70	1.46	15x
Speedup	20000	8	32	3.72	3.90	14x
Speedup	20000	8	64	5.17	3.37	17x
Speedup	20000	8	128	7.94	2.10	16x

Directory: ~pc
~pc -> The Tablet War: Apple v s The Rest
~pc -> From: object-oriented analysis and design, Grady Booch, Addison-Wesley, 1998
~pc -> Analysis of an Industry Price War: The Tablet price war
~pc -> Biography of Pok Chi Lau Home address: 2600

Download 1.53 Mb.

Share with your friends:

1 ... 10 11 12 13 14 15 16 17 ... 25