In this section, the steps to calculate the SPME reciprocal sum are described. These steps follow closely with the steps used in the software SPME program written by A. Toukmaji [8]. For a detailed discussion on the software implementation, please refer to Appendix B: Software Implementation of the SPME Reciprocal Sum Calculation.
Table 2 describes the steps that the RSCE takes to calculate the reciprocal energy and force. The step number is indicated in the architectural diagram of the RSCE (Figure 12) as dashed circles for easy reference. In Table 2, K is the grid size, N is the number of particles, and P is the interpolation order. The steps outlined assumed that there is one RSCE working with one host computer. The strategy is to let the host computer perform those complicated calculations that only need to be performed once at startup or those with complexity of O(N).
By analyzing the complexity order of each step shown in the leftmost column of Table 2, it can be concluded that the majority of the computation time is spent in steps 7 (mesh composition), 8 (3D-FFT), 10 (3D-IFFT), and 11 (force computation). The computational complexity of steps 8 and 10 depends on the mesh size (K_{1}, K_{2}, and K_{3}), while that of steps 7 and 11 depend mainly on the number of particles N and the interpolation order P. Both the number of grid points (mesh size) and the interpolation order affect the accuracy of the energy and force calculations. That is, more grid points and a higher interpolation order would lead to a more accurate result. Furthermore, the number of particles N and the total number of grid points K_{1}×K_{2}×K_{3} should be directly proportional.
Table 2 - Steps for SPME Reciprocal Sum Calculation

Computes the B-Spline coefficients and their derivatives for all possible lookup fractional coordinate values and stores them into the BLM memory.

2^{Precision_coord}

3

startup

Host

Computes the energy terms, etm(m_{1}, m_{2}, m_{3}) for all grid points that are necessary in energy calculation and stores them into the ETM memory.

K_{1}×K_{2}×K_{3}

4

repeat

Host

Loads or updates the x, y, and z Cartesian coordinates of all particles.

3×N

5

repeat

Host

Computes scaled and shifted fractional coordinates for all particles and load them into the upper half of the PIM memory. Also zeros all entries in the QMMR and the QMMI memories.

3×N

6

repeat

FPGA

BCC

Performs lookup and computes the B-Spline coefficients for all particles for x, y, and z directions. The value of the B-Spline coefficients depends on the fractional part of the coordinates of the particles.

3×N×P

7

repeat

FPGA

MC

Composes the grid charge array using the computed coefficients. The grid point location is derived from the integer part of the coordinate. Calculated values are stored in the QMMR memory.

N×P×P×P

8

repeat

FPGA

3D-FFT

Computes F^{-1}(Q) by performing the inverse FFT on each row for each direction. The transformed values are stored in the QMMR and the QMMI memories.

K_{1}×K_{2}×K_{3}

×

Log(K_{1}×K_{2}×K_{3})

9

repeat

FPGA

EC

Goes through each grid point to compute the reciprocal energy and update the QMM memories. It uses the grid index to lookup the values of the energy terms.

K_{1}×K_{2}×K_{3}

10

repeat

FPGA

BCC

Performs lookup and computes the B-Spline coefficients and the corresponding derivatives for all particles for all x, y, and z directions.

2×3×N×P

11

repeat

FPGA

3D-FFT

Computes the forward F(Q) and loads the values into grid charge array QMMR. In this step, the QMMI should contain all zeros.

K_{1}×K_{2}×K_{3}

×

Log(K_{1}×K_{2}×K_{3})

12

repeat

FPGA

FC

Goes through all particles, identifies their interpolated grid points, and computes the reciprocal forces for x, y, and z directions. The forces will be stored in the lower half of the PIM memory.