The calculations in the RSCE are performed using fixed-point arithmetic. The advantage is that fixed-point arithmetic, when compared with floating-point arithmetic, simplifies the logic design, which in turn allows the FPGA to operate at a higher frequency. This high frequency is crucial to maximize speedup against the SPME software implementation. However, the accuracy and the dynamic range of fixed-point arithmetic is less than that of floating-point arithmetic. Therefore, to obtain accurate results and to avoid undesired calculation overflow or underflow, the precision used in each step of the calculations must be well chosen. In this section, an estimate on the required precision for each calculation stage is provided. This section first states the error bound requirement in MD simulations. Then, it discusses the precision settings of the variables used in the input stage, intermediate computation stage, and the output stages of the reciprocal sum calculation.

4.6.1.MD Simulation Error Bound

To derive the precision requirement for various variables used in the reciprocal sum calculation, the maximum relative error allowed for reciprocal energy and force calculations must be set. As stated by Narumi [28], it should be appropriate and safe to set the relative error bound goal for the reciprocal energy and force computations to be 10^{-4.5}. However, to further ensure the accuracy of the computations in the RSCE is adequate to carry out an energy conserved MD simulation, the error bound goal for the RSCE is set to be 10^{-5}. This goal can be viewed as the optimum goal of the RSCE design. However, due to the resource limitations of the XCV2000 FPGA and the precision limitation of the Xilinx FFT LogiCore, the current implementation of the RSCE is not able to achieve this precision goal. In Chapter 5, the precision settings to achieve this goal are studied with the SystemC RSCE model. Furthermore, to observe the effect of the RSCE limited calculation precision on the MD simulation result, several MD simulations are carried out and the stability of these simulations is monitored. The stability of an MD simulation can be evaluated as the magnitude of the relative root mean square (RMS) fluctuation in the total system energy at every timestep within the simulation span [25]. Hence, all the precision requirements listed in this section aim to provide a starting point for implementing the RSCE and are limited by the hardware resource availability.

The precision settings of the input variables are shown in Table 3. Their precisions are set to get an absolute representation error of approximately 10^{-7}. In Table 3, the precision is represented by a {A.B} notation in which A is the number of binary digits in front of the binary point; while B is the number of binary digits behind the binary point. Furthermore, when the variable is a signed fixed-point (SFXP) number, the most significant bit (MSB) is used as a sign bit.

Table 3 - Precision Requirement of Input Variables

Symbol

Description

Prec.

Loc.

#

Comment

Int_x

Integer part of the scaled and shift fractional coordinates.

{8.0}

FXP

PIM ZBT

N

Maximum simulation box size supported is 128 x 128 x 128.

Frac_x

Fractional part of the scaled and shift fractional coordinates.

{0.21}

FXP

PIM ZBT

N

Absolute representation error is ~5x10^{-7}.

q_{1..N}

Charge of the particles.

{5.21}

SFXP

PIM ZBT

N

Absolute representation error is ~5x10^{-7}.

K_{1,2,3}

Number of mesh points.

{8.0}

FXP

Reg.

1

Maximum grid size can be 128 x 128 x 128.

P

Interpolation Order.

{4.0}

FXP

Reg.

1

Maximum represent-able even order is 14.

N

Number of Particles

{15.0}

FXP

Reg.

1

The maximum number of particles is 32768.

f(θ)

Function values of the B-Spline coefficients at the predefined lookup coordinate points, which are 2^{-D1} apart. D1 is the width of the lookup key.

{1.31}

SFXP

BLM ZBT

2^{D1}

Absolute representation error is 4.66x10^{-10}.

m(θ)

= dθ

Slope values of the B-Spline coefficients at the predefined lookup coordinate points.

{1.31}

SFXP

BLM ZBT

2^{D1}

Absolute representation error is 4.66x10^{-10}.

m(dθ)

Slope values of the derivative of the B-Spline coefficients at the lookup coordinate points.

{2.30}

SFXP

BLM ZBT

2^{D1}

Absolute representation error is 9.31x10^{-10}.

eterm

The energy term as defined in section 3.4.2.

{0.32}

FXP

ETM ZBT

K_{1}×

K_{2}×

K_{3}

Absolute representation error is 2.33x10^{-10}.

D1

The lookup fractional coordinate interval for the B-Spline coefficient. That is, the width of the lookup key.

{D1.0}

FXP

N/A

N/A

The value of D2 is listed in the BCC block functional description in section 3.8.

D2

The residue part of the fractional coordinate that does not constitute the lookup key. That is, D2 = (Frac_x - D1)

{D2.0}

FXP

N/A

N/A

The value of D1 is listed in the BCC block functional description in section 3.8.

4.6.3.Precision of Intermediate Variables

Table 4 shows the precision requirement of the variables used during the energy and force calculations. The width of the multipliers used in different stages of computations is given in the functional blocks description section (Section 3.8). Their width is set such that the result of the multiplications meets the required precision given in Table 4. In what follows, the reason for the precision settings is explained. The operation steps indicated in the explanation is with respect to the steps listed in Table 2.

Table 4 - Precision Requirement of Intermediate Variables

Symbol

Description

Precision

Loc.

Steps

θ_{X}_{i}, θ_{Y}_{i}, θ_{Z}_{i}

The B-Spline coefficients of the particles.

{1.21} SFXP

BCC

6, 7, 12

dθ_{X}_{i}, dθ_{Y}_{i}, dθ_{Z}_{i}

The derivatives of the B-Spline coefficients.

{1.21} SFXP

BCC

12

FFT_Q

The precision used during the 3D-FFT calculation. This is limited by the precision of the Xilinx FFT LogiCore which only supports 24-bit signed precision.

{14.10} SFXP

3DFFT

8, 11

Q[K_{1}][K_{2}][K_{3}]

Q charge grid. Precision of the charge grid is limited by FFT_Q.

{14.18} SFXP

QMM

7-12

At step 6 and step 10, the B-Spline coefficients and their corresponding derivatives for the x, y, and z directions for each particle are evaluated using a lookup table. Since the B-Spline coefficients for each particle are positive fractions that sum to one, no integer part and no sign bit are needed to represent the coefficients. However, to simplify the hardware design, a signed fixed-point number is used to represent the coefficients so that the same circuitry can be used to calculate the possibly negative derivatives as well. Since the input lookup coordinate has a 21-bit fractional part, it is sufficient to have the same precision in the B-Spline calculations. More discussion on the B-Spline coefficient lookup implementation can be found in section 3.8.

At step 7, the charge grid Q is composed by going through each particle and adding each particle’s grid points contribution (a charge is interpolated to P×P×P grid points) to a running sum that is stored at the Q[k_{1}][k_{2}][k_{3}] array entry. Since the simulation system is neutral (a prerequisite for the Ewald Summation) and the charges should not be concentrating on a particular grid point, an 8-bit magnitude (0-255) should be adequate to represent the integer part of the charge accumulation. However, to avoid overflow in the 3D-FFT step, 13 bits are assigned to the integer part of the grid value representation and 18 bits are left to the represent the fractional part. The reason for allocating 13 bits to the integer part is explained in the next paragraph.

At step 8, an inverse 3D-FFT is performed on the charge grid Q. The 3D-FFT block instantiates a Xilinx FFT LogiCore with 24-bit signed precision to perform the FFT. The precision of the LogiCore limits the precision of the charge grid transformation. Furthermore, since the SPME needs a three dimensional FFT, the dynamic range expansion, which happens in each butterfly stage of the FFT computation, would be significant [38]. Therefore, enough bits must be assigned to the integer part of the charge grid Q to avoid overflow during the FFT calculation. Based on a single-timestep MD simulation of a molecular system with 6913 water molecules, it is assumed that a 13-bit integer part is sufficient to avoid overflow in the 3D-FFT stage. Should any overflow happen during the FFT calculation, an overflow interrupt will be asserted in the RSCE status register. The water molecular system is described by a PDB file provided with the software SPME package [8]. Since 13 bits are allocated to the integer part and one bit is used to represent the sign, only 10 bits will be allocated to the fractional part of the charge grid value during the 3D-FFT operation.

At step 9, the reciprocal energy is computed by summing the energy contribution of all grid points. The energy term for all grid points are pre-computed and stored in the ETM memory. The eterm[k_{1}][k_{2}][k_{3}] is a fractional number and 32 bits are used to represent it. Although the charge grid only has a precision of 10 bits after the 3D-FFT operation, the 32-bit representation of the eterm provides easy portability when a higher precision FFT core becomes available.

At step 11, a forward 3D-FFT is performed. Again, dynamic range expansion is expected and 13 bits are allocated to represent the integer part of the charge grid to avoid overflow.

At step 12, the forces F_{xi}, F_{yi}, F_{zi} for each charge are computed by going through all grid points that the charge has been interpolated to. In this step, the derivatives of the B-Spline coefficients are used. Similar to the representation of the B-Spline coefficients, 21 bits are used to represent the fractional part of the derivatives.

4.6.4.Precision of Output Variables

Table 5 summaries the preliminary precision settings of the output variables. Their precisions are set to minimize the chance of overflow in typical MD simulations and to carry all the precision of the calculations. The calculated reciprocal energy is stored in a register that the host can read from; while the forces are stored in the lower half the PIM memory.

Table 5 - Precision Requirement of Output Variables