While a particle-based DSL will facilitate rapid development by programmers, platform specific backends to OpenMM will be critical for maintaining rapid execution. We propose to develop an OpenCL platform for OpenMM. OpenCL [5] is a standard for programming in heterogeneous parallel systems that has been widely adopted by all major hardware vendors. OpenCL offers the promise of a single codebase which can run optimized on multi-core, GPU, Larrabee (for HPC), and exotic new architectures yet to be released. However, in practice, OpenCL implementations need to be optimized for particular platforms. We propose to implement just-in-time (JIT) platform specific optimizations that will recode key inner loops to be optimal for the platform chosen at run time. These JIT optimizations can be done in a combinatorial scheme, with many possible choices tested and selected empirically, allowing for optimization on platforms not yet announced.
High performance on a single multicore computer or single GPU computer is important. However, modern computational science often requires much higher performance, with scalability over thousands to millions of computers. We propose a hybrid solution. First, we propose fine-grained solutions as described in Section 1.1 to scale maximally within a given computer. However, to scale to many computers (e.g. tens of thousands), current fine-grained parallelization methods have significant limitations: they typically require extremely expensive networks and in some cases cannot scale beyond a few nodes.
Moreover, even if scalability is possible on high-speed networks, such networks are often not present in typical academic lab computer clusters or commercial environments such as pharmaceutical companies. Thus, second, we propose data-parallel schemes to use multiple nodes, each node works on data in parallel and the results are assembled using a normative theory for combining. Our primary method for data-parallel based parallelization is the application of Markov State Model (MSM) theory. MSMs have allowed for scalability to hundreds of thousands of nodes [6] (as demonstrated in the Folding@home distributed computing project [7]) and its theoretical peak scalability has been estimated to be in the millions to billions of nodes (see Figure 1.1).
We will facilitate the use of MSMs and ease their creation by developing our open source software package MSMBuilder. MSMBuilder creates MSMs using new geometric and kinetic clustering algorithms. The general flow of MSMBuilder is: (1) cluster conformations into very small states called microstates and assume the high degree of structural similarity within a state implies a kinetic similarity, (2) validate that this state decomposition is Markovian and (optionally), (3) lump the microstates into some number of macrostates based on kinetic criteria and ensure that this macrostate model is Markovian. We will create tools for analyzing and visualizing the model at both the microstate and macrostate levels.
Figure 1.2. Computed binding free energies using POP-FEP versus experimentally measured binding free energies for eight FKBP ligands. The lines y=x and y=x±1.5 kcal/mol are drawn as guides. The most outlying point is associated with L12.
1.2.4 Enhance tools for Free energy calculation
Accurate free energy calculations are key for the prediction of protein-ligand binding. We will use and further improve Yank, a new code for fast and accurate prediction of free energies for protein-ligand binding. Yank was conceived by Dr. John Chodera (UC Berkeley) and Dr. Kim Branson (Vertex) while in the Pande lab, and subsequently developed in collaboration with the Pande lab and OpenMM developers. Building upon the OpenMM DSL core, we will re-engineer Yank to leverage MSMBuilder to dramatically enhance protein-ligand binding conformational sampling, building on top of the POP-FEP methods [8]. We will also add support for explicit solvent. The speed advantages from GPU acceleration will provide significant value for the molecular scale drug target dynamics DBP, where turn-around time is critical with millions of potential drug screens.
1.2.5 Support Coarse-Grain and Mesoscale simulation
As discussed below, the Mesoscale DBP on cell shape dynamics will require a combination of continuum modeling and particle-based modeling. Particle-based simulations are critical for accurate coarse-graining of the influence of molecular factors such as membrane proteins, growth machinery, and cell-wall and cell-envelope architecture. For virus-host cell interactions, a physically realistic simulation of the infection cycle requires local rearrangement of the outer membrane and associated proteins, pore formation in the cell wall, and eventual intiation of cell-wall lysis, in addition to the structure and dynamics of the viral particle itself. The use of the OpenMM DSL by the mesoscale DBP will enable the rapid introduction and testing of new classes of particle interactions (e.g. turgor pressure-mediated tension or membrane-planar limited potentials) that will lead to extensions of the DSL, in precisely the way that the DBPs should drive Core 1 activities.
Share with your friends: |