We propose three major thrusts for the Simbios computational science research portfolio. First, we will perform research necessary to design algorithms for physics-based simulation in the areas defined by our driving biological problems, which each have specific challenges in order to achieve their aims.
At the molecular scale, the primary challenges are sufficiently fast sampling such that one can overcome the massive challenges of reaching experimentally relevant timescales with sufficiently accurate, atomistically detailed models, and the ability to gain insight from those simulations. Successful simulations of these timescales will require the combination of several different techniques, including new algorithms for massively parallel simulation on thousands to billions of cores, algorithmic advances to speed individual calculations (such as multi-body approaches), and fast implementations of key methods on novel hardware paradigms.
Simulation at the mesoscale remains a new frontier for biocomputation with untapped potential for biological discovery. To study cellular growth and function requires the development of novel methods and models that integrate vital processes across multiple length and time scales, from molecular models of growth machinery, to polymer models of cytoskeletal structures, to continuum models of membrane and envelope mechanics. Despite the fact that cellular-scale systems are often far too complex to achieve a complete model, mesoscale modeling that accounts for the biologically and physically relevant degrees of freedom can often be used to infer material parameters and to interpret and predict phenomenological behavior.
For the macroscale DBP, the primary challenges require the ability for fast dynamics, optimization, and novel analysis schemes. For example, very fast musculoskeletal dynamics would enable a monkey to control a dynamic model of a prosthetic limb (or his own limb) using the direct brain interface. Very fast large scale optimization would make feasible the solution of dynamic optimal control problems and facilitate markerless motion tracking. The prediction of motions that will arise from neural recordings is another key challenge for this DBP, which will require new methods to analyze and learn from simulation results.
While these goals have been motivated by our proposed DBPs, we stress that these challenges affect many areas in biocomputation, and success in these areas would have broad impact. Thus, to serve these goals, we have identified four key areas of computational research that are required for success in the next grant period:
Algorithms for particle-based simulation (driven by molecular and mesoscale scale DBPs)
Algorithms for multibody-dynamics simulation (driven by molecular and macroscale DBPs)
Algorithms for continuum dynamics simulation (driven by the mesoscale DBP)
Algorithms for trajectory analysis (driven by all DBPs)
Second, we will ensure that algorithms written for these areas will run on next generation multicore architectures, by collaborating with the computer scientists designing these architectures and the programming paradigms that will be used for them. As we argue in Section 1.1, our investment in these architectures is critical because they represent the next major generation of computational platforms for high performance computing. Physics-based simulation, among all areas of biomedical computation, is highly dependent on high performance computing, and optimizing performance on multicore architectures will enable new classes of problems to be solved.
Third, we will employ professional software engineering practices to ensure that the applications and software libraries that comprise the SimTK toolkit are useful, efficient and maintainable.
In the following sections, we outline our plans for biomedical computation research by (1) introducing the idea of Domain Specific Languages (DSLs) for multicore computation, (2-5) detailing the four areas of algorithmic innovation where we will focus, and (6) describing our software engineering practices for delivering high quality software to the biomedical research community.
1.1 Anticipating the future of multi-core computing
One of the major contributions of Simbios in the previous grant period, as described in the Progress Report, was the introduction of very fast molecular dynamics computations on GPUs. GPUs represent a major departure in multicore architectures. For more than three decades, the microprocessor industry has seen exponentially decreasing feature size and increasing circuit density, following Moore’s law. However, recently power constraints are resulting in stagnant clock rates, and new micro-architecture designs yield limited improvements in instructions per cycle. Instead, the industry uses continued increases in transistor counts to populate chips with an increasing number of cores – multiple independent processors. This change has profound implications for the future of compute-intensive biomedical research. In the past, each generation of hardware brought increased performance on existing applications, with no code rewrite, and enabled new, performance hungry applications. This is only true nowfor applications written to run in parallel, and written to scale to an increasing number of cores. Because Simbios is the National Center for Biomedical Computing devoted to physics-based simulation, and because simulation is exquisitely dependent on computational power, it is mandatory that Simbios take the lead in ensuring key algorithms can efficiently run on next generation computing platforms.
We have therefore teamed with the Stanford Pervasive Parallelism Lab (PPL) to identify standard programming metaphors, and implement them in parallel and scalable codes that guarantee high performance that tracks hardware evolution, while providing natural programming models that shield application programmers from the details of complex parallel codes on heterogeneous hardware architectures. The PPL pools the efforts of many leading computer scientists, mathematicians, and engineers with support from Sun Microsystems, NVIDIA, IBM, Advanced Micro Devices, Intel, and Hewlett Packard under a completely open industrial affiliates program. They are eager to work with us to define Domain Specific Languages (DSLs) that are mission-critical to physical simulation. Our experience attaining 100-1000x performance enhancement with GPU implementations was not a one-off, but forshadowed the future of our enterprise. We have engaged world-class collaborators to ensure that Simbios applications will continue to leverage next generation computing platforms.
A DSL is a high-level domain-specific programming language and programming environment. Such environments capture parallelism implicitly and optimize and map this parallelism to heterogeneous hardware “under the hood.” As examples, Matlab is a DSL for mathematics, SQL is a DSL for table-based lookup, OpenGL is a DSL for graphics. By defining the important primitives in the DSL, programmers can create robust and fast codes that are general purpose within the space of applications of the DSL. Moreover, by separating the domain science from the multi-core plumbing, one gains two major advantages: (1) it is dramatically easier to develop applications within a DSL and (2) such applications can be “future proofed” in that the hardware-specific backend can easily be updated for unanticipated changes in hardware. Figure ES.3 (in the Executive Summary) shows the DSL architecture that we anticipate for Simbios.
One should note that writing code (without using a DSL) that scales well on these hardware architectures is a monumental task, and hence the potential inconvenience of using an initially unfamiliar DSL is far outweighed by the efficiency and portability of the resulting code. Indeed, writing a DSL is typically no more work than writing a traditional parallel code, as all the same elements are represented (i.e. both the domain specific science as well as the massively parallel code for “plumbing”). While a DSL requires a language formalism, the tools available from the PPL makes this straightforward, especially for a common family of languages as proposed herein. Moreover, the choice to organize code that separates domain specific scientific code from what is necessary for parallelism itself is typically a win, as that code is easier to write, modify, and port to new architectures.
For example, Liszt is a DSL developed at Stanford and devoted to fluid flow simulation.Liszt implements PDE solvers on unstructured meshes. Here is a typical fragment of Liszt code:
Liszt abstracts the representation of the common objects and operations used in flow simulation. The mesh data structure has been completely abstracted; all mesh access is performed through standard interfaces (mesh.cell or f.edges). Field variables are associated with topological elements such as cells, faces, edges and vertices, but are accessed through methods so that their representation is not exposed. Finally, sparse matrices (e.g. matrix A in line 2 of the above code) are indexed by topological elements, not integers. This code appeals to the computational scientists because it is written in a form they understand. The DSL compiler for Liszt knows what a mesh, a cell, and a face are. As a result, the compiler has the information needed to select the data structure representations, data decomposition and the layout of field variables that are optimized for a specific architecture. Using this domain-specific approach, it is possible to generate a special version of the code for a given mesh running on a given architecture, an impossible task for a general-purpose compiler.
In the following sections, we discuss the four algorithmic areas that we will develop in the next grant period. These four areas have two facets: they are driven by the DBPs to provide key functionality to make scientific progress, but they in turn are driving the definition of DSLs (one per area) in our collaboration with the Pervasive Parallelism Lab (PPL). Our interactions with PPL will be modeled on our past GPU development experience. Algorithms will be initially implemented on CPU machines for testing and validation, and then we will engage PPL scientists to transfer the reference implementations for analysis and re-implementation in a multicore architecture. We benefit from substantial funding resources at PPL, so that we only require resources for the Director, Prof. Hanrahan, and for our development team to “hand off” the codes. We anticipate that students in engineering, informatics, and computer science will take on research assistantships developing and testing the DSL environments. In addition to the joint GPU work on molecular dynamics, supervised by Pande and Hanrahan, one of our current post-docs, Kai Kohlkoff, has written clustering codes for multicore machines under joint supervision of Altman, Pande and Hanrahan, and serves as a model for how this collaboration will work. We will measure success of our DSLs subjectively by the degree to which biomedical simulation developers (who are not expert parallel programmers) are successful in using them to build working models that execute efficiently on the supported parallel hardware platforms. We will also objectively compare the results of these simulations with reference implementations to ensure accuracy and validity. Finally, it is important to note that the four DSLs we propose to develop will form a family of languages, with a uniform syntax, so that it will be straightforward for a programmer to master all of them.