The end result of a structural simulation at all scales is a set of trajectories (i.e. the motion of structures over time). These trajectories can contain complex patterns that are not obvious on inspection and require advanced informatics analyses to extract full information. Accordingly, our final area of core computational research is the creation of new methods for simulation trajectory analysis. We will create a toolset, integrated into SimTK, for finding patterns and features of biological relevance. Trajectories track a “system” which has a number of particles (molecular simulations), bodies (multibody simulations), or parameters (continuum simulations). A simulation state [18] is the set of parameters that describe the particular 3D conformation of a system. The simulation state space (SS space) is the space of all possible valid conformations in a system, not only those observed in the simulation data. A trajectory can be described as a traversal through the SS space. Thus, for trajectory analysis we can apply spectral methods based on eigenanalysis, independent component analysis or other matrix decomposition methods [19, 20]. At the molecular level, we can use diffusion and other harmonic analysis methods to understand the connectivity structure of the space and obtain meaningful partitions (or SS clusters) [21]. Homological methods, in contrast, are more algebraic than analytic [22]. They are based on notions of cycles and boundaries, defining various discrete topological invariants (homology groups) which capture the connectivity structure of the trajectory. There are complex theoretical relationships between spectral and homological methods [23]. We believe that trajectory analysis constitutes a DSL with primitives such as parameters, states, distance metrics, clusters and trajectories themselves, and will work with PPL to define and implement our algorithms.
1.5.1 Identification of mestastable states
We will use both spectral and homological methods to define ways to measure the “distance” between structural states. If a simulation tracks all objects, then the SS can include them all in this distance computation. However, sometimes it is useful to treat a subset of objects as equivalent (for example, at the molecular level one may want to treat water or lipid molecules as fungible). Spectral methods can be used to allow us to vary the degree to which we want to treat such bodies as distinct. Thus, we will create methods for this “controlled ambiguity” in order to have robust metrics that are not overly sensitive to irrelevant details. We have recently defined heat kernel signatures for matching 3D shapes which we can generalize for many applications [24]. The kernels can be used to create similarity metrics that naturally satisfy the triangle inequality. We will implement these and explore their appropriate use at different scales to find “similar” conformations of 3D structures in simulation trajectories. Once we have chosen an appropriate distance metric, we must combine its purely geometrical information present with the kinetic information available from the trajectories—telling us those geometries that interconvert. We will explore ways to modify the metric to bring SSs that are adjacent in a trajectory closer together, thus recognizing the intrinsic “connectivity” of the state space.
We will explore graph partitioning using spectral ideas [25], primarily based on looking at eigen-decompositions. We will also evaluate local variants, where one looks for a good graph cut near or around a given state region, where there is high conductance within the state and low conductance outside the region, which may naturally capture the notion of metastable state of the system [26] . We also want to capture the intuition that a metastable state is one through which many trajectories pass. The natural way to capture this is through the notion of persistent local homology: if many different paths pass through a given state then such "traffic hub" states can be detected [27].
Figure 1.6. Markovian model for Aβ oligomerization. Our model was built using the different aggregation states as the Markov states; in a system with four chains, there are five such states: four monomers (MMMM), two monomers and one dimer (MMD), two dimers [1], one monomer and one trimer (MT), and finally, one tetramer (Q). In addition, to include the effects of low concentration found experimentally, we discriminate EC states (in which states are close) from separated states. The rate limiting steps in the aggregation process are shown as dotted lines. The numbers associated with the transitions are transition probabilities. The significant figures were determined from the uncertainties in the transition probabilities. Some transitions with very low probability have not been shown for the sake of clarity.
Share with your friends: |