Recent advances in simulation and experiments have led to dramatic increases in the quantity and complexity of produced data. In particular, molecular dynamics simulations increasingly produce massive trajectories. Accurate analysis and interpretation of such dynamics are widely recognized as fundamental bottlenecks that could limit their applications, especially in the forthcoming era of exascale computing. We are developing original approaches to analyse complex dynamics and represent it in a simple, intuitive way. Mainly, we are interested in the analysis of state-of-the-art simulations of protein folding obtained from various groups (e.g., D. Shaw). However, analysis of other types of complex dynamics, e.g., dynamics of a disease from longitudinal cohort studies, or how proteins aggregate is also of interest.
Current major projects
- Free energy landscape analysis of state-of-the-art protein folding simulations
- Methodological development of the optimal reaction coordinate framework.
- Determination of optimal biomarkers from longitudinal cohorts.
- Extension of Machine Learning algorithms for the analysis of dynamics.
Detailed research programme
Protein folding free energy landscapes
State of the art simulations of protein folding describe folding dynamics with infinitely high spatial-temporal resolution, inaccessible to current experimental techniques. While, the results of the simulations, in principle, contain all the information about the processes, the extraction of this information, its analysis and presentation in a convenient form are highly non-trivial tasks. A rigorous way to analyse protein folding dynamics is to describe/approximate it by diffusion on the free energy landscape. The obtained landscapes allows one to determine rigorously and in a direct manner important characteristics of protein folding, e.g., the folding free energy barrier and pre-exponential factor, the protein configurations in the free energy minima and transition states, the number of folding pathways.
Development of the optimal reaction coordinate framework
The free energy landscape represents free energy as a function of chosen reaction coordinates (RCs). For such a description to be quantitatively accurate, the RCs should be chosen in an optimal way. We are developing the theory of such optimal RC as well as the efficient practical method to determine the RC from simulation trajectories or other Big Data. Accurate analysis and interpretation of simulations are widely recognized as fundamental bottlenecks that could limit their applications, especially in the forthcoming era of exascale computing.
Optimal biomarkers for the description of stochastic disease dynamics from longitudinal cohorts.
The evolution of disease or the progress of recovery of a patient is a complex process, which depends on many factors. We assume that disease dynamics should be described stochastically, e.g., due to inherent randomness or coarse grained/incomplete description. In that case the best coordinate that describes the progress of the disease (the best biomarker) between two end states, e.g., healthy and abnormal, is the committor function (the optimal RC). In particular, it should accurately predict the odds of positive outcome and the mean time to achieve that. We have developed approaches to construct such a coordinate from an ensemble of patient trajectories in an automated way without any disease specific information.
Extension of Machine Learning algorithms for the analysis of dynamics
Machine Learning (ML) approaches promise to revolutionise many areas of research dealing with Big Data. However straightforward application of ML to the analysis of molecular dynamics has been met with limited success. Here we combine ideas/methods from the ML field with the ideas/methods from the optimal RC framework.