"au:"Kelin Xia"" — arXiv2 Search

Showing 1–20 of 53 results

Protein folding tames chaos

Kelin Xia, Guo-Wei Wei

Aug 13, 2013·q-bio.BM·PDF

Protein folding produces characteristic and functional three-dimensional structures from unfolded polypeptides or disordered coils. The emergence of extraordinary complexity in the protein folding process poses astonishing challenges to theoretical modeling and computer simulations. The present work introduces molecular nonlinear dynamics (MND), or molecular chaotic dynamics, as a theoretical framework for describing and analyzing protein folding. We unveil the existence of intrinsically low dimensional manifolds (ILDMs) in the chaotic dynamics of folded proteins. Additionally, we reveal that the transition from disordered to ordered conformations in protein folding increases the transverse stability of the ILDM. Stated differently, protein folding reduces the chaoticity of the nonlinear dynamical system, and a folded protein has the best ability to tame chaos. Additionally, we bring to light the connection between the ILDM stability and the thermodynamic stability, which enables us to quantify the disorderliness and relative energies of folded, misfolded and unfolded protein states. Finally, we exploit chaos for protein flexibility analysis and develop a robust chaotic algorithm for the prediction of Debye-Waller factors, or temperature factors, of protein structures.

Flexibility-Rigidity Index for Protein-Nucleic Acid Flexibility and Fluctuation Analysis

Kristopher Opron, Kelin Xia, Zachary F. Burton, Guo-Wei Wei

Oct 26, 2015·q-bio.BM·PDF

Protein-nucleic acid complexes are important for many cellular processes including the most essential function such as transcription and translation. For many protein-nucleic acid complexes, flexibility of both macromolecules has been shown to be critical for specificity and/or function. Flexibility-rigidity index (FRI) has been proposed as an accurate and efficient approach for protein flexibility analysis. In this work, we introduce FRI for the flexibility analysis of protein-nucleic acid complexes. We demonstrate that a multiscale strategy, which incorporates multiple kernels to capture various length scales in biomolecular collective motions, is able to significantly improve the state of art in the flexibility analysis of protein-nucleic acid complexes. We take the advantage of the high accuracy and ${\cal O}(N)$ computational complexity of our multiscale FRI method to investigate the flexibility of large ribosomal subunits, which is difficult to analyze by alternative approaches. An anisotropic FRI approach, which involves localized Hessian matrices, is utilized to study the translocation dynamics in an RNA polymerase.

Ollivier persistent Ricci curvature (OPRC) based molecular representation for drug design

JunJie Wee, Kelin Xia

Nov 20, 2020·q-bio.BM·PDF

Efficient molecular featurization is one of the major issues for machine learning models in drug design. Here we propose persistent Ricci curvature (PRC), in particular Ollivier persistent Ricci curvature (OPRC), for the molecular featurization and feature engineering, for the first time. Filtration process proposed in persistent homology is employed to generate a series of nested molecular graphs. Persistence and variation of Ollivier Ricci curvatures on these nested graphs are defined as Ollivier persistent Ricci curvature. Moreover, persistent attributes, which are statistical and combinatorial properties of OPRCs during the filtration process, are used as molecular descriptors, and further combined with machine learning models, in particular, gradient boosting tree (GBT). Our OPRC-GBT model is used in the prediction of protein-ligand binding affinity, which is one of key steps in drug design. Based on three most-commonly used datasets from the well-established protein-ligand binding databank, i.e., PDBbind, we intensively test our model and compare with existing models. It has been found that our model are better than all machine learning models with traditional molecular descriptors.

Weighted persistent homology for biomolecular data analysis

Zhenyu Meng, D Vijay Anand, Yunpeng Lu, Jie Wu, Kelin Xia

Mar 7, 2019·q-bio.BM·PDF

In this paper, we systematically review weighted persistent homology (WPH) models and their applications in biomolecular data analysis. Essentially, the weight value, which reflects physical, chemical and biological properties, can be assigned to vertices (atom centers), edges (bonds), or higher order simplexes (cluster of atoms), depending on the biomolecular structure, function, and dynamics properties. Further, we propose the first localized weighted persistent homology (LWPH). Inspired by the great success of element specific persistent homology (ESPH), we do not treat biomolecules as an inseparable system like all previous weighted models, instead we decompose them into a series of local domains, which may be overlapped with each other. The general persistent homology or weighted persistent homology analysis is then applied on each of these local domains. In this way, functional properties, that are embedded in local structures, can be revealed. Our model has been applied to systematically studying DNA structures. It has been found that our LWPH based features can be used to successfully discriminate the A-, B-, and Z-types of DNA. More importantly, our LWPH based PCA model can identify two configurational states of DNA structure in ion liquid environment, which can be revealed only by the complicated helical coordinate system. The great consistence with the helical-coordinate model demonstrates that our model captures local structure variations so well that it is comparable with geometric models. Moreover, geometric measurements are usually defined in very local regions. For instance, the helical-coordinate system is limited to one or two basepairs. However, our LWPH can quantitatively characterize structure information in local regions or domains with arbitrary sizes and shapes, where traditional geometrical measurements fail.

Persistent homology analysis of osmolyte molecular aggregation and their hydrogen-bonding networks

Kelin Xia, D Vijay Anand, Shikhar Saxena, Yuguang Mu

May 28, 2019·q-bio.QM·PDF

Two types of osmolytes, i.e., trimethylamin N-oxide (TMAO) and urea, demonstrate dramatically different properties in a protein folding process. Even with the great progresses in revealing the potential underlying mechanism of these two osmolyte systems, many problems still remain unsolved. In this paper, we propose to use the persistent homology, a newly-invented topological method, to systematically study the osmolytes molecular aggregation and their hydrogen-bonding network from a global topological perspective. It has been found that, for the first time, TMAO and urea show two extremely different topological behaviors, i.e., extensive network and local cluster. In general, TMAO forms highly consistent large loop or circle structures in high concentrations. In contrast, urea is more tightly aggregated locally. Moreover, the resulting hydrogen-bonding networks also demonstrate distinguishable features. With the concentration increase, TMAO hydrogen-bonding networks vary greatly in their total number of loop structures and large-sized loop structures consistently increase. In contrast, urea hydrogen-bonding networks remain relatively stable with slight reduce of the total loop number. Moreover, the persistent entropy (PE) is, for the first time, used in characterization of the topological information of the aggregation and hydrogen-bonding networks. The average PE systematically increases with the concentration for both TMAO and urea, and decreases in their hydrogen-bonding networks. But their PE variances have totally different behaviors. Finally, topological features of the hydrogen-bonding networks are found to be highly consistent with those from the ion aggregation systems, indicating that our topological invariants can characterize intrinsic features of the "structure making" and "structure breaking" systems.

Persistent-Homology-based Machine Learning and its Applications -- A Survey

Chi Seng Pun, Kelin Xia, Si Xian Lee

Nov 1, 2018·math.AT·PDF

A suitable feature representation that can both preserve the data intrinsic information and reduce data complexity and dimensionality is key to the performance of machine learning models. Deeply rooted in algebraic topology, persistent homology (PH) provides a delicate balance between data simplification and intrinsic structure characterization, and has been applied to various areas successfully. However, the combination of PH and machine learning has been hindered greatly by three challenges, namely topological representation of data, PH-based distance measurements or metrics, and PH-based feature representation. With the development of topological data analysis, progresses have been made on all these three problems, but widely scattered in different literatures. In this paper, we provide a systematical review of PH and PH-based supervised and unsupervised models from a computational perspective. Our emphasizes are the recent development of mathematical models and tools, including PH softwares and PH-based functions, feature representations, kernels, and similarity models. Essentially, this paper can work as a roadmap for the practical application of PH-based machine learning tools. Further, we consider different topological feature representations in different machine learning models, and investigate their impacts on the protein secondary structure classification.

Multiscale persistent functions for biomolecular structure characterization

Kelin Xia, Zhiming Li, Lin Mu

Dec 26, 2016·q-bio.BM·PDF

In this paper, we introduce multiscale persistent functions for biomolecular structure characterization. The essential idea is to combine our multiscale rigidity functions with persistent homology analysis, so as to construct a series of multiscale persistent functions, particularly multiscale persistent entropies, for structure characterization. To clarify the fundamental idea of our method, the multiscale persistent entropy model is discussed in great detail. Mathematically, unlike the previous persistent entropy or topological entropy, a special resolution parameter is incorporated into our model. Various scales can be achieved by tuning its value. Physically, our multiscale persistent entropy can be used in conformation entropy evaluation. More specifically, it is found that our method incorporates in it a natural classification scheme. This is achieved through a density filtration of a multiscale rigidity function built from bond and/or dihedral angle distributions. To further validate our model, a systematical comparison with the traditional entropy evaluation model is done. It is found that our model is able to preserve the intrinsic topological features of biomolecular data much better than traditional approaches, particularly for resolutions in the mediate range. Moreover, our method can be successfully used in protein classification. For a test database with around nine hundred proteins, a clear separation between all-alpha and all-beta proteins can be achieved, using only the dihedral and pseudo-bond angle information. Finally, a special protein structure index (PSI) is proposed, for the first time, to describe the "regularity" of protein structures. Essentially, PSI can be used to describe the "regularity" information in any systems.

Multiscale Gaussian network model (mGNM) and multiscale anisotropic network model (mANM)

Kelin Xia, Kristopher Opron, Guo-Wei Wei

Oct 26, 2015·q-bio.BM·PDF

Gaussian network model(GNM) and anisotropic network model(ANM) are some of the most popular methods for the study of protein flexibility and related functions. In this work, we propose generalized GNM(gGNM) and ANM methods and show that the GNM Kirchhoff matrix can be built from the ideal low-pass filter, which is a special case of a wide class of correlation functions underpinning the linear scaling flexibility-rigidity index(FRI) method. Based on the mathematical structure of correlation functions, we propose a unified framework to construct generalized Kirchhoff matrices whose matrix inverse leads to gGNMs, whereas, the direct inverse of its diagonal elements gives rise to FRI method.With this connection,we further introduce two multiscale elastic network models, namely, multiscale GNM(mGNM) and multiscale ANM(mANM), which are able to incorporate different scales into the generalized Kirchkoff matrices or generalized Hessian matrices.We validate our new multiscale methods with extensive numerical experiments. We illustrate that gGNMs outperform the original GNM method in the B-factor prediction of a set of 364 proteins.We demonstrate that for a given correlation function, FRI and gGNM methods provide essentially identical B-factor predictions when the scale value in the correlation function is sufficiently large.More importantly,we reveal intrinsic multiscale behavior in protein structures. The proposed mGNM and mANM are able to capture this multiscale behavior and thus give rise to a significant improvement of more than 11% in B-factor predictions over the original GNM and ANM methods. We further demonstrate benefit of our mGNM in the B-factor predictions on many proteins that fail the original GNM method. We show that the present mGNM can also be used to analyze protein domain separations. Finally, we showcase the ability of our mANM for the simulation of protein collective motions.

Persistent spectral based machine learning (PerSpect ML) for drug design

Zhenyu Meng, Kelin Xia

Feb 3, 2020·q-bio.QM·PDF

In this paper, we propose persistent spectral based machine learning (PerSpect ML) models for drug design. Persistent spectral models, including persistent spectral graph, persistent spectral simplicial complex and persistent spectral hypergraph, are proposed based on spectral graph theory, spectral simplicial complex theory and spectral hypergraph theory, respectively. Different from all previous spectral models, a filtration process, as proposed in persistent homology, is introduced to generate multiscale spectral models. More specifically, from the filtration process, a series of nested topological representations, i,e., graphs, simplicial complexes, and hypergraphs, can be systematically generated and their spectral information can be obtained. Persistent spectral variables are defined as the function of spectral variables over the filtration value. Mathematically, persistent multiplicity (of zero eigenvalues) is exactly the persistent Betti number (or Betti curve). We consider 11 persistent spectral variables and use them as the feature for machine learning models in protein-ligand binding affinity prediction. We systematically test our models on three most commonly-used databases, including PDBbind-2007, PDBbind-2013 and PDBbind-2016. Our results, for all these databases, are better than all existing models, as far as we know. This demonstrates the great power of our PerSpect ML in molecular data analysis and drug design.

Weighted persistent homology for osmolyte molecular aggregation and hydrogen-bonding network analysis

D Vijay Anand, Kelin Xia, Yuguang Mu

Jul 14, 2019·q-bio.QM·PDF

It has long been observed that trimethylamin N-oxide (TMAO) and urea demonstrate dramatically different properties in a protein folding process. Even with the enormous theoretical and experimental research work of the two osmolytes, various aspects of their underlying mechanisms still remain largely elusive. In this paper, we propose to use the weighted persistent homology to systematically study the osmolytes molecular aggregation and their hydrogen-bonding network from a local topological perspective. We consider two weighted models, i.e., localized persistent homology (LPH) and interactive persistent homology (IPH). From the localized persistent homology models, we have found that TMAO and urea have very different local topology. TMAO shows local network structures. With the concentration increase, the circle elements in these networks show a clear increase in their total numbers and a decrease in their relative sizes. In contrast, urea shows two types of local topological patterns, i.e., local clusters around 6 Å~ and a few global circle elements at around 12 Å. From the interactive persistent homology models, it has been found that our persistent radial distribution function (PRDF) from the global-scale IPH has same physical properties as the traditional radial distribution function (RDF). Moreover, PRDFs from the local-scale IPH can also be generated and used to characterize the local interaction information. Other than the clear difference of the first peak value of PRDFs at filtration size 4Å, TMAO and urea also shows very different behaviors at the second peak region from filtration size 5Å~ to 10 Å.

A quantitative structure comparison with persistent similarity

Kelin Xia

Jul 12, 2017·q-bio.QM·PDF

Biomolecular structure comparison not only reveals evolutionary relationships, but also sheds light on biological functional properties. However, traditional definitions of structure or sequence similarity always involve superposition or alignment and are computationally inefficient. In this paper, I propose a new method called persistent similarity, which is based on a newly-invented method in algebraic topology, known as persistent homology. Different from all previous topological methods, persistent homology is able to embed a geometric measurement into topological invariants, thus provides a bridge between geometry and topology. Further, with the proposed persistent Betti function (PBF), topological information derived from the persistent homology analysis can be uniquely represented by a series of continuous one-dimensional (1D) functions. In this way, any complicated biomolecular structure can be reduced to several simple 1D PBFs for comparison. Persistent similarity is then defined as the quotient of sizes of intersect areas and union areas between two correspondingly PBFs. If structures have no significant topological properties, a pseudo-barcode is introduced to insure a better comparison. Moreover, a multiscale biomolecular representation is introduced through the multiscale rigidity function. It naturally induces a multiscale persistent similarity. The multiscale persistent similarity enables an objective-oriented comparison. State differently, it facilitates the comparison of structures in any particular scale of interest. Finally, the proposed method is validated by four different cases. It is found that the persistent similarity can be used to describe the intrinsic similarities and differences between the structures very well.

Persistent homology analysis of ion aggregation and hydrogen-bonding network

Kelin Xia

Feb 26, 2018·q-bio.QM·PDF

Despite the great advancement of experimental tools and theoretical models, a quantitative characterization of the microscopic structures of ion aggregates and its associated water hydrogen-bonding networks still remains a challenging problem. In this paper, a newly-invented mathematical method called persistent homology is introduced, for the first time, to quantitatively analyze the intrinsic topological properties of ion aggregation systems and hydrogen-bonding networks. Two most distinguishable properties of persistent homology analysis of assembly systems are as follows. First, it does not require a predefined bond length to construct the ion or hydrogen network. Persistent homology results are determined by the morphological structure of the data only. Second, it can directly measure the size of circles or holes in ion aggregates and hydrogen-bonding networks. To validate our model, we consider two well-studied systems, i.e., NaCl and KSCN solutions, generated from molecular dynamics simulations. They are believed to represent two morphological types of aggregation, i.e., local clusters and extended ion network. It has been found that the two aggregation types have distinguishable topological features and can be characterized by our topological model very well. For hydrogen-bonding networks, KSCN systems demonstrate much more dramatic variations in their local circle structures with the concentration increase. A consistent increase of large-sized local circle structures is observed and the sizes of these circles become more and more diverse. In contrast, NaCl systems show no obvious increase of large-sized circles. Instead a consistent decline of the average size of circle structures is observed and the sizes of these circles become more and more uniformed with the concentration increase.

A review of geometric, topological and graph theory apparatuses for the modeling and analysis of biomolecular data

Kelin Xia, Guo-Wei Wei

Dec 6, 2016·q-bio.BM·PDF

Geometric, topological and graph theory modeling and analysis of biomolecules are of essential importance in the conceptualization of molecular structure, function, dynamics, and transport. On the one hand, geometric modeling provides molecular surface and structural representation, and offers the basis for molecular visualization, which is crucial for the understanding of molecular structure and interactions. On the other hand, it bridges the gap between molecular structural data and theoretical/mathematical models. Topological analysis and modeling give rise to atomic critical points and connectivity, and shed light on the intrinsic topological invariants such as independent components (atoms), rings (pockets) and cavities. Graph theory analyzes biomolecular interactions and reveals biomolecular structure-function relationship. In this paper, we review certain geometric, topological and graph theory apparatuses for biomolecular data modeling and analysis. These apparatuses are categorized into discrete and continuous ones. For discrete approaches, graph theory, Gaussian network model, anisotropic network model, normal mode analysis, quasi-harmonic analysis, flexibility and rigidity index, molecular nonlinear dynamics, spectral graph theory, and persistent homology are discussed. For continuous mathematical tools, we present discrete to continuum mapping, high dimensional persistent homology, biomolecular geometric modeling, differential geometry theory of surfaces, curvature evaluation, variational derivation of minimal molecular surfaces, atoms in molecule theory and quantum chemical topology. Four new approaches, including analytical minimal molecular surface, Hessian matrix eigenvalue map, curvature map and virtual particle model, are introduced for the first time to bridge the gaps in biomolecular modeling and analysis.

Sequence-based Multiscale Model (SeqMM) for High-throughput chromosome conformation capture (Hi-C) data analysis

Kelin Xia

Jul 12, 2017·q-bio.QM·PDF

In this paper, I introduce a Sequence-based Multiscale Model (SeqMM) for the biomolecular data analysis. With the combination of spectral graph method, I reveal the essential difference between the global scale models and local scale ones in structure clustering, i.e., different optimization on Euclidean (or spatial) distances and sequential (or genomic) distances. More specifically, clusters from global scale models optimize Euclidean distance relations. Local scale models, on the other hand, result in clusters that optimize the genomic distance relations. For a biomolecular data, Euclidean distances and sequential distances are two independent variables, which can never be optimized simultaneously in data clustering. However, sequence scale in my SeqMM can work as a tuning parameter that balances these two variables and deliver different clusterings based on my purposes. Further, my SeqMM is used to explore the hierarchical structures of chromosomes. I find that in global scale, the Fiedler vector from my SeqMM bears a great similarity with the principal vector from principal component analysis, and can be used to study genomic compartments. In TAD analysis, I find that TADs evaluated from different scales are not consistent and vary a lot. Particularly when the sequence scale is small, the calculated TAD boundaries are dramatically different. Even for regions with high contact frequencies, TAD regions show no obvious consistence. However, when the scale value increases further, although TADs are still quite different, TAD boundaries in these high contact frequency regions become more and more consistent. Finally, I find that for a fixed local scale, my method can deliver very robust TAD boundaries in different cluster numbers.

Multiscale virtual particle based elastic network model (MVP-ENM) for biomolecular normal mode analysis

Kelin Xia

Jul 31, 2017·q-bio.BM·PDF

In this paper, a multiscale virtual particle based elastic network model (MVP-ENM) is proposed for biomolecular normal mode analysis. The multiscale virtual particle model is proposed for the discretization of biomolecular density data in different scales. Essentially, the model works as the coarse-graining of the biomolecular structure, so that a delicate balance between biomolecular geometric representation and computational cost can be achieved. To form "connections" between these multiscale virtual particles, a new harmonic potential function, which considers the influence from both mass distributions and distance relations, is adopted between any two virtual particles. Unlike the previous ENMs that use a constant spring constant, a particle-dependent spring parameter is used in MVP-ENM. Two independent models, i.e., multiscale virtual particle based Gaussian network model (MVP-GNM) and multiscale virtual particle based anisotropic network model (MVP-ANM), are proposed. Even with a rather coarse grid and a low resolution, the MVP-GNM is able to predict the Debye-Waller factors (B-factors) with considerable good accuracy. Similar properties have also been observed in MVP-ANM. More importantly, in B-factor predictions, the mismatch between the predicted results and experimental ones is predominantly from higher fluctuation regions. Further, it is found that MVP-ANM can deliver a very consistent low-frequency eigenmodes in various scales. This demonstrates the great potential of MVP-ANM in the deformation analysis of low resolution data. With the multiscale rigidity function, the MVP-ENM can be applied to biomolecular data represented in density distribution and atomic coordinates. Further, the great advantage of my MVP-ENM model in computational cost has been demonstrated by using two poliovirus virus structures. Finally, the paper ends with a conclusion.

Atomic Scale Design and Three-Dimensional Simulation of Ionic Diffusive Nanofluidic Channels

Jin Kyoung Park, Kelin Xia, Guo-Wei We

Mar 2, 2015·q-bio.QM·PDF

Recent advance in nanotechnology has led to rapid advances in nanofluidics, which has been established as a reliable means for a wide variety of applications, including molecular separation, detection, crystallization and biosynthesis. Although atomic and molecular level consideration is a key ingredient in experimental design and fabrication of nanfluidic systems, atomic and molecular modeling of nanofluidics is rare and most simulations at nanoscale are restricted to one- or two-dimensions in the literature, to our best knowledge. The present work introduces atomic scale design and three-dimensional (3D) simulation of ionic diffusive nanofluidic systems. We propose a variational multiscale framework to represent the nanochannel in discrete atomic and/or molecular detail while describe the ionic solution by continuum. Apart from the major electrostatic and entropic effects, the non-electrostatic interactions between the channel and solution, and among solvent molecules are accounted in our modeling. We derive generalized Poisson-Nernst-Planck (PNP) equations for nanofluidic systems. Mathematical algorithms, such as Dirichlet to Neumann mapping and the matched interface and boundary (MIB) methods are developed to rigorously solve the aforementioned equations to the second-order accuracy in 3D realistic settings. Three ionic diffusive nanofluidic systems, including a negatively charged nanochannel, a bipolar nanochannel and a double-well nanochannel are designed to investigate the impact of atomic charges to channel current, density distribution and electrostatic potential. Numerical findings, such as gating, ion depletion and inversion, are in good agreements with those from experimental measurements and numerical simulations in the literature.

Weighted (Co)homology and Weighted Laplacian

Chengyuan Wu, Shiquan Ren, Jie Wu, Kelin Xia

Apr 19, 2018·math.AT·PDF

In this paper, we generalize the combinatorial Laplace operator of Horak and Jost by introducing the $φ$-weighted coboundary operator induced by a weight function $φ$. Our weight function $φ$ is a generalization of Dawson's weighted boundary map. We show that our above-mentioned generalizations include new cases that are not covered by previous literature. Our definition of weighted Laplacian for weighted simplicial complexes is also applicable to weighted/unweighted graphs and digraphs.

Matched Interface and Boundary Method for Elasticity Interface Problems

Bao Wang, Kelin Xia, Guo-Wei Wei

Dec 16, 2014·math.NA·PDF

Elasticity theory is an important component of continuum mechanics and has had widely spread applications in science and engineering. Material interfaces are ubiquity in nature and man-made devices, and often give rise to discontinuous coefficients in the governing elasticity equations. In this work, the matched interface and boundary (MIB) method is developed to address elasticity interface problems. Linear elasticity theory for both isotropic homogeneous and inhomogeneous media is employed. In our approach, Lam$\acute{e}$'s parameters can have jumps across the interface and are allowed to be position dependent in modeling isotropic inhomogeneous material. Both strong discontinuity, i.e., discontinuous solution, and weak discontinuity, namely, discontinuous derivatives of the solution, are considered in the present study. In the proposed method, fictitious values are utilized so that the standard central finite different schemes can be employed regardless of the interface. Interface jump conditions are enforced on the interface, which in turn, accurately determines fictitious values. We design new MIB schemes to account for complex interface geometries. In particular, the cross derivatives in the elasticity equations are difficult to handle for complex interface geometries. We propose secondary fictitious values and construct geometry based interpolation schemes to overcome this difficulty. Numerous analytical examples are used to validate the accuracy, convergence and robustness of the present MIB method for elasticity interface problems with both small and large curvatures, strong and weak discontinuities, and constant and variable coefficients. Numerical tests indicate second order accuracy in both $L_\infty$ and $L_2$ norms.

Persistent homology analysis of protein structure, flexibility and folding

Kelin Xia, Guo-Wei Wei

Dec 8, 2014·q-bio.BM·PDF

Proteins are the most important biomolecules for living organisms. The understanding of protein structure, function, dynamics and transport is one of most challenging tasks in biological science. In the present work, persistent homology is, for the first time, introduced for extracting molecular topological fingerprints (MTFs) based on the persistence of molecular topological invariants. MTFs are utilized for protein characterization, identification and classification. The method of slicing is proposed to track the geometric origin of protein topological invariants. Both all-atom and coarse-grained representations of MTFs are constructed. A new cutoff-like filtration is proposed to shed light on the optimal cutoff distance in elastic network models. Based on the correlation between protein compactness, rigidity and connectivity, we propose an accumulated bar length generated from persistent topological invariants for the quantitative modeling of protein flexibility. To this end, a correlation matrix based filtration is developed. This approach gives rise to an accurate prediction of the optimal characteristic distance used in protein B-factor analysis. Finally, MTFs are employed to characterize protein topological evolution during protein folding and quantitatively predict the protein folding stability. An excellent consistence between our persistent homology prediction and molecular dynamics simulation is found. This work reveals the topology-function relationship of proteins.

Fast and Anisotropic Flexibility-Rigidity Index

Kristopher Opron, Kelin Xia, Guo-Wei Wei

Dec 8, 2014·q-bio.BM·PDF

The flexibility-rigidity index (FRI) is a newly proposed method for the construction of atomic rigidity functions. The FRI method analyzes protein rigidity and flexibility and is capable of predicting protein B-factors without resorting to matrix diagonalization. A fundamental assumption used in the FRI is that protein structures are uniquely determined by various internal and external interactions, while the protein functions, such as stability and flexibility, are solely determined by the structure. As such, one can predict protein flexibility without resorting to the protein interaction Hamiltonian. Consequently, bypassing the matrix diagonalization, the original FRI has a computational complexity of O(N^2). This work introduces a fast FRI (fFRI) algorithm for the flexibility analysis of large macromolecules. The proposed fFRI further reduces the computational complexity to O(N). Additionally, we propose anisotropic FRI (aFRI) algorithms for the analysis of protein collective dynamics. The aFRI algorithms admit adaptive Hessian matrices, from a completely global 3N*3N matrix to completely local 3*3 matrices. However, these local 3*3 matrices have built in much non-local correlation. Furthermore, we compare the accuracy and efficiency of FRI with some {established} approaches to flexibility analysis, namely, normal mode analysis (NMA) and Gaussian network model (GNM). The accuracy of the FRI method is tested. The FRI, particularly the fFRI, is orders of magnitude more efficient and about 10% more accurate overall than some of the most popular methods in the field. The proposed fFRI is able to predict B-factors for alpha-carbons of the HIV virus capsid (313,236 residues) in less than 30 seconds on a single processor using only one core. Finally, we demonstrate the application of FRI and aFRI to protein domain analysis.

Multiscale Gaussian network model (mGNM) and multiscale anisotropic network model (mANM)

Kelin Xia, Kristopher Opron, Guo-Wei Wei

Oct 26, 2015·q-bio.BM·PDF