LINK: http://www.youtube.com/watch?v=-5nnciZ9hgc
It is very important to mention, that the most important PCs is not necessarily will be involved in PCR, because PCA gives solution selecting PCs according to their ability to explain the variance in the independent variables whereas PCR is concerned with explaining the variation in the dependent y variable. One drawback of PCR is that it may be more difficult to interpret the resulting equations, for example to decide which of the original molecular properties should be changed in order to enhance the activity [Leach2007].
The technique of partial least squares regression (PLSR) is similar to PCR, with the essential difference that the quantities calculated are chosen to explain not only the variation in the independent variables x but also the variation in the dependent variables y as well. PLSR expresses the dependent variable in terms of quantities called latent variables which are linear combinations of the independent variables.
The following video shows and explains PLSR in short:
LINK: http://www.youtube.com/watch?v=WKEGhyFx0Dg
Because both PCR and PLSR use reduced dimensions, these methods give biased results, this is the price for treating the multicollinearity in the data. That is why the crucial step for both PCR and PLSR is to determine the proper number of PCs and latent variables, resp. Bootstrapping, Monte Carlo methods can help for that.
4. CoMFA (Comparative Molecular Field Analysis)
One of the most significant developments in QSAR in recent years was the introduction of Comparative Molecular Field Analysis (CoMFA). The aim of CoMFA is to derive a correlation between the biological activity of a series of molecules and their 3D shape, electrostatic and hydrogen-bonding characteristics [Leach2007]. The data structure used in a CoMFA analysis is derived from a series of superimposed conformations, one for each molecule in the data set. These conformations are presumed to be the biologically active structures, overlaid in their common binding mode. Each conformation is taken in turn, and the molecular fields around it are calculated. This is achieved by placing the structure within a regular lattice and calculating the interaction energy between the molecule and a series of probe groups placed at each lattice point. (In lines the compounds can be described, in every three columns theconformational Descartes coordinates can be found.) The general form of the equation is described in the next
|
(10.10)
|
where P probe groups, N points in the grid. c(i,j) is the coefficient for the column in the matrixthat corresponds group j at grid point i [Leach2007]. The solution of the PCA and PLS equation predict models for the 3D QSAR. Further trial based on the old methods are under development.
5. References
[Varnek2011] A. Varnek, I. I. Baskin, Mol. Inf. 2011, 30, 20-32.
[Baskin2008] I. I. Baskin, A. Varnek, Chapter 1. Fragment Descriptors in SAR/QSAR/QSPR Studies, Molecular Similarity Analysis and in Virtual Screening, 1-43. in A. Varnek, A. Tropsha (eds.), Chemoinformatics Approaches to Virtual Screening, Royal Society of Chemistry, Cambridge, 2008.
[Hansch1963] C. Hansch, R. M. Muir, T. Fujita, P. P. Maloney, F. Geiger and M. Streich, J. Am. Chem. Soc., 1963, 85, 2817-2824.
[Hansch1964] C. Hansch and T. Fujita, J. Am. Chem. Soc., 1964, 86, 1616–1626.
[Free1964] S. M. Free Jr. and J. W. Wilson, J. Med. Chem., 1964, 7, 395-399.
[Fleischer2000] R. Fleischer, P. Frohberg, A. Büge, P. Nuhn and M. Wiese, Quant. Struct.-Act. Relat., 2000, 19, 162-172.
[Hatrik1996] S. Hatrik and P. Zahradnik, J. Chem. Inf. Comput. Sci., 1996, 36, 992-995.
[Leach2007] A. R. Leach, V. J. Gillet, An Introduction to Chemoinformatics. Springer, Dordrecht, The Netherlands, 2007.
6. Questions
How can you define chem(o)informatics?
What is the benefits of the linear relationship?
What is the benefits and drawbacks of the biased regression methods (PCR, PLSR)?
List some molecular descritptions?
7. Glossary
LINK: http://www.genomicglossaries.com/content/chemoinformatics_gloss.asp
Chapter 11. Quantum Mechanics and Mixed Quantum Mechanics/Molecular Mechanics Methods to Characterize the Structure and Reactions of Biologically Active Molecules.
(Gábor Paragi, György Ferenczy)
Keywords: Born-Oppenheimer approximation, potential energy surface, Hartree-Fock method, density functional theory, mixed methods, linked atom method, strictly localized molecular orbital method.
1. Introduction
What is described here? The aim of the chapter is to provide an introduction into the theoretical background of the most commonly used high level energy calculation methods. We review the different step in the simplification of the original problem, and the limitation of the most commonly applied calculation methods. Finally, we overview the theories as QM and MM level energy calculations can connect to each other within a large common system.
What is it used for? High level energy calculations in atomic level theoretical investigations.
What is needed? Beginner level knowledge in quantum mechanics.
2. The hierarchy of approximations in quantummechanical treatment of atoms and molecules.
Currently, the most complete theoretical description of atoms and molecules can be achieved by quantum mechanics (QM). Therefore its application in molecular biology seems an obvious step but the average size of biological systems as well as their complexity strongly limit the applicability of QM in biology. Introducing appropriate approximations in QM-level calculations, however, can help us to establish QM in the field of biology. Consequently, knowing the principles of applied approximations has primary importance in the relevancy or validity of the results at this level of investigations.
According to the general picture, quantum mechanics is the physics of “small” systems. The meaning of small is quite relative and obscure but more or less we can be assured that molecular or sub-molecular systems are “small” enough. Similarly to other part of physics (e.g. classical mechanics, electrodynamics, thermodynamics etc.) QM can also be built up in an axiomatic manner. For the curious reader we would refer to the related chapters in [1] but we do not wish to go into the details in the frame of the present lectures. According to these axioms, the QM level description of any state of a system is characterized by a wave function which is determined by the Schrödinger-equation (in non-relativistic cases). Until this point we only talked about systems in general but we have not defined clearly the elementary building blocks of our systems.
Many biochemical investigations focus onto the profound understanding of interactions between molecules which usually include bond creation or breaking. These processes are related to the changes in the electron system therefore the smallest elements are the electrons and nuclei and we investigate their systems called as atoms and molecules. It is known from introductory QM books (e.g. H2+ molecule), therefore in the investigation of biologically relevant systems several approximations must be applied. The detailed derivation of the approximations or the discussion about possible further developments would take a whole book and principally belongs to the field of atomic and molecular physics or theoretical chemistry. Therefore in the present chapter we only would like to summarize shortly the most frequently applied approximations and show their validity region. We believe that for mainly application-oriented people or for MS students this is a good starting point.
3. From time-dependent systems to potential energy surface
3.1. The time-independent Schrödinger equation
As mentioned before, the evolution of a QM system in time is determined by the Schrödinger-equation in non-relativistic approximation. In case of molecules where the coordinates of the nuclei are signed by Ra (X1, Y1, Z1, X2, Y2, Z2, … , XM, YM, ZM; M = number of nuclei) and the electron coordinates by rj (x1, y1, z1, x2, y2, z2, … , xN, yN, zN; N = number of electrons) the Schrödinger-equation has the following form:
|
(11.1)
|
Here “i” means the imaginary unit, ħ is the reduced Planck-constant, V(Ra,rj ,t) signs any potential, Δ is the Laplace differential expression and finally Θ((Ra,rj ,t) is the total wave function of the system. Ma means the mass of a nucleus while me is the mass of the electron. Hereafter, the “atomic units” will be applied in the whole chapter. It means that certain physical constants are chosen to unity, namely: ħ, me, a0 (the Bohr-radius) and qe (the electron charge). The energy unit is the Hartree and the exchange rate to other common energy units is the following: 1 Hartree = 0.5 Rydberg = 27.5 eV = 627.5 kcla/mol. Using atomic units, Eq. (11.1) turns to
|
(11.2)
|
which is a simpler form of the same equation. According to the QM postulates, the first two terms on the right side is related to the kinetic energy of the nuclei and the electrons, respectively. The V(Ra,rj ,t) potential is built up from the electrostatic potential of the nuclei and the electrons, and any further external potentials (e.g. external electrostatic or magnetic field) can appear here as an extra additive term.
(Supplementary material) In QM we use different mathematical objects compared to the usual classical physics, so we would like to add a few words separately about them. If somebody is familiar with the basics of linear algebra, he/she can easily jump this supplementary part.
Taking a set of selected real or complex value functions (e.g. set of solutions of eqn. (1)) one can define linear-space or vector-space structure on this set with the help of usual scalar multiplication and addition of functions. In this situation the elements of the set are called generally vectors. Those objects which map between two vector spaces (or maps a vector space onto itself) with certain mathematical properties are called linear operators, or simply operators. There is a special situation when the image set of the mapping is the real numbers. In this case the operator called as functional and later we will work with this mathematical object in the frame of density functional theory. A good example for a functional is the definite integral: it assigns the area under the curve value of the chosen domain to the function in the integral. For a curious reader, a detailed introduction into the mathematics of vector spaces can be found in the book of P. R. Halmos [2].
Consequently the V(Ra,rj ,t) potential or the Laplace expression is an operator. The V(Ra,rj ,t) operator assigns to a wave function the multiple of the function according to V(Ra,rj ,t)Θ(Ra,rj ,t). The Laplace-operator is a little bit more complicated: it is evaluated as the sum of second order partial derivatives with respect to the variables. For instance, the Laplace-operator of the helium atom is as follows: We have one nucleus and two electrons, therefore the independent variables of the wave function are (X1, Y1, Z1, x1, y1, z1, x2, y2, z2) in Descartes-coordinates. Thus the Laplace-expression is
|
(11.3)
|
in Descartes-coordinates. It is worth to note that according to the axioms of QM the physical quantities are described by operators contrary to the classical physics, where they are usually real value functions. Hereafter we will use the word “operator” together with physical quantities (e.g.impulse-operator, coordinate-operator etc.), and the hat (^) above a letter denotes that it is an operator.
As an example, the impulse-operator of a particle is and the kinetic energy of a particle is evaluated as p2/2m . The corresponding operator is and defining
|
(11.4)
|
one can write the Schrödinger equation as
|
(11.5)
|
(
Share with your friends: |