Predictive Modeling in Cheminformatics

Virtual screening is the computational or in-silico screening of biological compounds and complements the HTS process. It is used to aid the selection of compounds for screening in HTS bioassays or for inclusion in a compound-screening library.Virtual screening can utilise several computational techniques depending on the amount and type of information available about the compounds and the target. Protein-based methods are employed when the 3D structure of the bioassay target is known and computational techniques involve the docking (virtual binding), and subsequent scoring, of candidate ligands (the part of the compound that is capable of binding) to the protein target.Ligand-based approaches are usually used when there are compounds known to be active or inactive for a specific target. If a few active compounds are known then structure-similarity techniques may be used; if the activity of several compounds is known then discriminant analysis techniques, such as machine learning approaches, may be applied. This is achieved by choosing several compounds that have known activity for a specific biological target and building predictive models that can discriminate between the active and inactive compounds. The goal is to then apply these models to several other unscreened compounds so that the compounds most likely to be active may be selected for screening. This is the approach taken in this research.The rationale behind the use of machine learning is to discover patterns and signatures in data sets from high throughput in-vitro assays.

Some things to remember before you perform Chemical Predictive Modeling



In this video students will learn how to perform predictive modeling with two activity endpoints that is active and inactive compounds.I have used bioassay datasets from pubchem to model the data using Naive Bayes,Random Forest and SVM.



OCHEM is a web based platform to automate the QSAR modeling process.The platform consists of two major subsystems: the database of experimental measurements and the modeling framework. A user-contributed database contains a set of tools for easy input, search and modification of thousands of records.



The video below describes about selection of variables and techniques to improve the accuracy of predictive models



Links to the papers:

There are various kind of classification models .Below I listed some of the classification models and its it different properties from Tom Mitchell's Book

Here is the link