Recent Changes

Wednesday, May 25

  1. page home edited Welcome to I571! Instructor: David Wild {djwild.jpg} Welcome Welcome to I571 Signing up as …

    Welcome to I571!
    Instructor: David Wild
    {djwild.jpg} Welcome
    Welcome to I571
    Signing up as a for-credit student - IU students can sign up for this as a 3 credit hour course using section #30761 (residential), #34705 (online) or #36485 (online Data Science). For-credit students must complete the assignments by the specified dates to receive credit. All grading etc. will be done through the I571 Canvas Website.
    Using the Canvas site for discussion and interaction. We intend that most of the interaction among students, and between students and the instructor, will be done using the I571 Canvas Website
    (view changes)
    7:00 pm

Monday, November 30

  1. page home edited ... Taking the next steps Graded Assignments & Coursework ... posted below 60%), 70%), …
    ...
    Taking the next steps
    Graded Assignments & Coursework
    ...
    posted below 60%),70%), plus a final exam (40%).(30%). The assignments
    Assignment 1 - 2D representation
    Assignment 2 - Similarity Calculation and Database Searching
    (view changes)
    11:20 am

Friday, November 27

  1. page home edited ... Taking the next steps Graded Assignments & Coursework ... be through 7 6 assignment…
    ...
    Taking the next steps
    Graded Assignments & Coursework
    ...
    be through 76 assignments, which
    ...
    posted below (70%),60%), plus a final exam (30%).(40%). The assignments
    Assignment 1 - 2D representation
    Assignment 2 - Similarity Calculation and Database Searching
    (view changes)
    9:52 pm
  2. page Assignment 6 edited ... 6. Generate conformers using the fast or best settings and create a merge features pharmacopho…
    ...
    6. Generate conformers using the fast or best settings and create a merge features pharmacophore model and select the best 3 models and then plot a ROC curve of your data.For ROC Curve you need to convert the actives and inactives into ldb format and test. Have a look at this excellent blog for plotting roc curves both part 1 and part 2. I have also provided a PknB dataset for testing it has 36 actives and 999 decoy sets (1035 compounds) . You need to separately convert actives and inactives in .ldb format to test for your model. Post your pharmacophore model on cheminfoclub wiki and compare the model with the paper discussed here .
    Note: Taking a large dataset might increase the computational time. First try a short run on 10 active and 10/20 inactive compounds .
    ...
    the database hits,sensitivity,specificity and GHscore of your models.hits.
    I have provided a R function below to calculate enrichment :
    enrichment <- function (x, y, top = 0.05, decreasing = TRUE)
    (view changes)
    6:58 am
  3. page Assignment 6 edited ... e<-enrichment(x,y,top=0.1) e= 1.1 7. 8. What are
    ...
    e<-enrichment(x,y,top=0.1)
    e= 1.1
    7.8. What are
    (view changes)
    6:54 am

Tuesday, November 24

  1. page Assignment 5 edited ... fpm<-fp.to.matrix(fplist) write.csv(fpm,"Actives.csv") <span style="fo…
    ...
    fpm<-fp.to.matrix(fplist)
    write.csv(fpm,"Actives.csv")
    <span style="font-family: Arial,Helvetica,sans-serif;">## given a
    # creates new columns: feature.scale = (feature - mean)/std
    feature.scale = function (data, cols) {
    ...
    8. Add the Outcome column of Active and Inactive to the csv file and merge the two files into a single file.Load the csv file and perform modeling experiments.
    9. For NA values in the data, replace the NA values with columns means. Also for modeling you need to remove the constant columns and also highly correlated columns (>0.95). With the remaining attributes perform your modeling.
    <span style="font-family: Arial,Helvetica,sans-serif;">##Removing##Removing Constant or
    d <- data.frame(somedata)
    dropc <- apply(d, 2, function(x) { length(which(x == 0))/length(x) > .8 })
    ...
    Note: replacing missing values with column means you can use this one liner below and you can use FSelector package to reduce variables based on correlations or a code to do that given below.
    ( dataset<- ifelse(is.na(data), rep(colMeans(data, na.rm=TRUE), rep(nrow(data), ncol(data))),unlist(data)) )
    <span style="font-family: Arial,Helvetica,sans-serif;">## Removing correlated
    corr.rem <- function(d, cutoff=0.9) {
    if (cutoff > 1 || cutoff <= 0) {
    (view changes)
    11:15 am

Friday, November 6

  1. page Assignment 5 edited ... 12 .Use the ROCR package to get the performance and plot the ROC plots,Lift plot for each fing…
    ...
    12 .Use the ROCR package to get the performance and plot the ROC plots,Lift plot for each fingerprints.
    13. Submit a PDF file for the Assignment 1 that contains:
    ...
    plots for Morgan,extended, Pubchem and
    (ii) The confusion matrices
    (iii) A short description of which you think is the best model and why
    (view changes)
    4:03 pm

Sunday, November 1

  1. page Assignment 1 edited ... 4. Show the frequency of Each Toxicophore SMARTS. 5 . Create a page and write a brief report …
    ...
    4. Show the frequency of Each Toxicophore SMARTS.
    5 . Create a page and write a brief report on cheminfoclub.
    Note : For 29 toxicophore collect the names which are marked as approved in the paper. There are 29 of them. Collect the SMARTS from the supporting information pdf.
    (view changes)
    10:00 pm

Saturday, October 31

  1. page Assignment 5 edited ... 1.Install Open Babel , R and the given R packages in windows, linux or mac OS. 2.Pubchem Bioa…
    ...
    1.Install Open Babel , R and the given R packages in windows, linux or mac OS.
    2.Pubchem Bioassay(http://www.ncbi.nlm.nih.gov/pcassay) provides access to the Bioassay results of the tested compounds. Select any one of the given bioassay and perform your experiments.
    ...
    AID: 504318 e) AID: 62434
    3.Download any one of the bioassay data i.e Active and Inactive set of compounds in sdf format.
    4. Using open babel filter the compounds like strips the salts,remove duplicates compounds and filter compounds containing number of heteroatoms >=10 and <= 60 and append the activity outcome active or inactive to the output file(sdf or smiles)
    (view changes)
    11:31 am

Monday, October 5

  1. page 2D chemical database searching systems edited ... Interfaces At the client side, some kind of interface is required for searching databases. Th…
    ...
    Interfaces
    At the client side, some kind of interface is required for searching databases. This could be a machine interface (e.g. JDBC , ODBC , SOAP service, REST service) or a human interface (HTTP or client-side application). Increasingly database access through a single human interface is an outdated method; service-oriented architectures allow much greater flexibility to search within a variety of applications and mashups.
    ...
    service (see http://cheminfo.wikispaces.com/Web+service+infrastructure)Web service infrastructure)
    Example with PostgreSQL and CHORD
    For this example, we are going to be working with a very small sample dataset of common drugs:
    (view changes)
    6:03 pm

More