ASSIGNMENT 2.1

1. What are the advantages of using a SQL-based database with a cartridge for storing 2D chemical structures, over a system developed just for handling chemical structures (such as ISIS)?

2. Assume we have a PostgreSQL table called compounds with the following fields:
SMILES VARCHAR(200)
Name  VARCHAR(200)
FP BIT(166)
Activity  REAL
 
a) Write a SQL statement that will populate the FP field with 166-bit MACCS keys using the gNova cartridge
b) Write a SQL statement that will search for all records with an activity greater than 70 that contain a Pyridine ring (An aromatic ring with a Nitrogen in)

3. Explain in plain english what the following SMARTS means:
[R0;!#8]

4.Can you write the SMARTS definitions for Lipinski's Hydrogen bond donor and acceptor?
a) Donors are defined as nitrogen or oxygen atoms that have at least one directly bonded hydrogen atom and
b) Acceptors are defined as nitrogen or oxygen

5. What is the difference between Reaction SMILES and SMIRKS?

ASSIGNMENT 2.2

1) Use any of the pubchem bioassay confirmatory screening data (Malaria/ Tuberculosis/ Leishmaniasis)
2) Download the active and inactive compounds using the pubchem API or the using rpubchem package in R. (Check code below how to install rpubchem)
3) If the dataset you select is small then its fine or else you can sample a 100 active compounds and 100 inactive compounds from a big dataset and use it for further analysis.
4) Use different fingerprints from rcdk to plot a heatmap of the tanimoto similarity matrix.
5) Explain :
a) The purpose of heatmap and hierarchical clustering of the data.
b) The purpose of using different fingerprints under study.

Example code is given below how to install packages and use it however you can use any kind of visualization on heatmaps . Make sure in the heat-map row and col names contains CIDs.
install.packages("devtools")
library(devtools)
install_github("abhik1368/rpubchem")
library(rcdk)
library(rpubchem)
library(gplots)
 
## Here 504703 is the pubchem bioassay
assay = get.assay(504703, quiet=TRUE)
assaydata = assay[,c("PUBCHEM.CID","PUBCHEM.ACTIVITY.OUTCOME")]
Active = subset(assaydata, PUBCHEM.ACTIVITY.OUTCOME == 'Active')
Activecmp = get.cid(Active$PUBCHEM.CID)
 
ActiveSmiles = lapply(Activecmp[,3], parse.smiles)
cmp.fp = vector("list",length(ActiveSmiles))
for (i in 1:length(ActiveSmiles)){
    cmp.fp[[i]] = lapply(ActiveSmiles[[i]], get.fingerprint, type="extended")
}
 
fp.sim = fp.sim.matrix(unlist(cmp.fp))
fp.dist = 1 - fp.sim
hc = hclust(as.dist(fp.dist), method="single")
heatmap.2(1-fp.dist, Rowv=as.dendrogram(hc), Colv=as.dendrogram(hc), col=colorpanel(40, "darkblue", "yellow", "white"), density.info="none", trace="none")
 

Note : you can use the video given below to follow some of the steps.