Assignment 1.1 - 2D representation

1. Generate 2 possible SMILES for Adamantane . (2%)

2. What issues might come up when trying to represent hydroxypyridines ? (2%)

3. Why is it desirable to always number the same molecule the same way? How can we do this? (2%)

4. What is the difference between a structural key and a hashed fingerprint? Why might you want to use a structural key to characterize a dataset containing a diverse set of drugs? (2%)

5. If two molecules have a Tanimoto coefficient of 0.89 what might that imply about their biological properties? (2%)


Assignment 1.2 Filtering compounds based on SMARTS

Mutagenicity is one of the most important adverse effects of compounds which prevent a compound to be a marketable drug. A seminal paper by Kazius etal. have identified many substructures ( parts of the molecule) which is responsible for mutagenicity. Such substructures are called toxicophores.The paper identified 29 toxicophores containing new substructures responsible for mutagenicity. In this assignment your job is to collect the SMARTS definitions and filter a dataset from pubchem bioassay using Toxicophores.You can use any tool for the assignment for example open babel,Chemistry development kit, rcdk or RDkit.

1. Download and install open babel in your computer . I don't have a windows version of open babel to create a video . But after installing the using open open babel from cmd line would be straight forward. One can use it using . Check out the video below how i use it.
babel -H
 

Alternatively you can select a subset matching a SMARTS pattern, so to select all molecules containing bromobenzene use:
PROMPT> babel   mymols.sdf  -osdf  'selected.sdf'    -s 'c1ccccc1Br'
You can select a subset that do not match a SMARTS pattern, so to select all molecules not containing bromobenzene use:
PROMPT> babel   mymols.sdf  -osdf  'selected.sdf'    -v 'c1ccccc1Br'
You can of course combine options, so to join molecules and add hydrogens type:
PROMPT> babel   mymols.sdf' -osdf ' myjoined.sdf' -h   -j
2. Collect the toxicophores SMARTS from
3. Download the active compounds from different pubchem Malaria Bioassay datasets select any one - AID: 449704 ,AID: 449703, AID: 504848, AID: 504850 . If you dont get any toxicophores try other confirmatory screen datasets.You can also use Drugbank dataset to find toxicophores.
4. Show the frequency of Each Toxicophore SMARTS.
5 . Create a page and write a brief report on cheminfoclub.