Where do 3D structures come from?

2D chemical structures can be derived from knowledge of the atoms that are present in a compound, and how they are bonded together. This is common knowledge for all substances, and thus we do not need to consider where they come from. However, there is no a priori information available to us that would reveal what the 3D structure of a compound would be. Indeed, as we shall see, all compounds are flexible to some degree, so the 3D structure will change over time. And we must bear in mind that, as with 2D structures, we are dealing with a model, not with reality itself (which is, to the best of our knowledge so far, a grand scale fuzzy quantum event!).

There are three main sources of 3D structural information:

Dealing with conformational flexibility

Most compounds have rotatable bonds, which means that the whole molecule can flex into many different conformers in 3D (see, for example, how much a "rigid" cyclohexane chair can flex, and how much a more flexible molecule can move!). Thus there is not just one 3D structure, but for any one compounds there is an infinite number (or less than infinite if, say, we consider discreet rotation units) of possible conformers.

However, not all conformers are equal. In particular, molecules prefer to be in low energy states instead of high energy states. Therefore we may decide to store just one (low energy) conformer (and let algorithms flex the molecule as needed), or produce several conformers (say a sampling of different lowe energy orientations).

First, we have to decide how to determine whether a bond is rotatable. A good working definition is: any single bond which is not part of a ring, is not terminal (e.g. Methyl) and is not in a conjugated system (e.g. an Amide). However, this is not perfect: we do know conjugated system bonds can rotate to a degree (based on the degree of conjugation), and we can have flexing of rings (say between chair and boat conformations for cyclohexane).

When we are discussing the rotation of rotatable bonds, you will hear two terms used: the torsion angle (and the dihedral angle. These two terms are synonymous, and refer to the relative position, or angle, between the A-B bonds and the C-D bonds when considering four atoms connected in the order A-B-C-D, i.e.:

Representing 3D conformers on computer

In addition to the information stored in the 2D structure (the atoms and how they are connected by bonds), for 3D conformers we also need to be able to store the coordinates of atoms (relative to some origin). There is no well established linear notation for storing this information, although SLN does allow atoms to be labelled with coordinates. More usual is a connection-table type file format, often either an MDL MOL/SDF file or a Sybyl MOL2 file. Other formats can be used too, such as CML, PDB and for the coordinates simply an XYZ file.

Internally, we can create a coordinate table which is simply an extension of the atom lookup table to store X, Y and Z coordinates for each of the atoms relative to a defined origin. It is normal for this coordinate system to be based on Ångström (i.e. one unit is one Ångström). Here is an example:
Once we have a coordinate table, we can derive from it a Distance Matrix that specifies the distance (in Ångström) between any two atoms in the conformer. For example:
Note that this also specifies a fully-connected graph! Once we have distances, we can also use Distance Geometry techniques.

In addition to storing coordinate tables and distance matrices for 3D conformers, we can also use various ways of specifying degrees of flexibility of a compound in 3D. For example we can specify two coordinate tables, one which stores a minimum X, Y, and Z value for an atom, and one which stores a maximum value. Or we can similarly specify minimum and maximum distance matrices.

Generating and manipulating 3D structures with a computer

There are a variety of programs that will "convert" 2D structures (say in SMILES format) to 3D structures. Often these will produce "valid" 3D structures, but not necessarily an energy minimized one (unless they are combined with an energy minimization tool as described below). These programs may output a single structure, or an ensemble of 3D structures. Most of these methods are fragment & rule based, that is they split the 2D structure into small fragments that are then matched to pre-defined dictionary of 3D fragments. By a series of rules and theory these are then combined together into a full 3D structure. Examples of this kind of approach are Concord , Corina and Omega . Other methods use Distance Geometry methods to rapidly sample the "conformational space" of a molecule to look for valid conformations based on distance bounds. The most prominent current example of this approach is smi23d .

Most of these methods also perform energy minimization , which can also be applied to 3D structures from any source (e.g. Xray or NMR). An energy minimization algorithm will take a conformer as input, and will attempt to rotate and flex the molecule such that the potential energy is minimized. To do this, we can apply any one of many optimization algorithms. Some of these will only find local minima (such as hill climbing), whilst others will attempt to find global minima (such as genetic algorithms , monte carlo , exhaustive search and simulated annealing)

3D Pharmacophores

A pharmacophore is a set of molecular features that is required for binding to a particular protein target. It is almost always used to refer to structural features (or derivatives such as hydrogen bonding potential), and is usually used in reference to 3D structures. A pharmacophore may be defined as set of features and distance bounds of these features from each other in 3D, and can be generated from either a target, or from a set of ligands. For example, "An OH group between 2 and 5 Ångström away from a carboxyl oxygen, both of which are 7-8 Ångström from a benzene ring":

A pharmacophore can be used as a query to a database too. Note that a pharmacophore search is like a substructure search in that it is a subgraph query on a fully-connected distance matrix graph. A pharmacophore can be represented in a variety of ways: for instance, a distance matrix of pharmacophore points (with a dictionary for point types which may contain coordinates of 3D substructures or SMARTS of 2D features). Note that we often need to be able to represent distance ranges (rather than exact distances) and we also may need to represent ambiguity in pharmacophore points.

3D descriptors and fingerprints

Just as with 2D, we can generate 3D structural or property-based descriptors. The equivalent of 2D structural keys are 3D pharmacophore "fragments". Sometimes these are called triplets or quadruplets based on the number of atoms in each of the fragments. Note that these fragments can contain distance ranges and ambiguous points just like a full pharmacophore. For a set of molecules, there are a huge number of triplets or quadruplets that can be generated, so these are usually hashed down onto a fixed number of bits.

A variety of other kind of descriptors can be created for 3D ranging from atom-based (e.g. partial charges generated from semi-empirical methods) to full molecule field-based (such as electrostatic, steric and hydrophobic fields). These can be used for a variety of applications (molecular alignment, docking, and similarity)

Databases of 3D structures

A good overview of how databases of 3D structures can be used in drug discovery is given on NetSci .

Pharmacophore searching is the equivalent of substructure searching in 2D: we supply a pharmacore query and then return all of the molecules which could satisfy the query (either by flexing the molecule, or by storing multiple conformers).

Similarity searching in 3D can be simply a matter of calculating at Tanimoto coefficient or Euclidean distance between two fingerprints (as in 2D). However there are several other ways of calculating 3D similarity that are not based on 3D similarity - for example by comparing distance matrices and mapping atoms in one molecule onto another, or by aligning the molecules to maximize the overlap of fields, and then measuring the amount of overlap between fields.

Available 3D databases

The most comprehensive database of 3D chemical structures generated by x-ray crystallography is the Cambridge Structural Database . This database contained 469,611 structures as of January 2009. The database comes with a variety of tools for viewing and analyzing the structures, including several free services . In particular, there is a free 500 compound subset of the database available for teaching purposes. There are a variety of databases of machine-generated 3D structures available, in particular Indiana University hosts Pub3D , a database of PubChem structures converted to 3D with smi23d .

Reading assignments