Wednesday 18 March 2009

The Clockwisdom of SMILES

I was recently confronted with a question that many of us face at some point in our lives: how many ways can the groups attached to a chiral C be moved around in a SMILES string while retaining the clockwisdom?

What's all this about clockwisdom? Well, a chiral SMILES string can indicate R or S around a tetrahedral centre using C@ or C@@. The difference is that R or S refer to clockwisdom of groups arranged by CIP priority (with the lowest priority facing away), whereas @ and @@ refer to clockwisdom of groups arranged in order of their appearance in the SMILES string (with the first appearing facing towards) [1]. Whether this was a good design decision by the Daylight gurus, I'm not 100% sure, but that's how it is.

So in short, if you change the order of groups in the SMILES string, you may need to change the clockwisdom to ensure that stereochemistry is preserved. Specifically, if you swap two groups you will get the other enantiomer ("putting the SMILES on the other face"?) unless you flip the clockwisdom; that is, Cl[C@@](Br)(C)I is the same enantiomer as Cl[C@](Br)(I)C. Another swap and we get back a SMILES string with the original clockwisdom.

So I started off by trying to think of a clever program to identify how many swaps were required to convert between two orderings of groups. Next I tried to write a few loops that would simply perform all possible swaps of groups to generate all of rearrangments, but that missed a few. In the end, I just wrote the dumbest program I could think of and got the following results. For an original ordering of groups 0123, the following orderings have the same clockwisdom: 1032, 3021, 2013, 3210, 1320, 3102, 0123, 0231, 0312, 2301, 1203, 2130.

And the point of all this? OpenBabel was not generating the correct stereochemistry around tetrahedral carbons in canonical SMILES. Now fixed.

Update (19/03/09): Tim Vandermeersch pointed out to me a neat way of determining the parity of a particular ordering of groups. Simply count the number of pairs in the ordering where one number is larger than another number to its right. For example, for 1032, there are two pairs (10, 32); for 3021, there are 3 pairs (32, 31, 21). Orderings with even numbers of pairs have one parity while orderings with odd number of pairs have the opposite parity.

[1] The OpenSMILES specification on stereochemistry

Image credit: Swamibu

4 comments:

Anonymous said...

Indeed...I have been faced with this question so many times in my life that I decided to simply ignore its solution...and to my surprise, it turned out not to make the slightest difference...

Noel O'Boyle said...

Joking aside, remember that different stereoisomers of drugs have different biological properties (for example, the unfortunate case of thalidomide). Given that many drug companies store chemical data as SMILES strings, it could make quite a difference if the chirality got mixed up.

Unknown said...

The decision to use the order of ligands is exactly correct. (This is what CML uses in AtomParity). The chirality is the sign of the chiral determinant. There are 24 ways of arranging atoms round a tetrahedron - 12 of one chirality and 12 of the other. Swapping any pair flips the chirality.

There are two ways of labelling the atoms - one is explicitly as SMILES and CML do. The advantage of this is it's very simple and completely deterministic. The disadvantage is that you have convey a fair amount of information for the labels. The advantage of CIP is that you have little information to convery. The disadvantge is that the rules can become unmanageable as Prelog admitted.

Noel O'Boyle said...

@petermr: That's a reasonable justification.