Monday, 21 May 2012

When Mol files go wrong II

With time I've become more convinced that the SMILES format is more capable of faithfully storing stereochemistry than a 2D format such as Mol. Here is another tale of woe, related to tetrahedral stereocentres with one implicit bond, illustrated and annotated by Symyx Draw (am I the only one who thinks this is better than ChemDraw?).

Did you know that the stereochemistry of the wedge is interpreted differently in the two following cases?
Easy peasy, eh? But what about the in-between case where the angle between the two plane bonds is close to 180 (see below on left)? Guess what - you're in trouble if you do this. Some software will regard this as undefined, some will continue on regardless. If you look at the InChI string you can see that it regards the stereo as undefined, whereas the SMILES string does contain a stereocentre. In short, you've got a problem; if you're in charge of a database, you should identify such cases and fix them (manually), for example as shown on the bottom right (if that is the correct stereo) or by adding the implicit hydrogen.
In the course of other work, I've come across some instances of this problem in ChEMBL and will be talking to the team about sorting it out. Does anyone have other examples of potential stereo problems in Mol files and how to identify them?


kott said...

And where is the first part of "Whem mol files go wrong"?

Noel O'Boyle said...

Try the following in Google: "when mol files go wrong"