Sunday 28 February 2010

How to store stereochemistry in Mol files

It's almost a year now since Tim Vandermeersch started to remove all of the stereochemistry code from OpenBabel and write it all from scratch. The first fruits of this work made it out into OB 2.2.3 where stereochemistry in SMILES uses the new code and works correctly.

I've helped out with reading and writing code for various formats. Right now, I'm adding stereo (i.e. double bond stereochemistry, and chirality) to the MDL Mol format. There are three places where stereochemical information can be stored in these files: the coordinates, the atom parity (in the atom block), the bond stereo (in the bond block).

My current understanding is that where 3D coordinates are present, there's no need to store stereochemical information in either the atom parity or the bond block. I think I'll probably set the atom parity anyway (since I've already written the code, and it helps when you look at the file to be able to easily identify the chiral centers).

For 2D coordinates, there's no need to store the bond stereochemistry (as this can be worked out from the coordinates), but chirality needs to be stored explicitly. The normal way to store this is not using atom parity (but I'll set this anyway for the same reasons as above), but by setting one of the bonds on the tetrahedral center to up or down.

For 0D coordinates, there are no guidelines. I propose to store cis/trans stereo using the bond stereo (you know, UP [or DOWN] at both ends of a double bond means cis), and chirality using the atom parity. The MDL spec states that atom parity should be ignored when read, but the alternative is to just forget the stereochemistry, or else to store both cis/trans stereo *and* chirality in the bond block, which may just about be possible but is likely to be a real mess.

Any thoughts?

Image credit: pt

4 comments:

greg landrum said...

Noel,

I have some comments and suggestions on this topic, but doing the discussion via comments in blogger is pretty annoying due to the interface. Can you post this to a sensible mailing list (or point me to an existing post) and I'll comment there?

Noel O'Boyle said...

I've reposted to the Blue Obelisk mailing list.

Rich Apodaca said...

Or you could re-phrase the post as a series of questions and use BlueObeliskExchange. Choose a good title and you'll have the canonical Web location for discussing this issue.

If you're asking whether you have the information right - I believe you do. AFAIK, bond stereochemistry can only be set through 2D or 3D coordinates (although I have seen files using the wedge/hash bond notation to denote this incorrectly). Bond stereo flag and coordinates are used for 2D tetrahedral stereochemistry.

I would avoid a 0D molfile at all costs because there's no way to set any stereochemistry. The spec quite clearly says ignore atom parity when reading a ctfile, so ignore it. Nor can bond stereo be unambiguously encoded.

Noel O'Boyle said...

@Rich: I didn't get any feedback from my question at BOxChange. Maybe a change of title would have done the trick.

In general, I don't seem to be getting a lot of sympathy for my 0D proposal. I think I'll dodo it (rather than do it). There's enough permutations and combinations to be keeping up with.