Wednesday, 23 May 2012

When Mol files go wrong III

Let's play spot the difference. Are the following structures the same? (Mol files from CHEMBL186139 and CHEMBL1180158.)
But if they're the same, then how come there are two distinct entries for this in the database? Well guess what - they don't have the same InChI:
InChI=1S/C30H36N4/c1-2-10-20-32-28-18-24-34(30-16-8-6-14-26(28)30)22-12-4-3-11-21-33-23-17-27(31-19-9-1)25-13-5-7-15-29(25)33/h5-8,13-18,23-24H,1-4,9-12,19-22H2/p+2/b31-27+,32-28?
InChI=1S/C30H36N4/c1-2-10-20-32-28-18-24-34(30-16-8-6-14-26(28)30)22-12-4-3-11-21-33-23-17-27(31-19-9-1)25-13-5-7-15-29(25)33/h5-8,13-18,23-24H,1-4,9-12,19-22H2/p+2/b31-27-,32-28+
The nitrogen attached to the ring is treated as a C=N once the two protons are added to neutralise the charge. The InChI code then considers the stereochemistry across that double bond to be defined in one case (177.2°) but undefined in the other (179.1°). Here are the pictures from winchi (click to enlarge):
I'm not quite sure where the problem is. Is the InChI correct to make the distinction? Any thoughts?

Monday, 21 May 2012

When Mol files go wrong II

With time I've become more convinced that the SMILES format is more capable of faithfully storing stereochemistry than a 2D format such as Mol. Here is another tale of woe, related to tetrahedral stereocentres with one implicit bond, illustrated and annotated by Symyx Draw (am I the only one who thinks this is better than ChemDraw?).

Did you know that the stereochemistry of the wedge is interpreted differently in the two following cases?
Easy peasy, eh? But what about the in-between case where the angle between the two plane bonds is close to 180 (see below on left)? Guess what - you're in trouble if you do this. Some software will regard this as undefined, some will continue on regardless. If you look at the InChI string you can see that it regards the stereo as undefined, whereas the SMILES string does contain a stereocentre. In short, you've got a problem; if you're in charge of a database, you should identify such cases and fix them (manually), for example as shown on the bottom right (if that is the correct stereo) or by adding the implicit hydrogen.
In the course of other work, I've come across some instances of this problem in ChEMBL and will be talking to the team about sorting it out. Does anyone have other examples of potential stereo problems in Mol files and how to identify them?

Wednesday, 2 May 2012

Speedup repeated calls to Python functions

If your Python script has repeated calls to a function with the same parameters each time, you can speed things up by caching the result. This is called memoization. It's not rocket science.

What is rocket science is that with a little bit of Python magic (see the code here), you can simply add memoization to any function with the @memoized decorator, e.g.
@memoized
def calcHOMO(smiles):
   # Generate Gaussian input file with Open Babel
   # and run Gaussian to find the HOMO.
   return homo
Calling this function with a SMILES string the first time would return the HOMO after 10 minutes. Calling it a second time would return the result instantly.

Update (11/05/2012): This feature is available in the Python standard library as of Python 3.2. See Andrew's comment below

Tuesday, 17 April 2012

Painting molecules your way - Introducing the paint format

If the range of depiction output formats (PNG, SVG and ASCII currently) provided by Open Babel is not enough for you, you can easily draw molecules yourself using the information provided by the new paint utility format.

The paint format simply describes lists of the actions required to generate a depiction of the molecule. For example, I used this information to prototype the ASCII output format in Python. Here it is in action:
>obabel -:c1ccccc1C(=O)Cl -opaint

NewCanvas 218.6 200.0
SetPenColor 0.0 0.0 0.0 1.0 (rgba)
SetPenColor 0.0 0.0 0.0 1.0 (rgba)
SetPenColor 0.0 0.0 0.0 1.0 (rgba)
SetPenColor 0.0 0.0 0.0 1.0 (rgba)
SetPenColor 0.0 0.0 0.0 1.0 (rgba)
SetPenColor 0.0 0.0 0.0 1.0 (rgba)
SetPenColor 0.0 0.0 0.0 1.0 (rgba)
DrawLine 109.3 100.0 to 143.9 120.0
SetPenColor 0.0 0.0 0.0 1.0 (rgba)
DrawLine 146.9 120.0 to 146.9 147.0
DrawLine 140.9 120.0 to 140.9 147.0
SetPenColor 0.0 0.0 0.0 1.0 (rgba)
DrawLine 143.9 120.0 to 167.3 106.5
SetPenColor 0.0 0.0 0.0 1.0 (rgba)
DrawLine 74.6 120.0 to 40.0 100.0
DrawLine 73.0 110.8 to 48.8 96.8
SetPenColor 0.0 0.0 0.0 1.0 (rgba)
DrawLine 40.0 100.0 to 40.0 60.0
SetPenColor 0.0 0.0 0.0 1.0 (rgba)
DrawLine 40.0 60.0 to 74.6 40.0
DrawLine 48.8 63.2 to 73.0 49.2
SetPenColor 0.0 0.0 0.0 1.0 (rgba)
DrawLine 74.6 40.0 to 109.3 60.0
SetPenColor 0.0 0.0 0.0 1.0 (rgba)
DrawLine 109.3 60.0 to 109.3 100.0
DrawLine 102.1 66.0 to 102.1 94.0
SetPenColor 0.0 0.0 0.0 1.0 (rgba)
DrawLine 109.3 100.0 to 74.6 120.0
SetPenColor 0.4 0.4 0.4 1.0 (rgba)
SetPenColor 0.4 0.4 0.4 1.0 (rgba)
SetPenColor 0.4 0.4 0.4 1.0 (rgba)
SetPenColor 0.4 0.4 0.4 1.0 (rgba)
SetPenColor 0.4 0.4 0.4 1.0 (rgba)
SetPenColor 0.4 0.4 0.4 1.0 (rgba)
SetPenColor 0.4 0.4 0.4 1.0 (rgba)
SetPenColor 1.0 0.1 0.1 1.0 (rgba)
SetFontSize 16
SetFontSize 16
SetFontSize 16
DrawText 143.9 160.0 "O"
SetPenColor 0.1 0.9 0.1 1.0 (rgba)
SetFontSize 16
SetFontSize 16
SetFontSize 16
SetFontSize 16
DrawText 178.6 100.0 "Cl"
To create an example depiction I naturally turned to bananas. Why bananas? Well, they're yellow, sweet if properly ripe, and of course are full of potassium. Using the information above and the magic of SVG, we can generate the following image (click to access zoomable SVG):

Other fruit are available, and if you would like me to put together a similar image for you, contact me and maybe we can work something out.

As an exercise for the reader, it would be cool to see the same thing done in 3D using say Blender or Povray.

Notes:
The banana SVG image is from OpenClipArt. It was modified, then I ran the Python script below, and finally trivially modified the output to give the final SVG. Here's that Python script:

Tuesday, 10 April 2012

Depict a chemical structure...without graphics Part III

Following on from earlier posts (here and here), I got to thinking again about molecular depiction using text. The solution I arrived at in Part II had a couple of drawbacks:
  1. It relied on an external library, aalib
  2. aalib doesn't seem to be available on Windows (at least not in a way I can use with MSVC)
  3. aalib is really aimed at bitmap depiction as ASCII, and I'm interested in vectors (lines)
So obviously there was nothing for it but to write some code myself for depicting lines as ASCII, which is essentially what I've done with ASCIIPainter, now part of Open Babel. You could use this to draw an arbitary image, but let's check out how it works for molecules:
It didn't work quite so well for text pasted directly into this blog as the aspect ratio was quite high (i.e. low resolution in the y direction), but once I hit upon adding style="line-height:100%" to the pre tag it was much improved:
obabel -:c1cc(C(=O)Cl)ccc1 -oascii -xw60 -xa1.6

              O



             | |
             | |
             | |
             | |
             | |
             |_|                         __
            _/  \__                   __/  \__
          _/       \__             __/        \_
        _/            \_         _/     __      \__
     __/                \__   __/    __/           \__
Cl                         \_/    __/                 \__
                            |   _/                      |
                            |                           |
                            |                        |  |
                            |                        |  |
                            |                        |  |
                            |                        |  |
                            |                        |  |
                            |                        |  |
                            |                        |  |
                            |   __                      |
                            |_    \__                   _
                              \__    \__             __/
                                 \__    \_         _/
                                    \_          __/
                                      \__    __/
                                         \__/
I've added an output option to help tune the aspect ratio (-xs). Also, multimolecule output is supported, and a fun pastime is to watch ASCII depictions of large libraries fly by at the command line.

Wednesday, 4 April 2012

Getting your double bonds in a twist - How to depict unspecified stereo

I've just been adding depiction support for double bonds with unspecified stereo, and thinking about how this should be done: a squiggly bond for a substituent on the double bond, or make the double bond itself twisted? Actually, I didn't have to think too much as Rich and others (also mcule) have already worked through these same issues. In short, the IUPAC recommendation (from 2006) is best avoided, and a twisted bond should be used instead.

So, here's the result after implementing the twisted bond:
obabel -:"Cl/C=C/Br" -:"Cl/C=C\Br" -:"ClC=CBr"
       -O tmp.svg -xr 1


...and with an asymmetric double bond for extra pizazz:
obabel -:"Cl/C=C/Br" -:"Cl/C=C\Br" -:"ClC=CBr"
       -O tmp.svg -xr 1 -xs


Credits: Twisted double bond by me. Everything else of depiction by Chris Morley and Tim Vandermeersch. Structure layout by Sergei Trepalin.

Monday, 2 April 2012

Cheer up your LaTeX with SMILES support II

In Part I, I showed how to embed PNGs, automatically generated from SMILES by obabel, into LaTeX documents. An alternative approach is to use the SVG output from obabel.

In a comment to my earlier post, Billy suggests running the SVG through Inkscape or rsvg-convert:
You can also embed the material as a vector graphic, of course. Inkscape doesn't seem to support pipes, and rsvg-convert gives ugly output, but I'm sure there's other options.
\immediate\write18{obabel -:'#1' -osvg -p | rsvg-convert -f pdf -o smilesimg\arabic{smilescounter}.pdf}
\immediate\write18{obabel -:'#1' -O smilesimg\arabic{smilescounter}.svg -p ; inkscape -f smilesimg\arabic{smilescounter}.svg -A smilesimg\arabic{smilescounter}.pdf}
Also, if you don't want to call these applications when the graphic files aren't out of date, then use the code snippet found at the top of the 3rd page from this article.
I found a third approach on the interwebs soon after writing the initial post. It's by Jakob Lykke Andersen who converts the SVG to PDF with ImageMagick's convert. If you download the file graphviz.tex from his website, you can just include it and use it as in the following example:
The resulting PDF looks better than the original (though it could be because I didn't handle the PNGs properly in the first PDF). A nice little touch in Jakob's version is that an error box appears in the PDF if there is a problem generating the image.

Exercise for the reader:
A bit more polish is needed before these methods can be used wholesale by others. If you know a bit about LaTeX, have a go at an obabel package for CTAN.