Sunday 21 December 2008

Have your hamburger and eat it - Edit molecules in PDFs

Wouldn't it be nice to be able to copy a molecule from a PDF into a molecular drawing package? Well, here are some instructions for doing this on Windows with BKChem, and using OSRA to do the conversion. Click on the image to the right for a screenshot of this in action.

(1) Install Python 2.6 (or just use 2.4 or 2.5 if you have one of these already)
(2) Install the Python Imaging Library 1.1.6 for your version of Python
(3) Download and extract
(4) Drop and convert_clipboard_image.xml into the BKChem plugins folder (Note: if the webserver is down, you can get these files here and here)
(5) Download and extract
(6) Set the environment variable OSRA to the full path to osra.exe
(7) Find the Snapshot tool (it has a picture of a camera) in your version of Adobe Reader. In version 9 it's under Tools/Select and Zoom/Snapshot Tool, and you can add it to the toolbar under Views/Toolbars/More Tools.
(8) Open a PDF of a paper containing a molecular structure (e.g. Figure 4 in this paper of mine), and use the Snapshot tool to draw a box around a molecule and hit CTRL+C to copy (if not done automatically).
(9) Start BKChem by double-clicking on (in the bkchem subfolder)
(10) Click "Plugins", "Paste and Convert Image"

(0) Open Source software allows you to implement crazy ideas as fast as you can think of them.
(1) This won't work with the latest exe release of BKChem as py2exe didn't include the ImageGrab module (part of PIL).
(2) For this to work with ChemDraw, I need to know how to place ChemDraw XML (which I can create with OpenBabel) on the clipboard so that ChemDraw will be able to paste it. (To be clear, I know how to place it on the clipboard in general, it's just how to place it in such a way that ChemDraw will recognise it as a chemical structure. Hmmm...just found this...)
(3) Bond angles are perturbed slightly (e.g. vertical bonds can become skewed). Maybe this can be fixed on the OSRA side.
(4) The hamburger reference and some background can be found on PMR's blog.
(5) Thanks to Leonard (see comment below) for the information on the Snapshot tool in Adobe Reader.

Friday 19 December 2008

The structure factor

Gerard Kleywegt does not take kindly to dodgy PDB files. I've been aware of his work in a general sense for a while, but after reading Practical Fragment's review of "Limitations and lessons in the use of X-ray structural information in drug design" by Andrew Davies, Stephen St-Gallay and Gerard Kleywegt I decided to read some of his papers. If you are looking for a broad overview (with excellent examples) of the issues to be considered if using PDB files, this paper is a good place to start.

The PDB is a wonderful resource, but it should be remembered that crystallography provides the electron density (at a specific resolution) not the structure, and everything from that point on is modelling. In other words, mistakes can be made due either due to ambiguities or incorrect assumptions. In most (?) PDB files, the electron density can be reconstructed to some extent from structure factors, and compared to the model to find regions where there is a lack of density, and hence a lack of evidence. In particular, the ligand can get very little love from the crystallographer (see references 43-48 in the paper).

I see that Kleywegt, after 15 years at Uppsala, has recently joined the Macromolecular Structure Database Group at the EBI as head of the PDBe, the European end of the PDB. Does this mean that dodgy PDB files will be a thing of the past?

Image credit: Ethan Hein

Monday 15 December 2008

Mono, C# and OpenBabel

Matt Sprague and I have put in a bit of work making OpenBabel accessible on the .NET platform on Windows (see Matt's recent post, for example). .NET is like Microsoft's JVM. There are several .NET languages (e.g. C#, Visual Basic, IronPython), just like there are several languages for the JVM (e.g. Java, Jython, JRuby). If you write a program in any of the languages and compile it, the resulting classes can be used from any of the other languages.

Having finished the work for Windows, I started to look at Linux and MacOSX. There's an Open Source version of .NET called Mono, which is available cross-platform. After a bit of hacking around, I'm pleased to announce that the next release of OpenBabel will provide support for Mono on Linux and MacOSX. I've included an example C# and IronPython script that uses OpenBabel (see here).

One thing I was wondering though: anyone have any idea how widely Mono is used?

Thursday 4 December 2008

Cinfony paper published in Chemistry Central Journal

My Cinfony paper has just come out:
Cinfony - combining Open Source cheminformatics toolkits behind a common interface NM O'Boyle, GR Hutchison. Chem. Cent. J. 2008, 2, 24.

The paper describes the Why? and How? of Cinfony, shows some examples of use, and discusses performance. To download Cinfony, or for documentation on using Cinfony, please see the Cinfony web site.

Update (05/12/08): Table 3 in the paper caused a stir over at Chem-Bla-ics (Who Says Java is Not Fast and Cheminformatics Benchmark Project #1) and Depth-First (Choose Java for Speed).

Here's the abstract:

Open Source cheminformatics toolkits such as OpenBabel, the CDK and the RDKit share the same core functionality but support different sets of file formats and forcefields, and calculate different fingerprints and descriptors. Despite their complementary features, using these toolkits in the same program is difficult as they are implemented in different languages (C++ versus Java), have different underlying chemical models and have different application programming interfaces (APIs).


We describe Cinfony, a Python module that presents a common interface to all three of these toolkits, allowing the user to easily combine methods and results from any of the toolkits. In general, the run time of the Cinfony modules is almost as fast as accessing the underlying toolkits directly from C++ or Java, but Cinfony makes it much easier to carry out common tasks in cheminformatics such as reading file formats and calculating descriptors.


By providing a simplified interface and improving interoperability, Cinfony makes it easy to combine complementary features of OpenBabel, the CDK and the RDKit.

Wednesday 3 December 2008

CrossRef + Ubiquity = Look up references without the pain

Imagine one of the following scenarios: you're reading a paper on your computer and want to look up a reference, or someone emails you the citation to an interesting paper, or you see some paper referenced on a website. If you want to read the paper that corresponds to the citation, you need to go to the website of the journal publisher, and enter all of the citation details or else click your way through years and journal issues to get to the paper you want. This is just the sort of tedious task that your computer should be doing for you instead.

And now, thanks to Geoffrey Bilder of CrossRef, it can. The details are described on the CrossTech blog. What it boils down to is that you can bring up the Ubiquity command window (CTRL+SPACE on Windows once Ubiquity is installed on Firefox), type "crossref" and use CTRL+V to paste a citation (or else select a citation on the webpage). After a couple of seconds, you'll see several hits for the paper in the CrossRef database, and clicking on Resolve will bring you directly to the online version of the paper. If you're feeling lucky you can just hit Enter after entering the citation and go directly to the first hit.

If you haven't already installed Ubiquity (see here for more info), I think it is worth it just to be able to use this particular CrossRef script.