Friday 30 March 2012

Molecular eye candy for IPython

IPython is an enhanced Python shell that does magic and is wonderful. Apart from me, everyone else uses it. You can install it with "easy_install IPython" on Windows, Linux, etc. Recent versions have a graphical console (start with "ipython-qtconsole") to which we can add some molecular eye candy. Greg Landrum has already done this for RDKit. Here's how to do it for Pybel.

First of all, here's the before picture:
Now, copy and paste the following into the console:
pybel.Molecule._repr_png_ = lambda x: x.write("png")
And here's how it looks after:
Greg has gone a bit further with RDKit, and has enhanced substructure matches, but that's where I'll draw the line for now.

Thursday 29 March 2012

Reduce, recycle and reuse bond closure symbols?

Time for a new poll. I'd like to know what you think about reusing bond closure symbols in SMILES strings. Simply put, when writing SMILES there is a choice between reusing the bond closure symbols or using a new number each time. That is, the following molecule with two rings could be written as:
C1CC1c1ccccc1 or

Here are the pros and cons of choosing not to reuse bond closure symbols (anything missing?):

  • Easier to count up the number of rings
  • Easier for a human to find corresponding bond openings and closures
  • Easier to implement (but not a lot)
  • Reuse of bond symbols can cause confusion

  • Use of % notation for two-digit bond closures can cause confusion (people are not so familiar with it)
  • Leads to syntax like C%1145, meaning that bonds 11, 4 and 5 close here (not very intuitive)
  • Limits bond closures to 99 (but more than 99 is unlikely)
  • Not as concise as alternative (for molecules with at least 10 bond closures)

See if you can answer this poll before checking what does software A do, or standard B suggest. What do you think should be the standard approach?

Update (17/04/2012): Poll results: 8 for reuse, 3 against

Tuesday 27 March 2012

Cheer up your LaTeX with SMILES support

No-one likes an unhappy document preparation system, so let's add support for SMILES to LaTeX. LaTeX has already seen a lot of love from chemists - check out the list of chemistry packages at CTAN. Perhaps even more surprising is the fact that a couple of these packages are being actively maintained and updated (e.g. chemmacros and chemfig).

So, let's get down to business. This is a simple hack that will insert a 2D diagram of a molecule if you use the \smiles command. Here's the necessary code (at the top of the .tex file) and an example of use in the main body. The resulting PDF is here.

So how does this work? Well, it simply calls obabel at the command-line to create a .PNG file, and then adds an \includegraphics command (the idea is taken from texments). The only catch is that command-line calling of executables is only allowed if you run pdflatex with the -shell-escape option ("pdflatex -shell-escape myfile.tex"). (For the record, I did this all on Windows, with pdflatex provided by Cygwin.)

While this is, um, neat, to be honest you'd probably be better off creating the PNGs in advance if you have a lot of SMILES strings. A much cooler way to embed chemistry in LaTeX would be to write a QSAR paper using Sweave/R/CDK via Rajarshi's CDK/R packages. This is the idea of literate programming, where the document functions as a complete description of the analysis because it is the analysis; it is the ultimate in reproducible research. Has this been done? Any takers?

Tuesday 20 March 2012

Share ami

Some time ago Google shut down the Share feature of Google Reader to push everyone onto Google Plus. Unfortunately, Google Plus is a bit of a walled garden (and also it sucks compared to FriendFeed, but that's not the issue here). Unlike Google Reader, it doesn't provide an RSS feed for shared items so there's no way I can list them in the sidebar of this blog (as I used to). Instead, this post contains a selection of items I shared on Google Plus since I joined in Nov last year.

I don't know if this is going to be a regular feature, so to follow me on Google Plus, check out my profile page.

The following items are listed somewhat chronologically, youngest first:

    Monday 12 March 2012

    Sunday 11 March 2012

    Accessing the CDK from Jython in Knime

    I've been trying to find out how to use the CDK and/or Open Babel from Jython in Knime. Unfortunately, there's no documentation beyond a single example script. It's a bit strange, as having this working would enable a whole host of developers to write nodes, particularly in the area of chemistry where Python has made considerable inroads.

    After asking and not receiving any answer, I've tried to hack something together. I haven't yet figured out how to do it properly, but at least I've learnt a bit, and the end result was that I was able to calculate the molecular weights with the CDK as in the following diagram:

    Here's the code for the node. It was debugged by keeping the console log file open in gvim, and reloading (:edit) after every run. Also I had to configure the node by selecting the "Script Output" tab, choosing "Append columns to input table spec", and adding an Output column of type Double:

    The way I did this was by hacking the source code for the Jython node itself. First I figured out how to compile it (YMMV). The source is available, but for some reason various files are missing that prevent it compiling out of the box (e.g. the plugin.xml). So I ran the Create Node Wizard (manipulator), copied over various files from the source, copied over various files from the binary (the files in lib and schema), added the lib/jython.jar (as described here), and installed the org.python.plugin node (not forgetting to remove the duplicate org.knime.ext.jython).

    To add the CDK jar to the Classpath, I placed it in the lib directory, repeated the procedure described above for lib/jython.jar, and then added the following hack to PythonScriptNodeModel.execute:
    If there's someone from Knime reading this, please let me know the correct way of doing this. I know that there's an extension point for extending the Classpath of the Jython node, but how is this used?

    Friday 9 March 2012

    Han's cells and Grätzel

    My PhD supervisor, Han Vos, marked his retirement from Dublin City University this week with a symposium at Smart Surfaces 2012, a conference on solar energy devices and sensors. During my PhD I did some experimental and computational work related to dye-sensitised solar cells, and it was great to finally get to see Michael Grätzel in person talking about new developments in the field (which he originated).

    Han was a great supervisor, and as a wet-behind-the-ears postgrad I think I learnt a lot from him. Something that I'd like to pass on to any postgrads reading this is to avoid at all costs co-authoring a review with a distinguished professor unless you have an impressive biography. Consider mine from back in 2005 (click for a bigger image) ...
    Image Credits:
    (1) W. R. Browne, N. M. O'Boyle, J. J. McGarvey and J. G. Vos. Elucidating excited state electronic structure and intercomponent interactions in multicomponent and supramolecular systems Chem. Soc. Rev. 2005, 34, 641-663. Reproduced by permission of The Royal Society of Chemistry.
    (2) "Piled Higher and Deeper" by Jorge Cham. Reproduced with permission.

    Create animated graphs in a GIFfy

    Although animated GIFs are "so 90s", they are quite handy for use in presentations as they work out of the box in Powerpoint (on Windows anyway), or alternatively, they can be displayed in the browser of your choice.

    Here's an animated GIF I created to show the progress of a Genetic Algorithm towards creating a molecule with a specific electronic structure (for more details on the background, see these posts [1] and [2]):
    To make this sort of animation, the first step is to make a set of graphs. I use Matplotlib from Python for plotting, so I have a loop looking something like the following:
    vals = range(0, 10) + range(19, 100 ,10)
    for N in vals:
         homo, trans = self.xy[N]
         lumo = [x+y for x,y in zip(homo,trans)]
         pylab.title("Generation %d of 100" % (N+1,))
         pylab.plot(trans, lumo, ".")
         pylab.savefig("pictures/animation%02d.png" % N)
    You can then use ImageMagick to do the PNG to animated GIF conversion. ImageMagick is a bit like Open Babel - it is a one-stop shop for file format conversion, plus it can do a whole set of transformations, and furthermore can be accessed as a programming library. It is available cross-platform (I used it on Windows). The main binary is called "convert", and the conversion is done as follows:
    convert.exe -delay 20 -loop 0 animation*.png GA_animation.gif
    You can also specify different timings for every frame as I did above. I wrote a little script to help with this:

    Monday 5 March 2012

    Grab, build and install the development version of Open Babel on Linux Mint 12

    This is one of those "Notes to self" posts. I just installed Linux Mint 12 into a VM. Here's what I had to do to build the development version of Open Babel, complete with GUI and Python bindings. As far as I know, these instructions should also work for Ubuntu 11.10 (Oneiric oh-whatever).
    export OB=$HOME/openbabel
    sudo apt-get update
    sudo apt-get install cmake subversion g++ swig2.0 python-dev libxml2-dev wx-common wx2.8-headers libwxbase2.8-dev libeigen3-dev libwxgtk2.8-dev
    mkdir $OB
    cd $OB
    svn checkout trunk
    mkdir build
    cd build
    make -j2 && make install
    export PATH=$OB/install/bin:$PATH
    export PYTHONPATH=$OB/install/lib:$PYTHONPATH
    Thereafter, just type "svn update" in $OB/trunk, and "make && make install" in $OB/build, to get the latest and greatest.

    Friday 2 March 2012

    It all adds up to a new descriptor

    Group contribution descriptors such as TPSA and LogP are pretty popular. I've just written up some docs describing how to add a new group contribution descriptor to Open Babel. It's fairly easy to do once you have a set of SMARTS strings and contributions.
    Group contribution descriptors are a common type of molecular descriptor whose value is a sum of contributions from substructures of the molecule. Such a descriptor can easily be added to Open Babel without the need to recompile the code. All you need is a set of SMARTS strings for each group, and their corresponding contributions to the descriptor value.

    The following example shows how to add a new descriptor, hellohalo, whose value increments by 1, 2, 3 or 4 for each F, Cl, Br, and I (respectively) in the molecule...
    To read the rest, check out the development docs.

    I'm currently planning* to add a new option to make debugging of these descriptors simpler, so now might be a good time to look into implementing a descriptor if you're interested.

    If you do implement a new descriptor, please consider the trees, the number of towels washed every day, and the effort you will save another PhD student, and contribute the result back to Open Babel. If you do so under a liberal license (e.g. CC0), the same SMARTS strings can also be used by other cheminformatics toolkits.

    * Now done. In the development version, add ";debug" to the start of the pattern file.