Wednesday, 22 August 2012

Transforming molecules into...well...other molecules

It's fairly straightforward to filter structures by SMARTS patterns using Open Babel, but how about if you want to transform all occurences of a particular substructure into something else? This may be useful for structure cleanup, tautomer normalisation, or even to carry out a reaction in silico.

Anyhoo, that's enough background. Let's suppose we want to hydrogenate all instances of C=C and C=O. Just copy and paste the following into the end of plugindefines.txt:
OpTransform
hydrogenate      # ID used for commandline option
*                # Asterisk means "no datafile specified"
Hydrogenate C=C and C=O double bonds
TRANSFORM [C:1]=[C:2] >> [C:1][C:2]
TRANSFORM [C:1]=[O:2] >> [C:1][O:2]
This gives a new obabel option, --hydrogenate, that will do the job:
C:\Tools\tmp>obabel -L ops
...
hydrogenate    Hydrogenate C=C and C=O double bonds
...
C:\Tools\tmp>obabel -:C=C -osmi --hydrogenate
CC
C:\Tools\tmp>obabel -:O=CSC=CC=S -osmi --hydrogenate
OCSCCC=S
Notes:
(1) This works best in the latest SVN as Chris sorted out some longstanding bugs.
(2) I've just enabled this in Python where it works as follows:
import pybel
transform = pybel.ob.OBChemTsfm()
success = transform.Init("[C:1]=[C:2]", "[C:1][C:2]")
assert success
mol = pybel.readstring("smi", "C=C")
transform.Apply(mol.OBMol)
assert mol.write("smi").rstrip() == "CC"

Wednesday, 15 August 2012

Using cheminformatics to guide your career path

Leaving aside the fact that the term scientific career is a bit of an oxymoron, I propose to show how considering the distribution of alpine flora can help guide your career choices. When Jaccard first headed for the alps to count edelweiss, little did he know he would make a stunning discovery that would change the face of cheminformatics forever - the Tanimoto coefficient. Here I show how to extend this coefficient to career path planning.

Simply put, you should choose your next institution based on whether it has a high Tanimoto coefficient with your current. Let's take my career as an example and show how this works:

1997-2001 University College Galway, UCG
2001-2004 Dublin City University, DCU (Tanimoto coefficient of 2/4 = 0.5)
2004-2005 University College Dublin, UCD (3/3 = 1.0)
2005-2006 Unilever Centre Cambridge, UCC (2/4 = 0.5)
2006-2009 Cambridge Crystallographic Data Centre, CCDC (2/5 = 0.4)
2009-         University College Cork, UCC (2/5 = 0.4)

As with any career there were some highs (Tanimoto of 1.0) and some lows (values of 0.4). Now the question is, what about my next move?

Notes: UCG is now NUIG (National University of Ireland, Galway).

Tuesday, 14 August 2012

Grant me that - My non-funding history

A successful PI here in UCC made the point to me (after a grant application of mine was bounced) that only about 1 in 6 applications of his are funded. This may have been a white lie, but I thought it'd be interesting to look back on all of my applications since I finished my PhD and see how it's gone for me (items in bold were funded):

 2005 - Marie Curie Intra-European Fellowship
 2006 - Royal Commission for the Exhibition of 1851
 2006 - EPSRC Project Grant (minor contribution)
 2007 - President of Ireland Young Researcher Award
 2008 - SFI Principal Investigator
 2008 - Wellcome Trust Research Career Development Fellowship
 2008 - Health Research Board Career Development Fellowship
 2009 - CSA Trust Jacques-Émile Dubois Grant
 2009 - SFI Starting Investigator Research Grant
 2009 - Wellcome Trust Research Career Development Fellowship
 2010 - CSA Trust Jacques-Émile Dubois Grant
 2011 - European Research Council Starting Grant
 2011 - IRCSET Enterprise Partnership Scheme
 2011 - SFI Starting Investigator Research Grant

Note that a couple of researchers out there are going a bit further than me, and actually posting up their grant applications. Check them out (via +Jan Jensen).

Image credit: The above image of Grant's Tomb (geddit?) is by ScottOldham.

It all adds up to a new descriptor Part II

As described in an earlier post, it's pretty easy to implement a new group contribution descriptor with Open Babel just by editing a text file. Well, that post inspired Andy Lang to use this feature to develop a melting point model which you can download right now and apply using Open Babel.

You can read the announcement of the model on the OB mailing list or get the model and read all the details including exactly how it was developed over at Jean-Claude Bradley's Open Notebook Science wiki. The image above by Andy Lang shows the performance of the model on a test set.

Wednesday, 8 August 2012

Presentations and Practicals from Resources for Computational Drug Discovery

As discussed previously, I recently presented on the topics of cheminformatics and protein-ligand docking. The talks were slightly updated for the occasion, and are included below.

I also supervised a practical on cheminformatics involving the Open Babel GUI. This will be available as part of the regular OB documentation, but in the meanwhile, you can access it at ReadTheDocs. It covers file format conversion (surprise!), depiction, filtering, and similarity and substructure searching.

Here's my cheminformatics talk:

...and my protein-ligand docking talk: