Saturday, 20 September 2008

Overview of cheminformatics toolkits

Yesterday, Andrew Dalke gave me a sneak preview of his EuroQSAR poster entitled "Python for Computational Chemistry". I see that it's now available on the web at his blog and I recommend you check it out.

It has an excellent diagram showing the history of various cheminformatics toolkits and how they relate to each other. I'm particularly pleased with the diagram as it includes some recent work of mine (Pybel and now Cinfony).

Andrew works on implementing cheminformatics systems in pharmaceutical companies and is a Python advocate. In his poster, he answers multiple "How do I do _____ in Python?" questions. If you want to support the use of Python in cheminformatics as well as let other students/coworkers see what toolkits are available, it's a really good poster to print out and stick up somewhere.

Oh yeah, in other news this month, Noel O'Blog is now being broadcasted from a secret location in University College Cork although I'll be back and forth to the CCDC on a regular basis.

Friday, 12 September 2008

Ubiquity - it's everywhere!

So I disappear for a month and when I come back everyone has dropped Greasemonkey like a hot potato (mmmmm...potatoes) and adopted Ubiquity (watch the video on that page to get an idea of what it's about). So here are a couple of Ubiquity scripts of mine which translate from DOI to ACS or RSC formatted reference.

If you install the Ubiquity extension for Firefox (use the link above), select a DOI on any page, hit the magic key (CTRL + SPACE for windows), and type one of "doi2acs", "doi2ACS", "doi2rsc" or "doi2RSC", you can get the ACS or RSC formatted reference with or without the title. This could be useful when putting together the references for a paper or so.

Comments welcome as these are my first scripts...

Tuesday, 12 August 2008

ANN: cinfony 0.8 - "the one with documentation"

I've just released a new version of cinfony. This one comes with full documentation of the API as well as a description of how the API is used. It should be ready for use on both Windows and Linux.

The website is at http://cinfony.googlecode.com and here's the front page:

Introduction

Cinfony presents a common API to several cheminformatics toolkits. It uses the Python programming language, and builds on top of OpenBabel, RDKit and the CDK.

Documentation

PubChemSR makes it to top 10

In the 3 months since publication in Chemistry Central Journal, Junguk Hur and David Wild's paper on PubChemSR has shot up the Chemistry Central rankings and has now entered the top 10 most highly accessed papers of all time (that is, since Jan 2007). I should know, as I've watched it overtake my Pybel paper and squeeze me out of that same top 10, but not for long I hope. :-)

Here's the abstract:
Background
Recent years have seen an explosion in the amount of publicly available chemical and related biological information. A significant step has been the emergence of PubChem, which contains property information for millions of chemical structures, and acts as a repository of compounds and bioassay screening data for the NIH Roadmap. There is a strong need for tools designed for scientists that permit easy download and use of these data. We present one such tool, PubChemSR.

Implementation
PubChemSR (Search and Retrieve) is a freely available desktop application written for Windows using Microsoft .NET that is designed to assist scientists in search, retrieval and organization of chemical and biological data from the PubChem database. It employs SOAP web services made available by NCBI for extraction of information from PubChem.

Results and Discussion
The program supports a wide range of searching techniques, including queries based on assay or compound keywords and chemical substructures. Results can be examined individually or downloaded and exported in batch for use in other programs such as Microsoft Excel. We believe that PubChemSR makes it straightforward for researchers to utilize the chemical, biological and screening data available in PubChem. We present several examples of how it can be used.

Thursday, 31 July 2008

Calculate circular fingerprints with Pybel II

OpenBabel 2.2.0 came out a little while ago with a lot of new features. One of the new features is a breadth-first iterator over a molecule that returns the current depth as well as the current atom (OBMolAtomBFSIter). Nice. Because now it's a doddle to write a Python script to create circular fingerprints.

The script below creates a circular fingerprint for the example molecule in Bender et al., JCICS, 2004, 44, 170 as follows:
D:\>python25 fp.py
0-C.ar;1-C.ar;1-C.ar;1-N.pl3;2-C.2;2-C.ar;2-C.ar 0
-N.pl3;1-C.ar;2-C.ar;2-C.ar 0-C.ar;1-C.ar;1-C.ar;2
-C.ar;2-C.ar;2-N.pl3 0-C.ar;1-C.ar;1-C.ar;2-C.ar;2
-C.ar 0-C.ar;1-C.ar;1-C.ar;2-C.ar;2-C.ar 0-C.ar;1-
C.ar;1-C.ar;2-C.2;2-C.ar;2-C.ar 0-C.ar;1-C.2;1-C.a
r;1-C.ar;2-C.ar;2-C.ar;2-N.pl3;2-O.co2;2-O.co2 0-C
.2;1-C.ar;1-O.co2;1-O.co2;2-C.ar;2-C.ar 0-O.co2;1-
C.2;2-C.ar;2-O.co2 0-O.co2;1-C.2;2-C.ar;2-O.co2

Here's the script, which is considerably shorter than the original one.:
import pybel

# Set up a translator to Sybyl atom types
ttab = pybel.ob.OBTypeTable()
ttab.SetFromType("INT")
ttab.SetToType("SYB")

if __name__ == "__main__":
D = 3 # default max depth
mol = pybel.readstring("smi", "c1(N)ccccc1c(=O)[O-]")

fp = []
for i in range(len(mol.atoms)):
ans = []
for atom, depth in pybel.ob.OBMolAtomBFSIter(mol.OBMol, i + 1):
if depth > D: break
atomtype = ttab.Translate(atom.GetType())
ans.append("%d-%s" % (depth - 1, atomtype))
fp.append(";".join(sorted(ans)))
print "\t".join(fp)

Wednesday, 23 July 2008

Chemistry in R

For many cheminformaticians, R is the preferred way of analysing multivariate data and developing predictive models. However, it is not so widely known that there are R packages available that are directly aimed at handling chemical data.

Over the last few years, Rajarshi Guha (Indiana University) has been doing some nice work integrating the CDK and R. His publication in J. Stat. Soft., Chemical Informatics Functionality in R, describes the rcdk and rpubchem packages. The rcdk package allows the user to read SDF files directly into R, calculate fingerprints and descriptors, calculate Tanimoto values, view molecules in 2D (JChemPaint) and 3D (Jmol), calculate SMILES strings, and access the property fields of file formats. The rpubchem package is focused on downloading compounds, property values and assay data from PubChem. See also articles in R news and CDK news [1], [2], [3].

A more recent development is ChemmineR, described in the latest issue of Bioinformatics: "ChemmineR: a compound mining framework for R". The authors appear to be unaware of the earlier work by Rajarshi, and so there is no comparison of available features. However, based on the documentation on their website, it seems that much of the functionality revolves around a type of fingerprint called atom-pair descriptors (APD). SDF files, when read in, are converted to a database of APDs and these can be used for similarity searching, clustering, removal of duplicates and so on. Sets of molecules can be visualised using a web connection to the ChemMine portal (I'm not sure what software is used). According to the documentation, future work will include descriptor calculation with JOELib.

So, there you have it. An exhaustive survey of the two available methods for bringing chemistry into R. Is the time ripe for a cheminformatics equivalent to Bioconductor?

Tuesday, 15 July 2008

Review of "IronPython in Action"

The three most well-known implementations of Python are the original C implementation (CPython), one in Java (Jython), and one in .NET (IronPython). Jython makes it easy to write Python programs that use Java classes and to test a Java library at an interactive prompt. Similarly, IronPython allows Python programs to be written that use the .NET libraries (these are the same libraries used by C# and Visual Basic).

IronPython (sometimes abbreviated as FePy) is relatively new and had its first release in 2006. It is Open Source and developed by Microsoft (by the same developer who started Jython). Although .NET is strictly for Windows, IronPython can also run on top of Mono, the open source implementation of .NET which is available cross-platform.

As a cheminformatician should you be interested in IronPython or .NET? Well, Dimitris Agrafiotis thinks so. And Eli Lilly recently open sourced a life science platform implemented in .NET (here's the SF website, but where's the mailing list?). I'm not yet convinced, but I'm keeping an open mind. I'm helping Matt Sprague to prepare C# bindings for OpenBabel. I think this will make OpenBabel the first cheminformatics library accessible from C# (although probably someone will contradict me below).

At this point, if you're still interested, you should check out the first chapter of "IronPython in Action", which you can read on the web. This explains in more detail the background to IronPython and why you might want to use it. IronPython in Action will be published later this year by Manning Press, and is written by Michael Foord and Christian Muirhead. Foord (aka Fuzzyman) is a very active blogger on Python and FePy in particular, and works at a company whose main product is implemented in FePy.

Just to make it clear, this is not a book for someone who wants to pick up programming from scratch. There's little hand holding here. The obligatory introduction to Python is quickly and efficiently dealt with before moving on to the main subject - accessing .NET classes and creating a GUI application. For those coming from C# to IronPython, there is a whole chapter on unit testing with Python, as well as another that covers everything from Python magic methods to metaprogramming. A particularly nice feature of the authors' style is to logically link each section to the following, so that you always understand the point of what you've just read and where it fits into the bigger picture.

From my point of view, it's nice to see that explanations are provided for those using the free version of Visual Studio, Visual Studio Express, as this means that I can test out all of the examples myself. Other chapters yet to be finalised cover using IronPython in a webbrowser (apparently IronPython can run in Silverlight, the new browser plugin from Microsoft), and extending IronPython with C# or VisualBasic.

In short, for any serious users of IronPython, this book is a must have. It may also convince those using C# to make the switch to a better and more productive life with Python...