Wednesday 21 December 2011

Open Babel: 10 Years and Future Directions

On the Open Babel mailing list, Geoff looks back on 10 years of the project, and looks forward to the future:
As 2011 draws to a close, Open Babel is over 10 years old! At this point, it's used by over 40 open source projects, downloaded over 200,000 times, and been used in over 400 academic papers. And of course, there have been 15 releases and dozens of contributors.
...Read the rest

See you in 2012!

Monday 5 December 2011

Poll over and discuss Goslar

So maybe my poll question was not very difficult (given that 50% of you got it right), but I thought it was quite surprising nonetheless when I came across a guest editorial by Steve Heller* in Anal. Chim. Acta in 1982, entitled "Where have all the data gone?":
"Unless something real and practical is done in the near future, it will become impossible to find or use scientific data with the resulting loss of time and money for those who need to repeat experiments."
The future alluded to is the one in which we now live. A follow-up letter, "Computer readable analytical chemical data - comments on a critical need" in Trends in Anal. Chem. discusses this further.

I was put in mind of these articles at the recent German Conference on Chemoinformatics (GCC2011) in Goslar, when (a) I met Steve Heller, and (b) in Prof Johnny Gasteiger's talk, he highlighted this same problem as one of the outstanding challenges that we should be sorting out. PMR of course has been discussing this issue for some time, but it's the first time I'd heard Prof Gasteiger mention it.

Since I'm on the subject of the GCC, it was good to meet several people who I know through the Open Babel mailing lists, and in particular Michael Banck, who plays a major role in curating chemistry software for Debian. For example, see this list of packages. His talk is available on Lanyrd.

In a recent blogpost I mentioned that Open Access makes it easy to redistribute copies of papers, and I wondered why OA journals don't take advantage of this. Well, it turns out they are - Jan Kuras of Chemistry Central was giving out nice colour copies of the Open Babel paper printed in booklet form, along with similar booklets summarising the three series they have recently published on RDF, PubChem3D and PMR's Symposium.

And finally here's a picture of me trying to steal a pretzel from one of the FIZ-Chemie Berlin Award winners, Dr. Volker Dirk Hähnke, who gave a very interesting talk on using sequence alignment methods to align a string representation of a chemical graph:

* You may know of Steve from such string representations as the InChI. Incidentally, I thought I was blazing a trail putting my talks on the web, but check out Steve's page.

Thursday 1 December 2011

Cinfony 1.1 released

Cinfony presents a common API to several cheminformatics toolkits. It uses the Python programming language, and builds on top of Open Babel, the RDKit, the CDK, Indigo, OPSIN and cheminformatics webservices.

Cinfony 1.1 is now available for download.

The two major additions in this release are support for using the Indigo cheminformatics toolkit (the indy module) and support for OPSIN (IUPAC name to structure, the opsin module).

As usual, Cinfony has been updated to use the latest stable releases of each toolkit: Open Babel 2.3.1, CDK 1.4.5, RDKit 2011.09, Indigo 1.0 and OPSIN 1.1. Installation on Windows has also been simplified somewhat as Open Babel 2.3.1 now includes the necessary .jar file and .NET libraries (for use from Jython and IronPython).

The Cinfony website has a somewhat condensed (and only slightly contrived :-) example showing the use of all of these resources in just 12 lines of Python. Here's a small example showing that roundtripping of IUPAC names is now easy to play with:
>>> from cinfony import opsin, webel
>>> mol = opsin.readstring("iupac",
>>> print webel.Molecule(mol).write("iupac")

To support Cinfony, please cite:
N.M. O'Boyle, G.R. Hutchison, Chem. Cent. J., 2008, 2, 24. [link]