Friday 14 September 2007

RDKit: Not just yet another cheminformatics toolkit

I'm sitting here with a bandage covering my head, as RDKit has just blown my mind.

RDKit is an cheminformatics toolkit written in C++ and Python. It was developed in-house in a company called Rational Discovery (hence the RD) since 2001. In Feb 2006, it appeared on SourceForge under a liberal license (BSD, except for the GPL Qt code), an appearance which presumably coincided with the demise of Rational Discovery (the company, not the concept, that is :-) ). And there it stayed, actively developed by two developers, but unknown among the open source chemistry community until...

A month ago, I happened to be glancing through the SourceForge software map of chemistry software and I was intrigued by the description of RDKit as "A collection of cheminformatics and machine-learning software developed at Rational Discovery". The website was pretty minimal and there didn't even appear to be any documentation. I dashed off an email to Greg Landrum, the main developer (who it turns out is also the developer of YAeHMOP (Yet Another extended Huckel Molecular Orbital Package) ), and asked him what the story was. Two days ago, he returned from holidays and pointed me to the correct website and the documentation, and I couldn't believe what I was seeing...

Some features that I think are cool:
(1) Molecules based on the Boost Graph Library
(2) All the Python stuff works for me on Windows!
(3) 2D depiction!!!
(4) 2D depiction that mimics 3D conformations!!!
(5) 2D --> 3D conversion in a similar method to Rajarshi's smi23D! (doesn't use stochastic promixity embedding though)

Here's a summary of some of the rest: SMILES, substructure searching, sophisticated fingerprints, machine learning stuff, a GUI, clustering, MACCS keys, descriptors (84 or so), chemical reaction transformations, implementation of Recap (not sure what this is, but there's a ref in the docs), basic pharmacophore stuff, and two types of SSSR (there's some text about this in the docs).

Cool! There's obviously a lot of overlap between OpenBabel and RDKit, and hopefully we can use this to both projects' advantage in terms of testing against each other and developing interfaces. In the meanwhile, here's to diversity, and to discovering that someone else has implemented 2D depiction in C++ so I Don't Have To.

For more info, see www.rdkit.org, and in particular, the Python interface documentation which gives a good overview.

Image credit: Toolkit by Neil T (CC BY-SA 2.0)

3 comments:

Rich Apodaca said...

Another great find. Amazing how things that cool can fly under the radar like that.

Noel O'Boyle said...

The explanation must be the advanced stealth technology contained in the Invisible subdirectory.

Unknown said...

Unbelievable! They are in stealth mode since 2006-02 ;-)

By the number of recent findings I hope that we have a good air traffic control.