Saturday, 20 September 2008

Overview of cheminformatics toolkits

Yesterday, Andrew Dalke gave me a sneak preview of his EuroQSAR poster entitled "Python for Computational Chemistry". I see that it's now available on the web at his blog and I recommend you check it out.

It has an excellent diagram showing the history of various cheminformatics toolkits and how they relate to each other. I'm particularly pleased with the diagram as it includes some recent work of mine (Pybel and now Cinfony).

Andrew works on implementing cheminformatics systems in pharmaceutical companies and is a Python advocate. In his poster, he answers multiple "How do I do _____ in Python?" questions. If you want to support the use of Python in cheminformatics as well as let other students/coworkers see what toolkits are available, it's a really good poster to print out and stick up somewhere.

Oh yeah, in other news this month, Noel O'Blog is now being broadcasted from a secret location in University College Cork although I'll be back and forth to the CCDC on a regular basis.

5 comments:

Kris said...

What about ChemAxon? What about python for SMARTS/SMIRKS reaction processing?

Andrew Dalke said...

Yeah, Marvin came up here during the EuroQSAR poster session. I don't know enough about the history. I've never used it. I've not heard of people using it other than for structure plugin, and I've not read any blogs or other web pages which talk about it from a user perspective.

There's also Python support in Schrödinger's code, including SMILES/SMARTS, but the API is not available publicly. I got a look at it but not enough to draw a conclusion that I want to write here. The reps here are not programmers.

Regarding SMIRKS, OEChem and Daylight both handle SMIRKS. I'm not aware of other toolkits that do. But I haven't seen code which actually use SMIRKS, other than some example programs. Most of what I've done has been molecular analysis, not molecular editing.

Egon Willighagen said...

CDK goes back to to 1998, though I even think 1997, but because things were not online on SourceForge then, and my computer of that time is broken, it's a bit hard to tell.

The CDK code is not just based on JChemPaint, but on Jmol too, as well as a project called compchem.sf.net. This latter project seems to have been removed from SF by Christoph :(

I should have back ups of my old computers, and will do some chemoinformatics archeology soon.

Andrew, I saw your poster at QSAR2008 on Monday... I missed you at the Blue Obelisk/... dinner that evening...

Andrew Dalke said...

I used the end of 2001 as the start point for JDK because of two reasons. The page at http://www.steinbeck-molecular.de/cdk/index.php/Main_Page says

The CDK originated in the lab of Christoph Steinbeck as a successor of his older CompChem libraries, which urgently needed a rewrite. In autumn 2000, Christoph Steinbeck, Egon Willighagen and Dan Gezelter met at Notre Dame University, South Bend, USA, to discuss the architecture of the package. On the flight back to Europe, a first draft of the data classes were written.

I also downloaded r100 of CDK from version control. The file trunk/cdk/doc/htdocs/index.html is the CDK home page but it talks about JChemPaint and the changelog.html file there says "We are moving towards a first release of the CDK." That file has a creation/modification date of 2001-02-06.

There is an older history, and for lack of space I didn't show it in my graphic, just like I didn't show earlier influences of RDKit. I also don't know the history of CDK that well, nor who was involved with Jmol, nor how people moved between the projects.

I do show that CDK derives from JChemPaint, and tried to indicate that with the thinner line.

I would like to more about the history. If you're here at EuroQSAR Egon, drop by my poster during the next session if you can. I'm presenting another poster about 3 m away from the one with this graphic.

I didn't know there was a Blue Obelisk meeting. I don't follow the mailing list.

Egon Willighagen said...

Andrew, I was not at QSAR2008, but in Uppsala, and met Ola (Bioclipse) on Monday, which was when I saw your poster... only just reading your reply :(

I need to dig up CDK history myself too... Dan (Jmol), Christoph (JChemPaint) and I (CML-support for both) met in September 2000 at Dan's place in South Bend (Notre Dame U), where we discussed a common library backend for Jmol and JChemPaint... this is where the CDK was 'founded'...

Christoph and I will dig up old email correspondance for JChemPaint etc... and I'll make sure that we convert that into a CDK News paper...

Jmol: http://jmol.sourceforge.net/history/

Also moved quite late to SF :(

BTW, I forgot to thank you for the interesting poster!