Noel O'Blog: Cheminformatics toolkit face-off

Wednesday, 21 May 2008

Cheminformatics toolkit face-off - SMARTS matching

SMARTS strings are really useful for substructure searching in molecules. They are sort of like regular expressions for molecules. The idea and syntax of SMARTS comes from the people at Daylight.

Rajarshi Guha, together with Dazhi Jiao (Uni. Indiana), have put together a test suite for SMARTS matchers, and run the test suite against the CDK, OpenBabel and OEChem. Greg contributed results for RDKit. The overall performance is described on Rajarshi's website.

Here's the current summary of the results for the 158 test cases, but check Rajarshi's page for a more up-to-date picture:

Toolkit	True	False
CDK	149	9
OpenBabel	149	9
RDKit	151	7
OpenEye	150	8

Image: Everybody stand back by Chris Radcliff (CC BY-SA 2.0)

2 comments:

Unknown said...: Noel, honestly, the tested patterns are quite straightforward and you should try some more complicated (real world) test cases, e.g. the ones collected as JOELib SMARTS test-cases; 27 June 2008 at 19:42
Noel O'Boyle said...: For sure, Joerg - the more test cases the better, and not just for SMARTS searching. I will try these out, and also forward to Rajarshi.; 2 July 2008 at 10:02