The advantage of this for the user would be (a) to reduce the learning curve - if you know how to use Pybel, you can access any of several different cheminformatics libraries with the same syntax, (b) the same scripts could be used to carry out a particular analysis using different cheminformatics libraries - different libraries may have different fingerprints, descriptors or implementations of particular algorithms (this is of course also useful for cross-checking the results of different programs) and (c) help reduce the divide between different cheminformatics toolkits (interoperability!!).
The rationale behind Pybel (described in the paper) lends itself to this use. Pybel doesn't attempt to wrap all the functionality of OpenBabel, but only the most common tasks in cheminformatics. For advanced options, or additional functionality, you can go behind the scenes and access OpenBabel directly. As a result, I propose that the Pybel API represents a generic API (one of many possible, of course) for accessing any cheminformatics library.
To test this, I have created CDKabel, a proof of concept which shows that the Chemistry Development Kit (CDK) can be accessed using Pybel syntax through Jython. CDKabel does not yet pass all of the Pybel tests, but there's enough to show that the approach has some merit. Compare the following: here's some Python code using Pybel and OpenBabel:
C:\Documents and Settings\oboyle>python25Now here's some Jython code with CDKabel and CDK:
Python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32
bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more inf
ormation.
>>> from pybel import *
>>> for mol in readfile("sdf", "head.sdf"):
... print "Molecule has molwt of %.2f and %d atoms" %
(mol.molwt, len(mol.atoms))
...
Molecule has molwt of 122.12 and 15 atoms
Molecule has molwt of 332.49 and 28 atoms
>>>
D:\Tools\CDK>set CLASSPATH=cdk-1.0.2.jarWell, at least they agree on the number of atoms :-) (It's my fault - CDK has like, ten different ways of calculating the molecular mass, and I just chose randomly :-) )
D:\Tools\CDK>..\jython2.2.1\jython
Jython 2.2.1 on java1.6.0_05
Type "copyright", "credits" or "license" for more informa
tion.
>>> from cdkabel import *
>>> for mol in readfile("sdf", "head.sdf"):
... print "Molecule has molwt of %.2f and %d atoms" %
(mol.molwt, len(mol.atoms))
...
Molecule has molwt of 122.04 and 15 atoms
Molecule has molwt of 331.96 and 28 atoms
>>>
I've only spent a few minutes throwing CDKabel together, so it doesn't do much beyond the example shown. However, if interested, you can download it and try it for yourself.
I'd appreciate comments on the idea that there is a core Python API that could be usefully applied to several cheminformatics libraries. Would anyone use CDKabel if it were available?