Thursday, 5 November 2009

Introducing Webel - A cheminformatics toolkit built solely on webservices

I'd like to introduce a new Cinfony module, Webel. Like the other components of Cinfony, Webel implements a standard API (see for example, the Pybel API) that covers a large proportion of common cheminformatics operations including reading/writing SMILES strings and InChIs, calculation of molecular weight and formula, molecular fingerprints, SMARTS searching, and descriptor calculation.

However, unlike the other components, Webel runs entirely off web services. All cheminformatics analysis is carried out using Rajarshi's REST services (which use the CDK and are hosted at Uppsala) and the NIH's Chemical Identifier Resolver (by Markus Sitzmann, and which uses Cactvs for much of its backend).

To use Webel, all you need to do is download, and type "import webel" at a Python prompt (see example code below - it's basically the same as using Pybel if you're familiar with that).

So what are the advantages of running off webservices? First, as should be clear, there is the ease of installation. This means that Webel could easily be bundled in with some other software to provide some useful functionality. Second, Webel can still be used in environments where installation of a cheminformatics toolkit is simply not possible (more on this next week!). Third, webservices may provide additional functionality not available elsewhere (e.g. the Chemical Resolver provides name-to-structure conversion as well as InChIKey resolution). Fourth, webservices are accessed across HTTP rather than through some type of language binding. As a result, Webel works equally well from CPython, Jython or IronPython. And finally, it's just a cool idea. :-)

If you can think of any other advantages or potential applications, I'd be interested to hear them. In the meanwhile, here's some code that calculates the molecular weight of aspirin, its LogP, its InChI, gives alternate names for aspirin, and creates the PNG above:

import webel

mol = webel.readstring("name", "aspirin")
print "The molecular weight is %.1f" % mol.molwt
print "The InChI is %s" % mol.write("inchi")
print "LogP values are: %s" % mol.calcdesc(["ALOGPDescriptor"])
print "Aspirin is also known as: %s" % mol.write("names")
mol.draw(filename="aspirin.png", show=False)
...which gives...
The molecular weight is 180.2
The InChI is InChI=1/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12)
/f/h11H AuxInfo=1/1/N:5,3,4,1,2,12,6,7,11,9,8,10,13/E:(11,12)/F:5,3,4,1,2,12,6,7
LogP values are: {'ALOGPDescriptor_ALogp2': 0.10304100000000004, 'ALOGPDescripto
r_AMR': 18.935400000000001}
Aspirin is also known as: ['2-Acetoxybenzoic acid', '50-78-2', '2-Acetoxybenzene
carboxylic acid', 'Acetylsalicylate', 'Acetylsalicylic acid', 'Aspirin', ...
'Claradin', 'Clariprin', 'Colfarit', 'Decaten', 'Dolean pH 8', ...
'Acetylsalicylsaure [German]', 'Acide acetylsalicylique [French]', ...
'A6810_SIGMA', 'Spectrum5_000740', 'CHEBI:15365',...]


nyc dad said...

Does this handle stereochemistry?

Noel O'Boyle said...

The underlying data model is a SMILES string. This is capable of storing cis/trans and tetrahedral stereochemistry. So the question then is, do the webservices honor stereochemistry? In the case of the CDK webservices, there is no problem as stereochemistry doesn't affect the results (e.g. the molecular weight). In the case of the NCI services, stereochemistry appears to be preserved (you can try a chiral SMILES-->InChI-->SMILES roundtrip to test this).

Unknown said...

Nice work! After reading about Cinfony, I was hesitating to use it, but this new module (Webel) seems to be what I needed. In particular,
I want to align some molecules and I came across Obfit that seems to require SMARTS pattern provided by the user. What I need is a method that would not require this input and Kabsch alignment implemented in CDK and described in chem-bla-ics Blog seems to be what I need. I'll see if I can do this with Webel. Keep up the good work!

Noel O'Boyle said...

Thanks for the encouragement Sargis! I think you will need access to the core CDK API to do what you want, so the Cinfony cdk module is the way to go. I intend to write a similar blog post on using this.

Markus Sitzmann said...

Hi Noel, as I wrote to you by email I really like the idea of Webel - nice work!

@nyc_dad: As Noel wrote: at the very backend the Chemical Structure Resolver uses Cactvs which deals with stereochemistry very carefully.
However, if you encounter any problems please report them to us

Noel O'Boyle said...

Just realised that your name isn't mentioned, Markus. I've corrected that omission.