Showing posts with label cinfony. Show all posts
Showing posts with label cinfony. Show all posts

Friday, 9 April 2010

ANN: Cinfony 1.0 released

Cinfony presents a common API to several cheminformatics toolkits. It uses the Python programming language, and builds on top of OpenBabel, RDKit, the CDK, and cheminformatics webservices.

Cinfony 1.0 is now available for download.

The two major additions in this release are support for using OpenBabel from IronPython (the ironable module), and webel - a cheminformatics toolkit that uses webservices (read more about this here).

As usual, Cinfony has been updated to use the latest stable releases of each toolkit: OpenBabel 2.2.3, CDK 1.2.5 and RDKit Q4_2009.

Thanks to Fredrik Wallner, Tom Sheldon and Cedric Moretti for reporting bugs.

Feedback, both positive and negative, is very much appreciated - I want to make Cinfony a useful toolkit for the community. That means you.

Tuesday, 30 March 2010

Cinfony slides from the ACS

Cinfony is a cheminformatics toolkit that makes it easy to access OpenBabel, RDKit, CDK and internet webservices from a Python program. Here is a talk I gave last Thursday at the ACS National Meeting in San Francisco describing Cinfony 1.0.

Monday, 16 November 2009

Cheminformatics Tutorial using Python and Silverlight

Recently I introduced Webel, a Python cheminformatics module that runs entirely on web services. One of the advantages of such a module is that it can be used in places where it is difficult to install a traditional cheminformatics toolkit. Like in your browser.

It turns out that Silverlight ("Microsoft's Flash") provides a Python interpreter that runs in your browser. Using this, Michael Foord (of IronPython in Action) has developed an interactive Python interpreter which you can use at trypython.org. It consists of two windows, one with a Python tutorial and the other with a Python prompt so that you can work through the tutorial.

After some little work, I present Try Python...with Cheminformatics. This adds Webel as well as a short tutorial that introduces many of its features. With a few more tutorials that cover SMILES, InChI and so on in more detail, this could be useful for teaching purposes as well as bridging the gap to having students develop their own Python scripts that use the CDK, OpenBabel or the RDKit.

Here is the obligatory screenshot (click for a larger version):

Thursday, 5 November 2009

Introducing Webel - A cheminformatics toolkit built solely on webservices

I'd like to introduce a new Cinfony module, Webel. Like the other components of Cinfony, Webel implements a standard API (see for example, the Pybel API) that covers a large proportion of common cheminformatics operations including reading/writing SMILES strings and InChIs, calculation of molecular weight and formula, molecular fingerprints, SMARTS searching, and descriptor calculation.

However, unlike the other components, Webel runs entirely off web services. All cheminformatics analysis is carried out using Rajarshi's REST services (which use the CDK and are hosted at Uppsala) and the NIH's Chemical Identifier Resolver (by Markus Sitzmann, and which uses Cactvs for much of its backend).

To use Webel, all you need to do is download webel.py, and type "import webel" at a Python prompt (see example code below - it's basically the same as using Pybel if you're familiar with that).

So what are the advantages of running off webservices? First, as should be clear, there is the ease of installation. This means that Webel could easily be bundled in with some other software to provide some useful functionality. Second, Webel can still be used in environments where installation of a cheminformatics toolkit is simply not possible (more on this next week!). Third, webservices may provide additional functionality not available elsewhere (e.g. the Chemical Resolver provides name-to-structure conversion as well as InChIKey resolution). Fourth, webservices are accessed across HTTP rather than through some type of language binding. As a result, Webel works equally well from CPython, Jython or IronPython. And finally, it's just a cool idea. :-)

If you can think of any other advantages or potential applications, I'd be interested to hear them. In the meanwhile, here's some code that calculates the molecular weight of aspirin, its LogP, its InChI, gives alternate names for aspirin, and creates the PNG above:

import webel

mol = webel.readstring("name", "aspirin")
print "The molecular weight is %.1f" % mol.molwt
print "The InChI is %s" % mol.write("inchi")
print "LogP values are: %s" % mol.calcdesc(["ALOGPDescriptor"])
print "Aspirin is also known as: %s" % mol.write("names")
mol.draw(filename="aspirin.png", show=False)
...which gives...
C:\Tools\cinfony\trunk\cinfony>python example.py
The molecular weight is 180.2
The InChI is InChI=1/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12)
/f/h11H AuxInfo=1/1/N:5,3,4,1,2,12,6,7,11,9,8,10,13/E:(11,12)/F:5,3,4,1,2,12,6,7
,11,9,10,8,13/rA:21CCCCCCCOOOCCOHHHHHHHH/rB:;a1;a2a3;;a1;a2a6;;;;s6d8s10;s5d9;s7
s12;s10;s1;s2;s3;s4;s5;s5;s5;/rC:6.3301,-.56,0;4.5981,-1.56,0;6.3301,-1.56,0;5.4
641,-2.06,0;2,-.06,0;5.4641,-.06,0;4.5981,-.56,0;4.5981,1.44,0;2.866,-1.56,0;6.3
301,1.44,0;5.4641,.94,0;2.866,-.56,0;3.7321,-.06,0;6.3301,2.06,0;6.8671,-.25,0;4
.0611,-1.87,0;6.8671,-1.87,0;5.4641,-2.68,0;2.31,.4769,0;1.4631,.25,0;1.69,-.596
9,0;
LogP values are: {'ALOGPDescriptor_ALogp2': 0.10304100000000004, 'ALOGPDescripto
r_AMR': 18.935400000000001}
Aspirin is also known as: ['2-Acetoxybenzoic acid', '50-78-2', '2-Acetoxybenzene
carboxylic acid', 'Acetylsalicylate', 'Acetylsalicylic acid', 'Aspirin', ...
'Claradin', 'Clariprin', 'Colfarit', 'Decaten', 'Dolean pH 8', ...
'Acetylsalicylsaure [German]', 'Acide acetylsalicylique [French]', ...
'A6810_SIGMA', 'Spectrum5_000740', 'CHEBI:15365',...]

Tuesday, 20 May 2008

Cheminformatics toolkit face-off - Depiction Part 2

[See update (28/10/08).]

One of the aims of the previous toolkit face-off was to get some feedback on the best options for drawing images with different toolkits. I also hoped that a comparison of the images would allow bugs to be easily identified. And finally, I thought that a bit of competition might help improve depiction and structure diagram generators across the board.

Since the last post:
  • Molinspiration have approached me to be included in the face-off
  • I've learnt that I should remove hydrogens from CDK and OASA depictions and control the size of the generated image
  • Beda has done amazing work enhancing depiction in OASA, and has in fact now released OASA separately from BKChem as an independent cheminformatics toolkit
  • Greg is just about to release a new version of the RDKit which has improved depiction, and he has fixed the bug identified by PMR
  • And I've fixed some bugs myself in cinfony
Now the images themselves (same dataset as before): [depiction] and [structure diagram generation].

Notes:
  1. The face-off involves the open source toolkits the CDK, OASA and the RDKit, and the proprietary toolkits Cactvs and molinspiration.
  2. If you want to help improve these depictions or coordinate generators, why not leave a comment below suggesting specific ways to improve them, or highlighting specific things they could do better.
  3. Double bond stereochemistry doesn't seem to be preserved by the CDK, but this is possibly my fault (I'm awaiting a reply to an email to the cdk-users mailing list)
  4. Several people complained to me last time that I didn't give OASA a fair chance. In order to make it up, this time round I've made OASA the star attraction. The coordinates generated by the three open source toolkits are all depicted by OASA

Thursday, 8 May 2008

Cheminformatics toolkit face-off - Depiction

[See update (28/10/08).]

Now for some pretty pictures as well as some not so pretty. Yes, it's the turn of the structure diagram generators (SDGs) to strut their stuff and throw some shapes. How do they perform for 100 random compounds from PubChem?

Here are my results for depiction and structure diagram generation (Note: I will move these links to my regular hosting in the near future). The images were generated using cinfony, the code is here and the dataset is here.

Some comments:
(0) Rich Apodaca has written an overview of Open Source SDGs.
(1) 2D coordinate generation is independent of depiction. A SDG typically has both parts but coordinates could be generated with one toolkit and depicted with another.
(2) Looking good is not the same as chemical accuracy. But looking good is important too! :-)
(3) OASA (Obviously Another Stupid Acronym) is a Python SDG which is part of BKchem by Beda Kosata. To depict images, it uses Pycairo. OASA currently does not support stereochemistry (e.g. across double bonds) and so the generated coordinates will not be chemically-accurate in some circumstances. This of course does not affect its ability to depict coordinates generated by other toolkits. OASA can depict wedges and bonds but I haven't yet implemented this in cinfony.
(4) It is not possible to use the CDK's depiction mechanism natively from CPython using JPype (it's fine from Jython though). That is why I use OASA to depict the CDK's generated coordinates. Technically, this is because JPype doesn't allow subclassing of Java classes (JPanel in this case). It does however, allow Python classes to implement a Java interface. It would be great if the CDK could add a convenience interface to allow this (I can supply more details off-blog).
(5) The PubChem images and coordinates are generated by the Cactvs toolkit.
(6) There are two sets of RDKit images. Since the current release has quite poor depictions, I decided to also include the results from a development branch, which uses Aggdraw.
(7) It is important to consider how to handle hydrogens. With OASA, I just drew all the hydrogens. This is probably not a good idea.
(8) It is likely that I have not used the best parameters, etc. for generating some of these images. I welcome any suggestions to improve image quality.
(9) OASA wasn't able to layout CID 1373132, and so that molecule was not included (I guess I should have just caught the exception and continued). The error message was:
Traceback (most recent call last):
File "cdkjpype.py", line 464, in draw
oasa.cairo_out.cairo_out().mol_to_cairo(mol, filename)
File "bkchem\oasa\oasa\cairo_out.py", line 85, i
n mol_to_cairo
self._draw_edge( e)
File "bkchem\oasa\oasa\cairo_out.py", line 121,
in _draw_edge
side += reduce( operator.add, [geometry.on_which_
side_is_point( 
start+end, (
self.transformer.transform_xy( a.x,a.y))) for a
in ring if a!=v1 and a!=v2])
File "bkchem\oasa\oasa\geometry.py", line 92, in
on_which_side_is_point
b = atan2( y2-y1, x2-x1)
ValueError: math domain error

(10) PubChem entries with more than 1 connected component were not included in this test. (As a result, the number of molecules shown is actually less than 100.)

Image: Creation by Ariana Rose Taylor-Stanley (CC BY 2.0)