Now for some pretty pictures as well as some not so pretty. Yes, it's the turn of the structure diagram generators (SDGs) to strut their stuff and throw some shapes. How do they perform for 100 random compounds from PubChem?
Here are my results for depiction and structure diagram generation (Note: I will move these links to my regular hosting in the near future). The images were generated using cinfony, the code is here and the dataset is here.
(0) Rich Apodaca has written an overview of Open Source SDGs.
(1) 2D coordinate generation is independent of depiction. A SDG typically has both parts but coordinates could be generated with one toolkit and depicted with another.
(2) Looking good is not the same as chemical accuracy. But looking good is important too! :-)
(3) OASA (Obviously Another Stupid Acronym) is a Python SDG which is part of BKchem by Beda Kosata. To depict images, it uses Pycairo. OASA currently does not support stereochemistry (e.g. across double bonds) and so the generated coordinates will not be chemically-accurate in some circumstances. This of course does not affect its ability to depict coordinates generated by other toolkits. OASA can depict wedges and bonds but I haven't yet implemented this in cinfony.
(4) It is not possible to use the CDK's depiction mechanism natively from CPython using JPype (it's fine from Jython though). That is why I use OASA to depict the CDK's generated coordinates. Technically, this is because JPype doesn't allow subclassing of Java classes (JPanel in this case). It does however, allow Python classes to implement a Java interface. It would be great if the CDK could add a convenience interface to allow this (I can supply more details off-blog).
(5) The PubChem images and coordinates are generated by the Cactvs toolkit.
(6) There are two sets of RDKit images. Since the current release has quite poor depictions, I decided to also include the results from a development branch, which uses Aggdraw.
(7) It is important to consider how to handle hydrogens. With OASA, I just drew all the hydrogens. This is probably not a good idea.
(8) It is likely that I have not used the best parameters, etc. for generating some of these images. I welcome any suggestions to improve image quality.
(9) OASA wasn't able to layout CID 1373132, and so that molecule was not included (I guess I should have just caught the exception and continued). The error message was:
Traceback (most recent call last): File "cdkjpype.py", line 464, in draw oasa.cairo_out.cairo_out().mol_to_cairo(mol, filename) File "bkchem\oasa\oasa\cairo_out.py", line 85, i n mol_to_cairo self._draw_edge( e) File "bkchem\oasa\oasa\cairo_out.py", line 121, in _draw_edge side += reduce( operator.add, [geometry.on_which_ side_is_point( start+end, ( self.transformer.transform_xy( a.x,a.y))) for a in ring if a!=v1 and a!=v2]) File "bkchem\oasa\oasa\geometry.py", line 92, in on_which_side_is_point b = atan2( y2-y1, x2-x1) ValueError: math domain error
(10) PubChem entries with more than 1 connected component were not included in this test. (As a result, the number of molecules shown is actually less than 100.)
Image: Creation by Ariana Rose Taylor-Stanley (CC BY 2.0)