Thursday 29 July 2010

Roger's next move, and speeding up Python code by indenting

I met Roger Sayle the other day and he was showing me some of the work he's been doing recently since leaving OpenEye. Apart from OpenEye, you may remember Roger from such programs as Rasmol and also GCC (he's the middle level maintainer; this is apparently the part that fixes your code to be more efficient - see this quiz for examples). He's also the author of the only paper in JCIM to contain Klingon, the Lexichem paper (incidentally this is also, as far as I know [1], the only paper in JCIM that is freely available through Author Choice).

Early this year, Roger set up NextMove Software which is still in early days but is one to keep an eye on. It currently has software to aid in checking and correcting the spelling of chemical nomenclature (e.g. in a user interface, word document, text mining) and has software for searching databases in the works.

What I want to highlight here is a bizarre way he found to speed up a Python script (via Andrew Dalke) by turning it into a function. If you go to his page on PPAccelerator, which is all about a potential product that speeds up Pipeline Pilot scripts (by up to 37 times if interpreted, or 1083 times if compiled), you will see timing comparisons of code for the Mandelbrot fractal written in different languages.

You will see two Python v2.5 examples, the second of which is about twice as fast. Look at the code for each. The only difference in the second case is that the entire script has been wrapped in a function (main() in this case), which is called at the end of the script. This is just plain weird! I don't think I've ever heard of this effect before. Apparently, Python optimizes the bytecode of a function but not module level code. If this is written anywhere in the docs, I certainly can't find it.

On a final note, a quick Google search brings up promise, which may be useful for exploiting such optimizations further.

Notes:

[1] The ACS don't provide any means to list Author Choice articles on their website. Supposedly Author Choice articles should receive increased exposure but the ACS seem to want to hide them.

Thursday 1 July 2010

Silicos to donate code to Open Babel

Earlier today, Dr Hans De Winter, CSO Silicos, sent an email to the Open Babel development list announcing their intention to Open Source their Spectrophore, pharmacophore alignment, and de novo design code:
...it is our pleasure to announce that Silicos NV, a Belgian-based company providing services in the field of computational drug discovery and virtual screening, has made a strategic decision to port its own developed software under the open source domain of Open Babel following the GNU GPL.

...The reason why we have made this strategic decision to port all our software to the open source domain is that we, as management of Silicos, strongly believe in an open innovation model, and open source is just one of these factors that make open innovation possible. For Silicos as a company, we believe that by actively participating and supporting the OB community, we could create more business in the form of services than we could otherwise...

I'm quite excited about this, both about the actual code being contributed and about what this decision signifies for Open Babel. I've always believed in the idea that contributing novel algorithms to a common resource is the best way of maximising usage; you still get full credit for your work, you still get your paper, but you are going to have basically bajillions of users. The fact that a commercial company also groks the advantages is significant.

Porting the code to Open Babel is currently under way, and a formal announcement will be made with the release of Open Babel 2.3.0 in a couple of months.