Sunday 17 December 2017

Faster toolkit, faster! Part II

A while ago I described some work I was doing to improve the overall speed of the Open Babel toolkit. Here I want to focus on a particular use-case and where I've got to.

This use case is SMILES to SMILES conversion. Now, while this particular transformation might not sound very interesting, it does encompass both SMILES reading and SMILES writing in one handy package, and both of these operations are often important when dealing with databases or datasets of chemical structures. It also exercises several areas of the toolkit such as kekulization, handling of aromaticity, and stereo perception (or not, as we'll see). Canonicalization is also relevant here, but I didn't do any work on that (and it needs some).

To begin with, some timings. To convert 100K ChEMBL molecules from smi to smi took 10m7s with OB 2.4.1. With the current development version it takes 31s. One change to the defaults is that the dev version does not reperceive the stereo. If you turn on stereo perception (-aS), it takes 1m13s. You can speed things up if you also avoid reperceiving the aromaticity (-aa) and read it as provided in the input; then the conversion only takes 19s.

[21 days later] So I was originally going to describe the results from the Visual Studio profiler for this conversion. But then I said, hey, I might as well fix that one, and that one there, and, well, you know how it goes - this part is actually quite fun, when you make a small change and see the speed go up. Anyway, the conversion that used to take 19s, now takes 11.0s. If you're interested, the speedups included things like replacing std::endl by "\n", caching option values, avoiding string copies, avoiding use of stringstream, avoiding SSSR calculation, and using reserve() on vectors. It was often surprising what things appeared high on the list in the profiler. I can see a few more things that could be improved, but I'm going to leave it there for the moment.

So, in summary, this particular conversion has gone from slow to fast, with a speedup of 55x. There's always more that could be done, but it's respectable.

Open Babel in a snap II

This is a quick update on a previous post, where I mentioned that I had made Open Babel available as a snap. That snap is the released version, 2.4.1. I have now put in place a semi-automated procedure that builds a snap of the development version. You can only have one snap version installed at a time, but you can switch between them.

To install the stable version use:
sudo snap install openbabel

To install the development version use:
sudo snap install openbabel --channel=edge

You can switch between them with:
sudo snap refresh openbabel --channel=stable # or edge

To see which you have installed, use "snap list", or run "openbabel.obabel" and look at the version number and date.

Notes: I'm using Launchpad to do this. Rather than base it directly off the openbabel master (which would require me to check-in snapcraft specific files), I've set it up so that it runs off a branch (named "snaps") in my own repo. Every so often, I merge master into this and a new snap will be created. To fully automate it, I will need to have a cronjob to do that merge automatically.