Wednesday 6 February 2013

A compilation of speeds - Compiler face-off

Compilers cake!Let's get right into this one. I've compiled Open Babel with g++ in various ways, and am going to compare the speed with the MSVC++ release. Specifically I'm going to compare the wallclock time to convert 10000 molecules (the first 10000 in ChEMBL 13) from an SDF file to SMILES.

Our starting point is the time for the MSVC++ compiled release:
29.6s (MSVC++ 2010 Express 32-bit)

I have a Linux Mint 12 VM (VMWare) on the same machine, so let's run the same executable under Wine on Linux:
37.3s (MSVC++ 32-bit under Wine/Linux)
...so it's slower, pretty much as expected. The not-an-emulation layer slows things down a bit.

How about the MinGW compilation described in the previous post?:
24.1s (MinGW g++ 4.6.2 32-bit)
g++ beats MSVC++. To be honest, I was a bit surprised to see this, although I understand from Roger that g++ is surprisingly highly-optimised for cheminformatics toolkits. Maybe we should look into an official MinGW release in future.

What about Open Babel compiled with Cygwin's g++?:
39.5s (Cygwin g++ 4.5.3 32-bit)
As expected it runs like a pig compared to the MinGW version. Cygwin's handy, but when you're in a hurry it's maybe not the best choice.

So far, so not very unexpected. Now we will enter the realm of weirdness. Let's compile it on Linux in the VM and run it there:
14.8s (Linux Mint 12 g++ 4.6.1 64-bit)

So, in short, the fastest way to run Open Babel on Windows is to use a VM to run Linux. Huh? The like-with-like comparison of MinGW's 24.1 versus Linux's 14.8 is the most intriguing. It suggests that the slowdown is either due to rubbish file I/O by Windows, or sub-optimal platform-specific code in Open Babel's I/O handling code.

Either way, it's a pretty interesting result.

Notes:
1. Hardware was a Dell Latitude E6400 bought 3 years ago (Core 2 Duo 2.4 Ghz, 4GB Ram) running Win 7 64-bit. The timing was the best of three after timings had stabilised (the first one or two is usually a second or two slower).
2. After the initial post, I compiled clang on Linux, and then used it to compile Open Babel. Running the conversion took 15.3s.
3. Also, I ran the MinGW compiled version under Linux, and it took 30.7s.

Image credit: Venkatesh Srinivas (Extrudedaluminiu on Flickr)

10 comments:

greg landrum said...

The mingw numbers are interesting. I haven't thought about mingw in quite a while; I might need to go back and try again now.

The MSVC++ Express vs g++ values match what my experience has been with the RDKit. It's been a while since I tried it on windows, but in this table, gulo_linux and gulo_win/gulo_win2 are the same machine:
http://code.google.com/p/rdkit/wiki/Benchmarking

Jeremy said...

Interesting results.
But I was wondering why you used a 32bit version of the compiler on your 64bit machine?
Also on the linux VM you do use the 64 bit version of the compiler for the latest test. (not sure it will affect much, but in case)

Is it because it doesn't work, or there is no compiler available for windows?

Noel O'Boyle said...

@greg: Thanks for the link. The timings on the same machine are interesting. I've been meaning to do some sort of comparison for quite some time but never got around.

@Jeremy: The Cygwin compiler comes as 32-bit. There is a MinGW-64 on SourceForge, but I thought I'd better get the original MinGW working first (I know that it should work, but haven't succeeded in the past). For MSVC, I just used the official release which is 32-bit. I can compile 64-bit myself but I have a different MSVC compiler than used in the release (and I thought I'd done enough).

For sure, if it was a paper, it should either be 64-bit or 32-bit across the board.

David Lonie said...

I'm curious what the result of running in native Linux would be, just for comparison.

Was there a lot of terminal output during the compilation? I've noticed that the windows command prompt is very slow at rendering text at time -- I've compiled some code that had a lot of warnings that took ~5 minutes to "compile" when it was printing the warnings to the terminal. Doing a clean build of the same project and piping the output to a file (or using msys) took less than 10 seconds...Granted, this was in a windows VM, but still somewhat shocking.

Noel O'Boyle said...

Well, I can't test the native Linux really. I don't have a Linux install and using a live distro would use a RAM disk and have different performance characteristics.

I'm aware of the scroll time problem. It's true also on Linux (I think?) but not so pronounced. I don't think this would affect OB compilation that much; there's one or two lines per source file in total, so maybe 100 or 150 lines over 10-15 minutes. I guess if I used /Wall I'd see more...

John Mayfield said...

If you're on an Intel CPU might be worth trying with one of their specific compilers, also looks like they do a free 30 trial: http://software.intel.com/en-us/intel-parallel-studio-xe/

Noel O'Boyle said...

Hmmm...meh...would have to register and all that. Since I wouldn't have access to it going forward, the figure isn't of much use to me in any case.

Michael Banck said...

Adding clang/llvm to the comparison would be interesting.

Noel O'Boyle said...

Linux Mint's clang appears to be broken (http://stackoverflow.com/questions/11333969/systems-clang-installation-wont-link). I'll consider compiling it myself...

Noel O'Boyle said...

clang comes in at 15.3 on Linux.