Sunday, 20 April 2014

QM Speed Test: Firefly and wrap up

You can find my latest results, this time for Firefly, over on my Github page. The summary is that it's slightly faster than NWChem for HF calculations, but much faster for B3LYP calculations. I also tried using FreeON but although it appeared to compile fine, it didn't run correctly on the CentOS system I have been using for testing.

Someone raised the point that these are not like-for-like comparisons in terms of all of the program parameters (integration grid size, convergence criteria, etc, etc.). Pedro Silva investigated the effect of grid-size on the speed and results, and tried to normalise across GAMESS, Q-Chem and Firefly. However, our intention here is to give an idea of comparative speeds using the defaults as some sort of baseline, but also because it reflects typical use.

Unfortunately, I ran out of time to continue this project back in late Jan or so. The first victim of the time shortages were the Turbomole results. The folks at COSMOlogic came forward with an evaluation copy, but I simply ran out of time before having a chance to figure it out and use it. The "tmole" input files are already there though in the repo (from Michael Banck).

The others involved have been much more active. To see all of the results gathered so far, check out the Github pages of:
Where to from here? Well, nowhere for me, at least for the immediate future. But these input files are out there now, and if you want to compare different packages, they make a good starting point. An interesting question is whether the relative speeds change once you start looking at larger systems. I don't know the answer to that...maybe you can try finding out.

Thursday, 27 February 2014

Screencasting on Windows

I always find screencasting quite difficult to set up on Windows. Here are some notes on a recent successful attempt that uses the well-known ffmpeg library/command-line tool to record from a Direct Show screen capture device.

Set up the screen capture device

Install Screen Capture Recorder and configure by selecting the shortcut "configure by resizing a transparent window". Start broadcasting the desktop by choosing the shortcut "stream desktop local LAN", and "Start Normal 10 fps".

Capture the screen

Install 32-bit ffmpeg (even on 64-bit Windows) and capture as follows (press "q" to stop):
ffmpeg -f dshow -i video=screen-capture-recorder -r 10
       -y recording.mp4

Convert to the target format

For example, convert to .wmv and retain just the first 60 seconds (see "-ss" for skipping the start and "-q:v 10" to control the size/quality):
ffmpeg -i recording.mp4 -t 60 -b:v 1000k
       -vcodec wmv2 myfile.wmv

Monday, 3 February 2014

When Python goes wrong: Off-by-one

Many niggling problems with Python were fixed in Python 3, but here's one that recently bit me in the expunged. It relates to rounding in format strings and so is a potential pitfall for anyone handling floats.

The thing is, "%f" rounds differently than "%d".
>>> "%.1f" % 1.56 ### I expect 1.6
'1.6'
>>> "%d" % 1.6    ### By analogy, I expect 2
'1'
The correct solution is either "%.0f" (yes - it exists) or explicitly round it yourself how you like, e.g. "%d" % int(x+0.5) or using round().

This was causing off-by-one results in my code which I fortunately identified prior to submitting a manuscript. But I'm disappointed, Python; and don't blame C.

Note: As a further issue, if you need to handle negative values also, doublecheck the rounding behaviour of your code...it may not always be what you expect.

Friday, 31 January 2014

Turning vim into an IDE: Part 1

Probably like many other users of vim (or gvim), I have never taken the time to set it up so that it works like a modern IDE with auto-completion, syntax checking, and whatever else. I've decided to start sorting this out.

My starting point is the following _gvimrc (in C:\Users\noel) which sets the behaviour of the tab key and indentation for several different file formats. I also highlight whitespace at the end of lines so that I don't add it by accident.

autocmd FileType python set expandtab
autocmd FileType python set sw=4
autocmd FileType python set ts=4
autocmd FileType python set autoindent

autocmd FileType cmake set expandtab
autocmd FileType cmake set sw=2
autocmd FileType cmake set ts=2
autocmd FileType cmake set autoindent

autocmd FileType html set expandtab
autocmd FileType html set sw=2
autocmd FileType html set ts=2
autocmd FileType html set autoindent

autocmd FileType javascript set expandtab
autocmd FileType javascript set sw=2
autocmd FileType javascript set ts=2
autocmd FileType javascript set autoindent

autocmd FileType css set expandtab
autocmd FileType css set sw=2
autocmd FileType css set ts=2
autocmd FileType css set autoindent

" Highlight whitespace at end of line
highlight ExtraWhitespace ctermbg=red guibg=red
autocmd Syntax * syn match ExtraWhitespace /\s\+$\| \+\ze\t/

" Don't create those annoying backup files ending with ~
set nobackup

Testing webapps from Python with Selenium

I'm a great believer in tests to ensure code quality, although naturally I don't always practise what I preach. Recently I've been working on a webapp with a Python server backend. I can test the server independently of the webapp, but testing the webapp itself involves clicking on various things, waiting for the server to respond, and then checking the result. Doing this manually quickly becomes a chore and that's without even considering different browsers.

The solution is Python of course (as usual on this blog). The aptly-named Selenium allows you to automate this sort of thing from the point of view of writing tests. It's available in several languages (I think they all use the same server backend) but I'm going to focus here on the Python. "pip install selenium" on Windows installs the basics; this will allow you to test against Firefox and Safari. For Chrome and IE, you need to download two additional files and add them to the PATH.

The Selenium docs are a bit confusing because (a) they keep referring back to an earlier version of the software (Selenium 1) which had different behaviour and (b) the commands for the Selenium IDE are very similar but not the same as the commands for the Python bindings.

Anyhoo, here's a modified version of the Selenium script that I'm using just to give some idea of how it works. When you run this at the command-line, a new Firefox window pops up, resizes to the specified dimensions, flies through the test, shuts down, and then I get the usual unittest output from the Python script. Specifying a browser name (e.g. Chrome) runs the test against Chrome instead.

import sys
import unittest

from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains

import selenium.webdriver.support.ui as ui

browsers = {"firefox": webdriver.Firefox, "ie": webdriver.Ie, "chrome": webdriver.Chrome, "safari": webdriver.Safari}

class Tests(unittest.TestCase):
    def setUp(self):
        self.driver = browser()
        self.driver.set_window_size(1380, 880)
        self.driver.get("file:///C:/Tools/myfile.htm")

        # Convenience functions
        self.driver.by_id = self.driver.find_element_by_id
        self.driver.by_xpath = self.driver.find_element_by_xpath
        self.wait = ui.WebDriverWait(self.driver, 5)

    def test_the_webpage(self):
        d = self.driver
        d.refresh()

        self.drag_and_drop(d.by_id("ID1"), d.by_xpath("//div[@id='things']/img[3]"))
        self.wait.until(lambda d: d.by_xpath("//table[@id='results']"))
        self.assertEqual("45", d.by_xpath("//table[@id='results']/tbody/tr/td[2]").text)

        d.by_id("thingy").click()
        self.wait.until(lambda d: d.by_xpath("//table[@id='results']"))
        self.assertEqual("64", d.by_xpath("//table[@id='results']/tbody/tr/td[2]").text)

    def drag_and_drop(self, elem, target):
        ActionChains(self.driver).drag_and_drop(elem, target).perform()

    def tearDown(self):
        self.driver.quit()

if __name__ == "__main__":
    browsername = "firefox"
    if len(sys.argv) == 2:
        browsername = sys.argv[1].lower()
    browser = browsers[browsername]

    suite = unittest.TestLoader().loadTestsFromTestCase(Tests)
    unittest.TextTestRunner().run(suite)
Note: For another Python project related to browser automation, see also twill.

Monday, 13 January 2014

Convert distance matrix to 2D projection with Python

In my continuing quest to never use R again, I've been trying to figure out how to embed points described by a distance matrix into 2D. This can be done with several manifold embeddings provided by scikit-learn. The diagram below was generated using metric multi-dimensional scaling based on a distance matrix of pairwise distances between European cities (docs here and here).
import csv
import numpy as np
import matplotlib.pyplot as plt
from sklearn import manifold

# Distance file available from RMDS project:
#    https://github.com/cheind/rmds/blob/master/examples/european_city_distances.csv
reader = csv.reader(open("european_city_distances.csv", "r"), delimiter=';')
data = list(reader)

dists = []
cities = []
for d in data:
    cities.append(d[0])
    dists.append(map(float , d[1:-1]))

adist = np.array(dists)
amax = np.amax(adist)
adist /= amax

mds = manifold.MDS(n_components=2, dissimilarity="precomputed", random_state=6)
results = mds.fit(adist)

coords = results.embedding_

plt.subplots_adjust(bottom = 0.1)
plt.scatter(
    coords[:, 0], coords[:, 1], marker = 'o'
    )
for label, x, y in zip(cities, coords[:, 0], coords[:, 1]):
    plt.annotate(
        label,
        xy = (x, y), xytext = (-20, 20),
        textcoords = 'offset points', ha = 'right', va = 'bottom',
        bbox = dict(boxstyle = 'round,pad=0.5', fc = 'yellow', alpha = 0.5),
        arrowprops = dict(arrowstyle = '->', connectionstyle = 'arc3,rad=0'))

plt.show()
Notes: If you don't specify a random_state, then a slightly different embedding may be generated each time (with arbitary rotation) in the 2D plane. If it's slow, you can use multiple CPUs via n_jobs=N.

Wednesday, 1 January 2014

QM Speed Test: NWChem

When I started the QM Speed Test I was afraid that I'd lose steam (and face!) before getting too far along. At the same time, I hoped that others might be inspired to send in some results for their systems and QM packages. This would make it easier for me (since half the work is setting up the input file), verify that my results were not pure invention, and extend the speed test to packages to which I don't have access.

I've just checked in the results for NWChem 6.3 compiled on Centos (also 6.3). But in the meanwhile, others have shot ahead by forking the speed test and checking in their own results:Fork me on GitHub
  • Eric Berquist, a grad student at the Uni of Pittsburgh, has checked in results for Dalton, ORCA, and Q-Chem.
  • Michael Banck, Debichem guru, has checked in results for all those packages available on Debian 7 (and presumably Ubuntu and other derivatives) namely NWChem, MPQC, and PSI4, as well as ERKALE.

Dudes, I salute you.

Let's start with my results. Rather than repeat the table here, you can check out the figures on the Github page. The summary is that NWChem is quite a bit faster at geo-opts than ERKALE. This is a combination of more steps in the geo-opt, and faster energy calculations. So if you are using some of ERKALE's other features, it would probably make sense to geo-opt first with NWChem. I did recompile ERKALE with "-O3 -ffast-math -funroll-loops -fPIC -march=native" and got a 10% improvement but that's not enough to change things.

From the other calculations, what do we now know? Well all of the packages tested can actually do geo-opts at the HF/6-31G and B3LYP/6-31G level of theory. Figuring this out by looking at their websites is not always as easy as you might think. Also worth noting is the relative slowdown when using B3LYP versus HF. In some cases the slowdown is of the order of 30-40%, in other cases the DFT calculations are twice as slow.

Michael's show that NWChem is faster than any of the other Open Source QM packages tested. Very useful information; again, it points at a large time-saving by using NWChem for the initial geo-opt before turning to whatever package you need for the subsequent analysis. Eric has compared several proprietary packages (I was going to say commercial but I can't tell if ORCA is available for purchase) and Q-Chem is much faster than ORCA and Dalton.

What we haven't looked at is scaling. Maybe the orders all change when we double the size of the molecule and increase the basis set. But that's a question for another day. Right now, the goal is to get as much breadth as possible on the original question.

Notes on compiling NWChem

1. Boy were there some quirks with this one. First of all, although the instructions on the website give everything in terms of the CShell (remember csh? - big on Unix, not so big these days), you can just use the bash equivalents.
2. I wanted only single-CPU calculations so I tried to turn off MPI support. My first attempt "USE_MPI=n" caused build failures, and the only way to get around build failures appeared to be to delete the folder and untar the source again. Following the instructions on this post, I used "export ARMCI_NETWORK=SOCKETS" instead.
3. My favourite quirk was that when you untar the source, the generated folder has a really long name which causes a compile failure after some time as follows:
The directory name chosen for NWCHEM_TOP is longer than
the maximum allowed value of 65 characters
current NWCHEM_TOP=/home/noel/Tools/Quantum/nwchem-6.3.revision2-src.2013-10-17/src
Renaming to "nwchem" also failed (different error - can't find something or other I think). Renaming to "nwchem-6.3" worked.
4. I couldn't figure out how to get NWChem to use the system lapack library but got it to link to blas as described in the docs.