Monday, 17 March 2008

Cube file considered harmful

Several quantum chemistry packages can be encouraged to output the electron density or the magnitude of the wavefunction for a particular orbital at grid points in a volume. This sort of volumetric data is commonly known as a cube file, as this is the name that Gaussian uses, although the volume may not necessarily be a cube. Such a cube file can be used to plot molecular orbitals at various isosurface values.

And that basically, is all they're good for. Nice pictures.

So, what's wrong with that? Well, try doing some sort of analysis with them. Let's say you're into conceptual DFT and want to start partitionally the electron density in weird and wonderful ways. You'll soon find that the cube file is unusable.

For example, during my PhD I wrote a Python script to automate the calculation of Hirshfeld atomic charges using cube files. To do this you need the electron density of the molecule itself, as well as that for each atom on its own. In my thesis, I have a test case using HCN. This required a cube of volume 10x10x12Ang3 with a grid spacing of 0.075Ang. That's pretty fine spacing, but it's required for accurate evaluation of the volume integration. And that's a pretty big cube, but it turns out that electron density falls off pretty slowly with distance.

I gave up trying calculate Hirshfeld charges for the molecules in which I was actually interested, ruthenium polypyridyl complexes. I worked out that a cube file of the appropriate size would be 13GB in size. And like I said, you don't just need one.

But what if I had access to the basis set itself and the molecular orbital coefficients? Then I could have bypassed the cube file and accessed the underlying mathematical functions. At a recent meeting on Tools, Visualisation and GUIs organised by CCP1, the GAMESS-UK developers have agreed to work with cclib to make this happen. This would allow users/developers to write software to analyse the electron density independent of any particular QM package. If you're interested, get in touch.

To finish, here are some pretty pictures relating to Hirshfeld charge analysis. First here's HCN:The next image shows (a) the promolecule (sum of the electron densities of each of the atoms on its own), (b) the electron density, and (c) the difference between a & b, referred to as the deformation density, which shows how electron density has changed on formation of the molecule.And finally, for the carbon atom, (a) the free atom density, (b) the bonded atom density, and (c) the atomic deformation density.


Geoff Hutchison said...

Since I wasn't at the conference... I assume this means that using cclib (and presumably other code like Open Babel) we can also handle things like the Molden format, which uses coefficients of the various basis functions?

I definitely agree that having the underlying wavefunctions and interoperability between multiple QM codes would be supremely useful. I've long suggested this to various packages (mostly proprietary QM codes, admittedly) and received silence.

Open standards and data of this type would be great. (I expect packages like Jmol might also like such compression of electron density and orbitals.)

Noel O'Boyle said...

Regarding Molden format, I guess so, but I am not really familiar with this. I don't like the idea personally of supporting intermediary formats; I think that misses the point of having a library like OpenBabel/cclib. For example, at the meeting someone asked whether I could get cclib to output a Dalton output file given a Gaussian output file, as he had some software that could read Dalton output files.

The GAMESS-UK guys seem to have good links with other QM codes. They also had a suggestion that they/someone contact each of the codes and ask for a specification of their machine-readable formats. In return, those programs that provide a spec get support in cclib, OpenBabel, Avogadro, GaussSum, Molekel, CCP1GUI, etc. If we get a couple of programs signed up, then we can get more.