Tuesday 13 November 2007

Using R from Python - the best of both worlds

R is a programming environment containing every known statistical method; however, it's ugly and difficult to program in. Python has only a small number of statistical methods (which are available through SciPy), but it's elegant and easy to program in. Both have graphing libraries which leave a lot to be desired in terms of ease of use, but the graphs from Python's matplotlib are a lot more polished. So, I'd like to use R from Python...

In order to do this, you will need to install the RPy library (as well as Python and R, of course). I'm working on Windows, but I've used this previously on Linux. As an example, here are three ways to access the same hclust object from Python. They each illustrate different aspects of the Python/R interface:
from rpy import *

hclust = r("""
a <- read.table("cpOfSimMatrix.txt")
mydist <- dist(1-a)
r("rm(a)") # Ends with an error if you leave anything in memory

a = r.read_table("cpOfSimMatrix.txt")
mydist = r.dist(r["-"](1,a)) # Note the trick for '1-a' here
hclust = r.hclust(mydist) # Converts R object to Python dict

# START OF METHOD 3 (here's one I created earlier)
hclust = r('myHclust')

For further examples, check out Peter Cock's excellent pages on combining R and Python.

Thursday 8 November 2007

Take a molecule for a spin with Avogadro

Avogadro 0.2 is an early stage release of a new open source molecular viewer and editor. It's available for Windows, Linux and MacOSX, so there's no excuse not to try it. I took it for a test drive for the first time a week or so ago (this was on Windows), and was very impressed. Sure, there are some rough edges, as you might expect with such a low version number, but it's already capable of matching the existing competition and is sure to become the molecular editor of choice with the next release.

Once it starts, open up a molecular structure file (for example, a PDB entry) and immediately choose the navigation tool (next to the Pencil in the Tools toolbar). The left mouse button rotates the molecule, the middle zooms, and the right translates. So far, so Rasmol.

However, it's the attention to detail in the user interface that makes Avogadro stand out from the crowd. If the initial click to rotate is on an atom, the molecule rotates around the atom, whereas otherwise the molecule rotates around its centroid. Something else you will notice is the eye-candy, which really aids clarity. When rotating, a set of curvy arrows indicate the degree of rotation. If you choose the Bond Centric Manipulation Tool (indicated by an icon containing the number 90) and click on a bond, you will see bond angles lovingly displayed as semi-transparent segments. Then if you click and drag on an atom adjacent to the bond, the visualisation of the dihedral angles as you alter them is pretty cool (see below).

So what else has Avogadro got? I didn't try the molecular builder, the force-field optimisation nor the export to POV-Ray, but they're all there. I'm more interested in what's to come. Avogadro uses a plugin architecture, which means that it will be easy to incorporate third-party add-ons. Along with forthcoming support for scripting languages, this will allow me to incorporate elements of cclib or GaussSum into Avogadro.

But that's all in the future. For now, I'm just wondering how many times I can rotate this dihedral angle until it falls off...

Wednesday 7 November 2007

Preventing Spam on Mediawiki at SourceForge

Mail spam is annoying. But spam on a wiki is even more annoying. If you've spent your free time writing documentation, there's no way you want to see junk text or link spam inserted into the middle of your flowing prose. In an earlier article, I described how to install a Mediawiki wiki on SourceForge. After a while you will start to get your first spam. And so the battle begins...

Wikis don't look very impressive as a website, but for an open source project they are a great way of maintaining documentation. Nobody likes writing documentation, which is why it needs to be made as easy as possible. For example, if a user mails the project mailing list with a question about installing that software, it usually means that the documentation needs to be updated and this can be done in about a minute. (On the other hand, if a project has a frequently-asked questions page, it means that they couldn't be bothered improving the documentation.)

Fortunately, spam can simply be controlled using permissions (thanks to David Wild for some of these suggestions). To begin with, make sure you keep an eye on the Recent Changes feed on your wiki (use an RSS reader). (As an aside, if your Recent Changes feed is broken, a possible cause is that you have installed an extension and included a rouge blank line after the final "?>" - this happened to me.)

Next you need to edit the permissions in LocalSettings.php to disable anonymous edits and disable account creation except by users who already have accounts. Of course, the admin is always allowed to do whatever. I'll explain the rationale for this below, but first here are the settings:
# No anonymous editing allowed
$wgGroupPermissions['*']['edit'] = false;
$wgGroupPermissions['user']['edit'] = true;
$wgGroupPermissions['sysop']['edit'] = true;
# Only users with accounts can create accounts
$wgGroupPermissions['*']['createaccount'] = false;
$wgGroupPermissions['user']['createaccount'] = true;
$wgGroupPermissions['sysop']['createaccount'] = true

On one of the wikis I am involved with on SourceForge, we had already disabled anonymous edits. However, anyone could create an account. At some point the spammers upgraded their spam software and were then able to create accounts on the wiki. Since the RSS feed for recent changes was not working (for the reason described above), it was three days and about 1000 spam accounts later before I realised the problem. At that stage it was too late to implement the solution I described above. Instead, I created a new group 'human', added the 10 or so real accounts to that group and gave them permissions while simultaneously removing all permissions from the regular 'user' account.
# Only 'humans' can edit
$wgGroupPermissions['*']['edit'] = false;
$wgGroupPermissions['user']['edit'] = false;
$wgGroupPermissions['human']['edit'] = true;
$wgGroupPermissions['sysop']['edit'] = true;
# Only 'humans' can create accounts
$wgGroupPermissions['*']['createaccount'] = false;
$wgGroupPermissions['user']['createaccount'] = false;
$wgGroupPermissions['human']['createaccount'] = true;
$wgGroupPermissions['sysop']['createaccount'] = true;

Bye-bye spam. Of course, I still had to revert about 30 edits...:-/

Image credit: Spam wall by freezelight

Friday 2 November 2007

ANN: cclib 0.8 released - parsers and algorithms for comp chem

On behalf of the cclib development team, I am pleased to announce that cclib 0.8 is now available for download.

cclib is an open source library, written in Python, for parsing and interpreting the results of computational chemistry packages. It currently parses output files from ADF, GAMESS (US), GAMESS-UK, Gaussian, Jaguar, Molpro and PC GAMESS. A paper is currently in press.

The main changes in this release are:
* the addition of a Molpro parser
* support for writing data to/from files as JSON
* API additions: charge, multiplicity, Natural Orbital coefficients
* New method: Lowdin population analysis
* use of Numpy instead of Numeric

Among other data, cclib extracts:
* coordinates and energies
* information about geometry optimization
* atomic orbital information
* molecular orbital information
* information on vibrational modes
* the results of a TD-DFT calculation

cclib also provides some calculation methods for interpreting the electronic properties of molecules using analyses such as:
* Mulliken and Lowdin population analyses
* Overlap population analysis
* Calculation of Mayer's bond orders.

For more information see our website, read the tutorial, or send us an email.