Noel O'Blog

Wednesday, 22 July 2009

Services built around open source software

Computer-aided chemistry is used today by all the major high-technology companies that are active in chemistry. Just like the meteorologist uses computers to forecast the weather, computers can be used to simulate and predict properties of molecules. This approach is documented to give companies and scientists a high return on investment. But few companies have the resources and skills to make it a reality. The cost of hardware, software, and specialized scientists makes this approach unattainable to most. hBar Lab addresses this problem by putting the required technology online. With hBar Lab there is:
No need for expensive hardware
No upfront payment for software
User-friendly interface makes it accessible for everyone, no specialized scientist necessary.

Source: hBar Lab - Computer-aided Chemistry On Demand

Support and consulting have always been ways of deriving income from open source software, but the web introduces new possibilities centered around web services. I have recently become aware of hBar Lab, whose web application is built entirely on open source software (MPQC, OpenBabel, Jmol) and who perform on-demand calculation of molecular properties:

The user login, select the property, e.g. ionization energy or geometry, and the molecule of interest, and then submit the query. The required calculations are seamlessly executed on computers in the background and once the calculations are done, the results will be returned in the user's inbox. It is as simple as that.

An interesting idea.

TwirlyMol - Status update re world domination

TwirlyMol was the world's first Javascript molecular viewer with shadows. It has been described as "and of course the shadows are cool" by Felix of Chemical Quantum Images.

Although TwirlyMol was only released into the wild to fend for itself in January, it has swiftly outpaced Chime and is rapidly approaching Jmol-like levels of deployment.

Well, almost. At least one ~~other~~ person is using it anyway. As part of a chemistry education project at the University of Wisconsin, TwirlyMol is being used on the ChemPrime wiki and on a student education portal, both of which look like two interesting resources under development. However, you should be warned - the TwirlyMol shadows have been removed!

TwirlyMol is freely available under a do-what-you-want-with-it license. You can even (*sob*) remove the shadows.

Wednesday, 15 July 2009

ANN: Symposium on Visual Analysis of Chemical Data (ACS Spring 2010)

Update 06/Sept/09: See second call for papers.

First Call for Papers:
Visual Analysis of Chemical Data
239th ACS National Meeting
San Francisco, March 21-25, 2010
CINF Division

Dear Colleagues,

We wish to announce an upcoming symposium focusing on innovative methods for visual representation and analysis of chemical data. Just as Edward Tufte has championed maximizing clarity and information content in statistical graphics, there is a need for methods to display chemical information that will maximize understanding, and allow rapid analysis and decision making.

We invite you to submit contributions that address various aspects of visualization of chemical data (such as structures, SAR data, literature, patents) including, but not limited to, the following topics:

With an ever increasing pool of descriptors, along with new and more sophisticated machine learning methods, QSAR models are becoming more difficult to interpret. How can information on model reliability, the presence of activity cliffs, and the range of applicability of a model and other relevant model properties be easily depicted?
Recently, virtual worlds 3D such as Second Life have presented new opportunities and challenges for the representation of chemical data. What is the potential of such a medium in education and communicating with the chemistry community?
Social software allows for rapid and convenient sharing of chemical data. Examples include Google Spreadsheets, ManyEyes, DabbleDB, and wikis, including Wikipedia. What are the implications for chemical research and education?
The visualization of the contents of large chemical datasets presents particular problems. How can an overview of the dataset be visualized so that it presents both the nature of the contents as well as the degree of diversity and similarity within the dataset? How can different datasets be visually compared?
Depicting 3D chemical information in 2D involves a loss of information. However, innovative 2D visualization methods can restore the most relevant information.
Chemical information comprises a diverse array of data types including chemical structures and diagrams (2D and 3D), associated assay results, conformations, QSAR models and their predictions. The visualization and integration of all these data into a single interface that aids interpretation and analysis is a continuing challenge.

We would also like to point out that sponsorship opportunities are available.

The on-line abstract submission system (PACS) will be open for submissions from 24th August. A second announcement will be made at that time.

Please contact Andrew, Jean-Claude or myself if you have any questions.

Yours sincerely,
Noel O'Boyle

On behalf of the symposium organizers:

Dr. Jean-Claude Bradley,
Drexel University, PA
bradlejc@drexel.edu

Dr. Andrew Lang,
Oral Roberts University, OK
alang@oru.edu

Dr. Noel O’Boyle,
Cambridge Crystallographic Data Centre, U.K.
oboyle@ccdc.cam.ac.uk

Image credit: prehensile

Tuesday, 7 July 2009

Sledgehammer, meet nut - Using Eclipse for Python

I usually use gvim or IDLE to edit Python files, but today I thought I'd try something a bit more heavyweight: Eclipse. Eclipse is widely used in the Java world. It's open source and freely available, and most importantly there is a Python plugin for Eclipse called PyDev.

So what does Eclipse have that IDLE doesn't? Well, integration with the Python debugger for a start. Also, this sort of code completion is quite handy (click for a larger image):

It also has nice integration with PyLint (see the bottom pane in the following figure) which catches various errors (e.g. mispelled variables) before you run a script:

Here are some notes:

I followed these installation instructions and then sped through the manual.
Pydev currently supports Eclipse 3.2 to 3.4. It took a while to find an Eclipse download page with version 3.4 but here it is. I installed Eclipse SDK 3.4.2.
Start Eclipse, and click on Help/Software Updates. Add http://pydev.sourceforge.net/updates/ to the list of update sites. Tick the box and click Install to install PyDev.
Following the details at http://www.fabioz.com/pydev/manual_101_interpreter.html, I added a Python interpreter (Name="Python 2.5", Executable="C:\Python25\python.exe").
Installing pylint on Windows is a pain, so I used easy install:
```
C:\Python25\Scripts\easy_install.exe pylint
```
In the PyLint configuration, you need to specify the location of lint.py. Mine was at C:\Python25\Lib\site-packages\pylint-0.18.0-py2.5.egg\pylint\lint.py.

Monday, 29 June 2009

I'll fix the bug...but only if you give me a public domain test file

Recently, Avogadro/OpenBabel have been increasing their support for computational chemistry log files. I am hoping that they will learn from our experience at GaussSum/cclib.

GaussSum was the first Python program I ever wrote, and still bears the hallmarks. When I first started GaussSum (a program which analyses the results of comp chem calculations), I would use the test cases from users to fix bugs. Then over time, I'd lose the test cases as I moved from computer to computer. I couldn't place the test cases in my version control system as the test cases might have been the results of someone's research, and they mightn't be happy to see them publicly available.

Things came to head when dealing with the parsing of vibrational frequencies in the various versions of GAMESS. It turned out that each version of GAMESS (PC-GAMESS, WinGAMESS and GAMESS US) had slightly different output for vibrational frequencies. I ended up bouncing between code that worked for WinGAMESS but not GAMESS and vice versa, depending on who sent me the last bug report. In other words, I was wasting my time fixing bugs which might reappear later. It was around this time that (a) I realised I needed a test suite, and (b) I needed public domain test files, so I could use them in my test suite.

The parser used by GaussSum is now available as a separate project, cclib, and is developed in collaboration with Adam Tenderholt and Karol Langner. This time I put a lot of thought into the test suite, and I think we've done very well. The parsers are initially developed using a set of calculations which are the same for each comp chem package; our test suite ensures that the same results are found in each case and that the units are consistent. We only fix bugs for which a public domain test file is provided ("I place this file in the public domain" is all we need to hear), and regression tests are easily added to the test suite. Our test suite has the final say on commits; commits are reverted if they cause an existing test to fail. This guarantees that cclib can only improve over time.

The inevitable consequence of this policy is that some reported bugs don't get fixed. Sometimes the reporter simply does not respond to the query to place it in the public domain. On two occasions, the reporter was working in a pharmaceutical company and felt it was more hassle than it was worth to do the necessary paperwork to place it in the public domain. So it goes... On the other hand, we do now have a set of more than 200 comp chem log files which go a long way to ensuring that our parsers can handle anything that is thrown at them. The best way of getting these files is to check the data directory of cclib out of subversion and run wget.sh.

In conclusion, if you are thinking of writing software that handles comp chem files, either try to collaborate with others who are working on the same problem (e.g. cclib or OpenBabel), or at the very least take into account some of the comments here. Otherwise, you are simply building a house of cards.

Friday, 19 June 2009

Using PyActiveResource to access ChemCaster

ChemCaster, from Rich's Metamolecular, is a platform for developing web-based cheminformatics applications. The advantage of such a system is that the user does not need to install any special software, nor does the application developer need to maintain a server.

Rich invited me to take it for a spin, so I signed up for a trial account and moved quickly on to my first problem, how do I access the API through Python?

It turns out that RESTful APIs tend to have common patterns, a fact which is taken advantage of by Active Resource, a Ruby library for defining classes which directly map onto the objects implied by a RESTful API. Or something like that - I neglected to read any documentation. Instead I just took Rich's example and tried to code it up in Python using PyActiveResource (this is a documentation-free project so using it is quite exciting).

Et voilá

Tuesday, 9 June 2009

From zero to Zotero - One man's journey out of PDF hell

Zotero is a reference management software. Sorry, let me correct that - Zotero is THE reference management software. I had tried Zotero before, and it certainly looked good; but frankly I couldn't figure out how to get it to work and so reverted to my usual system, the 'zero' of the title. Hearing the news that Endnote vs. Zotero was just thrown out of court, I decided to try it again.

And it's just amazing.

Let me begin by describing a typical workflow:
(1) Go to the summary page for an ACS paper online
(2) Click on the icon that appears in the address bar (looks like a sheet of paper with writing).

That's it. You've just saved the PDF, the HTML full-text and the paper's metadata.

If you've created an account on zotero.org (free of course!), you can synch your library so that multiple computers can share the same data. And best of all you can also synch the attachments (i.e. PDFs, HTML pages) if you have a WebDAV account (e.g. from your university or in my case, JungleDisk Plus/Amazon S3). If that wasn't enough, it also integrates with Word to make it easy to prepare a publication (~~though I haven't tested this~~ Update: it works just fine, but you first need to install the bibliographic styles you need from Zotero settings/Preferences/Styles/Get additional styles).

In other words, Zotero makes it easy to download papers, back them up, make them accessible from any computer and reference them in papers.

Zotero is open source and freely available from www.zotero.org.

Notes: I'm using Zotero 2.0b5. In the Zotero preferences (click on the gear icon), choose "Automatically attach PDFs and other files when saving items" in the General Tab. JungleDisk and Amazon cost money (we're talking around $1.50 a month), but there may be free alternatives for WebDAV. For any websites that aren't currently supported by Zotero, adding new translators has been made easy. All of the JavaScript files for the translators are stored in a folder on your computer and can easily be extended or added to. That said, I've had no trouble downloading PDFs from Sciencedirect, ACS, RSC, Wiley or BMC.

Image credit: jazzmodeus