Thursday, 16 June 2011

Using Zotero for Chemistry

Zotero keeps improving, and I was thinking it was time I started using it for my own papers. But how well do the translators work for Chemistry journals?

I tested the ability of Zotero to extract the correct metadata, the abstract, the journal abbreviation, the PDF, and the full-text HTML from the abstract page of a paper from a current issue of a journal from various publishers in Chemistry.

The following results are ranked by how well the translator works, starting with the best:
  • ACS Journals - Missing journal abbreviation.
  • BMC Journals - Misses author initials, doesn't recognise J Cheminf, missing journal abbreviation.
  • Elsevier (J Mol Struct THEOCHEM) - Metadata has slightly wrong DOI, markup included in abstract, missing journal abbreviation.
  • Wiley (J Comput Chem) - Only metadata (no PDF, or full-text HTML)
  • Springer (JCAMD) - Only metadata (no PDF, or full-text HTML)
  • Oxford (Nucleic Acids Res) - Only metadata (no PDF, or full-text HTML)
  • RSC (Chem Comm) - Only metadata, and that missing the page numbers (RSC must not be providing it to CrossRef).
I'm going to have a go at improving these by-and-by (keep an eye on my bitbucket account), but feel free to sort them out yourself if you want (leave a comment below if you do).

I was initially hesitant doing this as there was no test framework in place for translators, and there didn't seem to be much point in writing a translator that might break at any point without anyone knowing. But Avram Lyon is currently adding support for a test framework to Scaffold (the tool you use for writing Zotero translators), and so this should soon be available.

That's not normal for a correlation - Pearson vs Spearman

Let me present some random data, 100 data points with x and y values chosen from a uniform distribution between 0.8 and 0.9:
Correlation coefficients: -0.074 (Pearson), -0.073 (Spearman)

Now let's add a single data point at 0.1, 0.1:
Correlation coefficients: 0.866 (Pearson), -0.042 (Spearman)

Here's another interesting situation, with two random datasets of 50 datapoints chosen from the intervals (0.0, 0.5) and (0.5, 1.0):
Correlation coefficients: 0.668 (Pearson), 0.711 (Spearman)

What happens when we contract the areas though?
Correlation coefficients: 0.995 (Pearson), 0.719 (Spearman)

Tuesday, 7 June 2011

A bunch of stuff: Chemfps, cinfcasts, spreadchems, and MIOSS talks

Two of Andrew Dalke's projects have finally come out of stealth mode: chem-fingerprints (announcement and project page) and the world's first cheminformatics podcast, Molecular Coding.

Rich Apodaca has also been busy adding chemistry to spreadsheets, this time to Google Spreadsheets. Check out his video which sums up the whole thing.

And all of the talks from MIOSS are now available.

Thursday, 2 June 2011

Molecular zooming with Open Babel SVG

How cool is SVG? Way, I say. Combine it with some sweet sweet JavaScript (indeed, is there any other kind?) and you can visualise large numbers of molecules with low memory footprint (vectors don't use much memory) even in the tiny 600px margin of this blog post: (scroll mouse wheel to zoom, drag to move around)


The above shows a depiction of the first 100 molecules in PubChem, as drawn by Open Babel:
obabel PubChem3D.sdf -l100 -d -xC -O mols_100.svg

Want to see 1000 molecules?

Notes:
1. If you don't see anything above, you may want to upgrade your browser.
2. Open Babel is not the only Open Source chemistry software with SVG support: there's also the CDK, Indigo, OASA/BKChem and probably others.
3. In the Open Babel GUI, if you tick the box "Display in firefox" it shows these depictions when you convert.
4. One nice thing about SVG is that it can be styled with CSS, so that you could have a button on website that allows you to instantly change the colours of the bonds or labels, or the line-width. I don't think that the Open Babel SVG is set up for these shenanigans at the moment, but it could easily be done.