Tuesday, 18 December 2012

My year in review - Reviewing reviewing

Like most scientists, from time to time I get asked to review papers, and a few years ago I decided to keep track of the reviews I did for various journals. In 2010, 2011, and 2012 I was asked to review 7, 6 and 7 times respectively.

I've been trying to figure out is this a reasonable level of reviewing? I guess the question is, am I doing more reviewing for the chemistry community than they are doing for me? Not that I mind those freeloading slackers dumping it all on me - I'm just curious.

So what's the other side of the equation? In the same three years I've had my name attached to 8 peer-reviewed publications. If each was reviewed 2.5 times (a reasonable guesstimate), that's also 20 reviews.

And so the balance is maintained.

Note: I know what you're thinking...(eerie isn't it!)...you want to adjust the figures for multiple authors. But both of the values would have to be adjusted in the same way and so it just cancels out. (I think.)

Update (02/01/2013): The previous note is a load of rubbish. As Felix points out in the comments, I should be correcting for multiple authors. Oh well...

Image credit: After the Edit by Laura Ritchie (LMRitchie on Flickr)

Tuesday, 11 December 2012

Cinfony 1.2 released

Cinfony presents a common API to several cheminformatics toolkits. It uses the Python programming language, and builds on top of Open Babel, the RDKit, the CDK, Indigo, OPSIN, JChem and cheminformatics webservices.

Cinfony 1.2 is now available for download.

The two major additions in this release are support for the JChem commercial cheminformatics toolkit, and the ability to specify options (via an 'opt' dictionary) for format conversion and some other operations. There were also some under-the-hood changes to consolidate source files for ease of maintenance.

These additions were principally contributed by AdriƠ Cereto MassaguƩ (of Universitat Rovira i Virgili and DecoyFinder) who joined the Cinfony team (i.e. me) earlier this year.

As usual, Cinfony has been updated to use the latest stable releases of each toolkit: Open Babel 2.3.2, CDK 1.4.15, RDKit 2012.09, Indigo 1.1, JChem 5.11 and OPSIN 1.3.

The Cinfony website has a somewhat condensed example showing the use of many of these resources in a dozen lines of Python. Here's a smaller example showing part of the new functionality:
>>> from cinfony import pybel, jchem
>>> mol = pybel.readstring("smi", "CC(=O)Cl")
>>> print mol.write("smi", opt={"f": 2, "l": 1}) # Make atom 2 the first and atom 1 the last
C(=O)(Cl)C
>>> fp = jchem.Molecule(mol).calcfp("ECFP")
>>> fp.bits
[39, 47, 55, 246, 397, 429, 700, 908]

To support Cinfony, please cite:
N.M. O'Boyle, G.R. Hutchison, Chem. Cent. J., 2008, 2, 24. [link]

Tuesday, 4 December 2012

Intro to Open Babel

Recently I was asked (for the first time!) to provide introductory training for Open Babel. Here are the slides I put together:
If you're interested in this, you should follow up with the hands-on tutorial in the docs which I've mentioned previously. For further cheminformatics teaching material, see this post.

Tuesday, 20 November 2012

What's taking so long? - Profiling Open Babel

Profiling code shows where all the runtime is spent. In the case of Open Babel, profiling is a bit awkward due to its use of dynamically-loaded libraries (the format plugins, etc.). So here's how you do it on Linux...
[openbabel/build]$ rm CMakeCache.txt 
[openbabel/build]$ CXXFLAGS="-pg" LDFLAGS="-pg" cmake ../trunk -DCMAKE_INSTALL_PREFIX=../tree -DCMAKE_BUILD_TYPE=DEBUG -DBUILD_SHARED=OFF
[openbabel/build]$ make
This should successfully compile the library and plugins with profiling information, but will fail when it comes to linking one of the executables:
[openbabel/build]$ make obabel
[100%] Built target openbabel
Scanning dependencies of target obabel
[100%] Building CXX object tools/CMakeFiles/obabel.dir/obabel.o
Linking CXX executable ../bin/obabel
/usr/bin/ld: dynamic STT_GNU_IFUNC symbol `strcmp' with pointer equality in `/usr/lib/../lib64/libc.a(strcmp.o)' can not be used when making an executable; recompile with -fPIE and relink with -pie
collect2: error: ld returned 1 exit status
make[3]: *** [bin/obabel] Error 1
make[2]: *** [tools/CMakeFiles/obabel.dir/all] Error 2
make[1]: *** [tools/CMakeFiles/obabel.dir/rule] Error 2
make: *** [obabel] Error 2
Yikes! Using VERBOSE=1 we can see the offending command:
[openbabel/build]$ VERBOSE=1 make obabel 
cd /home/noel/Tools/openbabel/profile/build/tools && /usr/local/bin/cmake -E cmake_link_script CMakeFiles/obabel.dir/link.txt --verbose=1
/usr/local/bin/c++   -static -pg  -g -g3 -fno-inline   -pg CMakeFiles/obabel.dir/obabel.o  -o ../bin/obabel -rdynamic ../src/libopenbabel.a -Wl,-Bstatic -lpthread -Wl,-Bdynamic -lm -lz -Wl,-Bstatic 
/usr/bin/ld: dynamic STT_GNU_IFUNC symbol `strcmp' with pointer equality in `/usr/lib/../lib64/libc.a(strcmp.o)' can not be used when making an executable; recompile with -fPIE and relink with -pie
collect2: error: ld returned 1 exit status
With Roger's help, I was able to change this to something simpler which will compile:
[build/tools]$ /usr/local/bin/c++ -pg  -g -g3 -fno-inline   -pg CMakeFiles/obabel.dir/obabel.o  -o ../bin/obabel ../src/libopenbabel.a -lpthread -lm -lz
Success is mine AT LAST!! Now let's profile it:
[build/bin]$ export BABEL_DATADIR=wherever
[build/bin]$./obabel bigfile.smi -onul
[build/bin]$ gprof ./obabel > gprof.out
Now time to read the gprof manual.

We can design molecular wires for you wholesale Part II

As described previously, last year I published a paper in J. Phys. Chem. C with Geoff Hutchison on Computational Design and Selection of Optimal Organic Photovoltaic Materials.

A week or two ago someone emailed me for a copy of the paper. Fortunately, after a one year embargo, if you have an open access mandate you can request permission from the editor of an ACS journal to deposit the PDF in an institutional repository.

I went through this process a little while ago, and I am pleased to say that a copy of the PDF can now be found in University College Cork's institutional repository at http://hdl.handle.net/10468/748.

I'm not sure how people are going to find this though - it doesn't seem to be very prominent in Google. In particular, the PDF is not indexed (google search with filetype:pdf).

Friday, 9 November 2012

Tricks with SMILES and SMARTS Part II

A much-underused feature of SMILES is the ability to apply 'atom classes' to atoms using a colon (inside square brackets). So, for example, CC and C[CH3:6] both represent ethane, but in the latter case one of the carbons is labelled as being a member of atom class 6.

So what's the meaning of atom class 6? Well, it's whatever you want - it's simply a label that you use to indicate some related information. For example, you might want to record reaction locations, or locations of common substitutions, or mappings between different molecules (reactant/product, or sub/superstructures).

Anyhoo, here's how you access the atom class information in Open Babel from Python:
>>> import pybel
>>> ob = pybel.ob
>>> mol = pybel.readstring("smi", "C[CH3:6]")
>>> print mol.write("smi")
CC

>>> print mol.write("smi", opt={"a":True})
C[CH3:6]

>>> data = ob.toAtomClassData(mol.OBMol.GetData("Atom Class"))
>>> data.HasClass(1)
False
>>> data.GetClassString(1)
''
>>> data.HasClass(2)
True
>>> data.GetClassString(2)
':6'
>>>

Thursday, 8 November 2012

Plotting accesses on the axis Part III

Following on from Parts I and II, this is the last in a series of posts exploring how journal access statistics provided by Open Access journals such as Journal of Cheminformatics give an insight into relative impact of papers.

It's now over a year since the Blue Obelisk and Open Babel papers were published so let's look at the accesses over that period: ...close to straight lines with a defined slope.

And here's an update of the view over the first month, this time including the Universal SMILES paper: ...it seems like the accesses to this paper mirror almost exactly those of the Blue Obelisk paper.

In conclusion, it would be nice if journals provided these sorts of graphs, or if some third-party website (e.g. one of altmetrics ones) did it. All of the data is on the website; it just needs to be collated as I've done here.