Friday 25 November 2011

Your turn - Poll up and answer

It's been a while, so here's a poll (see the sidebar on the left).

In which decade was the following statement made? Guess before you google it, and extra marks if you know who the author was also.
With the ever-increasing number of publications, coupled with higher printing costs, there is great pressure brought on journal editors to keep manuscripts as short as possible. While this is quite understandable, it has, in my opinion, lead to a very serious problem. The vast majority of (hopefully) good analytical data, such as spectroscopic, kinetic and thermodynamic measurements, is never readily made available to the scientific community. Published data are often so "compressed" that one is unable to examine alternative interpretations, as the published data are not sufficient. Partial data are preferred to complete data...

Why doesn't [______] make it a policy to require authors to submit full data on spectroscopic and other data for which there are existing data centres? Furthermore, I propose that the editors of this journal and other such journals establish criteria for collecting relevant data for which no data center exists today in order to prepare for the future. Perhaps it is time for a conference of journal editors to meet and propose a solution to this problem. Unless something real and practical is done in the near future, it will become impossible to find or use scientific data with the resulting loss of time and money for those who need to repeat experiments.
(1) I've mixed up the spelling of center/re to protect the innocent.
(2) Poll closes in 7 days.
(3) Please - no spoilers in the comments.

Friday 18 November 2011

Displaying pharmacophores from Pharao in Avogadro

Following on from the equivalent post for Pharmer, here is an extended Avogadro plugin that works for both Pharmer and Pharao generated pharmacophores.

The plugin allows the user to click through all of the pharmacophores defined in a file and display them one by one. See install instructions in the previous post. The code is kinda rough-and-ready so be prepared to hack around if you need to sort something out:

Wednesday 16 November 2011

Plotting accesses on the axis

It's quite interesting to monitor the page accesses for publications in J. Cheminf. (which incidentally has just gotten a much improved interface). While page accesses are not citations, they do provide a measure of general interest in a paper. Also, if you reach a certain level (relative to other publications in the journal), your paper gets the "Highly Accessed" badge, which helps to highlight it. PMR has been talking about this too.

I monitored the page accesses in the first month for two recent papers, the Blue Obelisk paper and the Open Babel paper. I missed a day or two here or there, but the results are shown in this Google spreadsheet graph:
It is worth noting that about half of the accesses in the month occur in the first week. At that stage only the preliminary manuscript is available (I think it was two weeks later that the final HTML and PDF was produced). Furthermore, the DOI was not registered until the HTML went live and thus you need provide the direct URL rather than DOI if promoting the paper (I've complained about this to the journal).

Thursday 3 November 2011

My new book, made of 100% recycled papers

There are many advantages associated with publishing with an Open Access journal...but I'm not going to go into these now (see recent posts on PMR's blog [e.g. this one] for a comprehensive background if the advantages are not self-evident). Here the point I want to make is that the authors of OA papers can do things with their papers which are not allowed by traditional publishers.

As a trivial example, authors of OA publications can legally distribute copies of their papers - sort of handy if you're a scientist, eh? :-) With most Open Access publishers, the author does not transfer copyright to the publisher but rather licenses it to them under a Creative Commons License. In other words, the author retains the right to do whatever they want with the paper: they can copy+paste the text into their blog (insert example here if someone can send me one), they can insert the paper as an appendix to another publication (as I did with the Open Babel book), they can magnify it to A0 size and present it as a poster or art installation or whatever... (I've sometimes wondered why Open Access journals don't make more capital out of this difference between themselves and closed journals - for example, they could hand out the top 10 most accessed papers at conferences.)

Well, in the spirit of exercising my right to recycle my OA papers as I like, I've put together a book (well, a PDF at least) consisting of the various PDFs stitched together with a Table of Contents. Here is the result (a 15MB PDF):

If I wanted to, I could upload this to Lulu and allow people to order a copy in the post, all nicely bound with a cover. Maybe another day.
Notes: The PDF was created based on comments on the Debian message board. To avoid confusion over page numbers, I added the short form of the appropriate reference to the footer of all of the papers. The following Python script was used to automate some of the steps:
import os




\\title{Open Access Publications of\\\\Noel O'Boyle}



data = {
    "BO.pdf": [15,
"""Open Data, Open Source and Open Standards in chemistry: The Blue Obelisk five years on

N. M. O'Boyle, R. Guha, E. L. Willighagen, S. E. Adams, J. Alvarsson, J.-C. Bradley, I. V. Filippov, R. M. Hanson, M. D. Hanwell, G. R. Hutchison, C. A. James, N. Jeliazkova, A. S. I. D. Lang, K. M. Langner, D. C. Lonie, D. M. Lowe, J. Pansanel, D. Pavlov, O. Spjuth, C. Steinbeck, A. L. Tenderholt, K. J. Theisen and P. Murray-Rust.
J. Cheminf. 2011, 3, 37."""],
    "GM.pdf": [12,
"""Userscripts for the life sciences

E. L. Willighagen, N. M. O'Boyle, H. Gopalakrishnan, D. Jiao, R. Guha, C. Steinbeck and D. J. Wild.
BMC Bioinformatcs. 2007, 8, 487."""],
    "Cinfony.pdf": [10, """Cinfony - combining Open Source cheminformatics toolkits behind a common interface

N. M. O'Boyle and G. R. Hutchison.
Chem. Cent. J. 2008, 2, 24."""],
    "Confab.pdf": [9, """Confab - Systematic generation of diverse low-energy conformers

N. M. O'Boyle, T. Vandermeersch, C. J. Flynn, A. R. Maguire and G. R. Hutchison.
J. Cheminf. 2011, 3, 8."""],
    "AntColony.pdf": [15, """Simultaneous feature selection and parameter optimisation using an artificial ant colony: case study of melting point prediction

N. M. O'Boyle, D. S. Palmer, F. Nigsch and J. B. O. Mitchell.
Chem. Cent. J. 2008, 2, 21."""],
    "DataAnalysis.pdf": [2, """Review of ``Data Analysis with Open Source Tools"

N. M. O'Boyle.
J. Cheminf. 2011, 3, 10."""],
    "MACIE2005.pdf": [2, """MACiE: a database of enzyme reaction mechanisms

G. L. Holliday, G. J. Bartlett, D. E. Almonacid, N. M. O'Boyle, P. Murray-Rust, J. M. Thornton and J. B. O. Mitchell.
Bioinformatics. 2005, 21, 4315-4316."""],
    "MACIE2007.pdf": [6, """MACiE (Mechanism, Annotation and Classification in Enzymes): novel tools for searching catalytic mechanisms

G. L. Holliday, D. E. Almonacid, G. J. Bartlett, N. M. O'Boyle, J. W. Torrance, P. Murray-Rust, J. B. O. Mitchell and J. M. Thornton.
Nucleic Acid Res. 2007, 35, D515-D520."""],
    "OB.pdf": [14, """Open Babel: An open chemical toolbox

N. M. O'Boyle, M. Banck, C. A. James, C. Morley, T. Vandermeersch and G. R. Hutchison.
J. Cheminf. 2011, 3, 33."""],
    "PyChem.pdf": [2, """PYCHEM: a multivariate analysis package for python

R. M. Jarvis, D. Broadhurst, H. Johnson, N. M. O'Boyle and R. Goodacre.
Bioinformatics. 2006, 22, 2565-2566."""],
    "Pybel.pdf": [7, """Pybel: a Python wrapper for the OpenBabel cheminformatics toolkit

N. M. O'Boyle, C. Morley and G. R. Hutchison.
Chem. Cent. J. 2008, 2, 5."""]

order = [["Cheminformatics toolkits", ['Pybel', 'Cinfony', 'OB']],
         ["Enzyme reaction mechanisms", ['MACIE2005', 'MACIE2007']],
         ["QSAR", ['PyChem', 'AntColony']],
         ["The Rest", ["GM", "Confab", "DataAnalysis", "BO"]]

def formatjournal(text):
    """Format the journal metadata

    >>> formatjournal("Chem. Cent. J. 2008, 2, 5.")
    '\\\\textit{Chem. Cent. J.} \\\\textbf{2008}, \\\\textit{2}, 5.'
    broken = text.split(" ")
    pages = broken[-1][:-1]
    year = broken[-3][:-1]
    volume = broken[-2][:-1]
    journal = " ".join(broken[:-3])
    return "\\textit{%s} \\textbf{%s}, \\textit{%s}, %s." % (journal, year, volume, pages)
def test():
    import doctest

if __name__ == "__main__":
    # Sanity checks
    N = 0
    for x, y in data.iteritems():
        assert os.path.isfile(x)
        N += y[0]
    assert len(data) == sum(len(y) for x, y in order)

    # Write latex
    output = []
    for a, b, in order:
        output.append("\\part{%s}{\\Hide" % a)
        for paper in b:
            pages, biblio = data["%s.pdf" % paper]
            broken = biblio.split("\n")
            title = broken[0]
            output.append("\\chapter{%s}" % title)
            authors = broken[2]
            journaldata = formatjournal(broken[3])
    \\newpage}""" % (pages, journaldata))

    with open("generated.tex", "w") as f:
        f.write(latex % "\n".join(output))

    # Join Papers together
    pages = 1
    paper_idx = 1
    names = "A=generated.pdf"
    cat = "A2 A2 A2 A2"
    pages += 4
    for a, b in order:
        cat += " A2 A2"
        pages += 2
        for paper in b:
            # New chapters always on odd-number pages
            if pages % 2 == 0:
                cat += " A2"
                pages += 1
##            print pages
            paper_idx += 1
            papername = chr(64+paper_idx)
            names += " %s=%s.pdf" % (papername, paper)
            cat += " %s" % papername
            pages += data["%s.pdf" % paper][0]
    print >> open("run.bat", "w"), "pdftk %s cat %s output combined.pdf" % (names, cat)

    print """Now run...
pdflatex generated
pdflatex generated
pdftk combined.pdf burst output tmp\\file_%03d.pdf
pdftk generated.pdf burst output tmp\\numbers_%03d.pdf

bash-3.2$ cd tmp && for i in `seq -w 1 109`; do pdftk file_$i.pdf background numbers_$i.pdf output new-$i.pdf; done

pdftk tmp\\new-???.pdf output new.pdf