Thursday, 3 November 2011

My new book, made of 100% recycled papers

There are many advantages associated with publishing with an Open Access journal...but I'm not going to go into these now (see recent posts on PMR's blog [e.g. this one] for a comprehensive background if the advantages are not self-evident). Here the point I want to make is that the authors of OA papers can do things with their papers which are not allowed by traditional publishers.

As a trivial example, authors of OA publications can legally distribute copies of their papers - sort of handy if you're a scientist, eh? :-) With most Open Access publishers, the author does not transfer copyright to the publisher but rather licenses it to them under a Creative Commons License. In other words, the author retains the right to do whatever they want with the paper: they can copy+paste the text into their blog (insert example here if someone can send me one), they can insert the paper as an appendix to another publication (as I did with the Open Babel book), they can magnify it to A0 size and present it as a poster or art installation or whatever... (I've sometimes wondered why Open Access journals don't make more capital out of this difference between themselves and closed journals - for example, they could hand out the top 10 most accessed papers at conferences.)

Well, in the spirit of exercising my right to recycle my OA papers as I like, I've put together a book (well, a PDF at least) consisting of the various PDFs stitched together with a Table of Contents. Here is the result (a 15MB PDF):

If I wanted to, I could upload this to Lulu and allow people to order a copy in the post, all nicely bound with a cover. Maybe another day.
Notes: The PDF was created based on comments on the Debian message board. To avoid confusion over page numbers, I added the short form of the appropriate reference to the footer of all of the papers. The following Python script was used to automate some of the steps:
import os

latex="""\\documentclass[12pt,a4paper]{book}
\\usepackage{multido}
\\usepackage[hmargin=.8cm,vmargin=0.5cm,nohead,nofoot]{geometry}
\\usepackage[explicit]{titlesec}

\\newcommand*\\Hide{%%
\\titleformat{\\chapter}[display]
  {}{}{0pt}{\\Huge\\thispagestyle{empty}}
\\titleformat{\\part}
  {}{}{0pt}{}
}

\\begin{document}
\\pagestyle{empty}

\\title{Open Access Publications of\\\\Noel O'Boyle}
\\maketitle
\\tableofcontents

%s

\\end{document}
"""

data = {
    "BO.pdf": [15,
"""Open Data, Open Source and Open Standards in chemistry: The Blue Obelisk five years on

N. M. O'Boyle, R. Guha, E. L. Willighagen, S. E. Adams, J. Alvarsson, J.-C. Bradley, I. V. Filippov, R. M. Hanson, M. D. Hanwell, G. R. Hutchison, C. A. James, N. Jeliazkova, A. S. I. D. Lang, K. M. Langner, D. C. Lonie, D. M. Lowe, J. Pansanel, D. Pavlov, O. Spjuth, C. Steinbeck, A. L. Tenderholt, K. J. Theisen and P. Murray-Rust.
J. Cheminf. 2011, 3, 37."""],
    "GM.pdf": [12,
"""Userscripts for the life sciences

E. L. Willighagen, N. M. O'Boyle, H. Gopalakrishnan, D. Jiao, R. Guha, C. Steinbeck and D. J. Wild.
BMC Bioinformatcs. 2007, 8, 487."""],
    "Cinfony.pdf": [10, """Cinfony - combining Open Source cheminformatics toolkits behind a common interface

N. M. O'Boyle and G. R. Hutchison.
Chem. Cent. J. 2008, 2, 24."""],
    "Confab.pdf": [9, """Confab - Systematic generation of diverse low-energy conformers

N. M. O'Boyle, T. Vandermeersch, C. J. Flynn, A. R. Maguire and G. R. Hutchison.
J. Cheminf. 2011, 3, 8."""],
    "AntColony.pdf": [15, """Simultaneous feature selection and parameter optimisation using an artificial ant colony: case study of melting point prediction

N. M. O'Boyle, D. S. Palmer, F. Nigsch and J. B. O. Mitchell.
Chem. Cent. J. 2008, 2, 21."""],
    "DataAnalysis.pdf": [2, """Review of ``Data Analysis with Open Source Tools"

N. M. O'Boyle.
J. Cheminf. 2011, 3, 10."""],
    "MACIE2005.pdf": [2, """MACiE: a database of enzyme reaction mechanisms

G. L. Holliday, G. J. Bartlett, D. E. Almonacid, N. M. O'Boyle, P. Murray-Rust, J. M. Thornton and J. B. O. Mitchell.
Bioinformatics. 2005, 21, 4315-4316."""],
    "MACIE2007.pdf": [6, """MACiE (Mechanism, Annotation and Classification in Enzymes): novel tools for searching catalytic mechanisms

G. L. Holliday, D. E. Almonacid, G. J. Bartlett, N. M. O'Boyle, J. W. Torrance, P. Murray-Rust, J. B. O. Mitchell and J. M. Thornton.
Nucleic Acid Res. 2007, 35, D515-D520."""],
    "OB.pdf": [14, """Open Babel: An open chemical toolbox

N. M. O'Boyle, M. Banck, C. A. James, C. Morley, T. Vandermeersch and G. R. Hutchison.
J. Cheminf. 2011, 3, 33."""],
    "PyChem.pdf": [2, """PYCHEM: a multivariate analysis package for python

R. M. Jarvis, D. Broadhurst, H. Johnson, N. M. O'Boyle and R. Goodacre.
Bioinformatics. 2006, 22, 2565-2566."""],
    "Pybel.pdf": [7, """Pybel: a Python wrapper for the OpenBabel cheminformatics toolkit

N. M. O'Boyle, C. Morley and G. R. Hutchison.
Chem. Cent. J. 2008, 2, 5."""]
    }

order = [["Cheminformatics toolkits", ['Pybel', 'Cinfony', 'OB']],
         ["Enzyme reaction mechanisms", ['MACIE2005', 'MACIE2007']],
         ["QSAR", ['PyChem', 'AntColony']],
         ["The Rest", ["GM", "Confab", "DataAnalysis", "BO"]]
         ]

def formatjournal(text):
    """Format the journal metadata

    >>> formatjournal("Chem. Cent. J. 2008, 2, 5.")
    '\\\\textit{Chem. Cent. J.} \\\\textbf{2008}, \\\\textit{2}, 5.'
    """
    broken = text.split(" ")
    pages = broken[-1][:-1]
    year = broken[-3][:-1]
    volume = broken[-2][:-1]
    journal = " ".join(broken[:-3])
    return "\\textit{%s} \\textbf{%s}, \\textit{%s}, %s." % (journal, year, volume, pages)
    
def test():
    import doctest
    doctest.testmod()

if __name__ == "__main__":
    # Sanity checks
    N = 0
    for x, y in data.iteritems():
        assert os.path.isfile(x)
        N += y[0]
    assert len(data) == sum(len(y) for x, y in order)

    # Write latex
    output = []
    for a, b, in order:
        output.append("\\part{%s}{\\Hide" % a)
        for paper in b:
            pages, biblio = data["%s.pdf" % paper]
            broken = biblio.split("\n")
            title = broken[0]
            output.append("\\chapter{%s}" % title)
            authors = broken[2]
            journaldata = formatjournal(broken[3])
            output.append("""\\multido{}{%d}{\\null\\vfill
    %s
    \\newpage}""" % (pages, journaldata))
        output.append("}")

    with open("generated.tex", "w") as f:
        f.write(latex % "\n".join(output))

    # Join Papers together
    pages = 1
    paper_idx = 1
    names = "A=generated.pdf"
    cat = "A2 A2 A2 A2"
    pages += 4
    for a, b in order:
        cat += " A2 A2"
        pages += 2
        for paper in b:
            # New chapters always on odd-number pages
            if pages % 2 == 0:
                cat += " A2"
                pages += 1
##            print pages
            paper_idx += 1
            papername = chr(64+paper_idx)
            names += " %s=%s.pdf" % (papername, paper)
            cat += " %s" % papername
            pages += data["%s.pdf" % paper][0]
    print >> open("run.bat", "w"), "pdftk %s cat %s output combined.pdf" % (names, cat)

    print """Now run...
pdflatex generated
pdflatex generated
run.bat
pdftk combined.pdf burst output tmp\\file_%03d.pdf
pdftk generated.pdf burst output tmp\\numbers_%03d.pdf

bash-3.2$ cd tmp && for i in `seq -w 1 109`; do pdftk file_$i.pdf background numbers_$i.pdf output new-$i.pdf; done

pdftk tmp\\new-???.pdf output new.pdf
"""

No comments: