Sunday, 21 August 2016

My new thing - providing manuscript images as PDFs

My latest oeuvre (on the topic of which fingerprint is best) was published by J. Cheminf. a few weeks ago. For the first time, instead of providing the images as PNGs, I submitted them as PDFs.

You see, John had worked me over. At the start, I thought of a PDF as the bad boy of the journal publishing scene, the hamburger and not the cow. What appears at first as text arranged into sentences, is just a haphazard arrangement of glyphs which through some trickery of the eye coalesces into scientific discourse. To generate a PDF myself would be to add to this madness.

But the thing is, when you strip a PDF down to its essentials, it's a relative of a PostScript file (details omitted due to ignorance), a vector graphics format. A more popular vector graphics format is an SVG file, but this is not supported by most publishers and so I spend a lot of time calculating DPI and inches per column and then generating a PNG. But they do often support PDFs, and these can readily be generated (with a bit of care) from many different programs. And all other things being equal, the best quality images will be generated by providing a vector graphics format as the publisher can resize it without any loss of quality.

Below I provide details about how I generated the PDFs, but let's look at their handling by Journal of Cheminformatics. This journal provides three views of the paper, a HTML page, an ePUB (which I won't discuss further) and a PDF. The HTML version contains embedded PNGs, they are a little small for my taste (maybe my fault - I don't know) but they are readable. So somehow they were able to convert the PDFs to images of whatever size they wanted for the HTML page. The PDF is a bit more interesting, as the images are now included as vector graphics. That is, if you keep zooming in on an image in the PDF, the lines remain sharp (in contrast to the PNGs in the HTML version).

So, in short, there seems little downside to providing PDFs, and much to gain. I'd be interested in hearing the viewpoint of anyone involved with the publishing side of things.

1. When using matplotlib to generate graphs, just give the file a .PDF extension, e.g. plt.savefig("overallperformance_%s.pdf" % benchmark, dpi=300)
2. When using Inkscape, save as PDF.
3. The hardest part was the chemical structures. I tried a variety of recipes with two different commercial programs. In the end, although ChemDraw's SVG export had the heteroatoms all over the place, the EMF export was openable by Inkscape and then I could save as PDF. (Apparently you can go direct to PDF from ChemDraw on a Mac.)

Tuesday, 2 August 2016

Open Source Cheminformatics Toolkits - they keep on coming

It has been just over three years since I last surveyed the world of open source cheminformatics toolkits. So what's new?

  • Kekule.js - This is a JavaScript toolkit by Chen Jiang with an associated publication in JCIM this year. JavaScript cheminformatics is still in its infancy, and it's good a see a new player in this area. It's currently at version 0.7. Interestingly, it appears to include in _extras, a JavaScript version of Open Babel created using Emscripten.
  • OpenChemLib - This Java toolkit was developed by Thomas Sander at Actelion, and is the engine behind Data Warrior. It has since been converted to JavaScript (using GWT) by Luc Patiny and used, for example, in the impressive Wikipedia Chemical Structure Explorer.
  • Lilly MedChem Rules - Strictly speaking, this is not obviously a toolkit, but a commandline Ruby application. However, there is a C++ cheminf toolkit sitting behind that application, which was developed by Ian Watson at Eli Lilly.

Let me know if I've missed anything. For a more comprehensive overview of Open Source Molecular Modelling see the very recent paper by Pirhadi et al, which has an associated Github repo for keeping the information up to date.