Tuesday, 8 May 2007

Providing datasets as text files is much more useful than making them available as PDFfiles (hamburger, anyone?). Recently a colleague of mine, Dave Palmer, published a QSAR model of the aqueous solubility of organic compounds. I'm not sure whether at the time it was possible to provide the supporting information as a text file. In any case, the test and training sets are available on the ACS website as two PDF files rather than as text.

Thanks to the magic of the Chemical Blogspace Greasemonkey Script I can alert anyone who visits the journal website that the supporting information is now available as text files.

petermr said...

This looks very exciting. Please clarify:
- has the greasemonkey translated the PDF to TXT or (b) have you created separate TXT and linked to them.
If the latter, why did you not use TXT anyway instead of PDF

Noel O'Boyle said...

I regret to report that hamburger to cow conversion is not occuring. I am simply providing the original text files which were converted by the ACS into PDFs.

I didn't submit the paper myself so I'm not sure whether it was possible to prevent them being converted into PDFs. Looking at recent issues of JCIM, I see that some authors have provided the supplementary information as .txt files, so it is certainly possible now to avoid the PDF conversion.


hey thanks for providing them in text format.
i spent nearly half an hour trying to convert them from pdf to text (by copying) and later to excel by manually editing.