Friday, 3 April 2009

Some short stories

  • I want to flag up Andrew Dalke's course at the end of April on Python and cheminformatics. While I might disagree with Andrew's toolkit of choice, there's no doubt that the skills learnt will be of great benefit to any cheminformatician in their day-to-day work. As well as a cheminformatics portion, the course includes matplotlib (plotting), communicating with Excel, XML processing, subprocess (for calling command-line programs), NumPy, R, SQL and Django.
  • The first issue of Journal of Cheminformatics has hit the electronic shelves. Point your RSS reader to the feed. Best of luck to Christoph and David.
  • Is 2009 the year of OChRe on the desktop? After almost a decade of little development in this area, we have in quick succession papers on ChemReader, OSRA and now Clide Pro, an update of the venerable Clide. The techniques used by the new version are described in detail in the paper. Unfortunately, there is little in the way of comparison either to the original Clide or other OChRe software. On the plus side, the dataset of images discussed in the paper has been made available as supporting material with the intention of forming part of a community benchmark for performance comparisons (although it's not clear whether this dataset was also used for training the software).
  • There seems to be some confusion over the name of this field. Is it OSR (Optical Stucture Recognition, according to OSRA), OCR (Optical Chemical Recognition, ala chemOCR), OCSR (you guessed it, Optical Chemical Structure Recognition, as referred to in the Clide Pro paper), or OChRe (Optical Chemical Recognition again, but spelling out a real word; it also has that InChI up-and-down thing going on)?
  • Did your experiments fail again? Tell me about it. I mean that literally, because you've got your choice of journals to publish in. There's the All Results Journal ("all results are good results") or (for the more mathematically inclined) Rejecta Mathematica.

1 comment:

Igor said...

CLiDE Pro manuscript shows a very impressive success rate ~89%.
Too bad the authors chose a completely new set of test images and not one of their older sets where the results are known for previous versions of CLiDE and OSRA, so the comparison is difficult.
Another complication is that
at least some of the images
seems to have been scanned at a higher resolution than the "traditional" ceiling of 300 dpi. Compare for example image2.tif and image3.bmp - the font height in the former is about 22 pixels, the latter - 42 pixels, which indicates that the latter was probably scanned at a resolution higher than 300 dpi.