The dependencies of OSRA give an insight into the term "open source ecosystem". Rather than reinvent the wheel, OSRA makes use of open source libraries for optical character recognition (OCRAD, GOCR), bitmap to vector image conversion (POTRACE), messing about with images (ImageMagick, GREYstoration, ThinImage, CImg) and of course, cheminformatics (it uses either OpenBabel or RDKit). Luckily for Windows users, Igor provides a compiled version.
Back in July, I emailed Igor to request a new feature, the ability to output an SDF file containing the coordinates taken from the image. Three hours later he replied that he'd added it. And now, only four months later, I've gotten around to testing it (insert excuse here)...
Anyway, one nice way of testing conversion code is by roundtripping (or "There And Back Again"). So I took the now legendary depiction faceoff test file (see, for example, here), used the coordinates therein to create a PNG image using OASA (via Pybel or cinfony), ran OSRA on the resulting image to get some coordinates, and then used those coordinates to generate a PNG again. By eyeballing the two PNG images, it's possible to discover errors. So, here are the results of the OChRe.
(1) If you notice any trends in the errors, comment below and Igor might fix them.
(2) Where there are missing images after the OChRe, this is where OSRA missed a bond (probably reasonably), generated two molecules, and caused OASA a headache (it only handles single molecules).
Here's the code:
import pybel import popen2 odir = "images" for mol in pybel.readfile("sdf", "onecomponent.sdf"): title = mol.title print title mol.draw(usecoords=True, show=False, filename=os.path.join(odir, "%s_oasa.png" % title)) o, i, e = popen2.popen3("../osra-trunk/osra -f sdf %s/%s_oasa.png" % (odir, title)) osrasdf = o.read() newmol = pybel.readstring("sdf", osrasdf) try: newmol.draw(usecoords=True, show=False, filename=os.path.join(odir, "%s_osra.png" % title)) except AssertionError: print "Unconnected!"
Image credit: Jason.Hudson