Sunday, 21 December 2008

Have your hamburger and eat it - Edit molecules in PDFs

Wouldn't it be nice to be able to copy a molecule from a PDF into a molecular drawing package? Well, here are some instructions for doing this on Windows with BKChem, and using OSRA to do the conversion. Click on the image to the right for a screenshot of this in action.

(1) Install Python 2.6 (or just use 2.4 or 2.5 if you have one of these already)
(2) Install the Python Imaging Library 1.1.6 for your version of Python
(3) Download and extract BKChem-0.12.5.zip
(4) Drop convert_clipboard_image.py and convert_clipboard_image.xml into the BKChem plugins folder (Note: if the webserver is down, you can get these files here and here)
(5) Download and extract osra-mingw-1-1-0.zip
(6) Set the environment variable OSRA to the full path to osra.exe
(7) Find the Snapshot tool (it has a picture of a camera) in your version of Adobe Reader. In version 9 it's under Tools/Select and Zoom/Snapshot Tool, and you can add it to the toolbar under Views/Toolbars/More Tools.
(8) Open a PDF of a paper containing a molecular structure (e.g. Figure 4 in this paper of mine), and use the Snapshot tool to draw a box around a molecule and hit CTRL+C to copy (if not done automatically).
(9) Start BKChem by double-clicking on bkchem.py (in the bkchem subfolder)
(10) Click "Plugins", "Paste and Convert Image"

Notes:
(0) Open Source software allows you to implement crazy ideas as fast as you can think of them.
(1) This won't work with the latest exe release of BKChem as py2exe didn't include the ImageGrab module (part of PIL).
(2) For this to work with ChemDraw, I need to know how to place ChemDraw XML (which I can create with OpenBabel) on the clipboard so that ChemDraw will be able to paste it. (To be clear, I know how to place it on the clipboard in general, it's just how to place it in such a way that ChemDraw will recognise it as a chemical structure. Hmmm...just found this...)
(3) Bond angles are perturbed slightly (e.g. vertical bonds can become skewed). Maybe this can be fixed on the OSRA side.
(4) The hamburger reference and some background can be found on PMR's blog.
(5) Thanks to Leonard (see comment below) for the information on the Snapshot tool in Adobe Reader.

20 comments:

  1. The Snapshot tool is also available in Reader 8 & 9 - it just doesn't clutter up the toolbar by default.

    Try Tools->Select & Zoom->Snapshot tool.

    if you plan to use it a lot, you can add it to your toolbar using Views->Toolbars->More Tools.

    ReplyDelete
  2. @Leonard: Thanks for that - now I can finally upgrade :-) I did search for it in Reader 9 but it just didn't seem to be there. I will update the post.

    ReplyDelete
  3. Noel,
    This is very nice! I was thinking about a tool like this myself and I am so glad to see it already implemented.
    A word about the skewed bond angles - MCDL library is used to generate coordinates for the superatoms (CF3 for example), and unfortunately sometimes it moves around the rest of the molecule as well. If you remember from the openbabel-devel discussion it was no easy task to get MCDL to work with the superatoms/fragments at all so I am happy we even got this far...

    ReplyDelete
  4. convert_clipboard_image.xml
    refers to OASA, not OSRA is it a typo?

    ReplyDelete
  5. Ah, so that's what you were using MCDL for. But wouldn't it be better to keep the CF3 as it is (rather than expanding it), if that's how it is in the original diagram. I'm not sure if this is supported by SDF though.

    I've fixed the OASA typo - it's not the first time and I'm afraid it won't be the last.

    ReplyDelete
  6. I don't think superatoms are supported by SD format.
    With RDKit backend
    the 2d coordinate generation is much better there, but the stereo might get messed up. And I still was not able to compile RDKit with MinGW.

    ReplyDelete
  7. I was able to get it to work on Linux too
    using desktop-data-manager and PyGTK
    for clipboard grabbing/communication, yay!

    ReplyDelete
  8. I have to note 2 things:
    1) The results depend very much on the magnification setting in your PDF viewer when you copy the image to clipboard. For me 200% seems to give good recognition rates.

    2) My python knowledge is next to non-existent and I'm getting BKChem to crash when I attempt to save the recognized structure as .mol file - the error seems to be related to GTK:
    The program 'bkchem.py' received an X Window System error.
    This probably reflects a bug in the program.
    The error was 'BadWindow (invalid Window parameter)'.
    (Details: serial 6876 error_code 3 request_code 15 minor_code 0)

    any help with solving this BadWindow problem will be most appreciated!

    ReplyDelete
  9. Noel, way cool. The only drawback for most chemists would be the install process. I wonder how hard it would be to bundle up the whole package and deploy with a platform-independent installer...

    I also wonder if you could take this one step further:

    How about a system tool that will convert any image on the (Windows/Linux/Mac) clipboard to a molfile?

    That would decouple the functionality from PDFs - it would work with any image consumer, including a Web browser. It would also enable a variety of drawing packages (ISIS/Draw, ChemWriter, BKChem, JChemPaint, etc.) and chemical spreadsheets to use the conversion.

    ReplyDelete
  10. @Igor: Well done. I looked into using pyGTK on Linux, but it required a newer version of pyGTK than that in the current Debian.

    Interesting point about magnification setting. I would have thought it independent.

    Regarding the error, I think you will have to post the code somewhere or contact Beda. I would test it if I could!

    ReplyDelete
  11. @Rich: The procedure works off the clipboard rather than Adobe per se, and so is already independent. It's easy to stick the mol file back onto the clipboard too, but none of the drawing packages I tested (ChemDraw, BKChem, Symyx Draw) seem to support pasting a molfile from the clipboard.

    Regarding installation, it would be easy for Beda (if he wished) to bundle all of this in with BKChem so that the user would not have to install anything or set any variables.

    The only downside is platform-independence. I'm not sure that there is a cross platform way to cut/paste images from the clipboard. On that note...

    @Igor: ...why did you need to change the code to use pyGTK? Did the ImageGrab module from the Python Imaging Library not work on Linux?

    ReplyDelete
  12. ImageGrab is windows only :(
    I got the code working with PyGTK on Linux - not sure
    if it will also work on Windows (it just might!).

    Noel, how can I post it here, or would you prefer if I emailed it to you?

    ReplyDelete
  13. Uh oh looks like the server hosting Noel's scripts is down...

    ReplyDelete
  14. Here is the code that works on Linux:
    http://osra.svn.sourceforge.net/viewvc/osra/clipboard/convert_clipboard_image.py?content-type=text%2Fplain

    You might also need desktop data manager to get images into the clipboard:
    http://data-manager.sourceforge.net/

    ReplyDelete
  15. I've updated the script to run on both Linux and Windows - it uses PyGTK in the former and ImageGrab in the latter case.
    Also, I checked that it is indeed working with any clipboard manager that allows for image copy/paste,
    adobe reader is not necessary.
    For Windows there is a free clipboard manager -
    ClipMagic and it seems to work.

    ReplyDelete
  16. Igor - before you do too much work, I'm working on a standalone OSRA convertor that sticks the SDF file back on the clipboard. Then, the individual programs just need to be able to paste from the clipboard. This should make it easier to deploy.

    ReplyDelete
  17. Noel, sounds good, looking forward to checking it out!

    ReplyDelete