Friday, 10 August 2007

Access embedded molecular information in images

Recently Rich Apodaca has been discussing (here, here and here) embedding molecular information in images of molecules, such as a PNG file depicting a 2D structure.

I'm going to show how to extract this type of embedded metadata using Python.

First of all, you'll need an image to work with. Grab the PNG file, rosiglitazone.png, from Rich's post.

Next, you'll need the Python Imaging Library (PIL), a 3rd-party Python extension library available from Pythonware.

Here's the text of an interactive Python session showing how to access the image metadata:

C:\>python
ActivePython 2.4.1 Build 247 (ActiveState Corp.) based on
Python 2.4.1 (#65, Jun 20 2005, 17:01:55) [MSC v.1310 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more inf
ormation.
>>> import Image
>>> myimage = Image.open("rosiglitazone.png")
>>> dir(myimage)
['_Image__transformer', '_PngImageFile__idat', '__doc__', '__
init__', '__module__', '_copy', '_dump', '_expand', '_makesel
...
im', 'getpalette', 'getpixel', 'getprojection', 'histogram',
'im', 'info', 'load', 'load_end', 'load_prepare', 'load_read'
, 'mode', 'offset', 'palette', 'paste', 'png', 'point', 'puta
lpha', 'putdata', 'putpalette', 'putpixel', 'quantize', 'read
...
transform', 'transpose', 'verify']
>>> myimage.info
{'molfile': 'name\nparams\ncomments\n 25 27 0 0 0 0 0 0
0 0 0 V2000\n 1.6910 -6.1636 0.0000 C 0 0 0
0 0 0 0 0 0 0 0 0\n 2.5571 -6.6636 0.0000 C
0 0 0 0 0 0 0 0 0 0 0 0\n 3.4231 -6.1636
0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0\n 3.4231 -
5.1636 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0\n
2.5571 -4.6636 0.0000 C 0 0 0 0 0 0 0 0 0 0
n 7 8 1 0 0 0 0\n 8 9 1 0 0 0 0\n 7 10 1 0
...
...
0 0 0\n 9 11 1 0 0 0 0\n 11 12 1 0 0 0 0\n 12 13
2 0 0 0 0\n 13 14 1 0 0 0 0\n 14 15 2 0 0 0 0
\n 15 16 1 0 0 0 0\n 16 17 2 0 0 0 0\n 17 12 1 0
0 0 0\n 15 18 1 0 0 0 0\n 18 19 1 0 0 0 0\n 19 2
0 1 0 0 0 0\n 20 21 1 0 0 0 0\n 21 22 1 0 0 0
0\n 22 23 1 0 0 0 0\n 23 19 1 0 0 0 0\n 22 24 2 0
0 0 0\n 20 25 2 0 0 0 0\nM END', 'aspect': (1, 1)}
>>> moldata = myimage.info['molfile']


Cool. So now we could write this information to a file (print >> open("myoutputfile.mol"), moldata), or convert it into an OpenBabel molecule and calculate some properties:


>>> import pybel
>>> mymol = pybel.readstring("MOL", moldata)
>>> print mymol.molwt
357.42676

No comments: