Saturday 25 February 2012

Portrait of the molecule as a green substructure

I've already mentioned using Open Babel for depicting with SVG. That's a handy way to view a large set of molecules; you can zoom in and out and so forth. Let's look at some more of the features, as I've just been adding them to PNG depiction from SVG.

Let's start with the following basic depiction. Note that information on the SVG output options (e.g. -xC) is available in the docs or via "obabel -H svg", and that you can use the mouse to zoom in, etc.
obabel dataset.sdf -O output1.svg -xC


With some magic, we can convert carboxylic acid groups (and anything else listed in the user-editable superatom.txt) into COOH in the depiction. Let's add thick lines too:
obabel dataset.sdf -O output2.svg -xC --genalias -xA -xt

We can also do some fun stuff with descriptors (see "obabel -L descriptors" for a list). Let's sort by molecular weight and replace the title with the molecular formula and molecular weight:
obabel dataset.sdf -O output3.svg -xC --sort MW
  --title "" --append "formula MW"

You might have noticed that all of the molecules have a substructure in common. Let's highlight some of this in green, and get rid of the other colours:
obabel dataset.sdf -O output4.svg -xC
  -xu -s "[#6]~2~[#6]NCCN=C~2 green"

And finally, if the molecules are related, it can be useful to align the depictions using a substructure in order to identify similarities and differences (this has been improved in the development version):
obabel dataset.sdf -O output5.svg -xC
  -xu -s "[#6]~2~[#6]NCCN=C~2 green" --align

What other depiction features would you find useful?

12 comments:

Chris said...

I like the colour coding of the substructure. In my experiments it seems that if a molecule does not contain the substructure it does not display? Is there a way to display all but still colour code the substructure if present?

Noel O'Boyle said...

That's right. The "-s SMARTS" option is actually a SMARTS filter, which we are reusing for coloring and alignment.

I don't think there's any way to do what you suggest, but adding the functionality would be trivial, e.g. --highlight "SMARTS color SMARTS2 color2".

chris said...

That would be very useful, perhaps need to have default black at the end for molecules that don't match?

Noel O'Boyle said...

Feature added in development version.

Anonymous said...

I can't find any way to highlight a set of atoms using their number in a SDF/MDL file, e.g. to highlight the 1, 3 and 6th atom of the structure.

Is there any option to do this ?

Noel O'Boyle said...

There is an option to add the indices to the atoms (-xi), but right now it's not possible to highlight them. Could you describe where this might be useful?

Anonymous said...

Would be really usefull for the generation of diagrams showing the result of isotopic labeling experiments like the ones you can find on the this wikipedia page : http://en.wikipedia.org/wiki/Isotopic_labeling

Anonymous said...

Hi Noel, Can you highlight only molecule parts that conform to two SMARTS i.e. using "&"
Thanks Mark

Noel O'Boyle said...

Can you give a simple (but real) example? I think this should be handled via the SMARTS pattern....

tsv-test said...

Very cool! Is it possible to get single svgs (in one file, but for each molecule)? Can I use highlighting in Python?
Thanks!

Noel O'Boyle said...

Check out obabel's "-m" option. It splits the input to multiple files.

You can indeed use highlighting in Python. For example, if you have a Pybel Molecule called "mol", the following will work. Afterwards, if you call 'draw' or write out an svg, any C=O will be green:

pybel._operations['highlight'].Do(mol.OBMol, "C=O green")

Unknown said...

I am trying to use fast search on a dataset and color the output based on the matched substructure.

I am using the command:

../openbabel-2.3.2/bin/obabel all_chemicals.fs -O out.svg -xu -xC -s "test.smiles green" -at0.8

I get an svg file with uncolored atoms.

Is this possible? If yes, is the syntax of my command correct. Please help.