Saturday 9 July 2011

The alpha and omega of SMILES strings

I've just added a feature to the SMILES output in Open Babel that allows the user to specify a particular start and end atom for a SMILES string. No, no, come back! - don't go away! - it's really useful...

If you're still here, let me explain why. It allows you to take advantage of a really nice feature of SMILES strings: concatenating two SMILES strings allows you to easily generate new molecules. I call this (as of two minutes ago) Click Comp Chem or CCC (which of course is a SMILES string itself!).

With Click Comp Chem, the first atom of the second SMILES string will become joined to the "last atom" of the first SMILES string. When I say "last atom", I mean the last atom not within parentheses (i.e. not on a branch), e.g. if I add an atom to "CC(I)(Cl)" by appending "Br" to the end, then the Br will be attached to the second C atom (not the Cl).

CCC is particularly useful if generating polymers from a set of monomers (which is what we were doing in our solar cell paper), or generating a virtual library of combinatorial chemistry products (such as Jean-Claude Bradley did for his CombiUgi project, or as Duffy et al did to a create virtual library of peptides).

But a requirement for CCC is the ability to write a SMILES string such that it starts with a particular atom and ends with another. I'll explain the details of how I implemented this in a separate post. For now, here it is in action for a SMILES string for aspirin, "c1c(C(=O)O)c(OC(=O)C)ccc1". Let's suppose that we want to have the methyl of the ester (atom 10) at the end of the SMILES, and to start the SMILES with the ring carbon para to the ester (atom 13). The output options are "f" and "l" for "first" and "last":
>obabel -:"c1c(C(=O)O)c(OC(=O)C)ccc1" -osmi
c1c(C(=O)O)c(OC(=O)C)ccc1
1 molecule converted

>obabel -:"c1c(C(=O)O)c(OC(=O)C)ccc1" -osmi -xf 13
c1cc(C(=O)O)c(OC(=O)C)cc1
1 molecule converted

>obabel -:"c1c(C(=O)O)c(OC(=O)C)ccc1" -osmi -xl 10
c1c(C(=O)O)c(ccc1)OC(=O)C
1 molecule converted

>obabel -:"c1c(C(=O)O)c(OC(=O)C)ccc1" -osmi -xf 13 -xl 10
c1cc(C(=O)O)c(cc1)OC(=O)C
1 molecule converted

Image credit: Alan Cleaver

4 comments:

Orion said...

Nice work Noel. Being in the business (it would seem) of snapping SMILES strings together, I can fully appreciate the value of this. Congrats on the paper, too!

Noel O'Boyle said...

"Snapping SMILES". Ah, why didn't I think of that? :-)

Jean-Claude Bradley said...

This is going to be super useful for us. We are currently making imine libraries where this will be handy.

Noel O'Boyle said...

That's great to hear. I've been thinking about this problem on and off since you asked about it before, and finally got down to trying some ideas out.