Friday, 9 November 2012

Tricks with SMILES and SMARTS Part II

A much-underused feature of SMILES is the ability to apply 'atom classes' to atoms using a colon (inside square brackets). So, for example, CC and C[CH3:6] both represent ethane, but in the latter case one of the carbons is labelled as being a member of atom class 6.

So what's the meaning of atom class 6? Well, it's whatever you want - it's simply a label that you use to indicate some related information. For example, you might want to record reaction locations, or locations of common substitutions, or mappings between different molecules (reactant/product, or sub/superstructures).

Anyhoo, here's how you access the atom class information in Open Babel from Python:
>>> import pybel
>>> ob = pybel.ob
>>> mol = pybel.readstring("smi", "C[CH3:6]")
>>> print mol.write("smi")

>>> print mol.write("smi", opt={"a":True})

>>> data = ob.toAtomClassData(mol.OBMol.GetData("Atom Class"))
>>> data.HasClass(1)
>>> data.GetClassString(1)
>>> data.HasClass(2)
>>> data.GetClassString(2)


Andrew Dalke said...

I'm old-school in fmcs. I repurpose isotope labels as atom classes.

How does one use SMARTS to match a specific atom class? I thought it would be "[*:6]" but that doesn't work in OB, RD, or OE. That probably means I misunderstand something:

baoilleach said...

Roger likes to use isotopes too. Maybe it's 0.001% more efficient or something. :-)

I think you know more about SMARTS than me, so if you don't know how to match it, I'm guessing it doesn't exist. Kind of a pity.

Andrew Dalke said...

I believe atom classes were added to SMILES after our formative years, and support for them is less common across the toolkits, so we might not think about it when we should.

Even if I might know more about SMARTS, it doesn't mean that I know more about SMARTS-with-atom-classes.