Tuesday, 8 May 2018

When all you want is a ring: Part II

The previous post used a fairly strict definition of a ring system; a set of atoms, all of which are in a ring, and which are connected by bonds, all of which are in a ring. But this ignores any attached atoms, and a case can be made that the removal of these attached atoms completely alters the identity of a ring system.

As a first step towards considering these atoms, a value of 2 can be passed to CopySubstructure as the fourth parameter. This sprouts dummy atoms to replace missing bonds, and so all para-substituted phenyl rings (for example) become *c1ccc(*)cc1.

To go beyond this and actually recover the identity of the attached atoms, well, that's a little bit trickier. To begin with, we need to think about which atoms we want to copy. Depending on your application, you might want to focus on atoms that influence a particular aromaticity model, or have tautomeric potential. Here I'm going to include any atom that's doubly bonded, and any non-C atoms that are singly-bonded.

The most obvious way to do this is to add these atoms to the atomsToCopy variable before doing the copy. Unfortunately, that approach quickly runs into difficulties as the atom might be attached to two rings (e.g. C1CC1OC2CCCC2), or worse still, the atom might itself be in a ring (e.g. C1CC1=C2CCCC2). In fact, the only way to do this "straightforward approach" is to first count the ring systems and label the atoms belonging to each (e.g. with a flood-fill algorithm), and then one-at-a-time copy each ring system and the attached atoms. It works, but elegant it ain't.

So let's do it a different way, a way that turns out to be quite a bit simpler and handles all of the ring systems in a single copy. Remember above when I mentioned para-substituted phenyl rings becoming *c1ccc(*)cc1? No? Well, I did. Anyway, let's suppose that we know the original atom to which each dummy atom corresponds; we could then change the dummy atom to have the same atomic number and charge as the original. Well, what d'yaknow - the fifth parameter to CopySubstructure can be used to find exactly this correspondence.

Oh yeah, there's one catch. Any stereo involving the attached atoms will be missing (I didn't implement this in CopyStructure). If this is important, then you'll need to use the first approach I described above. Anyway, here's the code, which should be used in combination with the code samples in the original post. For the two examples listed above, C1CC1OC2CCCC2 and C1CC1=C2CCCC2, it will produce C1CC1O.C1(CCCC1)O and C1CC1=C.C1(=C)CCCC1.
atomorder = ob.vectorUnsignedInt()

def CopyRingSystem_3(orig, copy):
    for atom in ob.OBMolAtomIter(orig):
        if atom.IsInRing():
    for bond in ob.OBMolBondIter(orig):
        if not bond.IsInRing():
    ok = orig.CopySubstructure(copy, atomsToCopy, bondsToExclude, 2, atomorder)
    assert ok

    mark_for_deletion = []
    for atom in ob.OBMolAtomIter(copy):
        if atom.GetAtomicNum() != 0:
        bond = next(ob.OBAtomBondIter(atom))
        bondorder = bond.GetBondOrder()
        nbr = bond.GetNbrAtom(atom)
        origatom = orig.GetAtom(atomorder[atom.GetIdx()-1])
        if bondorder == 2 or origatom.GetAtomicNum() != 6:
            # Turn the asterisk into the 'original atom'
            valence = origatom.BOSum() + origatom.GetImplicitHCount()
            atom.SetImplicitHCount(valence - bondorder)
            nbr.SetImplicitHCount(nbr.GetImplicitHCount() + bondorder)
    for atom in reversed(mark_for_deletion):

    return ok

No comments: