Thursday 31 July 2008

Calculate circular fingerprints with Pybel II

OpenBabel 2.2.0 came out a little while ago with a lot of new features. One of the new features is a breadth-first iterator over a molecule that returns the current depth as well as the current atom (OBMolAtomBFSIter). Nice. Because now it's a doddle to write a Python script to create circular fingerprints.

The script below creates a circular fingerprint for the example molecule in Bender et al., JCICS, 2004, 44, 170 as follows:
D:\>python25 fp.py
0-C.ar;1-C.ar;1-C.ar;1-N.pl3;2-C.2;2-C.ar;2-C.ar 0
-N.pl3;1-C.ar;2-C.ar;2-C.ar 0-C.ar;1-C.ar;1-C.ar;2
-C.ar;2-C.ar;2-N.pl3 0-C.ar;1-C.ar;1-C.ar;2-C.ar;2
-C.ar 0-C.ar;1-C.ar;1-C.ar;2-C.ar;2-C.ar 0-C.ar;1-
C.ar;1-C.ar;2-C.2;2-C.ar;2-C.ar 0-C.ar;1-C.2;1-C.a
r;1-C.ar;2-C.ar;2-C.ar;2-N.pl3;2-O.co2;2-O.co2 0-C
.2;1-C.ar;1-O.co2;1-O.co2;2-C.ar;2-C.ar 0-O.co2;1-
C.2;2-C.ar;2-O.co2 0-O.co2;1-C.2;2-C.ar;2-O.co2

Here's the script, which is considerably shorter than the original one.:
import pybel

# Set up a translator to Sybyl atom types
ttab = pybel.ob.OBTypeTable()
ttab.SetFromType("INT")
ttab.SetToType("SYB")

if __name__ == "__main__":
D = 3 # default max depth
mol = pybel.readstring("smi", "c1(N)ccccc1c(=O)[O-]")

fp = []
for i in range(len(mol.atoms)):
ans = []
for atom, depth in pybel.ob.OBMolAtomBFSIter(mol.OBMol, i + 1):
if depth > D: break
atomtype = ttab.Translate(atom.GetType())
ans.append("%d-%s" % (depth - 1, atomtype))
fp.append(";".join(sorted(ans)))
print "\t".join(fp)

6 comments:

  1. Congrats for the Pybel paper and thanks for the script examples ... very useful for getting into it.

    ReplyDelete
  2. Thanks for the encouragement. You can see a list of examples at:
    http://openbabel.org/wiki/Developer:Python_Tutorial

    ReplyDelete
  3. Noel, is there already a C++ example for this iterator, e.g. for creating this fingerprint?

    I tried finding an example in the source code, could you pointme to one.

    ReplyDelete
  4. FOR_BFS_OF_MOL is shown in the API docs. You can access a->GetCurrentDepth() to get the depth, or *a to get the OBAtom.

    ReplyDelete
  5. Hi,
    Thanks for the example.
    Although, the generated descriptors are counts technically and not fingerprints exactly, since the generated features for a molecule are not unique but may occur multiple times.
    The script needs modification at step were the features are append to fp list. It should be
    if (";".join(sorted(ans))) not in fp:
    fp.append(";".join(sorted(ans)))


    ReplyDelete
  6. I think they can still be called fingerprints even if they use counts, just not binary fingerprints.

    Your code eliminates the duplicates, but is this necessarily an improvement?

    ReplyDelete