Thursday, 31 July 2008

Calculate circular fingerprints with Pybel II

OpenBabel 2.2.0 came out a little while ago with a lot of new features. One of the new features is a breadth-first iterator over a molecule that returns the current depth as well as the current atom (OBMolAtomBFSIter). Nice. Because now it's a doddle to write a Python script to create circular fingerprints.

The script below creates a circular fingerprint for the example molecule in Bender et al., JCICS, 2004, 44, 170 as follows:
D:\>python25 fp.py
0-C.ar;1-C.ar;1-C.ar;1-N.pl3;2-C.2;2-C.ar;2-C.ar 0
-N.pl3;1-C.ar;2-C.ar;2-C.ar 0-C.ar;1-C.ar;1-C.ar;2
-C.ar;2-C.ar;2-N.pl3 0-C.ar;1-C.ar;1-C.ar;2-C.ar;2
-C.ar 0-C.ar;1-C.ar;1-C.ar;2-C.ar;2-C.ar 0-C.ar;1-
C.ar;1-C.ar;2-C.2;2-C.ar;2-C.ar 0-C.ar;1-C.2;1-C.a
r;1-C.ar;2-C.ar;2-C.ar;2-N.pl3;2-O.co2;2-O.co2 0-C
.2;1-C.ar;1-O.co2;1-O.co2;2-C.ar;2-C.ar 0-O.co2;1-
C.2;2-C.ar;2-O.co2 0-O.co2;1-C.2;2-C.ar;2-O.co2

Here's the script, which is considerably shorter than the original one.:
import pybel

# Set up a translator to Sybyl atom types
ttab = pybel.ob.OBTypeTable()
ttab.SetFromType("INT")
ttab.SetToType("SYB")

if __name__ == "__main__":
D = 3 # default max depth
mol = pybel.readstring("smi", "c1(N)ccccc1c(=O)[O-]")

fp = []
for i in range(len(mol.atoms)):
ans = []
for atom, depth in pybel.ob.OBMolAtomBFSIter(mol.OBMol, i + 1):
if depth > D: break
atomtype = ttab.Translate(atom.GetType())
ans.append("%d-%s" % (depth - 1, atomtype))
fp.append(";".join(sorted(ans)))
print "\t".join(fp)

6 comments:

Joerg Kurt Wegner said...

Congrats for the Pybel paper and thanks for the script examples ... very useful for getting into it.

baoilleach said...

Thanks for the encouragement. You can see a list of examples at:
http://openbabel.org/wiki/Developer:Python_Tutorial

Joerg Kurt Wegner said...

Noel, is there already a C++ example for this iterator, e.g. for creating this fingerprint?

I tried finding an example in the source code, could you pointme to one.

baoilleach said...

FOR_BFS_OF_MOL is shown in the API docs. You can access a->GetCurrentDepth() to get the depth, or *a to get the OBAtom.

Anonymous said...

Hi,
Thanks for the example.
Although, the generated descriptors are counts technically and not fingerprints exactly, since the generated features for a molecule are not unique but may occur multiple times.
The script needs modification at step were the features are append to fp list. It should be
if (";".join(sorted(ans))) not in fp:
fp.append(";".join(sorted(ans)))


baoilleach said...

I think they can still be called fingerprints even if they use counts, just not binary fingerprints.

Your code eliminates the duplicates, but is this necessarily an improvement?