Thursday 31 July 2008

Calculate circular fingerprints with Pybel II

OpenBabel 2.2.0 came out a little while ago with a lot of new features. One of the new features is a breadth-first iterator over a molecule that returns the current depth as well as the current atom (OBMolAtomBFSIter). Nice. Because now it's a doddle to write a Python script to create circular fingerprints.

The script below creates a circular fingerprint for the example molecule in Bender et al., JCICS, 2004, 44, 170 as follows:
D:\>python25 fp.py
0-C.ar;1-C.ar;1-C.ar;1-N.pl3;2-C.2;2-C.ar;2-C.ar 0
-N.pl3;1-C.ar;2-C.ar;2-C.ar 0-C.ar;1-C.ar;1-C.ar;2
-C.ar;2-C.ar;2-N.pl3 0-C.ar;1-C.ar;1-C.ar;2-C.ar;2
-C.ar 0-C.ar;1-C.ar;1-C.ar;2-C.ar;2-C.ar 0-C.ar;1-
C.ar;1-C.ar;2-C.2;2-C.ar;2-C.ar 0-C.ar;1-C.2;1-C.a
r;1-C.ar;2-C.ar;2-C.ar;2-N.pl3;2-O.co2;2-O.co2 0-C
.2;1-C.ar;1-O.co2;1-O.co2;2-C.ar;2-C.ar 0-O.co2;1-
C.2;2-C.ar;2-O.co2 0-O.co2;1-C.2;2-C.ar;2-O.co2

Here's the script, which is considerably shorter than the original one.:
import pybel

# Set up a translator to Sybyl atom types
ttab = pybel.ob.OBTypeTable()
ttab.SetFromType("INT")
ttab.SetToType("SYB")

if __name__ == "__main__":
D = 3 # default max depth
mol = pybel.readstring("smi", "c1(N)ccccc1c(=O)[O-]")

fp = []
for i in range(len(mol.atoms)):
ans = []
for atom, depth in pybel.ob.OBMolAtomBFSIter(mol.OBMol, i + 1):
if depth > D: break
atomtype = ttab.Translate(atom.GetType())
ans.append("%d-%s" % (depth - 1, atomtype))
fp.append(";".join(sorted(ans)))
print "\t".join(fp)

6 comments:

Unknown said...

Congrats for the Pybel paper and thanks for the script examples ... very useful for getting into it.

Noel O'Boyle said...

Thanks for the encouragement. You can see a list of examples at:
http://openbabel.org/wiki/Developer:Python_Tutorial

Unknown said...

Noel, is there already a C++ example for this iterator, e.g. for creating this fingerprint?

I tried finding an example in the source code, could you pointme to one.

Noel O'Boyle said...

FOR_BFS_OF_MOL is shown in the API docs. You can access a->GetCurrentDepth() to get the depth, or *a to get the OBAtom.

Anonymous said...

Hi,
Thanks for the example.
Although, the generated descriptors are counts technically and not fingerprints exactly, since the generated features for a molecule are not unique but may occur multiple times.
The script needs modification at step were the features are append to fp list. It should be
if (";".join(sorted(ans))) not in fp:
fp.append(";".join(sorted(ans)))


Noel O'Boyle said...

I think they can still be called fingerprints even if they use counts, just not binary fingerprints.

Your code eliminates the duplicates, but is this necessarily an improvement?