The script below creates a circular fingerprint for the example molecule in Bender et al., JCICS, 2004, 44, 170 as follows:
D:\>python25 fp.py
0-C.ar;1-C.ar;1-C.ar;1-N.pl3;2-C.2;2-C.ar;2-C.ar 0
-N.pl3;1-C.ar;2-C.ar;2-C.ar 0-C.ar;1-C.ar;1-C.ar;2
-C.ar;2-C.ar;2-N.pl3 0-C.ar;1-C.ar;1-C.ar;2-C.ar;2
-C.ar 0-C.ar;1-C.ar;1-C.ar;2-C.ar;2-C.ar 0-C.ar;1-
C.ar;1-C.ar;2-C.2;2-C.ar;2-C.ar 0-C.ar;1-C.2;1-C.a
r;1-C.ar;2-C.ar;2-C.ar;2-N.pl3;2-O.co2;2-O.co2 0-C
.2;1-C.ar;1-O.co2;1-O.co2;2-C.ar;2-C.ar 0-O.co2;1-
C.2;2-C.ar;2-O.co2 0-O.co2;1-C.2;2-C.ar;2-O.co2
Here's the script, which is considerably shorter than the original one.:
import pybel
# Set up a translator to Sybyl atom types
ttab = pybel.ob.OBTypeTable()
ttab.SetFromType("INT")
ttab.SetToType("SYB")
if __name__ == "__main__":
D = 3 # default max depth
mol = pybel.readstring("smi", "c1(N)ccccc1c(=O)[O-]")
fp = []
for i in range(len(mol.atoms)):
ans = []
for atom, depth in pybel.ob.OBMolAtomBFSIter(mol.OBMol, i + 1):
if depth > D: break
atomtype = ttab.Translate(atom.GetType())
ans.append("%d-%s" % (depth - 1, atomtype))
fp.append(";".join(sorted(ans)))
print "\t".join(fp)
Congrats for the Pybel paper and thanks for the script examples ... very useful for getting into it.
ReplyDeleteThanks for the encouragement. You can see a list of examples at:
ReplyDeletehttp://openbabel.org/wiki/Developer:Python_Tutorial
Noel, is there already a C++ example for this iterator, e.g. for creating this fingerprint?
ReplyDeleteI tried finding an example in the source code, could you pointme to one.
FOR_BFS_OF_MOL is shown in the API docs. You can access a->GetCurrentDepth() to get the depth, or *a to get the OBAtom.
ReplyDeleteHi,
ReplyDeleteThanks for the example.
Although, the generated descriptors are counts technically and not fingerprints exactly, since the generated features for a molecule are not unique but may occur multiple times.
The script needs modification at step were the features are append to fp list. It should be
if (";".join(sorted(ans))) not in fp:
fp.append(";".join(sorted(ans)))
I think they can still be called fingerprints even if they use counts, just not binary fingerprints.
ReplyDeleteYour code eliminates the duplicates, but is this necessarily an improvement?