Similarly, although Open Babel's path-based (or Daylight-type) fingerprint FP2 was developed for similarity searching of databases, we realised that users wanted to use the information in the fingerprint for other purposes. From time to time, someone would ask on the mailing list what fragments corresponded to each of the 1024 bits. At first, our response was to point out that we couldn't really say as (a) more than one fragment might correspond to a particular bit and (b) the hashing algorithm that was used to link the fragments and the bits only worked one-way.
Eventually we realised that people wanted something more, and so Chris added an output option to describe the fragments and their corresponding bits. These can be used just like fragments from other fragmentation schemes (looking for privileged fragments, unusual fragments, whatever), and the purpose of this blog post is to show how to get to grips with these fragments by visualising them.
The example molecule is:

obabel -:N1CC1C(=O)Cl -O example.pngAnd here are the corresponding fragments generated by the FP2 fingerprint (scroll to zoom in the image below, click+drag to pan):Note: In the visualisation above, hydrogens should be ignored as they are not included in the paths (we could add an option to the SVG depiction to suppress these if necessary). Also, aromatic bonds are depicted as single bonds unless a complete aromatic ring is present in the fragment.
So...how is it done?
The first step in creating this visualisation is to generate a description of the bits in the corresponding FP2 fingerprint:
obabel -:N1CC1C(=O)Cl -ofpt -xs -xf FP2 > example.txtexample.txt will contain the following:
> 0 6 1 6 <670> 0 6 1 6 1 6 <260> 0 6 1 7 1 6 <693> 0 6 1 7 1 6 1 6 <9> 0 7 1 6 <82> 0 7 1 6 1 6 <906> 0 7 1 6 1 6 1 6 <348> 0 8 2 6 <623> 0 8 2 6 1 6 <329> 0 8 2 6 1 6 1 6 <652> 0 8 2 6 1 6 1 6 1 7 <635> 0 8 2 6 1 6 1 7 <653> 0 8 2 6 1 6 1 7 1 6 <46> 0 17 <17> 0 17 1 6 <328> 0 17 1 6 1 6 <219> 0 17 1 6 1 6 1 6 <1009> 0 17 1 6 1 6 1 6 1 7 <24> 0 17 1 6 1 6 1 7 <1010> 0 17 1 6 1 6 1 7 1 6 <456> 0 17 1 6 2 8 <329> 1 7 1 6 1 6 <225>The help text for the FPT format explains what this means:
obabel -H fpt ... For the path-based fingerprint FP2, the output from the ``-xs`` option is instead a list of the chemical fragments used to set bits, e.g.:: $ obabel -:"CCC(=O)Cl" -ofpt -xs -xf FP2 > 0 6 1 6 <670> 0 6 1 6 1 6 <260> 0 8 2 6 <623> ...etc where the first digit is 0 for linear fragments but is a bond order for cyclic fragments. The remaining digits indicate the atomic number and bond order alternatively. Note that a bond order of 5 is used for aromatic bonds. For example, bit 623 above is the linear fragment O=C (8 for oxygen, 2 for double bond and 6 for carbon). ...If we want to visualise these fragments, a small Python script can read example.txt, create the corresponding molecules, and write out their SMILES strings to output.smi:Visualising a file full of SMILES strings is then easy. The following line generates the SVG depiction shown above:
obabel output.smi -O fragments.svg -xC
2 comments:
As an alternative (this for a single line from the fp2 output, but easily adapted for the whole file):
line="5 7 5 6 5 6 5 6 5 6 5 6 <942>"
#Defined the SMILES strings for bond types in a dictionary
bondtypes={1:"-",2:"=",3:"#",5:":"}
part=map(int,line.split()[:-1])
#Do the first atom
SMILES="[#%s]"%((part[1],))
#Deal with rings
if part[0]!=0:
SMILES += bondtypes[part[0]]+"1"
#Now do the remaining atoms
for i in range(2,len(part)):
if i%2==0:
#bonds
SMILES += bondtypes[part[i]]
else:
#atoms
SMILES+="[#%s]" % ((part[i],))
#Now finish with rings
if part[0]!=0:
SMILES += "1"
print SMILES
sorry - python tabbing appears to have disappeared but should be able to figure it!
Ah! - a nice alternative, with no need to even bust out Pybel.
A minor correction though is that [#6] is not a valid SMILES (you're thinking of SMARTS where it matches both C and c). A simple lookup table would take of that of course.
Post a Comment