Similarly, although Open Babel's path-based (or Daylight-type) fingerprint FP2 was developed for similarity searching of databases, we realised that users wanted to use the information in the fingerprint for other purposes. From time to time, someone would ask on the mailing list what fragments corresponded to each of the 1024 bits. At first, our response was to point out that we couldn't really say as (a) more than one fragment might correspond to a particular bit and (b) the hashing algorithm that was used to link the fragments and the bits only worked one-way.
Eventually we realised that people wanted something more, and so Chris added an output option to describe the fragments and their corresponding bits. These can be used just like fragments from other fragmentation schemes (looking for privileged fragments, unusual fragments, whatever), and the purpose of this blog post is to show how to get to grips with these fragments by visualising them.
The example molecule is:generated by:
obabel -:N1CC1C(=O)Cl -O example.pngAnd here are the corresponding fragments generated by the FP2 fingerprint (scroll to zoom in the image below, click+drag to pan):Note: In the visualisation above, hydrogens should be ignored as they are not included in the paths (we could add an option to the SVG depiction to suppress these if necessary). Also, aromatic bonds are depicted as single bonds unless a complete aromatic ring is present in the fragment.
So...how is it done?
The first step in creating this visualisation is to generate a description of the bits in the corresponding FP2 fingerprint:
obabel -:N1CC1C(=O)Cl -ofpt -xs -xf FP2 > example.txtexample.txt will contain the following:
> 0 6 1 6 <670> 0 6 1 6 1 6 <260> 0 6 1 7 1 6 <693> 0 6 1 7 1 6 1 6 <9> 0 7 1 6 <82> 0 7 1 6 1 6 <906> 0 7 1 6 1 6 1 6 <348> 0 8 2 6 <623> 0 8 2 6 1 6 <329> 0 8 2 6 1 6 1 6 <652> 0 8 2 6 1 6 1 6 1 7 <635> 0 8 2 6 1 6 1 7 <653> 0 8 2 6 1 6 1 7 1 6 <46> 0 17 <17> 0 17 1 6 <328> 0 17 1 6 1 6 <219> 0 17 1 6 1 6 1 6 <1009> 0 17 1 6 1 6 1 6 1 7 <24> 0 17 1 6 1 6 1 7 <1010> 0 17 1 6 1 6 1 7 1 6 <456> 0 17 1 6 2 8 <329> 1 7 1 6 1 6 <225>The help text for the FPT format explains what this means:
obabel -H fpt ... For the path-based fingerprint FP2, the output from the ``-xs`` option is instead a list of the chemical fragments used to set bits, e.g.:: $ obabel -:"CCC(=O)Cl" -ofpt -xs -xf FP2 > 0 6 1 6 <670> 0 6 1 6 1 6 <260> 0 8 2 6 <623> ...etc where the first digit is 0 for linear fragments but is a bond order for cyclic fragments. The remaining digits indicate the atomic number and bond order alternatively. Note that a bond order of 5 is used for aromatic bonds. For example, bit 623 above is the linear fragment O=C (8 for oxygen, 2 for double bond and 6 for carbon). ...If we want to visualise these fragments, a small Python script can read example.txt, create the corresponding molecules, and write out their SMILES strings to output.smi:Visualising a file full of SMILES strings is then easy. The following line generates the SVG depiction shown above:
obabel output.smi -O fragments.svg -xC