With any tool, users will often come up with ways of applying it that the developers of the tool did not originally anticipate. In the Open Babel paper, I gave an example where the InChI was used (by Fábián and Brock) as part of a workflow to identify a specific class of racemic crystals - I don't think that NIST had this use-case originally in mind.
Similarly, although Open Babel's path-based (or Daylight-type) fingerprint FP2 was developed for similarity searching of databases, we realised that users wanted to use the information in the fingerprint for other purposes. From time to time, someone would ask on the mailing list what fragments corresponded to each of the 1024 bits. At first, our response was to point out that we couldn't really say as (a) more than one fragment might correspond to a particular bit and (b) the hashing algorithm that was used to link the fragments and the bits only worked one-way.
Eventually we realised that people wanted something more, and so Chris added an output option to describe the fragments and their corresponding bits. These can be used just like fragments from other fragmentation schemes (looking for privileged fragments, unusual fragments, whatever), and the purpose of this blog post is to show how to get to grips with these fragments by visualising them.
The example molecule is:
generated by:
obabel -:N1CC1C(=O)Cl -O example.png
And here are the corresponding fragments generated by the FP2 fingerprint (scroll to zoom in the image below, click+drag to pan):
In cheminformatics, there are two reasons why one might want to calculate the RMSD between two conformers. The first is to check whether two conformers are very close in structure - e.g. for the purpose of generating a diverse set of conformers. This problem is solved using (1) a least squares alignment followed by (2) calculation of the RMSD.
The other situation is comparing two sets of 3D coordinates to see whether a prediction method has accurately reproduced experimental coordinates (e.g. docking). This just requires step (2) above.
The situation is complicated a little bit by the fact that only "heavy" atoms (i.e. non-H atoms in this context) are typically used to calculate the RMSD. A much greater complication is that automorphisms (well, isomorphisms of two molecules which are identical, to be exact) must be taken into account in both cases above. For example, consider the case where two para-substituted benzene rings must be compared; the RMSD calculation must take into account the fact that a 180 degree flip of the ring might yield a smaller RMSD.
Anyhoo, here's some Pybel code that will calculate the RMSD between a crystal pose and a set of docked poses. The code also illustrates how to access the isomorphisms. You should modify the code for your specific purpose:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters