An efficient approach to Matched Molecular Pair Analysis (MMPA) consists of fragmenting molecules at a subset of single bonds and then collating the results to find cases where there's the same scaffold (or linker) but a different R group (or groups). If my memory serves me correctly this algorithm was first described by Wagener and Lommerse, but it was the publication by Hussain and Rea that really popularised it.
What I want to discuss here is the importance of handling the 'missing group'. To simplify things, let's fragment only at non-ring single bonds that are attached to rings.
Single cut
Consider the fragmentation of the following three molecules:
Molecule A:
Molecule B:
Molecule C:
For molecules A and B, the algorithm will break the bond to the F and Cl, replacing it by a dummy atom; the two scaffolds are identical and so A and B will be identified as a matched pair.
We also want C to be identified as part of the same matched series, but because the hydrogens are implicit, there's nothing to break off that would yield the matching scaffold. You could convert all implicit Hs to explicit which, well, would work; however you would have to hand in your cheminformatics badge at the desk due to the performance hit :-). Instead, we handle this empty group separately by iterating over the ring atoms that have an implicit H and generating the same scaffold fragments that would have been generated in the explicit case, each time associating it with the *[H] R group:
This is shown here for Molecule C, but the same procedure would apply to A and B. In fact, a significant proportion of the run time will just be dealing with this case as hydrogens on rings are not uncommon.Double cut
The typical use of double cuts is to find molecules that are the same except for replacement of a linker between two rings. After a first cut, a second cut is done on the results if possible.
Molecules D and E share the same scaffolds, and so their linkers are identified as a matched pair. This isn't the case for Molecule F as it only gets as far as a single cut after which there is nothing left to cut for the next step.
The solution is to treat all single cut results as an additional outcome of the double cut process if (and only if) both ends are attached to a ring. After renumbering one of the locants, we associate the resulting scaffold pair with the empty linker: *-*. This procedure doesn't change anything for D and E, but gives the necessary result for F:
Are you finished?
Yes. Fortunately, these corner cases don't affect higher numbers of cuts. The smallest group that can have three or more cuts is a single atom, and this will be handled automatically.
No comments:
Post a Comment