Introducing ReplaceR, a webapp that suggests R group or linker replacements that a medicinal chemist might consider making after the query, as well as those typically made before the query. This builds upon the ideas I described in a previous post.
In the course of a medicinal chemistry project, molecules progress through a series of changes that often involve replacement of R groups (or linkers). A key insight is that there is a certain order to these changes; in general, R groups that are simpler (in size, or in ease of synthesis, or just cheaper) will be made before those are more complex. It would be unusual to make the bromide, without having already tried chloride; if you see a tetrazole, then a carboxylic acid was likely already made.
Given true time series data from projects, we could work out this information. However, even without this, we can infer similar information from co-occurrence data in matched molecular series across publications and patents in ChEMBL (see this 2019 talk and blogpost of mine for more details). To give a sense of the method, consider that butyl co-occurs with propyl in matched series in 801 papers, but given that propyl occurs in 2765 series versus butyl's 1905 we infer that butyl is typically made after propyl (if at all). Groups in the 'after' panel are sorted by co-occurence with the query. Groups in the 'before' panel are instead sorted by co-occurrence/frequency - if we didn't do this, hydrogen or methyl would always be top of the list!
Note that no biological activity data or physicochemical property data (e.g. MWs) is used in this analysis. The suggestions are not based on bioisosteres, SAR transfer or QSAR models, but are simply a reflection of historical choices made by medicinal chemists across a large number of publications. I note that the 2021 paper by Mahendra Awale et al touches on many of the same issues.
This webapp may be useful to medicinal chemists. But in particular, computational chemists and cheminformaticians will find the app and the data behind it useful to support automatic enumeration and virtual screening of likely next steps. Note that the webapp just lists the 'nearest' 12 groups before and after. If you are interested in the full dataset, or just want to give feedback, drop me a line at baoilleach@gmail.com or file an issue on GitHub.
Cheminformatics Credits: EMBL-EBI Chemical Biology Services Team for ChEMBL data, John Mayfield et al for Chemistry Development Kit and CDKDepict, Geoff Hutchison et al for Open Babel, Greg Landrum et al for RDKit, Chen Jiang for openbabel.js, Michel Moreau for rdkit.js, Peter Ertl and Bruno Bienfait for JSME, Noel O'Boyle (me) for ReplaceR.
No comments:
Post a Comment