Friday, 3 June 2022

Diagnosing problems with SMILES

For my poster at the upcoming ICCS, I wanted to categorise any problems with the SMILES strings generated by a recurrent neural network. I did this using the partialsmiles library, a validating SMILES parser I wrote a little while ago.

The speciality of this library is dealing with partial SMILES strings as they are being generated - this potentially allows you to choose an alternative token if the original token causes a problem. However, it can equally well be used with full SMILES strings. Reported errors are broken down into three catagories: valence errors, kekulisation failures and syntax errors. The error message describes the specific problem, and the index of the relevant point in the SMILES string is available. As the docs state, errors associated with the semantics of cis/trans stereo symbols are not currently handled but that's not a problem here.

I mentioned valence errors; this involves a check against a table of allowed valences. I edited the defaults to allow hypervalent nitrogen (i.e. valence 5) as it may be present in the training data. Here's a typical output:
smiles_syntax=20 smiles_valence=1 smiles_kek=22
Total errors: 43   %conversion: 95.7
 22 cases of Aromatic system cannot be kekulized
  4 cases of Unmatched close parenthesis
  3 cases of 1 branch has not been closed
  5 cases of 2 ring openings have not been closed
  6 cases of 1 ring opening has not been closed
  1 cases of 3 ring openings have not been closed
  1 cases of Uncommon valence or charge state
  1 cases of Cannot have a bond opening and closing on the same atom
...and here's the code. Note the use of defaultdict, a hidden gem of the Python library which appears in almost all of my scripts:
from collections import defaultdict

import partialsmiles as ps

if __name__ == "__main__":
    verbose = False
    fname = "Regular_SMILES_1K.smi"

    smiles_syntax = smiles_valence = smiles_kek = 0
    msgs = defaultdict(int)
    N = 0
    with open(fname) as inp:
        for line in inp:
            N += 1
            smi = line.rstrip()
            try:
                mol = ps.ParseSmiles(smi, partial=False)
            except ps.SMILESSyntaxError as e:
                if verbose:
                    print(f"SMILESSyntaxError: {e}")
                smiles_syntax += 1
                msgs[e.message] += 1
            except ps.ValenceError as e:
                if verbose:
                    print(f"ValenceError: {e}")
                smiles_valence += 1
                msgs[e.message] += 1
            except ps.KekulizationFailure as e:
                if verbose:
                    print(f"KekulizationFailure: {e}")
                smiles_kek += 1
                msgs[e.message] += 1

    print(f"{smiles_syntax=} {smiles_valence=} {smiles_kek=}")

    tot_errors = smiles_syntax + smiles_valence + smiles_kek
    print(f"Total errors: {tot_errors}   %conversion: {(N-tot_errors)*100/N:0.1f}")

    for x, y in msgs.items():
        print(f"{y:3d} cases of {x}")

Tuesday, 26 April 2022

Threading time through Vortex

Vortex (a chemical spreadsheet/visualisation software from Dotmatics) has a plugin system built around Jython. Simply drop a .vpy file into a specific scripts folder, and a menu item immediately appears in the application. Here are some notes on using this to communicate with a webserver.

Code organisation

I found it best to separate Vortex-specific code (in the .vpy files) from supporting code that could be written and tested independently. This also naturally enables reuse of code across plugins. This supporting code I put in a folder adjacent to the scripts folder, and accessed it as follows:

import os
import sys
sys.path.append(os.path.join(vortex.getVortexFolder(), "MYFOLDERNAME"))

Something to note is that the application needs to be restarted to pick up on edits made to the supporting Python codebase. This is in contrast to edits made to the .vpy files, which can be tested immediately.

Access to a JSON parser

Communicating with a webservice is most easily done with JSON. Unfortunately, the 'json' module is missing in the bundled Jython. To install your own, you can go down the Java route and download an appropriate .jar file (see for example Chris Swain here or here). As an alternative, I prefer to use Python and the same module I used back in the days of ye olde Python, simplejson, a library which is still supported. I downloaded and extracted this into the MYFOLDERNAME folder mentioned earlier, so that it was available as:

import simplejson as json
Call a webservice without blocking

Passing several hundred or more molecules to a webservice that is calculating some property can take several seconds or indeed minutes. The simple approach to this looks something like the following. Note that here we use time.sleep() as a stand-in for the webservice call:

def main():
    time.sleep(5) # pretend to contact a webservice

if __name__ == "__main__":
    main()

Unfortunately, this causes the entire application to become unresponsive until the time.sleep() is complete. This is because the main thread of a GUI application is supposed to spend its time listening out for events like you clicking on something; if it's busy doing some other work, then it can't respond to those events. The solution is to run your code on a separate thread:

class MyRunnable(java.lang.Runnable):
    def run(self):
        main()
        
if __name__ == "__main__":
    t = java.lang.Thread(MyRunnable())
    t.start()

This seems to works perfectly. The only problem I found was that it prevented any useful error messages from appearing in the console (accessed via Help/Console - diagnostic) apart from "Exception in Thread - 1" or somesuch. If this happened, I temporarily changed the code to call main() directly.

Create and close a dialog box

Given that the calculation might take a while, it's a good idea to indicate to the user that the calculation has started; for example to avoid the user starting the calculation multiple times while waiting for the result. One solution is to pop up an info box immediately to let the user know that the calculation has started, and then close it if still present at the end of the calculation. To do this, we need to create the dialog box ourselves:

import javax.swing.JOptionPane as JOptionPane

pane = JOptionPane("This might take a while...", JOptionPane.INFORMATION_MESSAGE)
dialog = pane.createDialog(None, "The calculation has begun")
dialog.setModal(False)
dialog.setVisible(True)
...
dialog.setVisible(False) # close it if still present
Conclusion

If you find this useful or have any additional tips/tricks feel free to leave a comment.

[Update 27/04/2022] On Twitter, John Mayfield adds "From a few years ago now but I remember the jython version of the Python http request was really slow and was much much faster to use Java’s libs (still via jython)". Chris Swain pointed to his Cambridge MedChem Consulting website which has a large number of useful scripts.