Tuesday 26 December 2023

Marvel at dc SMILES

In an alternate history, Perl would be of language of choice for cheminformatics, and Ivan Tubert-Brohman's plans for world domination would have come to fruition. He is the author of PerlMol (see also his origin story [hat-tip to Rich]):

PerlMol is a collection of Perl modules for chemoinformatics and computational chemistry with the philosophy that "simple things should be simple". It should be possible to write one-liners that use this toolkit to do meaningful "molecular munging". The PerlMol toolkit provides objects and methods for representing molecules, atoms, and bonds in Perl; doing substructure matching; and reading and writing files in various formats.

Chatting at the recent RDKit meeting, Ivan mentioned some cheminformatics code he was planning to release in a language/tool that predates Perl, and even C! It uses 'dc', the 'desktop calculator' utility on Linux and is now available on GitHub as dc-smiles:

I have come to realize that using RDKit to parse SMILES has been a distraction, because as it turns out, UNIX has shipped with a built-in SMILES parser since the 1970s! It is a special mode of the dc program; one only needs to invoke dc with the right one-word command to enable SMILES-parsing mode.

Given the legendary stability of UNIX, it only makes sense to switch back to using the 50-year old SMILES parser that comes with dc instead of relative newcomers such as RDKit.

Something tells me that the text above should not be taken too seriously. In any case, after checking it out of GitHub, you can try it out using something like:

./smi.sh "O[C@@H](Cl)C#N"

...which gives the following JSON output in Matt Swain's CommonChem format:

{
 "commonchem": {"version": 10},
 "defaults": {
  "atom":{"stereo":"unspecified"},
  "bond":{"stereo":"unspecified"}
 },
 "molecules": [{
  "atoms": [
   {"z": 8, "isotope": 0, "chg": 0},
   {"z": 6, "isotope": 0, "chg": 0, "impHs": 1},
   {"z": 17, "isotope": 0, "chg": 0},
   {"z": 6, "isotope": 0, "chg": 0},
   {"z": 7, "isotope": 0, "chg": 0}
  ],
  "bonds": [
   {"atoms": [3, 4], "bo": 3},
   {"atoms": [1, 3], "bo": 1},
   {"atoms": [1, 2], "bo": 1},
   {"atoms": [0, 1], "bo": 1}
  ]
 }]
}

How he got dc to do this, I don't understand, but may 2024 bring us all more weird and wacky cheminformatics!

No comments: