Friday 20 July 2018

Streamlining the handling of aromaticity

I've done a bit of work over the last year or two stripping back some accumulated cruft in Open Babel. One question that's been on my mind during this process is how much can you simplify the toolkit's core but yet leave it capable of complex behaviour?

Take the case of aromaticity handling. Just this part of the toolkit on it's own raises many questions. Should aromaticity be lazily perceived or require an explicit function call? What happens when the user modifies a molecule? What about copying a molecule? Or just copying a subset of the atoms to another molecule? What should happen when you add two molecules together? If you read aromaticity from a SMILES string, what if it's a different model than the toolkit uses internally? Should the SMILES writer reperceive aromaticity, or just use it as presented?

Often the easiest solution to these problems is to always do the maximum amount of work possible (i.e. throwing away perceived aromaticity information at every opportunity), which I wanted to avoid at all costs. So I went through removing bits and simplifying things, making sure that aromaticity information was copied where possible, and hoping that in the end all of the complex behaviour that I wanted to maintain would still be possible without adding back fixes or kludges. And fortunately it was. I even managed to add additional behaviour, an option to keep the aromaticity information as read from a SMILES string.

I'm a firm believer that there's no point adding features or improving things if you don't write documentation explaining why/how/when they should use it. What's the point doing this work if no-one knows how they can take advantage of it? So, I've just written up documentation on how Open Babel handles aromaticity. The following (exclusive!) excerpt describes how aromaticity information is stored by Open Babel.

Handling of aromaticity

The purpose of this section is to give an overview of how Open Babel handles aromaticity. Given that atoms can be aromatic, bonds can be aromatic, and that molecules have a flag for aromaticity perceived, it’s important to understand how these all work together.

How is aromaticity information stored?

Like many other toolkits, Open Babel stores aromaticity information separate from bond order information. This means that there isn’t a special bond order to indicate aromatic bond. Instead, aromaticity is stored as a flag on an atom as well as a flag on a bond. You can access and set this information using the following API functions:

  • OBAtom::IsAromatic(), OBAtom::SetAromatic(), OBBond::UnsetAromatic()
  • OBBond::IsAromatic(), OBBond::SetAromatic(), OBBond::UnsetAromatic()

There is a catch though, or rather a key point to note. OBMols have a flag to indicate whether aromaticity has been perceived. This is set via the following API functions:

  • OBMol::SetAromaticPerceived(), OBMol::UnsetAromaticPerceived()

The value of this flag determines the behaviour of the OBAtom and OBBond IsAromatic() functions.

  • If the flag is set, then IsAromatic() simply returns the corresponding value of the atom or bond flag.
  • If unset, then IsAromatic() triggers aromaticity perception (from scratch), and then returns the value of the flag.

See https://open-babel.readthedocs.io/en/latest/Aromaticity/Aromaticity.html for the nail-biting conclusion to this thrilling exposition.

No comments: