Tuesday 23 January 2018

Which came first, the atom or the bond?

Let's suppose you need to iterate over the neighbours of an atom, and you want to know both the corresponding bond order and the atomic number of the neighbour. Given that toolkits typically provide iterators over the attached bonds or the attached atoms, but not both simultaneously, how exactly should you do this?

Should you:
  • (a) iterate over the neighbouring atoms, and then request the bond joining the two atoms?
  •     for nbr in ob.OBAtomAtomIter(atom):
            bond = atom.GetBond(nbr)
    
  • or (b) iterate over the attached bonds, and ask for the atom at the other end?
  •     for bond in ob.OBAtomBondIter(atom):
            nbr = bond.GetNbrAtom(atom)
    

Obviously, either way you get your atom and bond. But which is more efficient? Clearly, the answer to this depends on the internals of the toolkit. But if you assume that each atom knows its attached atoms and bonds, then it's only the second step that determines the relative efficiency. That is:
  • (a) given two atoms find the bond that joins them, versus
  • (b) given an atom and a bond find the atom at the other end

Since the implementation of (a) will probably involve the same test that (b) is doing plus additional work, it follows that (b) must be more efficient. I never really thought about this before until I was writing the kekulization code for Open Babel. It's the sort of thing that's useful to work out once and then apply in future without thinking. Sure, the speed difference may be minimal but given that you have to choose, you might as well write it the more efficient way.

3 comments:

John Mayfield said...

It depends, but most toolkits (not all) use an incidence list rather than adjacency list. In which case as you say a) is actually doing the same b) but twice, more clear if you expand it out:

for b1 in ob.OBAtomBondIter(atom) {
nbr := b1.GetNbrAtom(atom)
for (b2 in ob.OBAtomBondIter(nbr)) {
if (b2.GetNbrAtom(nbr) == atom)
bond := b2;
}
}

John Mayfield said...

Doh formatting, good job I didn't use significant white space :-)

Noel O'Boyle said...

Good point, except about the white space :-)

John pointed out ("personal communication") that for toolkits that use incidence lists (i.e. atoms keep track of the bonds they are attached to), the iterator over attached atoms is actually a convenience function; it's just iterating over the bonds, and calling GetNbrAtom. So unless you want the atom, it's always better to iterate over the bonds.