I've been working on the SMILES code in OpenBabel over the last while. The longer I've spent on it, the more impressed I've been with how it has been handled in the code and also with what a great idea SMILES was in the first place. The same goes for InChI, which has a slightly different goal, but which goes the extra mile and solves normalisation problems which I didn't even know existed.
But do they work? Can their canonicalisation procedures ensure that two identical molecular graphs result in the same canonical SMILES or InChI?
The InChI canonicalisation procedure is summarised in Rich's post. The Daylight algorithm is in Weininger*3, JCICS, 1989, 29, 97. And the review of the field that throws both into question is Ivanciuc's review of Processing Constitutional Information in Gasteiger's Handbook of Cheminformatics.
The key question here is whether the SMILES and InChI algorithms are capable of identifying automorphisms. There is a brute force way to do this, but both SMILES and InChI try to avoid this by identifying symmetry classes using extended connectivity and various graph invariants. An explicit automorphism check is not described as part of either algorithm but yet Ivanciuc argues repeatedly (e.g. at the end of 5.1.4) that any canonicalisation algorithm that does not include an explicit automorphism check "is incomplete, and its use in a chemical database...is unreliable".
The funny thing is that although the SMILES paper came out several years prior to Gasteiger's handbook (1993 vs. 2003), it is not referenced. Furthermore, the InChI developers have followed the same route more recently.
I leave the following question as an exercise for the reader: if a counterexample to the SMILES or InChI algorithms existed, how would one find it?
Image credit: _Blaster_