I believe that the Open chemistry community will wish to move towards InChI as the definitive approach for all canonicalisation in their codes. We have found that "unique SMILES" is not precisely defined and there is no accepted reference implementation that is freely available. For example a given molecule (e.g. caffeine) has at least 9 representations on the public Web.
- Peter Murray-Rust, Feb 2005, Open Babel mailing list
Different software generates different canonical SMILES. The reason for this is simple; no-one has described a canonicalisation scheme for SMILES that includes stereochemistry. Even if we wanted to generate the same SMILES, we cannot do so. Back in 2005, PMR pointed out that the InChI could be used for this purpose. As ever, PMR was way ahead of the times, and to my knowledge no one took up this idea until...
A paper of mine has just been published in J. Cheminf.:
Towards a Universal SMILES representation - A standard method to generate canonical SMILES based on the InChI
NM O'Boyle, Journal of Cheminformatics 2012, 4:22. doi:10.1186/1758-2946-4-22
I describe two approaches to generate a canonical SMILES, one based on roundtripping through the InChI (and so it incorporates the InChI normalisation as a side-effect), and one that just takes the canonical labels from the InChI (so the structure is unchanged). These approaches are available in the development version of Open Babel as options to SMILES output, and should soon be available in Open Babel 2.3.2.
I'm hoping that other toolkits will see merit in this approach and add similar capability. This will allow, for the first time, different toolkits to generate the same SMILES, and for the first time, it will finally be clear how different toolkits disagree on aspects of their chemical model. Only then we will have some progress on sorting out standard algorithms for stereocentre detection, aromatic models and so forth. And all this will be good for toolkits, and good for users.