In a recent post, Andrew Dalke compares SMILES parsing in Frowns (a now unsupported Open Source Python cheminformatics library) and the CDK (an actively developed Open Source Java cheminformatics library). Andrew is somewhat of a parsing expert - indeed, BioPython is built upon a parser called Martel which Andrew developed. The Frowns SMILES parser, contributed by Andrew, shows what an expert can do. It would be interesting to know whether similiar code could be incorporated into the CDK and OpenBabel, and if so, what is the speed tradeoff involved?
Andrew also discusses the compression of SMILES strings, which James Melville and Johnathan Hirst have also recently been studying. Rajarshi has also reviewed this work.