Recently, Avogadro/OpenBabel have been increasing their support for computational chemistry log files. I am hoping that they will learn from our experience at GaussSum/cclib.
GaussSum was the first Python program I ever wrote, and still bears the hallmarks. When I first started GaussSum (a program which analyses the results of comp chem calculations), I would use the test cases from users to fix bugs. Then over time, I'd lose the test cases as I moved from computer to computer. I couldn't place the test cases in my version control system as the test cases might have been the results of someone's research, and they mightn't be happy to see them publicly available.
Things came to head when dealing with the parsing of vibrational frequencies in the various versions of GAMESS. It turned out that each version of GAMESS (PC-GAMESS, WinGAMESS and GAMESS US) had slightly different output for vibrational frequencies. I ended up bouncing between code that worked for WinGAMESS but not GAMESS and vice versa, depending on who sent me the last bug report. In other words, I was wasting my time fixing bugs which might reappear later. It was around this time that (a) I realised I needed a test suite, and (b) I needed public domain test files, so I could use them in my test suite.
The parser used by GaussSum is now available as a separate project, cclib, and is developed in collaboration with Adam Tenderholt and Karol Langner. This time I put a lot of thought into the test suite, and I think we've done very well. The parsers are initially developed using a set of calculations which are the same for each comp chem package; our test suite ensures that the same results are found in each case and that the units are consistent. We only fix bugs for which a public domain test file is provided ("I place this file in the public domain" is all we need to hear), and regression tests are easily added to the test suite. Our test suite has the final say on commits; commits are reverted if they cause an existing test to fail. This guarantees that cclib can only improve over time.
The inevitable consequence of this policy is that some reported bugs don't get fixed. Sometimes the reporter simply does not respond to the query to place it in the public domain. On two occasions, the reporter was working in a pharmaceutical company and felt it was more hassle than it was worth to do the necessary paperwork to place it in the public domain. So it goes... On the other hand, we do now have a set of more than 200 comp chem log files which go a long way to ensuring that our parsers can handle anything that is thrown at them. The best way of getting these files is to check the data directory of cclib out of subversion and run wget.sh.
In conclusion, if you are thinking of writing software that handles comp chem files, either try to collaborate with others who are working on the same problem (e.g. cclib or OpenBabel), or at the very least take into account some of the comments here. Otherwise, you are simply building a house of cards.