Thursday 20 June 2013

Least Publishable Unit #3: Novel Approach to Pharmacophore Discovery

Here's an idea I had towards the end of my last postdoc, but only played around with it a little bit. It relates to the pharmacophore search software Pharmer.

I've mentioned David Koes's Pharmer here before. For the purposes of this discussion the key point is that it can do ultra-fast pharmacophore searching. The catch is that you need to generate conformers and index them in advance, but once done the search software whizzes through them. For example, check out the ZINC Pharmer applet over at the Camacho Lab's website.

Once something becomes ultra-fast, it opens up new possibilities for its use. In particular, we can consider using an ultra-fast pharmacophore search tool to discover pharmacophores in the first place.* So imagine tuning the pharmacophore definition by searching against a set of inactives seeded with actives; you could have a GA handling the tweaking of the pharmacophore and so on.

This method has a couple of nice features. If you are interested in selectivity (think of the 5-HT2A versus 5-HT2C I mentioned in an earlier installment) you could penalise matches against molecules known to bind to only the wrong target and reward matches to molecules known to bind to only the right one. This was why I was interested in this method, but in the end I didn't have enough data to proceed down this road; I did put Pharmer through its paces though and tried some basic optimisations of pharmacophore definitions using this approach.

In short, as far as I know no-one has ever carried out pharmacophore discovery in this way, so it could be something interesting to try especially in the context of developing a selective pharmacophore.

* I think this possibility is mentioned at the end of the Pharmer paper. But as far as I remember I thought of this independently. Makes no difference in any case.

Sunday 16 June 2013

(Almost) Translate the InChI code into JavaScript Part IV

Readers may remember that I made an attempt to convert the InChI code to Javascript some time ago.

Recently MichaƂ Nowotka of the ChEMBL group used the same technique (Emscripten) but a different approach to achieve the same thing. Check out the results over at the ChEMBLog.

Saturday 15 June 2013

Are you a Google Reader reader?

If you are using Google Reader to read this, you have under three weeks to migrate to another platform (Feedly looks like a good bet) before Google pulls the plug on it.

I try not to get too involved with monitoring stats but since Google Reader is about to disappear, it's somewhat interesting to compare the popularity of different RSS feeds (note that one blog may be available through several RSS feeds) based on the number of Google Reader subscribers. This is available from Google Reader if you subscribe to a URL and then click 'Feed Settings' and 'View Details and Statistics' (I think this value may even be available programatically).

For example, for this blog, the URL has 132 subscribers. That's a good number when you consider that In The Pipeline's has 1543. But to put things in perspective, xkcd's has 246,701. :-)

Thursday 13 June 2013

Using CTest with multiple tests in a single executable

Open Babel uses CTest for its testing. While admittedly fairly basic, its integration with the CMake build system and its support for dashboard builds makes it a reasonable choice for a C++ project. Out of the box though, the test framework appears to favour a separate executable for every test which is something of a nuisance; these take some time to build, and also don't provide much granularity over the tests (each executable is either Pass or Fail even if it contains multiple tests).

I've just spent some time modifying Open Babel's tests so that (a) they are all bundled in a single executable and (b) the parts of multi-part tests are called separately. The CTest docs are sparse on how to do this, so here is an attempt to describe the general gist of what I've done.

First of all, the relevant CMakeLists.txt is here. The following discussion changes some details for clarity.

This shows how to define the test names and consistuent parts:
set (cpptests
     automorphism builder ...
set (automorphism_parts 1 2 3 4 5 6 7 8 9 10)
set (builder_parts 1 2 3 4 5)
For tests where a list of parts has not been defined we add a default of 1:
foreach(cpptest ${cpptests})
  if(NOT DEFINED "${cpptest}_parts")
     set(${cpptest}_parts "1")
Don't forget the .cpp files for each test:
foreach(cpptest ${cpptests})
  set(cpptestsrc ${cpptestsrc} ${cpptest}test.cpp)
Each of these .cpp files should have a function with the same name as the file as follows:
int automorphismtest(int argc, char* argv[])
  int defaultchoice = 1;
  int choice = defaultchoice;

  if (argc > 1) {
    if(sscanf(argv[1], "%d", &choice) != 1) {
      printf("Couldn't parse that input as a number\n");
      return -1;

  switch(choice) {
  case 1:
  // Add case statements to handle values of 2-->10
    cout << "Test #" << choice << " does not exist!\n";
    return -1;
  return 0;
Now here's the magic. Using the filenames of the .cpp files, CMake can generate a test_runner executable:
create_test_sourcelist(srclist test_runner.cpp ${cpptestsrc}
add_executable(test_runner ${srclist} obtest.cpp)
target_link_libraries(test_runner ${libs})
When it's compiled you can run the test_runner executable and specify a particular test and subtest:
./test_runner automorphismtest 1
All that's left is to tell CMake to generate the test cases:
foreach(cpptest ${cpptests})
  foreach(part ${${cpptest}_parts})
             ${TEST_PATH}/test_runner ${cpptest}test ${part})
    set_tests_properties(test_${cpptest}_${part} PROPERTIES
Now, when you run "make test" or CTest directly, you will see the test output:
C:\Tools\openbabel\mySmilesValence\windows-vc2008\build>ctest -R automorphism
Test project C:/Tools/openbabel/mySmilesValence/windows-vc2008/build
      Start  6: test_automorphism_1
 1/10 Test  #6: test_automorphism_1 ..............   Passed    2.83 sec
      Start  7: test_automorphism_2
 2/10 Test  #7: test_automorphism_2 ..............   Passed    0.31 sec
      Start  8: test_automorphism_3
 3/10 Test  #8: test_automorphism_3 ..............   Passed    0.13 sec
      Start  9: test_automorphism_4
 4/10 Test  #9: test_automorphism_4 ..............   Passed    0.10 sec
      Start 10: test_automorphism_5
 5/10 Test #10: test_automorphism_5 ..............   Passed    0.15 sec
      Start 11: test_automorphism_6
 6/10 Test #11: test_automorphism_6 ..............   Passed    0.12 sec
      Start 12: test_automorphism_7
 7/10 Test #12: test_automorphism_7 ..............   Passed    0.11 sec
      Start 13: test_automorphism_8
 8/10 Test #13: test_automorphism_8 ..............   Passed    0.12 sec
      Start 14: test_automorphism_9
 9/10 Test #14: test_automorphism_9 ..............   Passed    0.10 sec
      Start 15: test_automorphism_10
10/10 Test #15: test_automorphism_10 .............   Passed    0.12 sec

100% tests passed, 0 tests failed out of 10

Total Test time (real) =   4.45 sec

Monday 10 June 2013

Least Publishable Unit #2: Find corresponding Raman peaks for deuterated species

Here's a little trick I came up with during my PhD to help figure out which resonance Raman peaks corresponded to which when comparing a non-deuterated to a deuterated species. This never made it to a publication but it's time to get it out there

The group to which I belonged, the Han Vos group, studied the photophysics of Ru and Os polypyridyl species. A useful tool to probe the excited state is Raman combined with deuteration of specific ligands. When you do a vibrational frequency calculation, you specify what isotopes to use when calculating the frequencies. In fact, you can instantly repeat the calculation for other isotopes without redoing the whole analysis. In this way you can calculate the frequencies for the deuterated and undeuterated forms.

This gives you a set of peaks but you don't exactly know which correspond to which. For example, in the diagram below for [Ru(bpy)3]2+, if you just consider the 1H and 2H spectra, it's fairly obvious which corresponds to v5 but you're relying on guesswork for the peaks on the left. However, it's useful to know which peaks correspond to which; one reason for this is to develop forcefields for IR calculation (I forget the details).

Anyhoo, the trick I thought of is to use isotopes with fractional masses intermediate between 1H and 2H. Then it's fairly easy to trace the shift in the peaks using the diagram above. Another way to figure out the correspondence would be to look at the wiggle vectors for the frequencies and match up the ones with corresponding wiggles. I used this to verify my results.

And it turned out that according to my awesome diagram which was never published, the peaks had been misassigned in the literature. v9/v'9 and v10/v'10 were not corresponding; instead it was v9/v'10 and v10/v'9. Take that literature!!

For more info, check out Chapter 5 of my thesis which I've just found someone has scanned in and deposited in DCU's Institutional Repository. Nice.

More (and more) Open Source cheminformatics tookits

When I were a lad, there weren't many Open Source cheminformatics toolkits out there. Things are different nowadays - you can't navigate the web without falling over new cheminformatics toolkits left, right and centre. This of course has its own problems, but having interesting new ideas floating around is a sign of a healthy community.

So here is a belated update on some new (and not-so-new) developments:
  • ChemKit - This came out some time ago now with the first release in June 2012. It is a C++ cheminformatics toolkit with Python bindings developed by Kyle Lutz (now working with Marcus Hanwell at Kitware).
  • The Avalon toolkit - Also first released publicly in June 2012, this is a Java toolkit developed internally by Bernhard Rohde at Novartis. This was used for example by Peter Ertl and him in their recent Molecule Cloud paper.
  • Rubabel - Not quite a new toolkit, but a Ruby-friendly wrapper around the Open Babel Ruby bindings from the Prince Lab at Brigham Young University. The general idea is similar to Pybel (i.e. make it easy to use from Ruby by using Ruby idioms and language features) but it takes a completely distinct approach and hopefully will find a following among OB's Ruby users.

Is there anything I've missed? Leave a comment to let me know...