Saturday, 14 February 2015

cheminformatics.js: Conclusion

Alon Zakai lowres - EmscriptenAnd that's brings me to the end of my emscripten adventures. I also looked into compiling Indigo, the core of which compiles fine (*), but as the SVG rendering requires Cairo, it was no-go (although others seem to have gotten it working).

I didn't discuss performance as I didn't look into it. Emscripten creates asm.js friendly JavaScript, which is where some of the extra speed boost comes from, and based on benchmarks the author claims that it runs between half and one tenth of the speed of native code...which I think is pretty good.

For converting Java toolkits to JS, an alternative approach using Google Web Toolkit has been adopted by JMol and JME. And finally, one could actually write a cheminformatics library directly in JavaScript - this is the approach used by the Kemia project.

* A minor patch to gzguts.h is needed (add #include <unistd.h>)

Image credit: Anna Lena Schiller

Thursday, 12 February 2015

cheminformatics.js: Helium

The previous two posts covered some heavyweights; this time let's turn to a newer C++ toolkit called Helium, by Tim Vandermeersch. With this toolkit, as well as depiction, I also decided to showcase Tim's SMILES validator functionality (also to be found in Open Babel as the 'smy' SMILEY format).

And here's the result.

Coming up, how 'twas done.

Compile Boost with Emscripten

Unlike with RDKit where we managed to get around the use of Boost libraries, here we need to deal with Boost face-on as Helium links to five different Boost libraries.

Following this info on StackExchange, I ran 'normally' and then edited project-config.jam by replacing "using gcc" with:
using gcc : : "/home/noel/Tools/emscripten/em++" ;
Note that the spaces are significant (e.g. preceding the final semicolon). I also set the install directory as local:
  option.set prefix : /home/noel/Tools/boostinstall ;
  option.set exec-prefix : /home/noel/Tools/boostinstall ;
  option.set libdir : /home/noel/Tools/boostinstall/lib ;
  option.set includedir : /home/noel/Tools/boostinstall/include ;
For building static libraries you need the following additional changes in tools/build/src/tools/gcc.jam. Change:
toolset.flags gcc.archive .AR $(condition) : $(archiver[1]) ;
toolset.flags gcc.archive .RANLIB $(condition) : $(ranlib[1]) ;
toolset.flags gcc.archive .AR $(condition) : "/full/path/to/emscripten/emar" ;
toolset.flags gcc.archive .RANLIB $(condition) : "/full/path/to/emscripten/emranlib" ;
Now use "./b2" to build the libraries (iostreams, chrono, timer, filesystem, system) as follows (the NO_BZIP2 was added because it was complaining about not being able to find the relevant header file):
./b2 link=static variant=release runtime-link=static threading=single --with-iostreams -sNO_BZIP2=1 --with-chrono --with-timer --with-filesystem --with-system  install

Compile Helium with Emscripten

CMake has some trouble finding the Boost libraries so you just have to set all of the variables manually:
cmake .. -DCMAKE_TOOLCHAIN_FILE=/home/noel/Tools/emscripten/cmake/Modules/Platform/Emscripten.cmake -DBOOST_ROOT=/home/noel/Tools/boostinstall -DBoost_INCLUDE_DIR=/home/noel/Tools/boostinstall/include -DBOOST_LIBRARYDIR=/home/noel/Tools/boostinstall/lib -DBoost_CHRONO_LIBRARY_RELEASE=/home/noel/Tools/boostinstall/lib/libboost_chrono.a -DBoost_SYSTEM_LIBRARY_RELEASE=/home/noel/Tools/boostinstall/lib/libboost_system.a -DBoost_TIMER_LIBRARY_RELEASE=/home/noel/Tools/boostinstall/lib/libboost_timer.a -DBoost_IOSTREAMS_LIBRARY_RELEASE=/home/noel/Tools/boostinstall/lib/libboost_iostreams.a -DBoost_FILESYSTEM_LIBRARY_RELEASE=/home/noel/Tools/boostinstall/lib/libboost_filesystem.a -DEIGEN3_INCLUDE_DIR=/usr/include/eigen3
make -j2
The default build has "-O2"; I changed this to "-O3" in the top-level CMakeLists.txt. It also (separately) had "-g" so I removed this.

To test, use nodejs to run helium.js.

Add Helium to a webpage

The easiest way to add an additional executable is to add another 'tool'. Place webdepict.cpp in the tools directory and edit tools/CMakeLists.txt to compile it. As before, looking at the output of "VERBOSE=1 make webdepict", we can modify it to generate the HTML page which is then tweaked as desired:
/home/noel/Tools/emscripten/em++   -O3  -pedantic -Wall -Wno-long-long -Wno-sign-compare     @CMakeFiles/webdepict.dir/objects1.rsp  -o webdepict.html  ../src/ /home/noel/Tools/boostinstall/lib/libboost_chrono.a /home/noel/Tools/boostinstall/lib/libboost_timer.a /home/noel/Tools/boostinstall/lib/libboost_system.a /home/noel/Tools/boostinstall/lib/libboost_iostreams.a /home/noel/Tools/boostinstall/lib/libboost_filesystem.a -s EXPORTED_FUNCTIONS="['_ValidateSmiles', '_SmilesToSVG']" -s DISABLE_EXCEPTION_CATCHING=0  --closure 1

cheminformatics.js: RDKit

Following on from the previous post, the second toolkit we'll look at is RDKit. Again, we want to convert RDKit to JavaScript and use it to depict SMILES in the browser.

And here's the result.

Here's how the sausage was made...

Compile RDKit with Emscripten

I used the latest RDKit release (2014-09-2). When configuring, the key insight is that by turning off every option, we don't need any boost libs. Yippee! I can see myself doing this more often in future...
~/Tools/cmake-3.1.2/bin/cmake .. -DCMAKE_TOOLCHAIN_FILE=/home/noel/Tools/emscripten/cmake/Modules/Platform/Emscripten.cmake -DRDKIT_BUILD_PYTHON_WRAPPERS=OFF -DRDKIT_BUILD_CPP_TESTS=OFF -DRDK_BUILD_SLN_SUPPORT=OFF -DBoost_INCLUDE_DIR=/home/noel/Tools/boostinstall/include
make -j2

Add RDKit to a webpage

Create a webdepict.cpp file with the SmilesToSVG functionality and compile it by linking against RDKit. The following CMakeLists.txt may be useful:
  cmake_minimum_required(VERSION 3.0)
  include_directories(${RDKIT_INCLUDE_DIR} ${Boost_INCLUDE_DIR})
  add_executable(webdepict webdepict.cpp)
  target_link_libraries(webdepict ${RDKIT_LIBRARIES})
...which can be configured with...
export RDLIB=/home/noel/Tools/rdkit-2014-09-2/lib
  ~/Tools/cmake-3.1.2/bin/cmake .. -DCMAKE_TOOLCHAIN_FILE=/home/noel/Tools/emscripten/cmake/Modules/Platform/Emscripten.cmake -DRDKIT_INCLUDE_DIR=/home/noel/Tools/rdkit-2014-09-2/Code -DBoost_INCLUDE_DIR=/home/noel/Tools/boostinstall/include -DRDKIT_LIBRARIES=${RDLIB}/;${RDLIB}/;${RDLIB}/;$RDLIB/;${RDLIB}/"
If tested with nodejs, it will write out the SVG for a molecule.

Finally, as with the previous toolkit, modify the CMake build command to create the HTML page which we then edit as desired. Note that this time we need to turn on exception catching:
/home/noel/Tools/emscripten/em++ -O3 @CMakeFiles/webdepict.dir/objects1.rsp -o webdepict.html @CMakeFiles/webdepict.dir/linklibs.rsp -s EXPORTED_FUNCTIONS="['_SmilesToSVG']" -s DISABLE_EXCEPTION_CATCHING=0

Wednesday, 11 February 2015

cheminformatics.js: Open Babel

Following on from the preamble, let's convert Open Babel into JavaScript and then use it to depict SMILES in the browser.

And here's the result.

The following gives an overview of how I did this...


I did all of this on Linux, in a Xubuntu 14.04 VM running on Windows. To begin with, I'll assume you've installed Emscripten master (and all of its associated dependencies) as well as something like CMake 3.1.2. Note that you need a recent version of CMake; for example the one provided with Ubuntu 14.04 (2.8.x) won't handle the toolchain file correctly.

Compile Open Babel with Emscripten

Check out the latest code from github. For the record, I used revision 75414ad.

We need to compile Open Babel with the plugins included statically. Also, since we only need 4 plugins (ASCII, Smiles, SVG, 2D coordinate generation), we don't want to build and include the other 100 or so. To achieve this aim, some delicate customisation of the build files with a machete is required: apply the sharp edge of said instrument to include/openbabel/plugin.h, src/plugin.cpp, src/formats/formats.cmake and src/CMakeLists.txt (*).

Building now would cause some complaints about asciipainter.cpp, so change the #include for openbabel/obutil.h to <math.h> (note to self: push this upstream).;

Create a build directory 'embuild', and from it run cmake as follows:

~/Tools/cmake-3.1.2/bin/cmake -DCMAKE_TOOLCHAIN_FILE=/home/noel/Tools/emscripten/cmake/Modules/Platform/Emscripten.cmake .. -DBUILD_SHARED=OFF -DENABLE_TESTS=OFF 
make -j2
This builds everything, right down to an obabel.js, which you can test as follows with node.js:
nodejs obabel.js -:c1ccccc1 -oascii

Add Open Babel to a webpage

We're going to create a convenience function for use from the webpage, SmilesToSVG. The appropriate code is in webdepict.cpp. To simplify building, add webdepict alongside babel and obabel in tools/CMakeLists.txt so that it is built as part of the Open Babel build.

Running 'make webdepict/fast' will generate webdepict.js, but we want to create a HTML page instead and tweak some of the settings. I've found that the easiest way to do this is to find the command that CMake is using and then edit it. We can find this with "touch webdepict.cpp && VERBOSE=1 make webdepict/fast". I edited this to give the following
$EMSCRIPTEN/em++   -static  -O3  @CMakeFiles/webdepict.dir/objects1.rsp  -o webdepict.html @CMakeFiles/webdepict.dir/linklibs.rsp -s EXPORTED_FUNCTIONS="['_SmilesToSVG']" --closure 1

This generates webdepict.html which you can open in a webbrowser. I modified the result to add some JavaScript that uses the SmilesToSVG function, et voila.

* A tar.gz file, containing the files I refer to, can be found here.

Saturday, 7 February 2015

cheminformatics.js: Preamble

Back in the day I attempted to convert the InChI toolkit to JavaScript with the help of a project called Emscripten. It didn't work great to be honest, but I was excited by the possibilities. Since then, both Emscripten and the project on which it builds, the LLVM/Clang compiler architecture, have become much more mature, with Emscripten being adopted as a Mozilla project (now employing the main developer for example), and Clang now the default compiler on MacOSX. And maybe I've changed too...or read the manual properly...or something.

Anyhoo, let's try to convert a few C++ open source cheminformatics toolkits to JavaScript and see how it goes. The goal in each case will be to create a web page where the user can type in a SMILES string which is depicted as they type.

The first post in the series will attempt to do this with Open Babel...

Thursday, 7 August 2014

In memory of Jean-Claude Bradley Part II

Here is the talk I gave at the symposium in memory of Jean-Claude Bradley.

I start off with a description of an open notebook science experiment I did for JCB, a protein-ligand docking related to malaria. I was able to pull all the details from his wiki, the datasets, the method, the reasons for certain choices, the results and even the edit history.

Next I discuss his use of webservices to develop chemistry applications and show several examples from my past. Finally, I suggest that today JCB would use MineCraft instead of Second Life if he was looking for an immersive environment in which to build chemistry activities for students.

Thursday, 5 June 2014

In memory of Jean-Claude Bradley

This Autumn I will be attending an ACS Meeting in San Francisco for the second time. The first time was in 2010 when I co-organised a symposium with Jean-Claude Bradley and Andy Lang.

I was pretty nervous. I stumbled through some opening remarks before finding my feet and paying tribute to the memory of Warren DeLano, another pioneer of openness in chemistry. When Jean-Claude arrived the next day to chair the second session, I remember thinking wow, this guy is so relaxed and confident he can just turn up in bermuda shorts and a casual shirt and not worry about whether his tie is sending out the right signals - I wish I was like that.

Subsequently, I found out that it hadn't always been like that. He been like, well, everyone else: wearing suits every day eagerly trying to make a good impression, following the funding, playing the game. A day came when he tired of it, looked at what he was doing, and decided it was not going to make the world a better place. So he sat down and thought about how to identify what areas of chemistry were actually "useful":
The best answer I could come up with is to trust what human researchers have to say in their papers. I developed a set of search phrases such as "what is needed now" or "what is missing is" and ran them through Google Scholar and Scirus. One of the results was "there is a pressing need for identifying and developing new drug-based antimalarial therapies".
Reactive Reports #51, 2006 Interview with David Bradley.

...and that was how he started the Useful Chemistry project. You can see the genesis of the project in his initial blog posts.

Others have commented on his legacy in Open Notebook Science. For me, his story of starting Useful Chemistry was what impressed me most: how many of us have the courage to look at our work and ask ourselves, is it useful?

To pay tribute to his remarkable vision, I will be speaking at the Jean-Claude Bradley Memorial Symposium on July 14th, organised by Andy Lang, Tony Williams and Peter Murray-Rust in Cambridge, UK. I encourage you to come along to celebrate the work of an inspiring person and to hear how others are building on his legacy.