Friday, 13 May 2011

(Almost) Translate the InChI code into JavaScript Part II

So, following on from Part I...

Let's download the InChI code and try to convert it to JavaScript. To put some sort of figure on the size of the codebase, the C code in INCHI_API/inchi_dll comes to 106K lines (including everything via "wc -l") or 4.8M.

The usual procedure to compile the InChI code is to type "make" in INCHI_API/gcc_so_makefile. Instead, comment out line 2 of the Makefile and then do the following to run make:
export EMMAKEN_COMPILER=/home/user/Tools/llvm-2.9/cbuild/Release/bin/clang
LINKER=/home/user/Tools/emscripten-git/tools/ SHARED_LINK=/home/user/Tools/emscripten-git/tools/ C_COMPILER=/home/user/Tools/emscripten-git/tools/ make
cd result
export PATH=~/Tools/llvm-2.9/cbuild/Release/bin:$PATH
llvm-dis -show-annotations
This creates, composed of LLVM disassembled bytecode, which we now convert to JavaScript in the same way as with "Hello World" previously:
# Run emscripten
$EMSCRIPTEN/ $V8/d8 > inchi.js

# Run the Javascript using v8
$V8/d8 inchi.js
Running it, of course, does nothing - it's just a library. Well, I say "just", but it's about 400K lines of code weighing in at 15M. With minification (YUI Compressor) we can get that down to ~7.5M, which zips down to 2MB. The emscripten author recommends passing it through Google Closure (which optimises and minifies the code) but it crashes out with some complaint about hitting a recursion limit. I don't know if it's a problem with the JavaScript code, a bug in Closure or just a feature of InChI generation. It also causes Spidermonkey (and hence Firefox) to complain about maxing out on stack space. Again, I don't know whether there's a way around this.

The next step is to write some code that does something useful with the library. That's all covered in Part III of course.


Igor said...

you might have seen this already - QEMU
ported to JavaScript and running Linux in the browser!

Noel O'Boyle said...