Tuesday, 24 May 2011

(Almost) Translate the InChI code into JavaScript Part III

Following on from Parts I and II...

Ok, so we've converted the InChI library to JavaScript. There are two ways to go from here, either call it directly from JavaScript or write a C function to call it and convert that to JavaScript (and then call that). It might seem that the second plan is more work, but it actually makes things easier as calling these JavaScriptified C functions is a bit tricky especially if we need to pass anything beyond basic C types.

The following code uses the InChI API to read in an InChI, and list the atoms and bonds in a molecule:
#include <stdio.h>
#include <string.h>
#include "inchi_api.h"

int InChI_to_Struct(char* inchi)
    inchi_InputINCHI inp;
    inchi_OutputStruct out;
    int i, j, retval;

    inp.szInChI = inchi;
    memset(&out, 0, sizeof(out));

    retval = GetStructFromINCHI(&inp, &out);
    printf("number of atoms: %d\n" , out.num_atoms);
      inchi_Atom* piat = &out.atom[i];
      printf("Atom %d: %s\n", i, piat->elname);
        printf("Bond from %d to %d of type %d\n", i, piat->neighbor[j], piat->bond_type[j]);

    FreeStructFromINCHI( &out );
    return retval;

int test_InChI_to_Struct()
    int retval;
    char inchi [] = "InChI=1S/CHClO/c2-1-3/h1H";

    retval = myInChI_to_Struct(inchi);
    return retval;

I saved the above code along with the InChI library's own C files in inchi_dll/mycode.c, and added it in the two appropriate places in the Makefile so that the compilation as described in Part II created two extra functions in inchi.js.

To test at the command line, you need to edit the run() method to call InChI_to_Struct, and then call the run() method itself. When you do this, you will find that _strtod is not implemented (so you need to add an implementation - I just pass the call to _strtol) and that there is a call to some clock-related functions (I make this just return 0 - to sort this out properly you would need to look at the original C code and figure out what this function is used for in this context). So, here it is in action if I call run("InChI=1/S/CHClO/c2-1-3/h1H"):
user@ubuntu:~/Tools/inchidemo$ ~/Tools/v8-repo/d8 inchi.js 
number of atoms: 3
Atom 0: C
Bond from 0 to 1 of type 1
Bond from 0 to 2 of type 2
Atom 1: Cl
Bond from 1 to 0 of type 1
Atom 2: O
Bond from 2 to 0 of type 2

Once tested, you can make a webpage that incorporates it. Using Chrome, check out the InChI JavaScript demo.

So...does it work? Well, almost. For some simple InChIs it works perfectly. For others, it returns an error. There are a couple of ways of tracking down the problem but, you know, I have to draw the line somewhere so I'll leave that as an exercise for the reader. Also, the page needs to be refreshed after each InChI, so there's something wrong there with the way I've set it up. The file size is currently too big, but that can be reduced by leaving out unnecessary functions (for example) as well as by using the techniques discussed in the previous post. Perhaps the biggest problem is that the code maxes out the stack space on Firefox/Spidermonkey - this can probably only be addressed by discussion with the emscripten author and changes to the InChI source code.

So that's where I'll leave it for now. I'm very impressed with how well this works - the whole idea is really quite amazing and I didn't expect to get this far, especially with such a complex piece of code. I'll leave the interested reader with a few questions: can you track down all the problems and sort them out?, what other C/C++ libraries could usefully be converted to JavaScript?, and what other languages can be generated from LLVM bytecode?

Supporting info: Various versions of the InChI JavaScript code: vanilla, for running at command-line, ready for webpage, and finally minified.

Acknowledgement: Thanks to kripken, the main emscripten author, for the rapid fix to my reported bug.

1 comment:

Egon Willighagen said...

Mmmm... impressive!