Tuesday 20 November 2012

What's taking so long? - Profiling Open Babel

Profiling code shows where all the runtime is spent. In the case of Open Babel, profiling is a bit awkward due to its use of dynamically-loaded libraries (the format plugins, etc.). So here's how you do it on Linux...
[openbabel/build]$ rm CMakeCache.txt 
[openbabel/build]$ CXXFLAGS="-pg" LDFLAGS="-pg" cmake ../trunk -DCMAKE_INSTALL_PREFIX=../tree -DCMAKE_BUILD_TYPE=DEBUG -DBUILD_SHARED=OFF
[openbabel/build]$ make
This should successfully compile the library and plugins with profiling information, but will fail when it comes to linking one of the executables:
[openbabel/build]$ make obabel
[100%] Built target openbabel
Scanning dependencies of target obabel
[100%] Building CXX object tools/CMakeFiles/obabel.dir/obabel.o
Linking CXX executable ../bin/obabel
/usr/bin/ld: dynamic STT_GNU_IFUNC symbol `strcmp' with pointer equality in `/usr/lib/../lib64/libc.a(strcmp.o)' can not be used when making an executable; recompile with -fPIE and relink with -pie
collect2: error: ld returned 1 exit status
make[3]: *** [bin/obabel] Error 1
make[2]: *** [tools/CMakeFiles/obabel.dir/all] Error 2
make[1]: *** [tools/CMakeFiles/obabel.dir/rule] Error 2
make: *** [obabel] Error 2
Yikes! Using VERBOSE=1 we can see the offending command:
[openbabel/build]$ VERBOSE=1 make obabel 
cd /home/noel/Tools/openbabel/profile/build/tools && /usr/local/bin/cmake -E cmake_link_script CMakeFiles/obabel.dir/link.txt --verbose=1
/usr/local/bin/c++   -static -pg  -g -g3 -fno-inline   -pg CMakeFiles/obabel.dir/obabel.o  -o ../bin/obabel -rdynamic ../src/libopenbabel.a -Wl,-Bstatic -lpthread -Wl,-Bdynamic -lm -lz -Wl,-Bstatic 
/usr/bin/ld: dynamic STT_GNU_IFUNC symbol `strcmp' with pointer equality in `/usr/lib/../lib64/libc.a(strcmp.o)' can not be used when making an executable; recompile with -fPIE and relink with -pie
collect2: error: ld returned 1 exit status
With Roger's help, I was able to change this to something simpler which will compile:
[build/tools]$ /usr/local/bin/c++ -pg  -g -g3 -fno-inline   -pg CMakeFiles/obabel.dir/obabel.o  -o ../bin/obabel ../src/libopenbabel.a -lpthread -lm -lz
Success is mine AT LAST!! Now let's profile it:
[build/bin]$ export BABEL_DATADIR=wherever
[build/bin]$./obabel bigfile.smi -onul
[build/bin]$ gprof ./obabel > gprof.out
Now time to read the gprof manual.

We can design molecular wires for you wholesale Part II

As described previously, last year I published a paper in J. Phys. Chem. C with Geoff Hutchison on Computational Design and Selection of Optimal Organic Photovoltaic Materials.

A week or two ago someone emailed me for a copy of the paper. Fortunately, after a one year embargo, if you have an open access mandate you can request permission from the editor of an ACS journal to deposit the PDF in an institutional repository.

I went through this process a little while ago, and I am pleased to say that a copy of the PDF can now be found in University College Cork's institutional repository at http://hdl.handle.net/10468/748.

I'm not sure how people are going to find this though - it doesn't seem to be very prominent in Google. In particular, the PDF is not indexed (google search with filetype:pdf).

Friday 9 November 2012

Tricks with SMILES and SMARTS Part II

A much-underused feature of SMILES is the ability to apply 'atom classes' to atoms using a colon (inside square brackets). So, for example, CC and C[CH3:6] both represent ethane, but in the latter case one of the carbons is labelled as being a member of atom class 6.

So what's the meaning of atom class 6? Well, it's whatever you want - it's simply a label that you use to indicate some related information. For example, you might want to record reaction locations, or locations of common substitutions, or mappings between different molecules (reactant/product, or sub/superstructures).

Anyhoo, here's how you access the atom class information in Open Babel from Python:
>>> import pybel
>>> ob = pybel.ob
>>> mol = pybel.readstring("smi", "C[CH3:6]")
>>> print mol.write("smi")
CC

>>> print mol.write("smi", opt={"a":True})
C[CH3:6]

>>> data = ob.toAtomClassData(mol.OBMol.GetData("Atom Class"))
>>> data.HasClass(1)
False
>>> data.GetClassString(1)
''
>>> data.HasClass(2)
True
>>> data.GetClassString(2)
':6'
>>>

Thursday 8 November 2012

Plotting accesses on the axis Part III

Following on from Parts I and II, this is the last in a series of posts exploring how journal access statistics provided by Open Access journals such as Journal of Cheminformatics give an insight into relative impact of papers.

It's now over a year since the Blue Obelisk and Open Babel papers were published so let's look at the accesses over that period: ...close to straight lines with a defined slope.

And here's an update of the view over the first month, this time including the Universal SMILES paper: ...it seems like the accesses to this paper mirror almost exactly those of the Blue Obelisk paper.

In conclusion, it would be nice if journals provided these sorts of graphs, or if some third-party website (e.g. one of altmetrics ones) did it. All of the data is on the website; it just needs to be collated as I've done here.

Wednesday 7 November 2012

Tricks with SMILES and SMARTS

Because of the relationship between SMILES and SMARTS, there are some fun tricks you can do (for some value of fun). For example, over at the NextMove blog I have written about creating a substructure hierarchy (here and here).

Here's another example I came up with in response to a recent question on the OB mailing list from Pascal Muller:
Having a molecule (let's say ethylpyridine CCc1cccnc1) and its scaffold (pyridine c1ccncc1), I would like to create a generic scaffold (smarts) for substructure searches: considered atoms become "any" atom (*), and bonds becomes "any" bond (~).
I.e., the smarts should be  CC*~1~*~*~*~*~*~1 (parts not belonging to the scaffold don't change).

Is there a way in Pybel to mutate atom / bond into "*~" apart from string replacement in the smiles? I anticipate problems with brackets doing so.
And my reply:
Atoms with atomic number 0 are written as *. Bonds with BO 4 are
written as $. So...the following hack may work for you in most cases.
:-)

>>> import pybel
>>> mol = pybel.readstring("smi", "CC(=O)Cl")
>>> mol.atoms[0].OBAtom.SetAtomicNum(0)
>>> print mol
*C(=O)Cl
>>> mol.atoms[2].OBAtom.SetAtomicNum(0)
>>> print mol
*C(=*)Cl
>>> bond = mol.OBMol.GetBond(2, 3)
>>> bond.GetBO()
2
>>> bond.SetBO(4)
>>> print mol
*C($*)Cl
>>> print mol.write("smi").replace("$", "~")
*C(~*)Cl
It probably isn't a general solution, but tricks like these can go a long way to solving a problem in many cases.