Thursday, 30 October 2008

Generating InChI's Mini-Me, the InChIKey

A recent comment about Pybel's inability to calculate the InChIKey lead me to investigate. It's true enough that currently the InChIKey is not one of the available formats in Pybel. However, by accessing OpenBabel directly it's possible to generate InChIkeys. Here's how.

The InChIKey is available as an option on the InChI format. How would you find this out? Well, "babel -Hinchi" gives all of the options, one of which is option "K" indicating "output InChIKey". Here's how to do a SMILES to InChIKey conversion from Python:
import openbabel as ob

conv = ob.OBConversion()
conv.SetInAndOutFormats("smi", "inchi")
conv.SetOptions("K", conv.OUTOPTIONS)

mol = ob.OBMol()
conv.ReadString(mol, "CC(=O)Cl")
inchikey = conv.WriteString(mol)
assert inchikey == "WETWJCDKMRHUPV-UHFFFAOYAQ"
A future version of Pybel will include the InChIKey format directly.

Image credit: Gustty

Tuesday, 28 October 2008

Cheminformatics toolkit face-off - Depiction Part 3

Part 1
Part 2

We've got two new additions to the face-off, both of which are still under development.

The first is a structure diagram generator which has been added to OpenBabel. This code comes from the MCDL Java applet, and has been translated from the original Java to C++ by one of its authors, Sergey Trepalin. There are a couple of rough edges so if you want anything sorted out before the next release, now's the time to test it out and get those bug reports in.

On the depiction front, Rich has just released the first beta of ChemPhoto, about which you can read more on his blog.

The same dataset was used as before, and the new additions are in the final columns: [depiction] and [structure diagram generation]. Feedback I'm sure is welcomed.

Notes:
(1) Some OB generated images missing due to taking too long. Remember those rough edges I mentioned...?
(2) OB generated coordinates depicted using OASA

Saturday, 18 October 2008

Tip for scripting a workflow

If you are writing several Python scripts that make up a workflow, e.g. if one reads an intermediate output file or pickle from another, then it's a good idea to name each Python script starting with a number. For example, the first script could be 0_parse_dockings.py, and the next one 1_calculate_enrichments.py.

This is handy for a couple of reasons:
  1. When you look at these files in 6 months time, you will know in what order you should run them
  2. When you are running the files, they will autocomplete very easily at the command line (Windows or Linux), e.g. you type "python 0", then hit TAB, and the name of the file will autocomplete. No need to think about the name of the file, (is it calcresults.py or analyseresults.py?) or have the problem of several files which start with the same letter.
Anyone else got any labour-saving tips for the busy scientist?

Friday, 17 October 2008

The SCCI - towards a novel metric for analysing your publication record

What's that sound? It's the credit crunch. Long term readers will know that I'm not one to stand by idly while the economy flounders and banks implode. Inspired by recent news that Thomson Scientific is suing Zotero, I've come up with a way to help scientists cope with the realities of the post-deprecession [1] world.

I've devised a new metric to analyse a person's publication record. It's the SCCI, Science Credit Crunch Index, a measure of the fraction of your papers that will not be available post-deprecession assuming that all of the companies/societies holding the copyright go down the tubes. In my case, it's a round 0.75. Any one interested in helping with a bailout?

[1] It's better than a depression - it's worse than a recession.

Tuesday, 14 October 2008

Look, Cinfony no longer logo-less

Coming up with a name for a software project is pretty hard. For weeks I meditated on a remote peak trying to think of a name that encompassed the entirety of my vision for a cheminformatics toolkit. Well, that didn't work out so I just called it cinfony.

However, when it comes to a logo, it's basically a question of what's on the web that I can legally cannabalise and bung a benzene ring on top of. The result is displayed above. Not too bad, if I do say so myself, although most of the credit goes to Jean Victor Balin, OpenClipArt.org and Inkscape.

Journal of Cheminformatics - A new Open Access journal from Chemistry Central

The title says it all. Apparently, Chemistry Central is getting together a new Open Access journal, the Journal of Cheminformatics. There's a placeholder website already up, naming David Wild as editor-in-chief. There's no announcement yet on the Chemistry Central website, and no mention of a timeframe, but I guess we'll hear more in the near future...

It's a canny move, I'd say. Cheminformaticians are among the most tech-savvy of chemists, are used to constant change in their toolset (e.g. new programming languages, new libraries, new analysis methods), and thus the most likely to adopt to new paradigms.

Wednesday, 8 October 2008

Molecular Graph-ics with Pybel

Graphs are great. There are books and books of algorithms written by generations of computer scientists that take graphs and do interesting things. And better still, there are open source programming libraries available that implement many of these algorithms. So, given a Pybel Molecule, how can it be converted for use by these libraries?

After some googling around, I found three graph libraries accessible from Python (on Windows): networkx, igraph and the Boost Graph Library (BGL). Whatever library is used, the solution is pretty much the same; iterate over all of the atoms and bonds of the molecule, and add nodes and edges to the graph.

Here's a function that takes a Pybel Molecule and returns a networkx graph (remember to install networkx first...):
def mol_to_networkxgraph(mol):
edges = []
bondorders = []
for bond in ob.OBMolBondIter(mol.OBMol):
bondorders.append(bond.GetBO())
edges.append( (bond.GetBeginAtomIdx() - 1, bond.GetEndAtomIdx() - 1) )
g = networkx.Graph()
g.add_edges_from(edges)
return g
What about making an igraph graph?
def mol_to_igraph(mol):
edges = []
bondorders = []
for bond in ob.OBMolBondIter(mol.OBMol):
bondorders.append(bond.GetBO())
edges.append( (bond.GetBeginAtomIdx() - 1, bond.GetEndAtomIdx() - 1) )

atomtypes = [atom.type for atom in mol]

g = igraph.Graph(edges=edges,
vertex_attrs={'atomtype':atomtypes},
edge_attrs={'bondorder': bondorders})
return g
And finally, making a BGL graph:
def mol_to_boostgraph(mol):
edges = []
bondorders = []
for bond in ob.OBMolBondIter(mol.OBMol):
bondorders.append(bond.GetBO())
edges.append( (bond.GetBeginAtomIdx() - 1, bond.GetEndAtomIdx() - 1) )

g = boost.graph.Graph(edges)
bondordermap = g.edge_property_map("integer")
for edge, bondorder in zip(g.edges, bondorders):
bondordermap[edge] = bondorder
g.edge_properties["bondorder"] = bondordermap

atomtypemap = g.vertex_property_map("string")
atomtypes = [atom.type for atom in mol]
for vertex, atomtype in zip(g.vertices, atomtypes):
atomtypemap[vertex] = atomtype
g.vertex_properties["atomtype"] = atomtypemap

return g
Now that you have a graph you can use any of the algorithms provided by the library.

Image credit: Erik Mallinson

Tuesday, 7 October 2008

Ubiquity script for SourceForge

Did you know that for any SourceForge (SF) project, its website can be found at http://whatever.sf.net or that its project page is at http://sf.net/projects/whatever? If you're a big user of SF, remembering random stuff like that can save a lot of time. However, if you want to be able to jump directly to the bugs page for a particular project, you'll need to know more than the project name - you'll need both the project ID (which is a number) and the bug tracker ID (another number).

I've just written a Ubiquity script which makes life easier if you are involved with SF projects. I can bring up the Ubiquity box, type "sf openbabel" (for example), and I'm presented with links to the project page, website, download page, bugs tracker and web SVN for the OpenBabel project. This saves a lot of time clicking around on SF or trying to remember random numbers.

(For some background info on Ubiquity, see the link in my earlier post)

Wednesday, 1 October 2008

OpenBabel, IronPython and ADME filtering

Two OpenBabel-related news items...

The first release of OBDotNet is available for download from SourceForge. This allows you to use OpenBabel from C# and IronPython. Read the text file contained therein for instructions on use from IronPython. Hopefully you can extrapolate from there for C#.

Secondly, some of you may be interested in the recent publication: Lagorce, D.; Sperandio, O.; Galons, H.; Miteva, M. A.; Villoutreix, B. O. FAF-Drugs2: free ADME/tox filtering tool to assist drug discovery and chemical biology projects. BMC Bioinformatics. 2008, 9, 396. This describes a Python library for filtering compound collections using simple ADME rules and SMARTS terms. This is of particular interest to me as it is built on top of Pybel.