Noel O'Blog: August 2009

Wednesday, 26 August 2009

Using OpenBabel from Java

OpenBabel 2.2.3 has just been released and with that, a new release of the Java bindings. The full details on using OpenBabel from Java are available on our wiki. On Windows openbabel.jar is included with the OpenBabel GUI so no additional installation is necessary. You just start Eclipse, add the jar file and away you go.

The following example shows how to use OpenBabel from Java. It includes an example of file format conversion, iteration over atoms, and using the SMARTS matcher.

import org.openbabel.*;

public class Test {

   public static void main(String[] args) {
       // Initialise
       System.loadLibrary("openbabel_java");

       // Read molecule from SMILES string
       OBConversion conv = new OBConversion();
       OBMol mol = new OBMol();
       conv.SetInFormat("smi");
       conv.ReadString(mol, "C(Cl)(=O)CCC(=O)Cl");
     
       // Print out some general information on the molecule, atoms
       conv.SetOutFormat("can");
       System.out.print("Canonical SMILES: " + conv.WriteString(mol));
       System.out.println("The molecular weight of the molecule is "
                  + mol.GetMolWt());
       for(OBAtom atom : new OBMolAtomIter(mol)) {
           System.out.println("Atom " + atom.GetIdx() +
                              ": atomic number = " + atom.GetAtomicNum() +
                              ", hybridisation = " + atom.GetHyb());
       }

       // What are the indices of the carbon atoms
       // of the acid chloride groups?
       OBSmartsPattern acidpattern = new OBSmartsPattern();
       acidpattern.Init("C(=O)Cl");
       acidpattern.Match(mol);
     
       vvInt matches = acidpattern.GetUMapList();
       System.out.println("There are " + matches.size() +
                          " acid chloride groups");
       System.out.print("The carbon atoms of the matches are: ");
       for(int i=0; i<matches.size(); i++)
           System.out.print(matches.get(i).get(0) + " ");
   }
}

The output is as follows:

Canonical SMILES: ClC(=O)CCC(=O)Cl
The molecular weight of the molecule is 154.97935999999999
Atom 1: atomic number = 6, hybridisation = 2
Atom 2: atomic number = 17, hybridisation = 0
Atom 3: atomic number = 8, hybridisation = 2
Atom 4: atomic number = 6, hybridisation = 3
Atom 5: atomic number = 6, hybridisation = 3
Atom 6: atomic number = 6, hybridisation = 2
Atom 7: atomic number = 8, hybridisation = 2
Atom 8: atomic number = 17, hybridisation = 0
There are 2 acid chloride groups
The carbon atoms of the matches are: 1 6

Note: although using OpenBabel from Eclipse on Windows works fine, some users have reported problems on Linux with the default OpenBabel build. You probably need to build OpenBabel statically on Linux if you want to use it from Eclipse, but I haven't tested this. In any case, you can just compile it from the command line.

Thursday, 20 August 2009

MolCore - a new beginning for OpenBabel and RDKit

Is it possible to design something exactly right first time? In the world of software design, the answer is no. There are some design decisions whose impact you will only realise years down the line, perhaps as you try to extend the software to handle unforeseen uses. At that point, you're stuck with design decisions that you cannot easily change without major work.

A case in point - in OpenBabel, atoms are numbered from 1 but bonds from 0. Bug heaven.

A few weeks ago the first steps were made in sorting out these sorts of issues; a new project, MolCore, was registered on SourceForge with the goal of developing a common Molecule object for both RDKit and OpenBabel. This will largely be based on RDKit code, but will pool together the collective wisdom of developers on both sides regarding things they wished had been done differently.

As ever with an open source project, all the discussion occurs in public so if interested check out the wiki pages and subscribe to the mailing list.

Monday, 17 August 2009

How does rescoring improve results in docking?

Despite more than a decade of research into improved scoring functions, a scoring function that can accurately predict binding affinities remains an elusive goal. Even the simpler problem of identifying ligands from a data set of inactive molecules is a challenge for modern scoring functions, although for a given protein a particular scoring function may work very well. While there is certainly a need for the development of improved scoring functions with better performance over a wider range of protein families, it is also important to make the maximal use of currently available scoring functions. One of the ways to do this is to combine existing scoring functions in a so-called rescoring experiment.

Testing Assumptions and Hypotheses for Rescoring Success in Protein−Ligand Docking Noel M. O'Boyle, John W. Liebeschuetz and Jason C. Cole, Journal of Chemical Information and Modeling, 2009, ASAP.

A rescoring experiment simply involves taking the docking poses found by Scoring Function A, and assessing them (after local optimization if you want to avoid artifacts) with Scoring Function B. Compared to the length of time a docking requires, rescoring is almost instant. Although rescoring has the potential to improve results in a virtual screen, it won't always. This means that it is important to understand the underlying reasons for success in rescoring. This would then allow the choice of appropriate Scoring Functions A and B.

JCIM has just published some work of mine in which I investigate two hypotheses for rescoring success:

That rescoring success occurs due to some consensus effect between the two scoring functions that eliminates false positives
That rescoring success occurs due to complementary between the scoring functions; that is, the first scoring function is better at pose prediction, while the second is better at scoring actives relative to inactives

As far as I am aware, this is the first study to investigate why rescoring can improve results in a virtual screen.

A cheminformatics journal by any other name...

Over at Wiley, QSAR and Combinational Science is retiring to make way for Molecular Informatics from 2010. The website is molinf.com.

The journal's scope includes but is not limited to the fields of drug discovery and chemical biology, protein and nucleic acid engineering and design, the design of nanomolecular structures, strategies for modeling of macromolecular assemblies, molecular networks and systems, pharmaco- and chemogenomics, computer-assisted screening strategies, as well as novel technologies for the de novo design of biologically active molecules. As a unique feature Molecular Informatics will publish so-called "Methods Corner" review-type articles which will feature important technological concepts and advances within the scope of the journal.

Apparently there's an "open access" option but I cannot find any details.