Noel O'Blog

Wednesday, 26 August 2009

Using OpenBabel from Java

OpenBabel 2.2.3 has just been released and with that, a new release of the Java bindings. The full details on using OpenBabel from Java are available on our wiki. On Windows openbabel.jar is included with the OpenBabel GUI so no additional installation is necessary. You just start Eclipse, add the jar file and away you go.

The following example shows how to use OpenBabel from Java. It includes an example of file format conversion, iteration over atoms, and using the SMARTS matcher.

import org.openbabel.*;

public class Test {

   public static void main(String[] args) {
       // Initialise
       System.loadLibrary("openbabel_java");

       // Read molecule from SMILES string
       OBConversion conv = new OBConversion();
       OBMol mol = new OBMol();
       conv.SetInFormat("smi");
       conv.ReadString(mol, "C(Cl)(=O)CCC(=O)Cl");
     
       // Print out some general information on the molecule, atoms
       conv.SetOutFormat("can");
       System.out.print("Canonical SMILES: " + conv.WriteString(mol));
       System.out.println("The molecular weight of the molecule is "
                  + mol.GetMolWt());
       for(OBAtom atom : new OBMolAtomIter(mol)) {
           System.out.println("Atom " + atom.GetIdx() +
                              ": atomic number = " + atom.GetAtomicNum() +
                              ", hybridisation = " + atom.GetHyb());
       }

       // What are the indices of the carbon atoms
       // of the acid chloride groups?
       OBSmartsPattern acidpattern = new OBSmartsPattern();
       acidpattern.Init("C(=O)Cl");
       acidpattern.Match(mol);
     
       vvInt matches = acidpattern.GetUMapList();
       System.out.println("There are " + matches.size() +
                          " acid chloride groups");
       System.out.print("The carbon atoms of the matches are: ");
       for(int i=0; i<matches.size(); i++)
           System.out.print(matches.get(i).get(0) + " ");
   }
}

The output is as follows:

Canonical SMILES: ClC(=O)CCC(=O)Cl
The molecular weight of the molecule is 154.97935999999999
Atom 1: atomic number = 6, hybridisation = 2
Atom 2: atomic number = 17, hybridisation = 0
Atom 3: atomic number = 8, hybridisation = 2
Atom 4: atomic number = 6, hybridisation = 3
Atom 5: atomic number = 6, hybridisation = 3
Atom 6: atomic number = 6, hybridisation = 2
Atom 7: atomic number = 8, hybridisation = 2
Atom 8: atomic number = 17, hybridisation = 0
There are 2 acid chloride groups
The carbon atoms of the matches are: 1 6

Note: although using OpenBabel from Eclipse on Windows works fine, some users have reported problems on Linux with the default OpenBabel build. You probably need to build OpenBabel statically on Linux if you want to use it from Eclipse, but I haven't tested this. In any case, you can just compile it from the command line.

Thursday, 20 August 2009

MolCore - a new beginning for OpenBabel and RDKit

Is it possible to design something exactly right first time? In the world of software design, the answer is no. There are some design decisions whose impact you will only realise years down the line, perhaps as you try to extend the software to handle unforeseen uses. At that point, you're stuck with design decisions that you cannot easily change without major work.

A case in point - in OpenBabel, atoms are numbered from 1 but bonds from 0. Bug heaven.

A few weeks ago the first steps were made in sorting out these sorts of issues; a new project, MolCore, was registered on SourceForge with the goal of developing a common Molecule object for both RDKit and OpenBabel. This will largely be based on RDKit code, but will pool together the collective wisdom of developers on both sides regarding things they wished had been done differently.

As ever with an open source project, all the discussion occurs in public so if interested check out the wiki pages and subscribe to the mailing list.

Monday, 17 August 2009

How does rescoring improve results in docking?

Despite more than a decade of research into improved scoring functions, a scoring function that can accurately predict binding affinities remains an elusive goal. Even the simpler problem of identifying ligands from a data set of inactive molecules is a challenge for modern scoring functions, although for a given protein a particular scoring function may work very well. While there is certainly a need for the development of improved scoring functions with better performance over a wider range of protein families, it is also important to make the maximal use of currently available scoring functions. One of the ways to do this is to combine existing scoring functions in a so-called rescoring experiment.

Testing Assumptions and Hypotheses for Rescoring Success in Protein−Ligand Docking Noel M. O'Boyle, John W. Liebeschuetz and Jason C. Cole, Journal of Chemical Information and Modeling, 2009, ASAP.

A rescoring experiment simply involves taking the docking poses found by Scoring Function A, and assessing them (after local optimization if you want to avoid artifacts) with Scoring Function B. Compared to the length of time a docking requires, rescoring is almost instant. Although rescoring has the potential to improve results in a virtual screen, it won't always. This means that it is important to understand the underlying reasons for success in rescoring. This would then allow the choice of appropriate Scoring Functions A and B.

JCIM has just published some work of mine in which I investigate two hypotheses for rescoring success:

That rescoring success occurs due to some consensus effect between the two scoring functions that eliminates false positives
That rescoring success occurs due to complementary between the scoring functions; that is, the first scoring function is better at pose prediction, while the second is better at scoring actives relative to inactives

As far as I am aware, this is the first study to investigate why rescoring can improve results in a virtual screen.

A cheminformatics journal by any other name...

Over at Wiley, QSAR and Combinational Science is retiring to make way for Molecular Informatics from 2010. The website is molinf.com.

The journal's scope includes but is not limited to the fields of drug discovery and chemical biology, protein and nucleic acid engineering and design, the design of nanomolecular structures, strategies for modeling of macromolecular assemblies, molecular networks and systems, pharmaco- and chemogenomics, computer-assisted screening strategies, as well as novel technologies for the de novo design of biologically active molecules. As a unique feature Molecular Informatics will publish so-called "Methods Corner" review-type articles which will feature important technological concepts and advances within the scope of the journal.

Apparently there's an "open access" option but I cannot find any details.

Wednesday, 22 July 2009

Services built around open source software

Computer-aided chemistry is used today by all the major high-technology companies that are active in chemistry. Just like the meteorologist uses computers to forecast the weather, computers can be used to simulate and predict properties of molecules. This approach is documented to give companies and scientists a high return on investment. But few companies have the resources and skills to make it a reality. The cost of hardware, software, and specialized scientists makes this approach unattainable to most. hBar Lab addresses this problem by putting the required technology online. With hBar Lab there is:
No need for expensive hardware
No upfront payment for software
User-friendly interface makes it accessible for everyone, no specialized scientist necessary.

Source: hBar Lab - Computer-aided Chemistry On Demand

Support and consulting have always been ways of deriving income from open source software, but the web introduces new possibilities centered around web services. I have recently become aware of hBar Lab, whose web application is built entirely on open source software (MPQC, OpenBabel, Jmol) and who perform on-demand calculation of molecular properties:

The user login, select the property, e.g. ionization energy or geometry, and the molecule of interest, and then submit the query. The required calculations are seamlessly executed on computers in the background and once the calculations are done, the results will be returned in the user's inbox. It is as simple as that.

An interesting idea.

TwirlyMol - Status update re world domination

TwirlyMol was the world's first Javascript molecular viewer with shadows. It has been described as "and of course the shadows are cool" by Felix of Chemical Quantum Images.

Although TwirlyMol was only released into the wild to fend for itself in January, it has swiftly outpaced Chime and is rapidly approaching Jmol-like levels of deployment.

Well, almost. At least one ~~other~~ person is using it anyway. As part of a chemistry education project at the University of Wisconsin, TwirlyMol is being used on the ChemPrime wiki and on a student education portal, both of which look like two interesting resources under development. However, you should be warned - the TwirlyMol shadows have been removed!

TwirlyMol is freely available under a do-what-you-want-with-it license. You can even (*sob*) remove the shadows.

Wednesday, 15 July 2009

ANN: Symposium on Visual Analysis of Chemical Data (ACS Spring 2010)

Update 06/Sept/09: See second call for papers.

First Call for Papers:
Visual Analysis of Chemical Data
239th ACS National Meeting
San Francisco, March 21-25, 2010
CINF Division

Dear Colleagues,

We wish to announce an upcoming symposium focusing on innovative methods for visual representation and analysis of chemical data. Just as Edward Tufte has championed maximizing clarity and information content in statistical graphics, there is a need for methods to display chemical information that will maximize understanding, and allow rapid analysis and decision making.

We invite you to submit contributions that address various aspects of visualization of chemical data (such as structures, SAR data, literature, patents) including, but not limited to, the following topics:

With an ever increasing pool of descriptors, along with new and more sophisticated machine learning methods, QSAR models are becoming more difficult to interpret. How can information on model reliability, the presence of activity cliffs, and the range of applicability of a model and other relevant model properties be easily depicted?
Recently, virtual worlds 3D such as Second Life have presented new opportunities and challenges for the representation of chemical data. What is the potential of such a medium in education and communicating with the chemistry community?
Social software allows for rapid and convenient sharing of chemical data. Examples include Google Spreadsheets, ManyEyes, DabbleDB, and wikis, including Wikipedia. What are the implications for chemical research and education?
The visualization of the contents of large chemical datasets presents particular problems. How can an overview of the dataset be visualized so that it presents both the nature of the contents as well as the degree of diversity and similarity within the dataset? How can different datasets be visually compared?
Depicting 3D chemical information in 2D involves a loss of information. However, innovative 2D visualization methods can restore the most relevant information.
Chemical information comprises a diverse array of data types including chemical structures and diagrams (2D and 3D), associated assay results, conformations, QSAR models and their predictions. The visualization and integration of all these data into a single interface that aids interpretation and analysis is a continuing challenge.

We would also like to point out that sponsorship opportunities are available.

The on-line abstract submission system (PACS) will be open for submissions from 24th August. A second announcement will be made at that time.

Please contact Andrew, Jean-Claude or myself if you have any questions.

Yours sincerely,
Noel O'Boyle

On behalf of the symposium organizers:

Dr. Jean-Claude Bradley,
Drexel University, PA
bradlejc@drexel.edu

Dr. Andrew Lang,
Oral Roberts University, OK
alang@oru.edu

Dr. Noel O’Boyle,
Cambridge Crystallographic Data Centre, U.K.
oboyle@ccdc.cam.ac.uk

Image credit: prehensile