Noel O'Blog: July 2011

Sunday, 24 July 2011

Depict a chemical structure...without graphics Part II

In an earlier post, I showed how PNGs depicting 2D molecular structures could be viewed in a text terminal. This can be useful when you're logged into a remote server and don't have access to a graphical viewer.

Just for fun, I've thrown together an ASCII depiction format for Open Babel that uses aalib. But this time, rather than create a PNG depiction and then convert that to ASCII, this "Painter" (in the language of the Open Babel API) draws the structure directly onto an ASCII canvas. This should lead to better resolution, although I haven't done a proper head-to-head comparison. Here's how it looks in action (compare to the images in my earlier post):

Note that I haven't checked this code into SVN, and I'm not sure I will. For my own reference, I've stored some of the relevant files at this link (diff against r4530). Includes code taken from aa-helper.

Thursday, 21 July 2011

Presentation on Solar Cells at Goslar (Nov 2010)

Now that the paper has come out, I think it's safe to post the corresponding talk I gave at the 6th German Conference on Chemoinformatics at Goslar last November:

De novo design of molecular wires with optimal properties for solar energy conversion

View more presentations from baoilleach

Tuesday, 19 July 2011

Displaying pharmacophores from Pharmer in Avogadro

Avogadro has a plugin architecture that allows you to drop a Python script into a particular folder to add additional functionality, whether it be a new menu item, a new display method or whatever. I've previously used this to write an extension to read comp chem log files using cclib. This time I need to add a new display method (or Engine in the lingo of Avogadro) that will depict a pharmacophore.

So what's Pharmer? Well, until the release of Pharao by Silicos last year there were no open source pharmacophore programs. And the recent release of Pharmer by David Koes bumps the number up to two. Pharmer focuses on pharmacophore search - ultra-fast pharmacophore search, that is. It does this by precomputing an index that allows very efficient searching. The common denominator of Pharmer and Pharao is not only "Phar", but also that both use the Open Babel library.

The file format used by Pharmer is simply JSON, which is easily edited by a text editor. Here's an example that was created using the ZINC Pharmer interface: HSP90.txt. If you install Avogadro 1.0.1 with Python support on Windows, also have Python 2.6 installed in C:\Python26, and copy the Python script shown below to C:\Program Files (x86)\Avogadro\bin\engineScripts, then you can add the Pharmacophore engine (click "Add" in the "Display Types" panel; note: you can add more than one Pharmacophore engine if you want to display more than one pharmacophore at the same time). Once added, click on Settings and Browse to open the Pharmer JSON file. If nothing appears on the screen, try rotating the display slightly.

Note: The image above was created by joining the output of the following Pharmer command using "obabel --join" and depicting with Avogadro:

pharmer dbsearch -dbdir db_712_actives -in HSP90.txt -out results.sdf -reduceconfs 1 -sort-rmsd

pharmacophore.py:

import os
from PyQt4.Qt import *
import Avogadro as avo
from numpy import array

# Get JSON from my own Python installation
import sys
sys.path.append("C:\\Python26\\Lib\\site-packages")
sys.path.append("C:\\Python26\\lib")
import json

colorlookup = {
  'Hydrophobic': [0, 139, 69],
  'HydrogenAcceptor': [255, 165, 0],
  'HydrogenDonor': [211, 211, 211],
  'Aromatic': [205, 0, 205],
  'PositiveIon': [0, 0, 238],
  'NegativeIon': [255, 0, 0],
  }

class Engine(QObject):
  # declare the changed() signal
  __pyqtSignals__ = ("changed()",)

  # constructor
  def __init__(self):
    QObject.__init__(self)
    self.widget = None
    self.textbox = None
    self.filename = ""
    self.pharma = []

  def name(self):
    return "Pharmacophore"

  def flags(self):
    return avo.EngineFlags.NoFlags

  def debug(self, text):
    return
    filename = os.path.join("C:\\", "Users", "Noel", "Desktop", "tmp.txt")
    output = open(filename, "a")
    print >> output, text
    output.close()

  @pyqtSignature("")
  def browse(self):
    self.filename = str(QFileDialog.getOpenFileName())
    if not self.filename:
      return
    
    self.textbox.setText(self.filename)
    with open(self.filename) as f:
      query = json.load(f)
      self.pharma = [x for x in query['points'] if x['enabled']]
    if len(self.pharma) > 0:
      for p in self.pharma:
        x += p['x']
        y += p['y']
        z += p['z']
      x /= len(self.pharma)
      y /= len(self.pharma)
      z /= len(self.pharma)
      for p in self.pharma:
        p['x'] -= x
        p['y'] -= y
        p['z'] -= z
    self.debug("Pharma: %s" % self.pharma)
    self.emit(SIGNAL("changed()"))

  @pyqtSignature("")
  def clear(self):
    self.filename = ""
    self.textbox.setText(self.filename)
    self.pharma = []
    self.emit(SIGNAL("changed()"))
    
  def settingsWidget(self):
    self.widget = QWidget()

    layout = QVBoxLayout(self.widget)

    self.textbox = QLineEdit(self.filename)
    self.textbox.setReadOnly(True)
    layout.addWidget(self.textbox)
    
    buttons = QHBoxLayout(self.widget)
    layout.addLayout(buttons)

    self.browsebtn = QPushButton("Browse")
    self.clearbtn = QPushButton("Clear")
    buttons.addWidget(self.browsebtn)
    buttons.addWidget(self.clearbtn)
    
    QObject.connect(self.browsebtn, SIGNAL("clicked()"), self, SLOT("browse()"))
    QObject.connect(self.clearbtn, SIGNAL("clicked()"), self, SLOT("clear()"))

    return self.widget

  def renderOpaque(self, pd):
    self.debug("Len of pharma: %d" % len(self.pharma))
    if len(self.pharma) == 0:
      return
    
    # Painter 
    painter = pd.painter
    # Molecule
    molecule = pd.molecule

    # Color
    color = pd.colorMap

    for point in self.pharma:
      if point['name'] in colorlookup:
        colvals = [float(x) / 255. for x in colorlookup[point['name']]] + [1.0]
        color.setFromRgba(*colvals)
      else:
        color.setFromRgba(0.3, 0.6, 1.0, 1.0)
      painter.setColor(color)        
      
      r = point['radius']
      self.debug("Radius: %.1f" % r)
      x, y, z = point['x'], point['y'], point['z']
      painter.drawSphere(array([x, y, z]), r)

Tuesday, 12 July 2011

The alpha and omega of SMILES strings Part II

In the previous post I described why it's useful to be able to generate SMILES strings starting and ending with particular atoms. This post describes how to do it.

To begin with, one way to do this is using ring closure notation, as Rajarshi and Andrew pointed out in replies to a question of mine at the Blue Obelisk Q&A early this year. Andrew went on to write a Python script that allowed complete reordering of all of the atoms of a molecule using this method.

I was hoping to find a more elegant method than using ring closures. Also, wholesale use of ring closures can cause problems with stereochemistry as this is a corner case that not all toolkits handle correctly. In any case, I kept thinking about this on and off, and eventually got around to trying some ideas out. In the end, the solution was easier than I'd thought.

A SMILES string is generated from a depth-first tree traversal of a graph. Every atom (except for the root) has a parent atom, and 0 or more child atoms. Setting the start atom is trivial; just make that the root. It turns out that setting the end atom requires only two rules: (1) parenthesise all of the child trees of the end atom, and all but the last child trees of other atoms (this latter should be the default in any case), (2) visit child trees that do not have a route (through 'unvisited' atoms) to the end atom first.

That's it...except for the corner cases. Click Comp Chem involves replacing an implicit H of the start and end atoms; if there is an explicit bracketed H present, then it needs to be removed to free up the valence. For example, if the endatom is [nH] or [C@@H] then the H needs to go. Otherwise you have weirdoddities like the following 5-coordinate C:

C[C@@H](Br)(Cl) + I gives C[C@@H](Br)(Cl)I

Note however that this may result in a SMILES string that does not accurately represent the original structure (but that's not the point of the exercise) e.g.

c1c[nH]cc1 -> c1c[n](cc1)

What about the stereochemistry? Well, I toss that out too at the moment; an alternative would be to allow the user to specify the resulting stereochemistry at the start and end atoms.

Image credit: crashoverreason

Saturday, 9 July 2011

The alpha and omega of SMILES strings

I've just added a feature to the SMILES output in Open Babel that allows the user to specify a particular start and end atom for a SMILES string. No, no, come back! - don't go away! - it's really useful...

If you're still here, let me explain why. It allows you to take advantage of a really nice feature of SMILES strings: concatenating two SMILES strings allows you to easily generate new molecules. I call this (as of two minutes ago) Click Comp Chem or CCC (which of course is a SMILES string itself!).

With Click Comp Chem, the first atom of the second SMILES string will become joined to the "last atom" of the first SMILES string. When I say "last atom", I mean the last atom not within parentheses (i.e. not on a branch), e.g. if I add an atom to "CC(I)(Cl)" by appending "Br" to the end, then the Br will be attached to the second C atom (not the Cl).

CCC is particularly useful if generating polymers from a set of monomers (which is what we were doing in our solar cell paper), or generating a virtual library of combinatorial chemistry products (such as Jean-Claude Bradley did for his CombiUgi project, or as Duffy et al did to a create virtual library of peptides).

But a requirement for CCC is the ability to write a SMILES string such that it starts with a particular atom and ends with another. I'll explain the details of how I implemented this in a separate post. For now, here it is in action for a SMILES string for aspirin, "c1c(C(=O)O)c(OC(=O)C)ccc1". Let's suppose that we want to have the methyl of the ester (atom 10) at the end of the SMILES, and to start the SMILES with the ring carbon para to the ester (atom 13). The output options are "f" and "l" for "first" and "last":

>obabel -:"c1c(C(=O)O)c(OC(=O)C)ccc1" -osmi
c1c(C(=O)O)c(OC(=O)C)ccc1
1 molecule converted

>obabel -:"c1c(C(=O)O)c(OC(=O)C)ccc1" -osmi -xf 13
c1cc(C(=O)O)c(OC(=O)C)cc1
1 molecule converted

>obabel -:"c1c(C(=O)O)c(OC(=O)C)ccc1" -osmi -xl 10
c1c(C(=O)O)c(ccc1)OC(=O)C
1 molecule converted

>obabel -:"c1c(C(=O)O)c(OC(=O)C)ccc1" -osmi -xf 13 -xl 10
c1cc(C(=O)O)c(cc1)OC(=O)C
1 molecule converted

Image credit: Alan Cleaver

Thursday, 7 July 2011

We can design molecular wires for you wholesale

As part of a project led by Dr. Geoffrey Hutchison (University of Pittsburgh), I have been collaborating on the development of a method to design so-called "molecular wires" (conducting organic polymers) for use in organic solar cells. Using a combination of computational chemistry and cheminformatics, we have been able to find organic polymers with the required electronic structure predicted to perform well as the electron donor in solar cells.

This work is now available as an ASAP at J. Phys. Chem. C:

N.M. O'Boyle, C.M. Campbell, G.R. Hutchison.
Computational Design and Selection of Optimal Organic Photovoltaic Materials
J. Phys. Chem. C 2011. In press.

Rather than repeat the abstract here, I'll summarise the general idea. Current solar cells are based on semiconductor technology and are expensive to make both in terms of materials and energy. Organic solar cells, while never going to be as efficient at converting light to electricity, offer the possibility of cheap solar energy due to the ease of manufacture and low cost of materials. In 2006, Scharber et al described how to calculate the efficiency of an organic solar cell based on the electronic structure (i.e. HOMO, LUMO) of the solar cell components, so the question was: could we find organic polymers with the required structure to maximise efficiency?

To cut a long story short, we could. What we did was combine Open Babel/Pybel (for structure generation and forcefield optimisation), Gaussian09 (for ZINDO/S//PM6 calculations), cclib (for extracting the results) and a Python script implementing a genetic algorithm, and out popped molecular wires predicted to be highly efficient. The top candidates were then filtered according to additional criteria et voilà. Overall, over 90,000 polymers were tested - still only a small fraction (about 4%) of what we'd have had to test without the genetic algorithm.

If you're interested in a copy of the paper and don't have access to the journal, get in touch - we can distribute a number of copies through the ACS website.