Tuesday 24 March 2009

The Clockwisdom of SMILES Part II

As many readers of this blog will be aware, a chiral SMILES is not a lopsided grin. Instead it is a way of describing the relative spatial arrangement of groups around a chiral centre using SMILES notation.

The following examples investigate this notation. The stereotypical examples are the following:
A. C[C@](Br)(Cl)I - ACW(C,Br,Cl,I)
B. C[C@H](Br)I    - ACW(C,H,Br,I)
where ACW(w,x,y,z) indicates anticlockwise in terms of x,y,z when looking from w. However, note that the first group does not necessarily need to appear before the chiral carbon:
C. [C@](C)(Br)(Cl)I - ACW(C,Br,Cl,I) (Same as A)
D. [C@H](C)(Br)I    - ACW(H,C,Br,I)  (Opposite of B)

What about ring closures? These are handled as followed:
E. C[C@H]1CCN1 -
ACW(C,H,N,C) (the 1 indicates a bond to the chiral C)
F. C[C@]12N(CC2)C1 -
ACW(C, the C1 carbon, the C2 carbon, N)
G. C[C@@]21N(CC2)C1 -
CW(C, the C2 carbon, the C1 carbon, N) (the same as F)

Note that ring closures directly before an atom do not indicate a bond to that atom. Try to draw the following and compare your result to that given by Daylight's Depict service:
H. [C@@]123[C@H](C(C=C3)(C)C)CC[C@@](C1)(CCC2)C
If you got that right, consider yourself a SMILES ninja.

Credit: Thanks to Craig James for his patient explanations.
Image credit: sean-b

Friday 20 March 2009

Time for a test - Any questions?

I'm a great believer in tests for code quality. In fact, I don't want to contribute code to a project if I can't add a test to the test suite. This is particularly important in collaborative projects where changes by others might impact on bugs I've fixed or features I've added. I've learned my lesson in the past. With a test suite, I can be sure that everything is still working the way I expect it.

I've recently started a new test suite for OpenBabel called obunittest. Although OpenBabel already has a test suite ("make test"), I wanted to have a test suite written in Python where people could easily add new tests.

obunittest is hosted at github and all of the necessary instructions are available at the obunittest website (just scroll down). Git itself isn't required, but you may find it interesting to use - to do so, just create an account on github and fork my project.

So this is your chance to add a test for OpenBabel. Now while this might not be everyone's idea of a fun time, if there's some feature of OpenBabel upon which you rely, write a test for it and send it to me (or "git it" to me). This will ensure that that particular feature will always work in future OpenBabel releases. The same goes if there's something that you know is currently broken - just write a test. Remember that a stitch in time means you won't be saying "darn".

Image credit: Duncan Hull (hi!)

Wednesday 18 March 2009

The Clockwisdom of SMILES

I was recently confronted with a question that many of us face at some point in our lives: how many ways can the groups attached to a chiral C be moved around in a SMILES string while retaining the clockwisdom?

What's all this about clockwisdom? Well, a chiral SMILES string can indicate R or S around a tetrahedral centre using C@ or C@@. The difference is that R or S refer to clockwisdom of groups arranged by CIP priority (with the lowest priority facing away), whereas @ and @@ refer to clockwisdom of groups arranged in order of their appearance in the SMILES string (with the first appearing facing towards) [1]. Whether this was a good design decision by the Daylight gurus, I'm not 100% sure, but that's how it is.

So in short, if you change the order of groups in the SMILES string, you may need to change the clockwisdom to ensure that stereochemistry is preserved. Specifically, if you swap two groups you will get the other enantiomer ("putting the SMILES on the other face"?) unless you flip the clockwisdom; that is, Cl[C@@](Br)(C)I is the same enantiomer as Cl[C@](Br)(I)C. Another swap and we get back a SMILES string with the original clockwisdom.

So I started off by trying to think of a clever program to identify how many swaps were required to convert between two orderings of groups. Next I tried to write a few loops that would simply perform all possible swaps of groups to generate all of rearrangments, but that missed a few. In the end, I just wrote the dumbest program I could think of and got the following results. For an original ordering of groups 0123, the following orderings have the same clockwisdom: 1032, 3021, 2013, 3210, 1320, 3102, 0123, 0231, 0312, 2301, 1203, 2130.

And the point of all this? OpenBabel was not generating the correct stereochemistry around tetrahedral carbons in canonical SMILES. Now fixed.

Update (19/03/09): Tim Vandermeersch pointed out to me a neat way of determining the parity of a particular ordering of groups. Simply count the number of pairs in the ordering where one number is larger than another number to its right. For example, for 1032, there are two pairs (10, 32); for 3021, there are 3 pairs (32, 31, 21). Orderings with even numbers of pairs have one parity while orderings with odd number of pairs have the opposite parity.

[1] The OpenSMILES specification on stereochemistry

Image credit: Swamibu

Monday 2 March 2009

Review of Hello World - Computer Programming for Kids and Other Beginners

My generation were among the first children to learn programming. Thanks to the BBC Microcomputer (in my part of the world), kids were provided with a computer and a manual that taught computer programming in both BASIC and Logo. The local library had a complete set of Usborne books that covered everything from arcade games, to fantasy adventures (the old-school text only type, that is), to assembly language programming and sorting algorithms. And these were for children.

So what was it about programming on the BBC (or ZX Spectrum or Commodore 64) that drew kids in? For me it was all about graphics. Drawing circles could only be done dot by dot and led easily to drawing ellipses, and then to hyperboloids of revolution (think cooling towers) whose top you could twist. I read Chaos by James Gleick, couldn't believe the simplicity of generating the Mandelbrot fractal, and lifted my jaw off the floor the first time my BBC drew the little Mandelbeetle (I'm not the only one - see also PMR). I did some astronomy at school and plotted the night sky for different months of the year. And so on.

Since then we've seen the rise of the PC and Windows, which in fairness had QBasic for quite some time (I am a Nibbles master). However, as David Brin pointed out ("Why Johnny can't code", 2006) today there's no easy way for kids to get hooked on programming. Even my favourite language, Python, is lacking here. Out of the box the only usable graphics library for kids is the turtle module, an implementation of LOGO:
C:\Documents and Settings\oboyle> python
Python 2.6.1 (r261:67517, Dec 4 2008, ...
Type "help", "copyright", "credits" ...
>>> from turtle import *
>>> for i in range(10):
... for j in range(5):
... forward(100)
... left(360/5)
... left(360/10)
...
>>>
It seems to work quite well, although the documentation is aimed at computer science majors rather than teachers (never mind kids). Also, the demo files are only available in the source distribution (you can get them from SVN here).

While Logo might be quite good for introducing the basics of programming languages, its graphics capabilities are limited. pygame is really the way to go. This is one of the big third-party Python extensions that incorporates support for sound, graphics and input devices. As the name implies it has everything necessary to write a decent computer game (see for example, the list of pygame arcade games). The downside is that this library makes no effort to cater for kids.

Enter a recent publication from Manning, "Hello World! - Computer programming for kids and other beginners" by Warren and Carter Sande. Written with 12 year old kids in mind, the preface makes it clear that the authors (one a 12-year old kid himself) know their target audience well:
"For kids especially, one of the most fun parts of using a computer is playing games, with graphics and sound. We’re going to learn how to make our own games and do lots of things with graphics and sound as we go along. Here are pictures of some of the programs we’ll be making:"
(Figure published with permission from Manning Publications)

Lunar lander! Slalom racing! The Sandes have reinvented the Usborne books for the YouTellyTub generation, and then some. Assuming no previous programming knowledge (a reasonable assumption when you're 12), the book teaches Python programming with the goal of writing computer games. The initial chapters cover the basics from variables, through maths, "if" statements and loops. But there's also already the fun stuff like getting input and simple graphical dialogs, and in case attention is waning Chapter 10 (of 24) has the complete listing for a Skiing game. As the book says:"One of the great traditions of learning to program is typing in code you don’t understand. Really!"

After introducing lists, functions, objects and modules, pygame enters the picture in Chapter 16 which covers drawing, images and animation. The following chapters cover sprites and collision detection, events and sound. The final chapters return to useful Python modules such as handling strings, file input and output, and using random numbers. All of the code examples are available for download from the book's website, along with a simple installer that contains all of the examples and modules required, along with Python itself.

As you might have guessed, I think this is a great book that fills a real niche - I don't know of any other programming book on the market that targets kids. What's amazing is that it has set its sights so high, and yet manages to meet its goals. I think it would be great to see this book promoted as a way of teaching programming in primary schools. In the meanwhile if you know any 12+ kids interested in computers, give them an opportunity to develop a fascinating hobby and get them this book.