Thursday, 30 September 2010

Are you on my side or not? It's E/Z Part II

Due to popular demand (just kidding!), here is yet another post on the subject of stereochemistry in SMILES, this time focusing on the handling of ring bonds (ring opening/closure) at cis/trans stereo centers. This follows on from Part I.

The key point (and difficulty) when dealing with rings bonds on such double bonds is that, since the ring bond appears twice in the SMILES string (at both the opening and closing), the stereo symbol can appear at either occurrence or indeeed both. (I think this was a mistake in the SMILES specification, but there you go.) When writing a SMILES string, the preferred syntax just shows the stereo symbol at the end on the double bond (Open Babel will only output this syntax).

The following structure will be used as an example:
So, using the preferred syntax, a SMILES string for the example structure would be:
(a) C/C=C\1/NC1

In other words, from carbon-3 it's down to the C of the ring closure, and up to the N of the ring, where up and down are relative to carbon-1.

There's no need to specify the stereochemistry of both groups on the right-hand side, of course. The following SMILES is equivalent to (a) although not so clear:
(b) C/C=C1/NC1

Coming back to SMILES (a), we could have written the stereo symbol at the ring closure, or indeed at both ends:
(c) C/C=C\1/NC/1
(d) C/C=C1/NC/1

Note that the symbol used for the ring opening is the opposite of that for the ring closure. The rationale for this is that from the point of view of carbon-4, carbon-3 is up (hence C/1), whereas from the point of view of carbon-3, carbon-4 is down (hence C\1). Whatever...just stick to form (a) and you won't need to think about this, as it will just be a source of errors. (It would have been simpler for everyone if Daylight had only allowed the stereo symbol at the carbon on the double bond.)

So much for valid SMILES. How should invalid SMILES be handled? Consider the following:
(e) C/C=C\1\NC1

Both the ring closure and the N are down...? I don't think so. This should be treated as undefined stereochemistry.

How about the case where the two ring bonds have stereo symbols which are not in agreement?
(f) C/C=C\1NC\1
(g) C/C=C\1/NC\1

In both of these cases the stereochemistry for the ring bonds should be considered undefined. In Open Babel, I've chosen to handle these as follows:
(f) C/C=C\1NC\1  --> C/C=C1NC1 (ignore ring bond stereo)
--> CC=C1NC1 (undefined stereochemistry)

(g) C/C=C\1/NC\1 --> C/C=C1/NC1 (ignore ring bond stereo)
--> C/C=C\1/NC1 (defined stereochemistry)

Image credit: Kim+5

Monday, 20 September 2010

Depict a chemical structure...without graphics

Sometimes you just need to identify the chemical structure in an SD file, but don't have access to a graphics terminal. For example, you could be logged into a server at a remote location. What to do?

Well, it turns out that you can depict pretty much anything using text symbols - this is known as ASCII art. Ubuntu has two of the main ASCII art libraries available, aa-lib and libcaca (named by a 10-year old). With both of these there are associated viewers, asciiview and cacaview.

There a few ways to go here: either a cheminformatics library could directly depict a molecule using ASCII art, or it could depict it using one of these libraries, or we can be lazy and just convert an existing PNG to text. The first case is likely to produce a better quality image - it is actually the subject of a paper by Raymond Carhart in JCICS in 1976 (via Pat Walters). Naturally, since this is a blog post, we will take the lazy route here and just convert from PNG to text.

So, this is the original image:
I found it better to convert to B&W by thresholding all non-white pixels to black:
convert orig.png -threshold 99% blackwhite.png


Running asciiview, we have the following:

Note that the structure is immediately clear. Still - we can do better. If we "-negate" the image first, we have:

How about for cacaview?

Not so good. However, both asciiview and cacaview have zoom and pan functionality and once we zoom in, the structure can be clearly identified:

I was originally thinking of including this functionality in Pybel (which, with the help of OASA, can generate 2D depictions as PNG files), but I think that generating such text images is best done through these ASCII art viewers, as you might need to zoom and pan to get the "full picture".

In a comment on FriendFeed, Hari wondered whether an exact depiction of a chemical structure could be made using Unicode characters. Good question. But first, how close can we get with ASCII characters? Here's my best attempt:

O
//
Cl--{/
\
\_____
/ --- \
/ \
\\ //
\_____/
Can you do better?

Monday, 13 September 2010

How to get into cheminformatics

A chemistry undergraduate from South America recently emailed me asking about how to get into cheminformatics:
My area is chemistry and I'm very interested about cheminformatics. Actualy, I'm using Python to develop a software to make some analysis (image analysis applied to chemistry). Here in ----, the college course of chemistry don't have disciplines of informatics related.

Because of this, I got some questions, if you can answer to me, I'll be very grateful:

Have you done chemistry college or some informatics college related?
If you have done the chemistry college, how you started to work with computation applied to chemistry?
Here, in ----, actualy I think that the cheminformatics is not very known, even in the scientific field. What about in other countrys? The most of people that are working with cheminformatics have done chemistry colleges or some computation college related?


I answered as follows:
My own background is a degree in Chemistry, followed by a PhD in Inorganic Computational Chemistry (DFT calculations). In the field most people have chemistry degrees, although there are also a few computer scientists. The types of problems the two work on are often different; the computer scientists may be more interested in developing methods, while the chemists may be more interested in applying and interpreting the results. I think that most chemists would not do any informatics or programming during their degree - they would just teach themselves at the start of their PhD - it sounds like you have already done this.

When I was 12 or 13, I started programming in BASIC on my home computer and got involved in programming competition for high school students. I didn't have a computer while in university, but during my PhD I started programming again, this time in Python. More recently, I've learnt C++ by working on Open Babel.

If you want to gain expertise in the field, I would very much encourage you to get involved in an open source cheminformatics project. You will learn a lot about programming, organising large projects, testing, how to work with other people, and so on. If you're interested in image analysis you could look at OSRA, etc. You may also want to subscribe to the blueobelisk mailing list or ask a question at blueobelisk.shapado.com.

Cheminformatics is not a very well known field - I didn't know what it was until I started doing it, even though I had done computational chemistry during my PhD. The main countries associated with cheminformatics are the UK, US and Germany, it seems to me; these are the countries where a lot of the pharmaceutical companies do drug design. But you can do cheminformatics anywhere - you just need a computer.


What advice would you give? I'm especially keen to hear from cheminformaticians from South America. (I'll point the student to this blog post)

Image credit: Duncan Hull

Friday, 3 September 2010

Area Man declares 1st Open Babble a success

You know you have arrived in Liverpool when your first sight on exiting the airport is a yellow submarine. The mug you just bought showing the cover of Sgt. Pepper's Lonely Hearts Club Band is also a good clue.

The 1st Open Babble meeting took place this week in Chester, UK, near Liverpool and Manchester. For the first time in its 7 (8?) year history, the developers of Open Babel met each other and discussed issues such as whether we should open brackets on the same line or on the next line. Well, that and other more serious questions. Many discussions aren't really suited to email, and so this was a chance for us to cover a whole variety of issues in a short space of time.

We (Marcus, Tim, Chris and I) worked off a agenda of topics pre-suggested through a Google spreadsheet, keep very brief minutes to record decisions/action items, and brought Geoff in remotely through Skype after sending around the minutes. Although we had a couple of slides prepared here and there, in general it was all round-table discussions. It worked out pretty well I think.

This Open Babble involved core developers only, and we are going to continue such meetings in future. However, just as a matter of interest, are there people out there interested in a user's meeting, e.g. a 1-day workshop or something of that nature? I've added a poll on the left-hand side where you can indicate your interest (or lack thereof). (Update: Poll now over)