Noel O'Blog: June 2008

Wednesday, 25 June 2008

ANN: cinfony 0.2 - the "easy to install" version

A new version of cinfony is now available. The good news is that it now much easier to install for Windows users.

All you need now is to have Python and Java, download the CDK and the RDKit, edit a configuration file, and away you go using OpenBabel, the RDKit and the CDK from Python. The full instructions are here.

I've updated some of the docstrings, but documentation is still sparse and the Pybel documentation is still the best guide, as linked to on the cinfony web site.

Here's an example of what's possible with cinfony. Suppose you want to convert a SMILES string to 3D coordinates with OpenBabel, create a 2D depiction of that molecule with the RDKit, calculate descriptors with the CDK, and write out an SDF file containing the descriptor values and the 3D coordinates.

from cinfony import rdkit, cdk, pybel
mol = pybel.readstring("smi", "CCC=O")
mol.make3D()
rdkit.Molecule(mol).draw(show=False,
                         filename="aldehyde.png")
descs = cdk.Molecule(mol).calcdesc()
mol.data.update(descs)
mol.write("sdf", filename="aldehyde.sdf")

This new version of cinfony also includes Jybel, so that you can now access both the CDK and OpenBabel from Jython, as well as from CPython.

Tuesday, 24 June 2008

The Forced Authorship Licence - Get your users to write papers for you

You are probably familiar with commercial software licences. You may even have heard of open source licences. But are you familiar with the Forced Authorship Licence (FAL) model? Let me give you an example from real life (the name has been removed to focus on the actual licence):

"X is available free of charge for researchers belonging to Academic community. The download and use of X is subject to the X Academic Licence...Every users is associated with one of the X team. This will help the user in installing and using X and will be co-author of the first paper published by the user, containing X results. You must contact one of the three group leaders and discuss your proposed project before applying for the use of X."

I think this is a great idea. Think of all the publications. If I had thought of this a few years ago, I would now have 30 extra papers instead of the 30 citations of GaussSum.

But it's never to late to start. And why stop at software? If I'm going to be competing with people that use the FAL, I need to think smarter. From now on, all of my papers, software, blog posts, personal communications, and any ideas that arose while reading my papers, attending my talks or reading my posters will carry the Noel O'Blog Licence (NOABL - 'A' for apostrophe and don't you forget it). Instead of citing me, any resulting publication must carry my name as sole author, in bold - no, make that in fire in letters thirty feet high - and when applying for grants, I'm allowed to list these publications together with my own work.

Sounds fair, doesn't it? Oh - I almost forgot. It's spelt "Noel M. O'Boyle". And don't forget that apostrophe.

Image: Licences by Martin Deutsch (CC BY-NC-ND 2.0)

Monday, 23 June 2008

O No - It's cheminformatics!

It is time to cast your vote in the greatest polarising debate of our times. Yes indeed: should it be cheminformatics or chemoinformatics? Vote now (see poll on right).

Not to sway any undecided voters, but I'm definitely in favour of "cheminformatics". My main reason is that I'm worried that if the other camp win out, they'll probably decide to change more words: we'll end up doing chemoistry, like our Australian cousins.

You have 13 days to cast your vote...

An RSS feed for the CCL list

Apparently some people still read the CCL (Computational Chemistry List) using email. I gave up on that some time ago. Here's an RSS feed I threw together some time ago, and which you might find useful: CCL feed

If you don't know what an RSS feed is, and how it might be useful, it's quicker just to test it out than to explain. First of all, you need a feed reader: for example get an account on Google Reader, a free online RSS reader. Next, subscribe to whatever RSS feeds you want, by clicking on "Add subscription" and copying and pasting the URL of the RSS feed into the box that appears.

Here's how the feed is created:

import sys
import email
import datetime
import pdb

from ftplib import FTP
from StringIO import StringIO

import PyRSS2Gen

def breakdaily(messages):
    """
    >>> import pickle
    >>> a = pickle.load(open("tmp20070105"))
    >>> len(breakdaily(a))
    1
    """
    broken = []
    message = []
    for line in messages.split("\n"):
        if line.startswith("From owner-chemistry@ccl.net"):
            broken.append("\n".join(message))
            message = []
        else:
            message.append(line)
    broken.append("\n".join(message))
    return broken[1:]

def getlatest(N):
    ftp = FTP('ftp.ccl.net')
    ftp.login('anonymous')
    ftp.cwd('/pub/chemistry/archived-messages')
    # ftp.dir()

    listoffiles = ftp.nlst()

    # Get current year
    year = datetime.datetime.now().year

    # Get N most recent messages
    messagetot = 0
    months = [str(x).zfill(2) for x in range(1, 13)]
    months.reverse()
    days = [str(x).zfill(2) for x in range(1, 32)]
    days.reverse()
    msgs = []


    while messagetot<N:
        ftp.cwd(str(year))

        for month in months:
            ftp.cwd(month)
            availabledays = ftp.nlst()
            for day in [x for x in days if x in availabledays]:
                # Go thru in reverse order but exclude days that are
                # non-existent
                messages = StringIO()
                ftp.retrbinary("RETR %s" % day, messages.write)
##                pickle.dump(messages.getvalue(), open("tmp%i%s%s" % (year,month,day), "w"))
                listmsgs = breakdaily(messages.getvalue())
                # print "="*24 + "\n", messages.getvalue()
                for i,msg in enumerate(listmsgs):
                    msg_content = email.message_from_string(msg)
                    text = ""
                    for part in msg_content.walk():
                        if (part.get_content_maintype()=="text" and
                            part.get_content_type()=="text/plain"):
                            text = part.get_payload()
                            text = text.replace("\n","<br/>").decode("iso-8859-1", "strict")
                    msgs.append( (year, month, day, msg_content, i+1, text) )
                messagetot += len(listmsgs)
                if messagetot>=N:
                    break
            if messagetot>=N:
                break
            ftp.cwd("..")
        ftp.cwd("..")
        year -= 1 # Continue into the previous year

    ftp.quit()
    msgs.reverse()

    return msgs

def main():

    print "\nStarting..."

    messages = getlatest(100)
##    outputfile = open("messages.pickle", "w")
##    pickle.dump(messages, outputfile)
##    outputfile.close()
##    import sys
##    sys.exit(1)
##    messages = pickle.load(open("messages.pickle", "r"))

    rssitems = []
    for year, month, day, msg, id, messagetext in messages:

        # Add the new item
        newitem = PyRSS2Gen.RSSItem(
                 title = msg['Subject'],
                 link = "http://ccl.net/cgi-bin/ccl/message-new?%d+%s+%s+%s" % (
                         year, month, day, str(id).zfill(3)),
                 description = messagetext,
        ## What's a guid? A globally unique id...used by RSS readers
        ## to determine whether they've seen a particular news item
        ## already
                 guid = PyRSS2Gen.Guid("http://ccl.net/cgi-bin/ccl/message-new?%d+%s+%s+%s" % (
                         year, month, day, str(id).zfill(3))),
                 pubDate = msg['Date']
                 )
        rssitems.append(newitem)

    rss = PyRSS2Gen.RSS2(
        title = "CCL",
        link = "http://www.ccl.net",
        description = "RSS Feed of the world's greatest computational chemistry "
                      "mailing list, chemistry@ccl.net (the CCL list)",
        lastBuildDate = datetime.datetime.now(),
        items = rssitems)
    rss.write_xml(open("ccl.rss", "w"))

    print "Finishing...\n"

def test():
    import doctest
    doctest.testmod()

def do_debugger(type, value, tb):
    pdb.pm()

if __name__=="__main__":

    # sys.excepthook = do_debugger
    main()

    ## test()

Tuesday, 17 June 2008

Jmol gets competition - OpenAstexViewer now available

AstexViewer has just gone open source (LGPL), and is available from http://openastexviewer.net/.

Both Jmol and AstexViewer are 3D chemical structure applets that run in the browser. AstexViewer was originally developed by Mike Hartshorn at Astex Therapeutics for in-house visualisation of protein crystal structures. Although available at no cost for some time, it has not until now been open source.

Jmol is in many ways an open source success story. It has several enthusiastic developers (who make new releases with new features every few days!), a very busy mailing list, a large number of users worldwide, and even a recent book. It will be interesting to see whether OpenAstexViewer is sufficiently different to attract users away from this well-established project.

In any case, here's to diversity. Hopefully both projects will get interesting ideas from each other.