Noel O'Blog: July 2016

Monday, 18 July 2016

Your SD file will never be the same again...with ASCII depiction

A recent blog post showed how to depict a record in an SD file from within Vim. This of course is no help to those readers who have yet to successfully exit from a Vim session. But what if there was no need to create an ASCII depiction...because it was already there?

Yes, that's right, you have no clue what I'm talking about. What I'm saying is, why not bung in an ASCII depiction of the molecule in a property field? Well, apart from it being a bonkers idea that'll bloat the SD file to hitherto unimagined sizes (but think of the improved compression!), I can't think of any reason not to do this. It is my belief that this could finally unleash the untapped potential of ASCII depiction. And so I've added an option to the SD file writer in Open Babel to do exactly this.

John is fond of quoting Jurassic Park's "your scientists were so preoccupied with whether or not they could that they didn't stop to think if they should." I don't know why I just mentioned that.

Thursday, 14 July 2016

Even more molecular depiction in Vim

Previous posts focused on popping up an ASCII depiction in Vim, but what doing something with the PNG output from Open Babel? Is there any way this can be viewed from Vim? Aged readers may remember a previous blog post of mine that looked into PNG to ASCII conversion, and that would be one approach.

A more direct approach is to use Vim's built-in capabilities to view bitmap files. Well, to be exact, a specific type of old-skool bitmap called an XPM. This has a very simple format and so Vim can show the entire contents of the file and use syntax-highlighting to visualise it. The alleged use for this is to enable people to directly edit bitmaps - that's right, someone out there is using Vim as a bitmap editor.

The only problem is generating the XPM file in the first place. We could probably do this directly from Open Babel as the format is fairly simple (e.g. as an option to the PNG writer) but in a certain light that just might ("just might", mind you) be viewed as feature creep. So instead, we can use ImageMagick's convert to do the job (which is available cross-platform). As before the required script is shown below.

Once I got it working I got to thinking, "well, that's a 79x79 bitmap it's showing, which is pretty small but what if I reduce the font size? aha! - then I can show an arbitrary sized bitmap and show much better detail". At which point I realised that in any environment where I can change the font size in Vim, I should probably just pop up an image viewer to display the PNG (left as an exercise for the reader).

In the end, are these depictions better than the ASCII ones? Meh, probably not - which I think was one of the conclusions also from my previous foray into PNG to ASCII conversion. Oh well.

noremap <silent> <leader>d :call SmiToPng(77)<CR>
function! SmiToPng(width)
  let smiles = expand("<cWORD>")
  " Strip quotation marks and commas
  let smiles = substitute(smiles, "[\"',]", "", "g")
  " Handle escaped backslashes, e.g. in C++ strings
  let smiles = substitute(smiles, "\\\\", "\\", "g")

  botright new
  setlocal buftype=nofile bufhidden=wipe nobuflisted noswapfile nowrap
  let fname = tempname().'.png'
  call system('obabel -:'.smiles. ' -O '.fname.' -d -xm -xp '. a:width)
  execute '$read ! "C:\Program Files (x86)\ImageMagick-6.5.8-Q16\convert.exe" '.fname.' xpm:-'
  setlocal filetype=xpm
  execute "normal! ggd/pixels\<cr>dd"
  silent! g/\v^"(\S)\1+",?/d
  execute "normal! Gdd"
  setlocal nomodifiable
  1
endfunction

The 7th Joint Sheffield Conference on Chemoinformatics - a real tweet

Just back from the Sheffield meeting, which takes place every 3rd year. Great meeting as ever - tribute was paid to John Holliday for the lion's share of the organisation. I got to meet some old friends and some new. For the first time at a meeting I decided to live-tweet the talks, joining such Twitter luminaries as Wendy Warr, Mireille Krier, Nathan Brown, and Jérémy Besnard.

It worked out quite well, and kept me completely engaged and awake. When you are aware that what you write is instantly publicly visible, you really make an effort to follow method descriptions etc so that you can adequately describe what's going on. To speed things up I decided to avoid editorialising; if the author described their method/result as the best thing since sliced bread, I dutifully reported a major advance in the field of baked goods even if I was thinking "bread is dead, baby, bread is dead". I have since learned that this is referred to as journalism.

With about 750 tweets covering 27 talks (I missed one due to flat batteries), I averaged about 27 tweets per talk, which may be just over one per slide. Afterwards I asked on Twitter whether people were annoyed or found my avalanche of tweets useful; based on 13 respondents, the results were 3 to 1 in favour of the tweets. If I do a repeat performance, next time I'll give a heads-up so people can mute me if uninterested.

I don't like my efforts disappearing into the void, so I've archived the complete list of #ShefChem16 tweets from all attendees and remotes that used that hashtag. You can relive the build-up, the talks themselves, the scones/doughnuts, the conference dinner, not to mention the queuing for taxis to the station. The talks and posters are being made available by-and-by on the conference website so you might find it interesting to look at the tweets in combination with the slides.

Notes on creating the download of tweets:
I tried to do this the hi-tech route via the Twitter API, but I think it's impossible if there were more than 100 tweets in a day. The API is geared towards streaming not historical analysis. In the end, I went to the Twitter website, searched for #ShefChem16, hit "All tweets", zoomed out and kept hitting Page Down until all the conference tweets were shown. Next I saved the generated HTML via Firebug (right click on the <body> element and choose "Copy HTML"), and extracted the tweets with the following script. Unfortunately, although it's possible to know to whom a reply has been made, the corresponding tweet id does not seem to be available so I didn't bother handling replies in a special way.

# vim: set fileencoding=utf-8 :
from bs4 import BeautifulSoup as bs

soup = bs(open("shefchem16.html"), "lxml")

HANDLESIZE = 10

data = []
name = None
for tag in soup.find_all("div"):
    if not tag.get("class"):
        continue
    if "stream-item-header" in tag.get("class"):
        name = tag.a['href'][1:]
    if "js-tweet-text-container" in tag.get("class"):
        tweet = tag.get_text().encode("utf-8").replace(" …", "")
        data.append("%10s %s" % (name[:HANDLESIZE], tweet.strip().replace("\n", "\n"+" "*(HANDLESIZE+1))))

with open("tmp.txt", "w") as f:
    for d in reversed(data):
        f.write(d+"\n")

Image credit: Egon Willighagen on Twitter

Tuesday, 12 July 2016

Basic Graphviz input file generator in Python

Generating a Graphviz Dot input file is fairly simple, so I tend to code it up myself rather than use an existing library. However the need to normalise the node names complicates things a little bit. Here's a copy of the one I wrote today. Note that you may need to beef up the normalisation if your node names contain additional non-alphanumeric characters. (Note to future self: next time just autoincrement the node labels instead of trying to use a normalised form.)

class Graph:
    def __init__(self):
        self.lookup = {}
        self.edges = []
    def add_edge(self, x):
        if x in self.lookup: return
        nx = x
        for y in " -()[]":
            nx = nx.replace(y, "_") # normalise the label
        self.lookup[x] = nx
    def add(self, x, y):
        self.add_edge(x)
        self.add_edge(y)
        self.edges.append( (x, y) )
    def write(self):
        tmp = ["digraph {"]
        for x, y in self.edges:
            tmp.append( '%s -> %s;' % (self.lookup[x], self.lookup[y]))
        for x, y in self.lookup.iteritems():
            tmp.append('%s [label="%s"]' % (y, x))
        tmp.append("}")
        return "\n".join(tmp)

Friday, 8 July 2016

Syntax highlighting for SMILES files in Vim

An image in a recent blogpost showed how the default syntax highlighting (*) in Vim looks when applied to SMILES files. A splash of pink here and there to brighten things up on a dull molecule. It's definitely doing something, but what it's doing I've no idea. So I've decided up with this I will not put any more.

Here's a simple syntax highlighting style for SMILES files. It supports three different colours for comments, the SMILES itself, and titles. If anyone has the necessary regexp-fu, it might be worth looking into highlighting different bracket levels in the SMILES string.

To use, save this file into vimfiles/syntax/smi.vim, and then use "set filetype=smi" to turn on the highlighting. To automatically do this for *.smi files, create a file vimfiles/ftdetect/smi.vim containing the line "autocmd BufRead,BufNewFile *.smi set filetype=smi".

I also looked into doing folding for such files, that is, to use any comment as a header and then fold the subsequent SMILES. This is a format I use for storing matched series, for example. It works (see foldexpr), but it's too slow for a large file, which defeats the purpose.

" Vim syntax file
" Language: SMILES strings
" Maintainer: Noel O'Boyle
" Latest Revision: 16 June 2016

if exists("b:current_syntax")
  finish
endif

syn match smiTitle /\v\s+.*$/hs=s+1
syn match smiString "\v^\S*" nextgroup=smiTitle skipwhite
syn match smiComment "\v^#.*$"

let b:current_syntax = "smi"

hi def link smiComment     Comment
hi def link smiString      Identifier
hi def link smiTitle       Include

* As the Vim documentation says, apparently it's actually lexical highlighting, but everyone calls it syntax highlighting so...