Saturday 18 October 2008

Tip for scripting a workflow

If you are writing several Python scripts that make up a workflow, e.g. if one reads an intermediate output file or pickle from another, then it's a good idea to name each Python script starting with a number. For example, the first script could be 0_parse_dockings.py, and the next one 1_calculate_enrichments.py.

This is handy for a couple of reasons:
  1. When you look at these files in 6 months time, you will know in what order you should run them
  2. When you are running the files, they will autocomplete very easily at the command line (Windows or Linux), e.g. you type "python 0", then hit TAB, and the name of the file will autocomplete. No need to think about the name of the file, (is it calcresults.py or analyseresults.py?) or have the problem of several files which start with the same letter.
Anyone else got any labour-saving tips for the busy scientist?

1 comment:

Egon Willighagen said...

So, what happens when you need to insert something between step 3 and 4? 31, 3.1, 3a maybe?

I suggest just use a wrapper script, and just indicate in your LaTeX which script you ran for that table or figure you are looking at.

And, even more importantly: use a version control system! It basically does what a sane lab note book does: keep track of everything you do, and does the sign off by your boss, which is actually now a Git/SVN service.

Adding something between step 3 and 4? Just do it, commit it. All in your favorite scripting hacking style, without X_ semantics needed.