Noel O'Blog: Got CPUs to burn? Put 'em to work with GNU parallel

I've just been using GNU parallel for the first time. It makes running jobs over multiple CPUs trivial.

In the past, if I had a large number of single-CPU computationally intensive jobs and multiple CPUs to run them over, I would create separate bash scripts for each CPU with a line for each calculation, e.g. ./runthis input1.smi > output1.txt. This is not super-ideal as different jobs take different lengths of time and so any CPU that finishes its bash script ahead of schedule just sits there idle. It also involves making N separate bash scripts.

Enter GNU parallel. This comes with several Linux distributions but on Centos I just quickly installed from source. Once done, you just need to put all of the jobs in a single script and pipe it through parallel:

cat myjobs.sh | parallel -j7 # e.g. for 7 CPUs

There are a whole load of complicated ways of using parallel. I'm very happy with this one simple way.

3 comments:

Vladimir Chupakhin29 November 2013 at 06:49
Parallel is nice. I am also using the trick described here http://www.linux-magazin.de/Ausgaben/2009/02/Parallelarbeit/. Thus you can provide the number of process, and in final do the trick: doParallel_babel *.sdf.
Unknown29 November 2013 at 07:54
There is also a PPSS - (Distributed) Parallel Processing Shell Script https://code.google.com/p/ppss/
Filip
Noel O'Boyle29 November 2013 at 10:59
Thanks for the pointers.

Note to self: here's how to run gzip in parallel:
ls *.txt | parallel -j8 gzip

Thursday, 28 November 2013

Got CPUs to burn? Put 'em to work with GNU parallel

3 comments: