gnu parallel processing example

Developer from somewhere

Here’s a simple script that does work :)

def do_work(argument)
  puts "#{Time.now} Processing #{argument}"
  sleep 5
  puts "#{Time.now} Done processing #{argument}"
end

do_work(ARGV[0])

When running this, I get the following output on my machine:

2016-01-07 16:14:18 +0100 Processing 1
2016-01-07 16:14:23 +0100 Done processing 1

If we would like to run the script for more than one argument, it would take us some time. Luckily, we can use parallel to speed things up:

$ (echo 1;echo 2) | parallel -j+0 --eta 'ruby initial.rb {}'
When using programs that use GNU Parallel to process data for publication please cite:

  O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
  ;login: The USENIX Magazine, February 2011:42-47.

This helps funding further development; and it won't cost you a cent.
Or you can get GNU Parallel without this requirement by paying 10000 EUR.

To silence this citation notice run 'parallel --bibtex' once or use '--no-notice'.


Computers / CPU cores / Max jobs to run
1:local / 4 / 2

Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 0s Left: 2 AVG: 0.00s  local:2/0/100%/0.0s 2016-01-07 16:12:02 +0100 Processing 1
2016-01-07 16:12:07 +0100 Done processing 1
2016-01-07 16:12:02 +0100 Processing 2
2016-01-07 16:12:07 +0100 Done processing 2
ETA: 0s Left: 0 AVG: 0.00s  local:0/2/100%/2.5s

As we can see from the timestamps, both scripts started at the same time, and were executed in parallel. The -j+0 flag tells parallel to use as many cores as possible to complete the jobs.

Alternatively, this has a simpler syntax:

parallel --eta -j+0 'ruby initial.rb {}' ::: 1 2

or, with a shell glob:

parallel --eta -j+0 'ruby initial.rb {}' ::: *.txt

More examples here

GNU citation:

1
O. Tange (2011): GNU Parallel - The Command-Line Power Tool, The USENIX Magazine, February 2011:42-47.