------------------------------------------------------------------------------- Parallel Perl script to provide multi-process xargs like multi-tasking. See "info/apps/xargs.txt" While Gnu-xargs can now do parallel processing, it mixes up the output while parrallel works to preserve the output order, keeping results together. using a tmp (--tmpdir). That means of course output is delaied until commands are finished either in sequence (normal), or as they finish (??), or tell it not to (-u or --ungroup). ------------------------------------------------------------------------------- Citation Needed... Arrrgghh... NOTE: I no longer recommend "parellel" due to the 'citation notice' that is output to stderr. This makes it much more difficult to use and far less portible. But if you need to use it, you need to create a file "will-cite" in any of the following directories to stop this 'citation notice'. $PARELLEL_HOME $XDG_CONFIG_HOME/parallel (for each dir in ':' list $XDG_CONFIG_DIRS)/parallel $HOME/.parallel EG: mkdir ~/.parallel touch ~/.parallel/will-cite Or add "--will-cite" to every command in the script, though author wants 10,000 EURO to allow you to do that! ------------------------------------------------------------------------------- Parallel vs XArgs Parallel commands are passed to the shell, xargs are exec'd Parallel can take input from stdin, files, or arguments It will multi-loop over multiple inputs. Use ::: for cli-args and :::: to loop over a file of arguments It will also accept stdin but then has other limitations EG: this output 3*3*3 or 27 lines! seq 3 > numbers.dat parallel echo "{1} {2} {3}" ::: a b c :::: numbers.dat ::: X Y Z Note {} outputs all args used, and --tag prefix ouput with that. parallel --tag echo "- {2} - {}" ::: a b c :::: numbers.dat ::: X Y Z You can name those sources using --header parallel --header : echo '{gender}' '{size}' \ ::: gender M F \ ::: size S M L XL XXL You can use columns --colsep {delim} Or even read multiple input lines (record groups) -N# Remote Execution You can have it transfer the input file, or pass a 'script' to a remote machine before hand, and clean up afterwards. And have a prepared cluster of machines to distribute jobs to. See --sshloginfile ------------------------------------------------------------------------------- With Wget/Curl Using Parallel or Xargs -P to wget multiple files cat url_list | parallel "wget -q {} 2>/dev/null || echo {} >> url_failed " note wget can use 'keep-alive' with connections so this will give each wget a few URL's each to work with. parallel -j20 wget -q < links.urls parallel --will-cite --line-buffer nohup.out 2>&1 & The "headers.txt" file can be used to set auth, and cookies needed For example User-Agent: Mozilla/5.0 Chrome/56 Accept-Encoding: gzip, deflate, sdch, br Cookie: JSESSIONID=DBE1FED5C040B2DF7; NOTE: With no command, parallel takes the input as being a list of commands to execute. See also the script bin/www/crawl_parallel ------------------------------------------------------------------------------- Get a set of dated files. For example files 01 to 10 of the last 30 days (in reverse date order) in the form of YYYY-MM-DD_nn.jpg parallel \ wget http://site/path/'$(date -d "today -{1} days" +%Y-%m-%d)'_{2}.jpg \ ::: $(seq 30) ::: $(seq -f %02g 10) ------------------------------------------------------------------------------- Using parallel to run a remote commands Run a remote command on a lot of machines in parallel! Note the order of in which remote machines are selected is randomized and it not preserved even though the argument order is preserved sed 's|^|1/r -x |' hostlist | parallel --sshloginfile - --keeporder \ echo '{} : `hostname`' :::: hostlist Preserving remote machine order means parallel needs to do the ssh itself. Note the need to escape the backquotes because of the double shell parsing. parallel --keeporder \ r -x '{}' 'echo {} : \`hostname\`' :::: hostlist parallel --keeporder \ r -x '{}' 'echo {} : \`hostname\`' \ :::: <(egrep '^\].* (RH|SOL)' ~/misc/dist_accts|awk '{print $3}'|uniq) or parallel --keeporder r -x '{}' : `hostname`' :::: <(find_acct -l r -g RH) See also "info/apps/mpi_pbs.txt" for using it in PBS cluster computing. -------------------------------------------------------------------------------