------------------------------------------------------------------------------- Filename Path, Basename, and Suffix Bourne Shell file="/path/to.the/file.tar.gz" Resolve Symbolic Links (empty if none) readlink -f "$file" Extract Path (NOTE: may be empty) path="`expr "$file" : '\(.*\)/'`" : ${path:=.} # use '.' if no path OR (files if no '/' present) dirname=`echo "$file" | sed -e 's,[\\/][^\\/][^\\/]*$,,'` Remove Path (basename alternative) basename="`expr "//$file" : '.*/\([^/]*\)'`" Get Suffix (path kept if present) suffix="`expr "$file" : '.*\.\([^./]*\)$'`" Remove Suffix (path kept if present) name="`expr "$file" : '\(.*\)\.[^./]*$' \| "$file"`" Bash (zsh, ksh93) methods -- note the pairings f=path/to.the/file.tar.gz echo ${f##*/} # filename (strip to last '/') file.tar.gz echo ${f%/*} # path (strip after last '/') path/to.the echo ${f%.*} # remove first suffix (can fail, if no suffix, with path) path/to/file.tar echo ${f%%.*} # remove all suffix (fails for a '.' in path!) path/to echo ${f/%.gz/.bz2} # replace specific suffix path/to.the/file.tar.bz2 # -------- f=${f##*/} # extract filename (to work on) echo $f file.tar.gz echo "${f##*.}" # get suffix gz echo "${f%.*}" # remove suffix file.tar echo "${f#*.}" # get all the suffixes tar.gz echo "${f%%.*}" # get basename file # ------- d="${d%/}" # remove any trailing '/' on directory name Loop over files except those with specific suffixes EG: ignore files with "~" or "," appended or with suffixes such as ".swp", ".rpmnew", ",v", etc. ASIDE: a direct glob match might be better! # All files, ignoring *~ and *, for i in $(LC_ALL=C; echo ${1%/}/*[^~,]) ; do [ -d $i ] && continue # Ignore directories # Ignore *.{rpmsave,rpmorig,rpmnew,swp,cfsaved} scripts [ "${i%.cfsaved}" != "${i}" ] && continue [ "${i%.rpmsave}" != "${i}" ] && continue [ "${i%.rpmorig}" != "${i}" ] && continue [ "${i%.rpmnew}" != "${i}" ] && continue [ "${i%.swp}" != "${i}" ] && continue [ "${i%,v}" != "${i}" ] && continue ... done Check input file fully. FILE="${1:-/dev/stdin}" [ ! -f $FILE ] && Usage "Error \"$FILE\": does not exists" [ ! -r $FILE ] && Usage "Error \"$FILE\": is not readable" Same file (symbolic or hardlinked), EG: same device and inode (bash) [ /path/to/file1 -ef /path/to/file2 ] ------------------------------------------------------------------------------- glob, globbing, GLOB Only happens if one of these are present outside quotes. * any number of chars (not starting with '.') ? any single char [a-zD] any of these chars [^m-z] not these chars NOTE: if gobl expansions fail the word is left as is. Loop over matching files, but ignore glob failures NB: If bash is use you can use "shopt -s nullglob" to stop null glob failures for f in cdr_*.bz2; do [[ $f = *'*'* ]] && break # glob failure, no files, break loop ... done NOTE {a,b,c,d} is expanded seperatally, in sequence, before globbing and each expandsion may result in failed globs. echo {d,c,b,a}* # => dl c* bin ansible apps archives Other shopts nullglob return no args if glab fails to match dotglob include '.' at start of files in matches (except "." and ".." ) failglob outptu error and do not execute command (expansion error) nocaseglob make alphanumericas case insinsitive. extglob extended globbing globstar allow ** to match across path char (recursive) List directories recursivally shopt globstar printf "%s\n" **/ --- Extended Globs Note that if a file matchs multiple times, it does NOT double up! Unlike what you may get using seperate globs of a complex file list @(glob|glob|glob) globs must match one time only ********* ?(glob|glob|glob) globs match zero or one time in filename *(glob|glob|glob) globs match zero or more times +(glob|glob|glob) globs match one or more times !(glob|glob|glob) Negative Globbing! Must not match pattern Example meaning of the prefix flag... touch ac azc azzc azzzc echo a@(z)c # => azc echo a?(z)c # => ac azc echo a*(z)c # => ac azc azzc azzzc echo a+(z)c # => azc azzc azzzc echo a!(z)c # => ac azzc azzzc echo a!(z*)c # => ac rm a*(z)c Example use, list files with a compression suffix echo *.@(z|gz|bz2) Negative globs... echo a* # => ansible apps archives echo !(a*) # => bin dl git info lib misc projects store work All files except "README" echo !(README) Prevent multiple file matches using multiple globs echo a* -- *s # => ansible apps archives -- apps archives projects echo @(a*|*s) # => ansible apps archives projects ------------------------------------------------------------------------------- File Descriptors... What file descriptors are open (including the 'ls' to the /dev/fd directory) ls -l /dev/fd/ The permissions define if the descriptor is actually read or write ------------------------------------------------------------------------------- File Permissions and other information... "stat" returns file information in great depth... > stat .bashrc File: .bashrc Size: 363 Blocks: 8 IO Block: 4096 regular file Device: fd00h/64768d Inode: 81817059 Links: 1 Access: (0600/-rw-------) Uid: ( 1000/ anthony) Gid: ( 1000/ anthony) Access: 2019-07-03 10:58:03.586387861 +1000 Modify: 2019-07-03 10:58:03.586387861 +1000 Change: 2019-07-03 10:58:03.592387912 +1000 Birth: - Get user and group of home directory NOTE: "ls" is known to munge the results in some cases... # find returns names OR if no name then returns the uid or gid read user group < <(find "$HOME" -prune -printf "%u %g\n") echo "user=$user group=$group" # Get user and group using stat # WARNING returns "UNKNOWN if no name is present -- BAD DO NOT USE read user group < <(stat --printf "%U %G" "$HOME") echo "user=$user group=$group" # Get uid and gid numbers using stat read uid gid < <(stat --printf "%u %g" "$HOME") echo "uid=$uid gid=$gid" # File system info stat --printf "FS Blocks:\n total: %b\n free: %f\n bsize: %S \n" \ --file-system . ------------------------------------------------------------------------------- Copy-N-Paste Problem... When you copy and paste text from any Xwindow application (For example: XTerm, "vim" in XTerm, GVim, Gedit, parcellite (clipboard), xselection, ....) the lines pasted contain RETURN characters as the end-of-line markers. For Example if you are reading this in a "vim" editor, try Copy-N-Paste this to the command line... cat -v - This is a multi-line file for Copy-N-Paste Testing. (Press the 'return' key see results. Followed by ^D for EOF) The output from copy will only use RETURN chars, without the expected NEWLINE characters. Many things do not handle this properly (including "cat") This can make Copy-N-Pasting data from a file onto a command line, or anything else that is not a shell or a editor, a real problem. See next for solutions... Using a keyboard Paste Macro can fix this, as it types the text in correctly as if it really was typed. See "info/X/event_handling.txt" for keyboard macro development ------------------------------------------------------------------------------- HERE, Here, here files can be fun! Important: See previous Copy-N-Paste problem. Basically you cannot copy from a editor directly into a command, but you can have to have it as a argument to a command, to fix the problem. This is a typical Here file... No indentation, and a file EOF... It is very hard to see what is and is not part of the here file. =======8<-------- cat <<'EOF' This is documentation that would be printed as it appear here, until it sees the defined label on a line by itself. The quotes around the label define the quoting used within the herefile itself. for example if you use "" you can allow variable and command substitution within the here file. The problem is that you can NOT indent the here file text with ANY whitespace, as that space will be part of the input. This means it looks totally out of place in text file documents, or program code. EOF =======8<-------- this can also be useful... read until the first blank line! cat <<"" Simple Indented HERE Documents WARNING: Next example has TAB indents (which probably are no longer present) =======8<-------- echo <<-'EOF' This used to have tabs on the front that you can use to indent the documentation. The tabs get removed... However it relies of TABs and loosing them can cause the shell to miss the EOF marker. Tabs are not something that you can gurantee will not be lost! EOF =======8<-------- Losing the tabs is a major problem. It is white space, and you should not rely on white space (either amount or type). Here use use a small program to remove indents, and marks what is the true left edge of the text. =======8<-------- sed 's/^ *|\(0,1\}//' <<-'EOF' | This is a space+bar indented her file. | This looks neater than a normal tab indented 'herefile'. | | It also preserves indenting in the document! | Like these two lines. | | However the final 'EOF' label still needs to be either hard to the | left, or tab indented. That means you're still dependant on them. EOF =======8<-------- BUT we still have a out-of-place final EOF, and handling variable substitution is still a problem. --- The IDEAL Indented HERE Documents Use single quotes for the DATA block. This solves both problems (for bourne shells) And can be used in 'copy and paste' to command-line documentation. Only problem is you need to escape single quotes within the text. echo ' | Single '\''Quotes'\'' | Double "Quotes" | Indented Lines | # Hash Comments in the "|" lines are preserved! | (or can be removed by sed if desired) | # This is a HERE document, and this comment line is ignored! This line is also ignored as it has no "|" at start (not recommended) This and following Blank line is also ignored. | You can have Variable Inclusions such as HOME='"$HOME"'. | Though it requires you to swap quoting modes. | Basically this works extrememly well, with no big problems. | ' | sed -n '/^ *|/!d; s///; s/^ //; p' > dest_file NOTE: The quoted string not only removes the need for a tab indented EOF, but also removes the need for "EOF" label. The 'sed' script also stops problems with EOL's that is caused by Copy-N-Paste (see above) Perl version (much simplier) =======8<-------- echo ' | Line One | Line Two (indented) | Line Three with '\''quoted string'\'' | # Included comment # while this is a comment and is ignored! ' | perl -ne 's/^\s*\| ?// && print' > dest_file =======8<-------- NOTE that any line not starting with a 'bar' is ignored, so you can include installation comment(s) inside the here file, that is not part of the file. This is a version that also removes comments from inside the 'here' file Useful to document initallation of config files that normally can't accept comments, such as the firewall configuration file. =======8<-------- echo ' # append ports to a firewall config file | --port=111:tcp # rpcd - the portmapper, to find these ports | --port=111:udp # --port=662:tcp # statd ??? don't include for now # --port=662:udp | --service=nfs # port tcp:2049 ' | perl -ne 's/\s*#.*$//; s/^\s*\| // && print' \ >> /etc/sysconfig/system-config-firewall =======8<-------- This one flushes and pauses after each line of input. It was used for slow transmission to a network MAIL srever, for testing, so you can see results of each line sent, immediatally after that line is sent. It also outputs a copy of the input to stderr (for debugging display). =======8<-------- echo ' | HELO xyzzy | MAIL FROM: root@nowhere.land | RCPT TO: anthony | DATA | From: A.Thyssen@test | To: no-one-really@foobar.gu.edu.au | Subject: Testing | | This is a test | . | QUIT ' | perl -ne '$|=1; s/^\s*\| ?//||next; print; print stderr; sleep 1' | socat - tcp:localhost:smtp,crlf =======8<-------- NOTE: The 'crlf' option is because many SMTP servers require a return-newline EOLs as per the INET RFC of the protocol. Most LINUX based mail servers do not have that requirement, but many others do (symantec mail gateways). ------------------------------------------------------------------------------- Signal Traps NOTE: Some machine use different numbers for signal after 15, Better to use signal names if posible. If a trep runs "exit" the EXIT trap will then be executed. As such cleanup code is better put in an EXIT trap, while interupt traps run "exit". If you don't use "exit" in the EXIT trap, the original exit value is used for the final exit code. Example use... #/bin/sh setup_display() {echo ...setup...;} reset_display() {echo ...reset...;} trap 'reset_display' EXIT trap 'reset_display; suspend' TSTP trap 'setup_display' CONT # exit code 130 is the Interupt code (128) plus exit value of 2 (INT) trap 'exit 130' HUP INT QUIT ABRT TERM setup_display sleep 20 # do things exit as normal to reset --- You can get the exit value (for a trap EXIT) using $? at the start. cleanup_display() { exit_code=$? echo Exiting with $exit_code reset_display exit $exit_code } trap 'cleanup_display' EXIT --- You can also discover what command that was beening run using $BASH_COMMAND $BASH_LINENO (these do not update during a trap). Though they are more useful for the special shell tracing traps DEBUG RETURN ERR ------------------------------------------------------------------------------- Safe Temporary Files... Single File... =======8<-------- # Set up temporary file with auto-cleanup umask 77 tmp=$(mktemp "${TMPDIR:-/tmp}/$PROGNAME.XXXXXXXXXX") || { echo >&2 "$PROGNAME: Unable to create temporary file"; exit 10;} trap 'rm -f "$tmp"' EXIT trap 'exit 130' HUP INT QUIT ABRT ALRM TERM command > "$tmp" # Remove auto-cleanup (not normally needed) rm -f "$tmp" trap - EXIT HUP INT QUIT ABRT ALRM TERM =======8<-------- Note the temporary file is already created with appropriate restricted permissions. The $TMPDIR is a typical environment variable users often set to specify a different 'personal' directory for temporary files (generally due to disk space requirements, and/or security). The exit value is preserved by the EXIT signal trap, unless it uses "exit" itself --- Multiple Files (directory)... If multiple temporary files are needed, a better idea is to create your own temporary directory for them (using a -d flag) =======8<-------- umask 77 tmpdir=`mktemp -d "${TMPDIR:-/tmp}/$PROGNAME.XXXXXXXXXX"` || { echo >&2 "$PROGNAME: Unable to create temporary directory"; exit 1;} trap 'rm -rf "$tmpdir"' EXIT # remove when finished (on end or exit) trap 'exit 130' HUP INT QUIT ABRT ALRM TERM # terminate script on signal command > "$tmpdir/A" =======8<-------- Now you can use any filename in the temporary directory, and all will be automatically cleaned up when finished. --- The alternative to using "mktemp" is to just use "$PROGNAME.$$" though that may still clash, so adding a extra random number may be a good idea. In BASH use "$PROGNAME-$$-$RANDOM" In old bourne SH you can use "$PROGNAME-$$-`awk 'BEGIN { srand (); print rand() }'`" ------------------------------------------------------------------------------- Creating Large Files... Two types... Filled and holey (sparse) files. --- Completely Filled File (often for use as a swap file) Create a 8Mbyte byte file dd if=/dev/zero of=/export/swap/XXX bs=8k count=1024 Under solaris mkfile -v 8m swap-file Fill a whole disk partition with 4 Mbyte files (a poor mans "shread") # create first 4M file (index 0) dd if=/dev/zero of=00000 bs=4k count=1024 # copy that file until disk fill produces a write error i=0; while i=$(($i+1)); do cp 00000 `printf '%05d' $i` || break; done --- Holey or Sparse File Creation... File can be bigger than the actual file system! dd if=/dev/zero of=/tmp/holey bs=1 count=0 seek=1000000 perl -e 'open(FILE, ">/tmp/holey"); seek(FILE,1000000,0); print FILE "\n";' echo | socat -u - file:/tmp/holey,create,largefile,seek=1000000 under solaris, use -n with "mkfile" to make a holey file mkfile -n -v 4m /tmp/holey ------------------------------------------------------------------------------- Generating file sequences... See also "generating a list of numbers" in "info/apps/general.txt" Saving data into numbered files... Just save files in sequence... for num in `seq -f %03d 999`; do #num=`printf %03d $num` # do whatever and save to next sequence file touch file_$num.suffix done For a restartable processes... Start with the next file number that is not present (ignoring gaps). This will continue to generate files in sequence even if gaps are present in the previously generated files. # Determine the largest number of the last file generated num=`ls file_*.suffix 2>/dev/null | tail -n1 | sed 's/[^0-9]//g'` [ ! "$num" ] && num=0 # set to zero if no previous file found while :; do num=`expr $num + 1` # next file number to be created num=`printf %03d $num` # do whatever and save to next sequence file touch file_$num.suffix # break when finished! done The above will handle files being randomly deleted, before, during, or after. To allow multiple processes to create numbers sequences, some type of locking mechanism will be needed (see "file locking") and constantly looking for largest file numabe yet used. ------------------------------------------------------------------------------- For Opening files and complex stream handling See "file_handles.txt" ------------------------------------------------------------------------------- For complex reading of INPUT EG: binary input, non-blocking, interactive key stokes, timeout, etc See "input_reading.txt" ------------------------------------------------------------------------------- Character Codes BASH FAQ 71: Convert characters http://mywiki.wooledge.org/BashFAQ/071 Showing hexadecimal, but octal is just as easy char=Z str=A-Z # bash built-in (note the "'" in "printf" input) printf '%02X' "'$char" 5A # external "printf" printf '%02X' "'$char" 5A # using "od" echo -n "$char" | od -An -tx1 | tr -d ' \011' 5a # using "xxd" xxd -p -u <<< "$str" 412D5A0A xxd -p -u <<< "$str" | sed 's/\(..\)/0x&, /g; s/, $//;' 0x41, 0x2D, 0x5A, 0x0A # using "hexdump" note the reversal! hexdump -e '"%X"' <<<"$str"; echo A5A2D41 NOTE: "printf"s return 00 for an empty string (correct result for "read -N1") While "od" preserves the empty string (good for "read -n1", as EOL is empty) code=5a # bash built-in echo -e "\x$code" Z # external "printf" printf "\\x$code\n" Z # using "xxd" xxd -r <<<"0 $code"; echo Z ------------------------------------------------------------------------------- Reading Whole Files Read file in single variable B="`cat .plan`" or B=$(<.plan) # <---- *** Bashism *** Read file into a array of white-spaced words! - includes '#' lines B=($(<".plan")) Or into array of lines (initial blank lines ignored) IFS=$'\n' B=($(<".plan")) Or to preserve the file as a one variable array OIFS="$IFS" IFS='' B=($(<".project")) IFS="$OIFS" Also See "Break string into words" in "script.hints" Looped Read Lines NOTE: The "-r" option ignores a backslash line continuations while read -r line; do echo "\"$line\"" done < .plan However if the last line did not end in newline it will fail to process that line... This will process it if last line is not empty. while read -r line || [[ -n "$line" ]]; do echo "\"$line\"" done < .plan Also read automatically trims leading and trailing whitespace to prevent this set IFS for the read while IFS= read -r line; do echo "\"$line\"" done < .plan You can use the temporary IFS to set a field seperator The '_' is a throw away variable name while IFS=: read -r user _ _ _ _ _ shell _; do echo "$user -> $shell" done < /etc/passwd Remove '#' full-line comment while IFS= read -r line; do [[ $line = \#* ]] && continue # ignore comment lines echo "\"$line\"" done < .plan In a BIG loop construct having filename at the end of the file can make it harder to read, but using 'cat-pipeline' has sub-shell issues. Using a file descriptor can solve this. This can also be used to read from the file in multiple places open it using a file descriptor (see below) exec 3MB name="_app" mach="na-prd-iasjet" # OLD data dir="/usr/local/nagios/share/perfdata/$mach" xml="$dir/Linux_Check_Local_Disk_S3_-_PRD.xml" typeset -A data while IFS=$'\n' read -r line; do # read each line and assign to BASH hash table read var val < <(echo "$line" | sed 's/^ */ /; s/<\/[^>]*>$//') data[$var]="$val" done < <(perl -e ' $block="DATASOURCE"; # read these data blocks from XML file $/ = ""; # data block boundary while(<>) { # look for specific block of data next unless /'"$name"'<\/NAME>/; s/.*<$block>\n//s; # remove prefix s/<\/$block>.*//s; # remove suffix print; exit; }' "$xml" ) unit="${data[UNIT]}" Read whole File Speed... Shell built-in (bash) -- is faster time for I in $(seq 1 1000); do B=$(<".plan"); done real 0m0.679s Cat time for I in $(seq 1 1000); do B="`cat .plan`"; done real 0m0.939s Sed time for I in $(seq 1 1000); do B="`sed 's/^/|/' .plan`"; done real 0m1.444s Awk time for I in $(seq 1 1000); do B="`awk '{print "|",$0}' .plan`"; done real 0m1.997s ------------------------------------------------------------------------------- Bash mapfile reading file into array, newlines preserved (unless -t opion given) # real file into array, striping newlines (-t) # mapfile -t < <(seq 9) printf '%s\n' "${MAPFILE[@]}" # read equivelent (to lines array) read -d'\n' -a lines < <(seq 9) printf '%s\n' "${lines[@]}" # you can specify what lines you want # skip first 2 lines (-s), then read next 3 lines (-n) # mapfile -t -s 2 -n 3 lines < <(seq 9) printf '%s\n' "${lines[@]}" # Read from a descriptor -u (output from seq) # skip 3 lines (-s), then read next 4 lines (-n) # exec {file}< <(seq 9) # open a file mapfile -t -u$file -s3 -n4 lines # read lines 4 to 7 exec {file}<&- # close file printf '%s\n' "${lines[@]}" Bash mapfile callback (blocks of input) # With a callback: # $1 index of last line read, $2 is the last line read. process() { echo -n line $1 ': ' "$2"; } mapfile -C process -c 1 < <(seq 9) #echo "${MAPFILE[@]}" # So index matches actual line numbers use -O 1 unset MAPFILE # ensure index 0 is cleared from previous run! mapfile -C process -c 1 -O 1 < <(seq 9) #echo "${MAPFILE[@]}" # Callback on a blocks of lines... # Call progress every 10 lines progress () { echo -en " |$1|\r"; sleep 1; } mapfile -n200 -c10 -O 1 -C progress lines < <(yes); \ echo "" # Callback is not called if last group has a incomplete line count # As such you will need to handle last callback youself # # WARNING: last line is not yet stored into array when process is called! # process() { echo " ${lines[-2]}" # second last line at this point echo " ${lines[-1]}" # last line saved at this point printf "%2d: %s\n" $1 "$2" # current line and index } unset lines mapfile -C process -c 3 -O 1 -t lines < <(seq 8) echo "${lines[@]}" # Note that lines 7 and 8 was NOT processed (not a whole block) unset lines mapfile -C process -c 3 -O 1 -t lines < <(seq 9) # But it was was handled this time! (whole block) # Deal with incomplete last callback (assumes -O 1 is used) process() { for (( i=last+1; i<$1 ; i++ )); do printf "%2d: %s\n" $i "${lines[i]}" done (( last < $1 )) && printf "%2d# %s\n" $1 "$2" last=$1 } unset lines last mapfile -C process -c 5 -O 1 -t lines < <(seq 9) process ${#lines[@]} "${lines[-1]}" ------------------------------------------------------------------------------- In-place Editing, in-place edit A good general guide... http://mywiki.wooledge.org/BashFAQ/021 https://unix.stackexchange.com/questions/159513/ Including why 'copy, and mv-replace' can be a problem, though its prefered. Remove a specific line from CLI sed -i '551 d' .ssh/known_hosts perl -i -pe '$_="" if $. == 551' .ssh/known_hosts Or RE replacement.... sed -i 's/foo/bar/g' file perl -i -pe 's/foo/bar/g' file Use a temporary file (what the previous does) This however will likely break symbolic and hard links grep -v 'pattern' < file > temp; mv temp file To stop symlink breaking, you need to copy back instead of move (replace) grep -v 'pattern' < file > temp; cp temp file; rm temp Using "sponge" (only opens its argument when input is complete)... ASIDE: "sponge" is atomic if $TMPDIR is on same file system Also an '-a' option lets you append, without input reading the data just output grep -v 'pattern' < file | sponge file If the input and output file length remains the same (sort, tr, cut, shuf), That is the read point is guranteed to stay ahead of the write point, Then you can use... seq 10 > file shuf < file | dd of=file conv=notrunc 2>/dev/null cat file NOTE: The file is not truncated if result is smaller (such as for "sed") This is equivelent, opening stdout as read-write, without truncation. Some commands (grep, sed) may refuse to work with input and output to the same file. seq 10 > file shuf < file 1<> file cat file Note on speed... sed is slowest perl faster grep -v; move faster as it is simplier! and the last (no truncation) fastest (one command) --- Using stream buffers... (rm -f "file"; sed 's/foo/bar/g' > "file") < "file" or using a delay command example.txt | (sleep 1;rm example.txt;cat > example.txt) OR (if file can fit in memory/swap) echo "$(sed 's/foo/bar/g' filename)" >filename WARNING: The above can go really wrong, particularly on user interupt. As it provides no way of aborting on failure. I do not recomend it in anything but temporary files. --- Direct file editing... VIM can also do this using 'silent' (no screen) mode. vim -es +"%s/foo/bar/g" +"wq" foo.txt ED can also be more easily used with BASH, and a here file or string ed -s file <<< $'g/foo/s//bar/g\nw\nq' EX is even older than "ed" (simlink to vim) The 'x' is the ex/vim command to save and quit ex -sc '%s/olddomain\.com/newdomain.com/g' -cx file ex -sc '%!sort' -cx file ------------------------------------------------------------------------------- How do I get a particular line or range of lines from a file? Read into a variable, better to use "mapfile" EG read line $n into $x from $file mapfile -ts "$((n - 1))" -n 1 x <"$file" printf '%s\n' "$x" Using external tools... first line only head -1 sed q " into variable mapfile -t -n 1 line < <(seq 10) first N lines head - sed q awk '{print} NR== {exit}' awk 'NR== {exit} {print}' " into array mapfile -t -n 5 lines < <(seq 10) echo ${lines[@]} single line sed -n 'p;q' sed 'q;d' " get 5'th mapfile -t -s 4 -n 1 line < <(seq 10) range of lines (inclusive) sed -n ',$p;q' sed ',$!d;q' range of lines (exclusive) sed ',!d;d;d' # runs to EOF! awk '/END/{A=0} {if(A) print} /START/{A=1}' files... Up to this line (exclusive) sed '/END/Q' # GNU-sed sed -n '/END/!p;//q' sed -n '/END/q;p' last line sed '$!d' last two lines sed '$!N;$!D' second last line sed '$!{h;d;}; x' last lines tail + sed -e :a -e '$q;N;,$D;ba' NOTE: The `q' in the sed commands above is to avoid un-needed computer cycles, similarly for the exit in the awk script below. Sed could also replace the head and tail commands, in a similar fashion, but is slower. --- The problem with the above is if you want the lines relative to both the start and the end of the file at the same time. If you are dealing with a REAL file and not a pipe you can use "ed", which will know how many lines are in the file. However in a pipeline (like "sed") that is more difficult as it will not know how many lines are left to go. All lines but last 5 lines head -n-5 file # GNU head echo '1,$-5 p' | ed -s file sed -e :a -e '$d; N; 2,5ba' -e 'P;D' file # Awk using a rolling buffer! awk -v n=5 '{if(NR>n) print a[NR%n]; a[NR%n]=$0}' # Reversing the file and doing a tail... tac | tail -n+6 | tac # note number is one more than needed tac | sed '1,5d' | tac NOTE: for "ed" to get the last 10th to the 5th last line (6 lines) echo '$-10,$-5 p' | ed -s file A random line from a file however has fewer simple solutions, without having to calculate the total number of lines in the file before hand. For example, this does not work though logically it should... nawk ' BEGIN {srand(); RNUM = int(LNUM * rand())+1} NR == RNUM {print $0; exit 0} ' LNUM=`wc -l < file` file problems RNUM = 0 = no output, RNUM = 1 = first line, RNUM = LNUM-1 = 2rd last line RNUM = LNUM = NO OUTPUT <--- The problem A perl solution was given in the perl cookbook, which only requires a single pass though a file to extract a single random line. It works on incrementally adjusting the probabilities as each line is read, so any line gets a equal chance of being picked, in one pass. It may have compounding arthimetic error problems, though I don't kow for sure. You can also use programs like... shuf - generate random permutations (GNU) ------------------------------------------------------------------------------- Extract using File Markers... Answers mostly from... https://stackoverflow.com/questions/7451423 WARNING: Many of the following assumes either The file only has one matching line, or they are well seperated Get just the next line after match sed -n '/Regex/{n;p;}' # awk version (fails on consecutive matches) awk '/Regex/{getline; print}' # awk version without getline (implicit print) awk 'line && NR==line+1; /Regex/ {line=NR}' # Prints the first non-matching line, after as set of matches awk '/Regex/ {f=NR}; f && NR==f+1' Get N lines after the matching line awk -v lines=7 '/Regex/ {for(i=lines;i;--i)getline; print $0 }' Get ALL lines after match (exclusive) sed '1,/Regex/d' Get ALL lines after match (inclusive) sed -n '/Regex/,$ p' Print lines between markers AAAA to BBBB sed -n '/AAAA/,$p; /BBBB/q' file Excluding the end marker sed -n '/BBBB/q; /AAAA/,$p' file Exclusive of both markers sed -n '/BBBB/q; 1,/AAAA/d; p' file Print contents of lines between two tags EG: .... .... .... awk 'BEGIN{ RS=""}{gsub(/.*/,"")}1' file Print Paragraphs that contain AAA BBB or CCC sed -e '/./{H;$!d;}; x;/AAA/!d;/BBB/!d;/CCC/!d' ------------------------------------------------------------------------------- uniq without sorting! WARNING: The first line of duplicates are kept, not last! This can effect the resulting order of results. perl -ne 'print unless $a{$_}++' awk ' !x[$0]++' # more efficent awk version awk '!($0 in a) { a[$0]; print }' # keeping (ignoring) empty lines awk '!NF || !x[$0]++' BASH arrays (slow)... declare -a LINES while read; do for n in "${LINES[@]}"; do if [[ $n == $REPLY ]]; then continue 2 fi done LINES=("${LINES[@]}" "$REPLY") echo "$REPLY" done For huge files... https://stackoverflow.com/questions/30906191/ Storing hash values rather than full lines (for files with long lines) perl -ne 'use Digest::MD5 qw(md5_base64); print unless $seen{md5_base64($_)}++ ' huge.txt Note that each hash needs 120 bytes which could be longer than original lines! Alternative using chunks if duplicate lines are not far apart)... cat huge.txt | pv | parallel --pipe --keep-order --block 100M -j4 -q \ perl -ne 'use Digest::MD5 qw(md5_base64); print unless $seen{md5_base64($_)}++' \ > uniq.txt Use "bloom filters" (Bloom::Faster module of Perl) https://metacpan.org/pod/release/PALVARO/Bloom-Faster-1.7/lib/Bloom/Faster.pm perl -e 'use Bloom::Faster; my $f = new Bloom::Faster({n => 100000000, e => 0.00001}); while(<>) { print unless $f->add($_); } ' huge.txt > uniq.txt ------------------------------------------------------------------------------- Delete lines found in one file, from another file Situation: you have a master file "master" and you want to remove all entries found in another file, leaving just the 'new' ones. If you can sort or re-order the master file, and all entries are unique, then you can use... cat master list list | sort | uniq -u > new_master An alturnative is to use "comm" on sorted versions of the lists. sort -o master master sort -o list list comm -23 master list > new_master Of course you can also get the items which are not present in the master file, and those which are in BOTH files (union) using different 'comm' options. fgrep fgrep -v -f list master > new_master WARNING: This does not limit test to whole lines! small list elements could sub-string match a larger longer line!! ------------------------------------------------------------------------------- Compress Blank lines see "general.txt" ------------------------------------------------------------------------------- Reverse the line order in a file tac The reverse of "cat"! sed '/\n/!G; s/\(.\)\(.*\n\)/&\2\1/; //D; s/.//' sed -nf '1!G; $p; h' from "info sed" perl -e "print reverse <>" the perl solution is very simple! awk '{x[NR]=$0}END{for(i=NR;i>0;i--){print x[i]}}' infile >outfile NOTE: this will barf of extreamly large files. ed - infile <<-EOF g/^/m0 w EOF ed 'in-place' solution (twice as fast as the above awk solution) NOTE: this will gag on long lines instead of long files NOTE that :g/^/m0 will also work directly in vi too vim... reverses lines from a+1 to b :'a,'b g/^$/ m 'a ------------------------------------------------------------------------------- reverse characters within in a line rev sed -e ' # skip lines with less than two characters /../! b # Embed newlines at both ends # then slowly move markers toward middle :x s/\(\n.\)\(.*\)\(.\n\)/\3\2\1/ tx # remove newline markers s/\n//g ' ------------------------------------------------------------------------------- Exclusive Lock for a Process =======8<-------- #!/bin/bash # # Using "flock" to ensure script only runs once. # Opens a file descriptor on the script itself. # The file descriptor will close when script exits. # # Get file descriptor for an Exclusive Lock exec {script_lock}<"$0" # open this script for the lock # Get-Test for a lock, so we can warn we will be waiting (or aborting) if flock --nonblock $script_lock; then : All good continue else # echo "Script already running... ABORTING" # exit 10 # OR echo "Script Locked... Waiting for Release..." flock $script_lock fi echo "Exclusive Lock Achieved... Running..." exec countdown # OR release when ready with # exec {script_lock}<&- =======8<-------- NOTE; this is flock, which does involve lockd Older Lock File Method... File Creation ( set -o noclobber; echo > my.lock ) || echo 'Failed to create lock file' File Permission ( file mode = 000 ) Create a lockfile with zero permission. The example below uses this method in shell. It creates the lockfile and places the current pid of the process in it. NOTE: a trap should be provided to remove the lock file on any abnormal exit by the program running this. WARNING: This technique does not work for ROOT which will always succeed to create the file even if it exists. Lockfile() { # create a lockfile with the process ID in it masksave=`umask`; umask 777 ( echo $$ > $1 ) 2>/dev/null; success=$? while [ $success -ne 0 ]; do sleep 2 ( echo $$ > $1 ) 2>/dev/null; success=$? done umask $masksave } Exclusive Open This requires the open(2) command to use the O_CREAT & O_EXCL flags to ensure that a file is opens if it had to be created. The csh 'noclobber' flag should do this (Look at source). NOTE: This does NOT work over NFS (unless version 3 release) Hard Links to a file 'ln' This works for root but on System V the 'ln' command removes any file that the link is created for (ala 'mv' ) as such on System V this fails for both users and root. This is the method normally used for passwd file locking. NOTE: This method is known to work over NFS Symbolic Links 'ln -s' It is not known if this has the same problem on System V or if this is atomic over NFS. File rename using rename() Create a unique file with the process PID /tmp/data.lock.$PID then rename it to the lock filename /tmp/data.lock lock directory instead of a file 'mkdir' This should work properly in all cases but is unknown if this is atomic over NFS. NOTE: The lockfile often contains the process-ID of the process locking the file. When the program notices that a file has a lock, it can then check to see if the other process still exists (using a kill 0 and looking at the return code and errno). This implementation fails miserably with NFS-mounted files, because they could easily be lock by a process on a remote machine. This is a fairly brain-dead locking mechanism which was fine when everyone worked on a single VAX without networked file systems, but is now obviously inadequate. Still it works for many situations. Alternitivly a scheme in checking the lockfile creation date is possible. For more locking info please see C/locking.hints ------------------------------------------------------------------------------- To match all files in a directory (required 3 pattern matches) .[^.] .??* * ^This may be a ! n some shells, or in real ancheit shells (sh on ultrix) no `not' function may not be provided for shell regex. ------------------------------------------------------------------------------- Test for Newer Files ls: newer() { # is file 1 newer than all others given [ `ls -1rtd "$@" | tail -1` = "$1" ] } older() { # is file 1 older than all others given [ `ls -1td "$@" | tail -1` = "$1" ] } # NOTE: the use of -r ensudes that ls is forced to reorder the files # to produce a true result. This ensures that if the files are the # same age (in which case ls don't bother re-ording) the result is always # false. find: newer () { # is file 1 newer than file 2 [ "`find . -name $1 -newer $2 ! -type d -print`" ] } make: # Rich Salz newer () { # is file 1 newer than file 2 echo "$1 : $2 ; @/bin/false" >/tmp/x$$ make -qf /tmp/x$$; status=$? rm -f /tmp/x$$ exit $status } bash: # Note if first file (typically source) is missing this test is false # if second file (typically destination) is missing test is true if [ "$file1" -nt "$file2" ] cat "$file.in" > "$file.out" fi # More complex 'make-like' example -- See use in "set_ssh_config" # Now if we can get this to use a list of files if [ "file1" -nt "merger" ] || [ "file2" -nt "merger" ] || [ "file3" -nt "merger" ] then :> "merger" [ -f "file1" ] && cat "file1" >> "merger" [ -f "file2" ] && cat "file2" >> "merger" [ -f "file3" ] && cat "file3" >> "merger" chmod 600 "merger" fi perl: if (-M $file1 < -M $file2) ksh: if [[ "$file1" -nt "$file2" ]] multiple find: * # multiple file test # Alex P. Ugolini, Jr. newlist=`find file1 file2 file3 file4 -newer filen -print` multiple ls: This is the best solution. "ls" can list a large collection of files in the right order. You can then look for files newer or older files than a known filename # Tom Christiansen set `ls -td file1 file2` echo $1 is newer NOTE: the find, and ls solutions could result in all the files in a directory being stat'ed, so caution is required if dealing with VERY large directories. ------------------------------------------------------------------------------- Is a directory empty? # Anthony Thyssen -- my own solution - simple and obvious to script # Warning "ls -A" as root on HP-UX is inverted! if [ -z "`ls -A "$dir"`" ]; then : it is empty ; fi # needs external commands - and can test for a specific filename count if [ `ls -A "$dir" | wc -l` -eq 0 ]; # then empty # Using GNU find (prints a dor for each file!) file_count=$(find "$dir" -maxdepth 1 -exec printf %.0s. {} + | wc -m) # a file_count of 1 = empty # Even more direct find "$dir" -maxdepth 0 -empty -exec echo {} is empty. \; # Bourne Shell Built-ins only, but you must be in that directory if [ "`cd "$dir"; echo .* * ?`" = ". .. * ?" ] # then empty # Builtins only (clobbers positional parameters) cd "$dir" set -- .[!.]* ..?* * for f in "$@"; do if test -e "$f" || test -L "$f"; then echo "directory is non-empty" break fi done # BASH - simplest -- shopts handle the special cases # nullglob - allow * to be zero arguments, not itself # dotglob - include 'hidden' files, except '.' and '..' shopt -s nullglob dotglob if ( f=(*); (( ! ${#f[@]} )) ); then echo "The current directory is empty." fi For more see https://antofthy.gitlab.io/info/apps/find.txt Find Empty Directories # Find empty directories (depth first) find . -depth -type d -empty -printf "rmdir %p\n" Bash v4 or higher (but it is not depth first!)... NOTE: this has a very slow startup time due to the ** glob! shopt -s globstar for dir in **/; do files=("$dir"/*) [[ ${files[@]} ]] && continue echo "empty dir: \"$dir\"" done Pipeline Loop method (clean and versitile)... find "$dir" -depth -type d | while read sub; do # case "$sub" in */*) ;; *) continue ;; esac # sub-dir only [ "`cd "$sub"; echo .* * ?`" = ". .. * ?" ] || continue echo rmdir "$sub" #rmdir "$sub" done Only one file in a directory? # list directory name of all files, # sort and list file counts of each directory # Use -d for directories with two or more # or -u for directories with only one file # find . -type 'f' -printf '%h\n' | sort | uniq -c Perl may be the better solution. See "info/perl/file.txt" ------------------------------------------------------------------------------- Read File Permissions (and other attributes) easilly Perl perl -e 'printf "%o\n", (stat(shift))[2];' ~ 40755 NOTE: 40000 = directory See: man 2 stat stat (Linux coreutils) Basically this is a formatable "ls" command stat ~ --printf=%A"\n" drwxr-xr-x stat ~ --printf=%a"\n" 755 ------------------------------------------------------------------------------- File modification times (preferable to the second) EG: the "ls" command fails for files older than 6 months :::::> ls -l oldfile -rwxr-xr-x 1 root 106496 Oct 11 1990 oldfile TAR (to the minute) :::> tar cf - oldfile | tar tvf - rwxr-xr-x 0/10 106496 Oct 11 12:51 1990 oldfile CPIO :::> echo oldfile | cpio -oac | cpio -ictv 209 blocks 100755 root 106496 Oct 11 12:51:48 1990 oldfile 209 blocks PERL *** :::> perl -e 'require "ctime.pl"; print &ctime((stat(shift))[9]),"\n";' \ oldfile Fri Nov 7 6:05:02 2003 NOTE: both cpio and tar methods will read the whole file, even though it isn't necessary. As such these commands can be very slow on large files. The best method is the perl one which only does the stat() system call, and does not open or access the file at all. ------------------------------------------------------------------------------- Compare log files. New log file matches the old log file but may have extra lines at the end of the file. Return true (status = 0) if this is the case. NOTE: This test does not check if the files are reversed! IE: the files match but it is the old one that is longer instead of the new log file. if comm -3 $new $old | cat $old - | cmp -s - $new; then mv $new $old # replace old log with new log elif comm -3 $old $new | cat $new - | cmp -s - $old; then : # ignore new file as it matches but is shorter else echo error # old and new log files do not match at all fi ------------------------------------------------------------------------------- Split pipe into two separate commands (executable tee) some_command | awk '{ print | "'cmd1'" ; print }' | cmd 2 See also the 'tee_pipe' and 'pee' command. Bash can do this a lot easier and more universally... some_command | tee >( cmd1 ) | cmd2 Example usage... # list the start and end of a time ordered directory (oldest and newest) ls -Flat /var/spool/clientmqueue/ |\ tee >( { sleep 0.1; tail -4 >&2; } ) |\ { head -4; echo '...'; }; sleep 0.2 # tar into multiple compression formats tar chof - dir \ | tee >( gzip -9 -c > dir.tar.gz ) \ >( xz -2e -c > dir.tar.xz ) \ | bzip2 -9 -c > dir.tar.bz2 # use and compress DU data (all files, in kilobytes) du -ak | tee >(gzip -9 > /tmp/du.gz) | xdu ------------------------------------------------------------------------------- Read from a pipeline using a filename! Some commands can only read information from a actual filename and not a from standard input. IE: the data can't be read from STDIN On linux your can force a command to read standard input, by reading from the /dev/fd0 device... command | read_from /dev/fd/0 BUT this is a pipeline, so the "read_from" gets run in a sub-shell! But a more generic solution (works on more machines) also exists... Use a named pipe! mknod pipe p command > pipe & read_from pipe I use these techniques to pre-process files for commands which does not accept standard input (or doing so has disadvantages). For Example. * Sun pkgadd will not take standard input so I use a named pipe to allow me to de-compress a gziped pkg package into the pkgadd command without needing to decompress the stored package itself. * SGI xfsrestore must read from a file (or device), if you want to still control it interactively (EG; stdin is still needed!). However that command can't properly read from a remote sun tape drive (rmt command incompatibility). Solution was read the tape though a network pipe manually using `dd' into a named pipe (file could be too big to save to a temporary file). Then get xfsrestore read from that named pipe. mknod /tmp/pipe p ssh -n -x TAPE_HOST dd ibs=10k if=/dev/rmt/0hn > /tmp/pipe & xfsrestore -i -f /tmp/pipe . On Bash named pipes or /dev/fd/? usage is built into the shell. read_from <(command) or for writing to the file write_to >(command) The `pipe' argument is substituted with the appropriate named pipe or /dev/fd? device name for the system being used. ------------------------------------------------------------------------------- Program output Buffered output can be: line by line, as printed, or in large blocks (See also co-processing where this is a major problem) If a program is "expecting" a particular output (like a prompt) from a command before sending the next command to that same program, the output string may never be received as it is buffered forever (as the buffer is never full, or a newline seen!). This results in a Deadlock for interactive program control. The buffering is due to the interaction of different programs and the stdio library. If a program uses low-level writes, the output will be blocked in the system writes, as such partial lines or whole paragraphs can be the case. Network Packets are preformed in this way. The same thing will happen if a program turns off or removes the standard IO libraries buffering mechanism (using setbuf, or setvbuf). However if a program uses the high level stdio library routines for writing (putchar, printf, etc..) the stdio library will buffer the output until it is flushed (written with a low-level write). This will be done when : 1/ The program does a forced flush. This includes the cases of a program: turning off buffering, or opened file in append mode. 2/ If output is to a tty (or pty), when end of line is reached 3/ Otherwise when the buffer is full. This buffering causes many of the problems with deadlocks in a programs interactive (both input and output) control of another program. See "interactive.hints" in this directory. Solution.. * A program such as "pty" (ask archie) can force programs to think it is talking to a tty when it in fact isn't. What this program does is run the command in a psuedo-tty but this you don't have to worry about. In particular the pty package provides a script called (don't ask) "condom" which accepts a command and runs that command as if it was talking to a tty, regardless of the piping arrangements around it. * The "expect" package also launches the command in its own pty for this same reason. For more information on this problem see co-processes.hints https://antofthy.gitlab.io/info/shell/co-processes.hints ------------------------------------------------------------------------------- Split files by content split split up file by size, NOT by content csplit split files by content See https://stackoverflow.com/questions/11313852/ Example... split story by chapters to ch_00 ... ch_?? csplit --prefix=ch_ story "/^Chapter/" '{*}' Default is 2 digits or --digits=N or -n N -s --quiet silent don't output line counts -k do not remove output files on an error -f --prefix= filename prefix -b "%02d" format of number Regular Expresions LINE copy to this line, but don't include (top of next) /RE/[+/-LINE] Copy to this line but don't match it (top of next) %RE%[+/-LINE] Include this line (at end of segment) '{count}' Repeat previous RE count times '{*}' Repeat to end of file Output is 0 before first match. -------------------------------------------------------------------------------