The Bash and Linux survival kit

Dennis van Dok

Monday 2017-11-27

Created: 2017-11-27 ma 11:25

Unix

Getting Linux on your laptop:

http://get.debian.net/
https://www.ubuntu.com/download (Debian based but more polished)
https://getfedora.org/ (Red Hat’s incubator)
https://software.opensuse.org/

Mac OS X

“But I’m on a Mac!” you say? Well:

Mac OS X actually is a Unix already!
You could run a virtual machine (VirtualBox, VMWare) with Linux;
or go hard-core and install Linux anyway.

Windows

“But I’m on Windows!” you say? Well:

It’s an abomination; just install Linux
Run a virtual machine (VMWare or VirtualBox) or
Install CygWin for that ultimate Unix cosplay feeling

Login and stoomboot

You will encounter Linux at Nikhef on the login server and on the stoomboot batch system. Both are based on Red Hat Enterprise Linux (but not actually Red Hat).

These systems are accessible via ssh.

Bash

Interaction with the system is done via command-line shell.

It is the most powerful way of commanding a system—if you can type on a keyboard.

It is used for interactive work as well as scripting (a.k.a. shell programming).

Which shell do I need?

If you have to ask, it’s /bin/bash. It is the default and it is fine.

If you have particularly strong feelings about other shells, The Korn shell (ksh) and Z shell (zsh) are acceptable. If you are a C shell (csh) user, you must repent.

Shell programming languages

Stick to Bash for shell scripts. It is more portable between systems—but more importantly, between people.

Do not try to program in csh.

Purists will stick to Bourne shell (/bin/sh) for maximum portability. See Chapter 11 of the Autoconf manual on portable shell programming before you try.

Tuning

Everybody wants to optimise their environment according to personal preference. The Unix attitude takes this into consideration and allows a million settings to suit everybody’s whim. The collective wisdom accumulated over four decades have delivered some tricks that everybody will want to know.

A primer on rc files

shell	sources
login shell	`${HOME}/.bash_profile`
non-login shell	`${HOME}/.bashrc`

Historically the Unix system distinguished between login shells and non-login shells. You would typically run only a single login shell, but a graphical environment allows opening multiple terminal windows with non-login shells.

Put all of your favourite settings in .bashrc and leave .bash_profile mostly empty except for this:

if [ -f "$HOME/.bashrc" ]; then
    . "$HOME/.bashrc"
fi

The rest should be dealt with by the system-wide /etc/profile.

Can you spot the errors in this .bashrc?

# .bashrc
# Get the aliases and functions
if [ -f /etc/bashrc ]; then
	. /etc/bashrc
source /cvmfs/grid.cern.ch/etc/profile.d/setup-cvmfs-ui.sh
export X509_VOMS_DIR=$HOME/.glite/vomsdir
export VOMS_USERCONF=$HOME/.glite/vomses
export PATH=/data/project/software/bin:$PATH
exec /bin/tcsh

key bindings

Bash uses Emacs style keybindings by default.

but it can be fully customised
which you shouldn’t do; learn to love the defaults
arrow keys, HOME, END, etc. all work too.

History recall

Arrow up/down will cycle through your previous commands. So will Ctrl-P and Ctrl-N.

Recalling a command you typed earlier can be done with Ctrl-R:

Type Ctrl-R
type a few letters from the command; this will start a reverse search through the history
type Ctrl-R again to cycle back through matches

environment

The environment is a collection of variables (uppercase by convention) that are accessible by any program you run.

The PATH variable is a list of directories which are searched when you type a command. If you want to know where a program can be found type:

which <command>

QUIZ

Can you spot the mistake in this .bash_profile?

if [ -f "$HOME/.bashrc" ]; then
    . "$HOME/.bashrc"
fi
# set PATH so it includes user's private bin if it exists
if [ -d "$HOME/bin" ] ; then
    PATH=".:$HOME/bin:/sbin:/usr/sbin:$PATH"
fi

Don’t put . in your PATH. It poses a risk of running commands from random places.

In general it is safer to get used to typing

./command

whenever you want to execute a local script.

It’s subtle, but the ‘.’ at the beginning of the PATH is actually a serious risk. This means that you could unknowingly run a program or script from whichever directory you happen to be in at that time.

In general it is safer to get used to typing

./command

whenever you want to execute a local script.

Completions

Are you afraid you might get tired from all that typing? What if I told you there is a way to reduce that to a minimum?

It’s called command-line completion and it comes standard with your shell. Any partly written command or argument followed by a tap on the <TAB> key will either:

complete the command if it is unique, or
list the possible completions if there are multiple matches

Recall the last argument

This is something I use surprisingly often. E.g.

stat /some/path/to/file
cat /some/path/to/file

Instead of recalling the history for the second line, simply type cat <ESC>. to stick the last argument of the previous line on the end of the current line. Repeated <ESC>. will cycle back through earlier commands.

Complete everything

Moreover, there are piles of completion examples for all popular commands. E.g. type

git <TAB>

and you will be given a list of git subcommands.

History

Keeping a record of commands you ran earlier is quite useful. By default bash will keep track of this but there are a few useful enhancements.

# don't keep more than one copy of a repeated command
HISTCONTROL=ignoredups
# append to the history file, don't overwrite it
shopt -s histappend
# keep plenty of history
HISTSIZE=65000
# useful on systems with shared home directories
HISTFILE=${HOME}/.bash_history-$(hostname)
# keep track of time
HISTTIMEFORMAT='%F %T %Z # '

prompt

The prompt is displayed to indicate that the shell awaits your next order. Did you know you can enhance the prompt, e.g. to indicate the time, host name, and current path? Or even the current git branch name?

PS1='\u@\h:\w \A $(__git_ps1 " (%s)")\$ '

This shows:

a07@lena:/project/newton 11:24 (master)$

aliases

Often used commands can be abbreviated by creating aliases. My advice: don’t overdo it on the aliases. Stick to some of the more usual ones.

alias ls='ls --color=tty'
alias ll='ls -lhF'
alias rm='rm -i'
alias mv='mv -i'

It’s safer to protect potentially dangerous commands with a mandatory interactive flag.

More fanciful shortcuts can easily be implemented with shell scripts.

Interactive-fu

Demo time.

Interactive scripts and keeping notes

Interactive sessions help you work through certain problems in rapid short cycles. But it can be frustrating after a succesful bout of trial-and-error to retrace your steps.

One fix could be the use of typescript. Start it at the beginning of your session, and everything you type will be recorded in a file to peruse later.

A more modern solution: run a jupyter notebook with a bash kernel.

Scripting

The power of bash as a command-line tool is complemented by its power as a programming language. Without learning any more commands you can start writing a shell script by writing the commands to a file.

Write your script myscript.sh like this:

# my first script
echo "This is my first shellscript"

And then run it like

bash ./myscript.sh

Or turn it into an executable like so:

#!/bin/bash
# my first script
echo "This is my first shellscript"

followed by

chmod +x myscript.sh
./myscript.sh

Running or sourcing

A shell script is run in a separate, isolated process that can not affect the current shell environment. In particular, this does not have any effect.

#!/bin/bash
export PROJECT=newton
cd /project/$PROJECT

In these cases, a script needs to be sourced. This will be as if you typed the commands in yourself.

Quiz

John sourced the following script. What went wrong?

#!/bin/bash
export PROJECT=newton
cd /project/$PROJECT
git status
exit

The exit at the end of the script ended John’s shell (not to mention his sense of self-worth).

bashisms

Can you tell what is wrong with the following script?

#!/bin/sh
i=0
while (($i < 10)) ; do
    echo $i
    ((i++))
done

Shell arithmetic is not supported by the POSIX standard.

Write

#!/bin/bash

when you are going to use bashisms. If you don’t know what those are, assume you are.

Flow control

Bash supports typical programming constructs such as

if ... ; then ... ; fi
while ... ; do ... ; done
for i in ... ; do ... ; done
case word in (pattern) commands ;; ... esac

Variables

Local variables (conventionally written lowercase) can be assigned and referred

a=hello
echo $a

No spaces around the =.

Quotations

Two flavours:

echo 'the cat ate my homework'
echo "the cat ate my homework"

The difference?

what="my homework"
echo 'the cat ate $what'
echo "the cat ate $what"

Results:

the cat ate $what
the cat ate my homework

Escaping

Passing arguments that may contain spaces or other funky characters is tricky.

Use "$@" wherever possible
Always quote arguments with variables
Do not use eval.

Seriously. Don’t use eval. Just don’t.

Parsing command-line options

#!/bin/sh
proxyhost=login.nikhef.nl
proxyport=8888
while getopts :h:p: OPT; do
    case $OPT in
    h|+h) proxyhost="$OPTARG" ;;
    p|+p) proxyport="$OPTARG" ;;
    *) echo "usage: `basename $0`"\
	   "[+-h proxyhost] [+-p proxyport} [--] ARGS..."
       exit 2 ;;
    esac
done
shift `expr $OPTIND - 1`
OPTIND=1
ssh -n -N -f -D "$proxyport" "$proxyhost" "$@"

Quiz

Where lurks the danger in the following code?

#!/bin/bash

PROJECT=/project/darwin
PROJECT=./darwin
for i in $PROJECT/* ; do
    find $i -ls > project-$(basename $i).list
done

Without quoting the variable, file names with spaces will spell trouble.

Quizz

Jeff thoroughly tested the following code. Then he changed one line. What went wrong?

#!/bin/bash
# clean up leftover files
# echo 'running in test mode'
echo 'now it's running in production'
path=var/batch/jobs
# it's ok to drop old file
retention="30"
find /$path -type f -mtime +$retention -exec rm {} +

The quote in the echo statement cancelled the quoted string, so the last quote actually started another one not cancelled until the ‘it’s’ in the comment further on.

#!/bin/bash
# clean up leftover files
# echo 'running in test mode'
echo 'now it's running in production'
path=var/batch/jobs
# it's ok to drop old file
retention="30"
find /$path -type f -mtime +$retention -exec rm {} +

This skips the setting of $path and that means that the cleanup script will run from /… Oops.

The following are equivalent

credit=.02
a=John\'s\ \"credit\"\ is\ \$$credit.
b='John'\''s "credit" is $'$credit.
c="John's \"credit\" is \$$credit."
d="John's"' "credit" is $'"$credit."
e="John is poor."

They’re all:

John's "credit" is $.02.

Debugging shell scripts

You will find yourself at times pondering why your shell script went south. Here is what you do next.

Don’t ignore errors

The default behaviour of a shell script is to carry on in the event of failures. It will only bomb out if it encounters a serious syntax error in the script, but no checking is done before it runs. Your script could be crawling with errors but as long as they aren't in the execution path, you’re fine.

Each command has a return value. Non-zero indicates a problem. The return value of the most recent command can be retrieved from the variable $?. Inspect it and act accordingly.

Another approach is to be very strict about errors.

set -e
trap 'fail $LINENO' ERR
fail() {
    echo "error on line $1" >&2
}

This will ensure that the execution stops when a non-zero return value is encountered and that the line number is printed.

Input, output, errors?

Each Unix process has at least three standard data streams: one for input, and two for output

stdin (fd 0)
stdout (fd 1)
stderr (fd 2)

It is useful to keep the normal output stream separate from the error stream.

Redirections

Redirect both output streams to separate files.

run=`date -u +%FT%T`
./analysis.sh > "output.$run" 2> "err.$run"

Use >> to append to a file instead of overwriting.

Debugging statements

Putting echo statements in your scripts may help with debugging. They should not be mixed with the standard output.

echo "now starting the frobnicator" >&2

This means that stdout goes to the same stream as where stderr happens to go to.

More debugging

Use set -x judiciously throughout your code to print traces of all executed statements.

set -x
foo=somevalue
echo $foo
set +x
echo done

Renders:

+ foo=somevalue
+ echo somevalue
somevalue
+ set +x
done

Debugging—Check the environment

Dump the environment and check carefully:

PATH
LD_LIBRARY_PATH
LD_RUNPATH
PYTHONPATH
LANG, LC_*

Beware of funky influences of locale settings on the behaviour of some programs. When paranoia sets in, issue

export LC_ALL=C
export LANG=C

and try again. Also check the output of the locale command.

Keeping it in one file

For completeness sake, here we compound stdout and stderr onto a single file.

./whatever.sh > all_the_output 2>&1

Mind the ordering. First you need to send stdout to a file, then you want to send stderr to the same stream.

Bonus hack

Give stdout and stderr separate colors.

#!/bin/bash
# Try to redirect stdout to one pipe, stderr to another
# test number of arguments
if [ -z "$1" ]; then
    echo "Usage: $0 prg [args]"
    echo "Run a program, and give standard out" \
	 "and standard error different colors."
    exit 2;
fi
exec 3>&1 ; ( exec 4>&1
( exec "$@" 2>&4 ) | ( exec 1>&3
    while read s; do
	echo -e "\e[34m$s\e[0m"
    done ) ) | ( exec 1>&2
while read s; do
    echo -e "\e[31m$s\e[0m"
done )

Common Unix tools

The Unix philosophy is “do one thing and do it well.” There are a couple of programs out there that implement the primitives for basic data manipulation.

In true self-documenting spirit, all tools have manpages. Start with man man and work your way up.

For everything else, there is Google.

Some of the more common tools

Slides below here are just for reference.

Text manipulation

cat	just listed here for the most useless use of cat award
sed	streamline editor with regular expression powers
awk	the duct tape of Unix tools
grep	find strings in files
sort	order lines

cut	select fields from each line
diff	show differences between files
head/tail	tail -f is actually useful
tar	roll directories into tarballs
gzip	compress files or data streams

File system

`ls`	swiss army knife of file listings
`find`	most of the time you want to use locate instead
`touch`	create files out of nowhere, update timestamps
`cp`	copy
`mv`	move or rename
`ln`	link

`rm`	really remove
`rsync`	copy on steroids
`which`	where is my executable?
`stat`	what can we tell about a file
`du`	disk usage

System processes

`ps`	list processes, like `ps aux` or `ps -ef`
`top`	who is eating my cpu and memory?
`kill`	sending signals
`bg/fg`	background/foreground programs
`lsof`	find open files
`vmstat`	memory, buffers and io
`free`	overview of memory

Network

`ip`	swiss army knife of network tools
`ip addr`	show network addresses on this system
`ip route`	show the routing table
`ping`	see if we can reach a machine
`dig`	query DNS
`traceroute`	see which path takes us to a machine
`ssh`	secure shell
`nc`	netcat, less useless than cat

Package management

`apt/dpkg`	Debian’s package manager
`yum/rpm`	Red Hat’s package manager
`pip`	Python package tool

Interesting pipelines

Below are a few examples.

Find and grep

This traverses a directory and finds all files of a certain name and then tries to grep for a certain pattern in these files.

find . -type d \( -path \*/.svn \
    -o -path \*/.git \) -prune -o \
    -type f \( -name \*.txt \) \
    -exec grep --color -i -nH -e searchterm {} +

manipulate a set of predictably numbered files

for i in `seq -f file-%03g.txt 1 100` ; do
    sort -t, -n -k2 $i | cut -d, -f2,4-8 | \
	tail -n 1 > ${i%.*}.ord
done

A set of 100 comma-separated data files is numerically sorted on the second field, cut to only output fields 2, 4, 5, 6, 7, and 8, and then the last lines are saved to an output file.

Disk usage report

du -s * | sort -n

Show which file/directory uses the most disk space.

Most recently changed files

ls -lrt    # sort by timestamp
find . -mmin -10 -ls    # find files changed in the last 10 minutes

Editing files

At some point you will need to edit files: source code, LaTeX files, shell scripts, configuration files…

Modern Linux systems have plenty of editors to choose from.

Emacs

“Emacs is like a laser guided missile. It only has to be slightly mis-configured to ruin your whole day.” —Sean McGrathi

“While any text editor can save your files, only Emacs can save your soul.” —Per Abrahamseni

Legacy

Emacs has a reputation for being slow and bloated, as well as overly complex. In truth, this editor has stood the test of time. There is active development and a ton of packages for every type of file and every type of workflow.

Pros and cons

cons	pros
not generally installed everywhere	can edit files remotely
steep learning curve	built-in documentation
encourages heavy customisation	superbly extensible

VI

To me vi is zen. To use vi is to practice zen. Every command is a koan. Profound to the user, unintelligible to the uninitiated. You discover truth every time you use it. —Satish Reddy

Vi has two modes. The one in which it beeps and the one in which it doesn’t. —Unknown

Legacy

The original text editor of Unix. Nowadays it is actually “VI Improved” or VIM, which is much more powerful. The graphical version is called gvim. It can be personalised and extended.

Pros and cons

cons	pros
editing modes require practice	powerful editing with very few keystrokes
limited extensibility	installed on nearly every system
strictly just an editor	Remote editing at lightning speed

SSH (the secure shell)

The standard way to get around on systems is by way of a secure remote shell, a.k.a. ssh.

SSH will set up an encrypted communication channel between your computer and the remote server; even if the traffic across this channel is intercepted, the actual data will be indecipherable for the interceptor.

ssh public/private key

Instead of having to type your password every time it is possible to set up a public/private key pair. The private part stays with you and you alone; there is a password on it for good measure.

The public part you can spread everywhere.

ssh-keygen
cat ${HOME}/.ssh/id_rsa.pub > authorized_keys
scp authorized_keys login:.ssh/authorized_keys

Mind the permissions of authorized_keys and the .ssh directory

-rwxr-xr-x .ssh
-rw-r--r-- authorized_keys

Agent forwarding

Logging in through a chain of servers is easier with an ssh agent. Normally an agent is already started for you.

ssh-add -l    # list keys in the agent
ssh -A login    # login with agent forwarding

The forwarding means that the agent can be reached through a backchannel.

Fancy tricks

There are some neat tricks that ssh can do to make life a little easier.

port forwarding
acting as a SOCKS proxy
connect to a machine through a proxy
share connections to the same machine
use multiple identities for different machines/domains

automatic socks proxy

This goes to .ssh/config

Host login
    Hostname login.nikhef.nl
    DynamicForward 2020

proxy to non-public network

Host *.darknet
    user neo
    CheckHostIP no
    ProxyCommand ssh -q -A jumphost /usr/bin/nc %h %p 2>/dev/null
    IdentityFile ~/.ssh/id_rsa.darknet
    IdentitiesOnly yes

Shared connections

Host *.nikhef.nl
    ControlMaster auto
    ControlPath /tmp/%h-%p-%r.shared

Only the first connection needs to authenticate.

Global settings

Host *
    ForwardAgent yes
    ForwardX11 no
    ForwardX11Trusted no
    User a07
    HashKnownHosts yes

sshfs

Copying files by SSH can be done with scp, but there is a really convenient way under Linux using the FUSE file system driver.

The sshfs command mounts a remote server directory based on your ssh authentication. It appears just like an ordinary directory.

sshfs login: /tmp/login
ll /tmp/login/
fusermount -u /tmp/login

Screen/tmux

Sometimes you remote session should last longer than your workday. Or your laptop’s battery.

The screen utility allocates a pseudo terminal attached to a background process independent of your session. You can run multiple shells in a screen and manoeuvre around with the Ctrl-A prefix. Type Ctrl-A ? for a help screen.

The tmux utility is a remake of screen, with modernised session handling, scripting, and ease of use. It is still less ubiquitous than screen so you may not have the option to run it unless you bring your own.

Git

What? Are we going to talk about git now? How did this get on the agenda?

There is not that much we need to discuss, really. Bear with me.

Gitflow, or not?

It speaks to the flexibility of git that it lends itself to many different styles of collaborative version management. However, consensus seems to drift to some generally good ideas.

As usual, this is the battleground of a new holy war.

In one corner, we find gitflow. On the opposite side, a kind of anti-gitflow dubbed OneFlow.

If you are part of a team that already has a working model, the burden of choice is not upon you.

Gitflow

This workflow treats every type of new development as a feature. These are merged into the main development branch, from which releases are staged. The master branch only ever receives finalized releases.

Anti-Gitflow

A serious critique of gitflow was written up by Adam Ruka, and he countered with his own workflow. This flow is based on the idea that all of the project’s progress goes into a single branch; feature branches will have to be rebased on master before they are pushed upstream. This maintains a cleaner view of the projects history.

To Rebase or not to rebase

It all comes down to which school of merging you wish to follow. One school follows the principle that merges should be proper merges as this renders a more faithful representation of the development history. The other school adheres to the idea that rebasing produces a cleaner, if somewhat synthetic, outline of the project’s past.

Gitlab

Since a couple of months, there is a Gitlab server at Nikhef that is available for all local users. It is thus far considered an experimental service, so there are no guarantees, but everybody is welcome to try it.

A local gitlab server has the advantage that software that can not be made public on github can still be managed privately in a user friendly web environment.

Security

Security considerations are usually not at the top of everyone’s priority list. By following the guidelines here most bases are covered.

Rule 1

Ask the experts.

Rule 2—passwords

Treat passwords with extreme care.

Passwords are considered ‘something only you know’, but as soon as you write them down somewhere, on a piece of paper or in a file, you could inadvertently share this with others.

Never put passwords in a script. There is always a better way. Be aware that passwords typed on the command line will appear in your history file.

Rules 3 through $\infty$

protect your security tokens (ssh private key)
strong passwords
different passwords everywhere
do not log in from a public computer
encrypt your phone
encrypt your laptop
encrypt your grandmother
program with a deep mistrust of human beings
DO NOT USE EVAL

Temporary files and directories

Established practice for safely creating temporary files is by using mktemp.

tmpfile=`mktemp`
tmpdir=`mktemp -d`

This takes care of creating a new file with a randomised name that is guaranteed to be owned by the user.

Using passwords in scripts

Sometimes scripts need to use a password to authenticate or unlock. The script can read the password from stdin and keep it in a local variable for the time that it is needed.

stty -echo
echo "enter password:"
read passwd
stty echo
mkproxy --passin - <<<$passwd
unset passwd

Be aware that putting passwords on the command-line means that it will show up in the process list.

Finally

Learn just enough Linux to get things done: http://alexpetralia.com/posts/2017/6/26/learning-linux-bash-to-get-things-done
Learning git branching: https://learngitbranching.js.org/
Advanced Bash-Scripting Guide: http://tldp.org/LDP/abs/html/

#Linux on Freenode.net IRC: https://freenode.linux.community/how-to-connect/
Gitlab server at Nikhef: https://sikkel.nikhef.nl/
Let me Google that for you: http://bfy.tw/FDe5
Emacs Org mode: http://orgmode.org/
Reveal.js: https://revealjs.com/