Created: 2017-11-27 ma 11:25
Getting Linux on your laptop:
“But I’m on a Mac!” you say? Well:
“But I’m on Windows!” you say? Well:
You will encounter Linux at Nikhef on the login server and on the stoomboot batch system. Both are based on Red Hat Enterprise Linux (but not actually Red Hat).
These systems are accessible via ssh.
Interaction with the system is done via command-line shell.
It is the most powerful way of commanding a system—if you can type on a keyboard.
It is used for interactive work as well as scripting (a.k.a. shell programming).
If you have to ask, it’s /bin/bash
. It is the default and it is
fine.
If you have particularly strong feelings about other shells, The
Korn shell (ksh
) and Z shell (zsh
) are acceptable. If you are
a C shell (csh
) user, you must repent.
Stick to Bash for shell scripts. It is more portable between systems—but more importantly, between people.
Do not try to program in csh.
Purists will stick to Bourne shell (/bin/sh
) for maximum
portability. See Chapter 11 of the Autoconf manual on portable
shell programming before you try.
Everybody wants to optimise their environment according to personal preference. The Unix attitude takes this into consideration and allows a million settings to suit everybody’s whim. The collective wisdom accumulated over four decades have delivered some tricks that everybody will want to know.
shell | sources |
---|---|
login shell | ${HOME}/.bash_profile |
non-login shell | ${HOME}/.bashrc |
Historically the Unix system distinguished between login shells and non-login shells. You would typically run only a single login shell, but a graphical environment allows opening multiple terminal windows with non-login shells.
Put all of your favourite settings in .bashrc
and leave
.bash_profile
mostly empty except for this:
if [ -f "$HOME/.bashrc" ]; then
. "$HOME/.bashrc"
fi
The rest should be dealt with by the system-wide /etc/profile
.
# .bashrc
# Get the aliases and functions
if [ -f /etc/bashrc ]; then
. /etc/bashrc
source /cvmfs/grid.cern.ch/etc/profile.d/setup-cvmfs-ui.sh
export X509_VOMS_DIR=$HOME/.glite/vomsdir
export VOMS_USERCONF=$HOME/.glite/vomses
export PATH=/data/project/software/bin:$PATH
exec /bin/tcsh
Bash uses Emacs style keybindings by default.
Arrow up/down will cycle through your previous commands. So will Ctrl-P and Ctrl-N.
Recalling a command you typed earlier can be done with Ctrl-R:
The environment is a collection of variables (uppercase by convention) that are accessible by any program you run.
The PATH
variable is a list of directories which are searched
when you type a command. If you want to know where a program can
be found type:
which <command>
Can you spot the mistake in this .bash_profile
?
if [ -f "$HOME/.bashrc" ]; then
. "$HOME/.bashrc"
fi
# set PATH so it includes user's private bin if it exists
if [ -d "$HOME/bin" ] ; then
PATH=".:$HOME/bin:/sbin:/usr/sbin:$PATH"
fi
Don’t put .
in your PATH
. It poses a risk of running
commands from random places.
In general it is safer to get used to typing
./command
whenever you want to execute a local script.
Are you afraid you might get tired from all that typing? What if I told you there is a way to reduce that to a minimum?
It’s called command-line completion and it comes standard
with your shell. Any partly written command or argument followed
by a tap on the <TAB>
key will either:
This is something I use surprisingly often. E.g.
stat /some/path/to/file
cat /some/path/to/file
Instead of recalling the history for the second line, simply type
cat <ESC>.
to stick the last argument of the previous line on the
end of the current line. Repeated <ESC>.
will cycle back through
earlier commands.
Moreover, there are piles of completion examples for all popular commands. E.g. type
git <TAB>
and you will be given a list of git subcommands.
Keeping a record of commands you ran earlier is quite useful. By default bash will keep track of this but there are a few useful enhancements.
# don't keep more than one copy of a repeated command
HISTCONTROL=ignoredups
# append to the history file, don't overwrite it
shopt -s histappend
# keep plenty of history
HISTSIZE=65000
# useful on systems with shared home directories
HISTFILE=${HOME}/.bash_history-$(hostname)
# keep track of time
HISTTIMEFORMAT='%F %T %Z # '
The prompt is displayed to indicate that the shell awaits your next order. Did you know you can enhance the prompt, e.g. to indicate the time, host name, and current path? Or even the current git branch name?
PS1='\u@\h:\w \A $(__git_ps1 " (%s)")\$ '
This shows:
a07@lena:/project/newton 11:24 (master)$
Often used commands can be abbreviated by creating aliases. My advice: don’t overdo it on the aliases. Stick to some of the more usual ones.
alias ls='ls --color=tty'
alias ll='ls -lhF'
alias rm='rm -i'
alias mv='mv -i'
It’s safer to protect potentially dangerous commands with a mandatory interactive flag.
More fanciful shortcuts can easily be implemented with shell scripts.
Demo time.
Interactive sessions help you work through certain problems in rapid short cycles. But it can be frustrating after a succesful bout of trial-and-error to retrace your steps.
One fix could be the use of typescript
. Start it at
the beginning of your session, and everything you type
will be recorded in a file to peruse later.
A more modern solution: run a jupyter notebook with a bash kernel.
The power of bash as a command-line tool is complemented by its power as a programming language. Without learning any more commands you can start writing a shell script by writing the commands to a file.
Write your script myscript.sh
like this:
# my first script
echo "This is my first shellscript"
And then run it like
bash ./myscript.sh
Or turn it into an executable like so:
#!/bin/bash
# my first script
echo "This is my first shellscript"
followed by
chmod +x myscript.sh
./myscript.sh
A shell script is run in a separate, isolated process that can not affect the current shell environment. In particular, this does not have any effect.
#!/bin/bash
export PROJECT=newton
cd /project/$PROJECT
In these cases, a script needs to be sourced. This will be as if you typed the commands in yourself.
John sourced the following script. What went wrong?
#!/bin/bash
export PROJECT=newton
cd /project/$PROJECT
git status
exit
The exit at the end of the script ended John’s shell (not to mention his sense of self-worth).
Can you tell what is wrong with the following script?
#!/bin/sh
i=0
while (($i < 10)) ; do
echo $i
((i++))
done
Shell arithmetic is not supported by the POSIX standard.
Write
#!/bin/bash
when you are going to use bashisms. If you don’t know what those are, assume you are.
Bash supports typical programming constructs such as
if ... ; then ... ; fi while ... ; do ... ; done for i in ... ; do ... ; done case word in (pattern) commands ;; ... esac
Local variables (conventionally written lowercase) can be assigned and referred
a=hello
echo $a
No spaces around the =
.
Two flavours:
echo 'the cat ate my homework'
echo "the cat ate my homework"
The difference?
what="my homework"
echo 'the cat ate $what'
echo "the cat ate $what"
Results:
the cat ate $what the cat ate my homework
Passing arguments that may contain spaces or other funky characters is tricky.
"$@"
wherever possibleeval
.
Seriously. Don’t use eval
. Just don’t.
#!/bin/sh
proxyhost=login.nikhef.nl
proxyport=8888
while getopts :h:p: OPT; do
case $OPT in
h|+h) proxyhost="$OPTARG" ;;
p|+p) proxyport="$OPTARG" ;;
*) echo "usage: `basename $0`"\
"[+-h proxyhost] [+-p proxyport} [--] ARGS..."
exit 2 ;;
esac
done
shift `expr $OPTIND - 1`
OPTIND=1
ssh -n -N -f -D "$proxyport" "$proxyhost" "$@"
Where lurks the danger in the following code?
#!/bin/bash
PROJECT=/project/darwin
PROJECT=./darwin
for i in $PROJECT/* ; do
find $i -ls > project-$(basename $i).list
done
Without quoting the variable, file names with spaces will spell trouble.
Jeff thoroughly tested the following code. Then he changed one line. What went wrong?
#!/bin/bash # clean up leftover files # echo 'running in test mode' echo 'now it's running in production' path=var/batch/jobs # it's ok to drop old file retention="30" find /$path -type f -mtime +$retention -exec rm {} +
The quote in the echo statement cancelled the quoted string, so the last quote actually started another one not cancelled until the ‘it’s’ in the comment further on.
#!/bin/bash
# clean up leftover files
# echo 'running in test mode'
echo 'now it's running in production'
path=var/batch/jobs
# it's ok to drop old file
retention="30"
find /$path -type f -mtime +$retention -exec rm {} +
This skips the setting of $path
and that means that the
cleanup script will run from /
… Oops.
credit=.02 a=John\'s\ \"credit\"\ is\ \$$credit. b='John'\''s "credit" is $'$credit. c="John's \"credit\" is \$$credit." d="John's"' "credit" is $'"$credit." e="John is poor."
They’re all:
John's "credit" is $.02.
You will find yourself at times pondering why your shell script went south. Here is what you do next.
The default behaviour of a shell script is to carry on in the event of failures. It will only bomb out if it encounters a serious syntax error in the script, but no checking is done before it runs. Your script could be crawling with errors but as long as they aren't in the execution path, you’re fine.
Each command has a return value. Non-zero indicates a problem.
The return value of the most recent command can be retrieved from
the variable $?
. Inspect it and act accordingly.
Another approach is to be very strict about errors.
set -e
trap 'fail $LINENO' ERR
fail() {
echo "error on line $1" >&2
}
This will ensure that the execution stops when a non-zero return value is encountered and that the line number is printed.
Each Unix process has at least three standard data streams: one for input, and two for output
It is useful to keep the normal output stream separate from the error stream.
Redirect both output streams to separate files.
run=`date -u +%FT%T`
./analysis.sh > "output.$run" 2> "err.$run"
Use >>
to append to a file instead of overwriting.
Putting echo
statements in your scripts may help with debugging.
They should not be mixed with the standard output.
echo "now starting the frobnicator" >&2
This means that stdout goes to the same stream as where stderr happens to go to.
Use set -x
judiciously throughout your code to print traces
of all executed statements.
set -x
foo=somevalue
echo $foo
set +x
echo done
Renders:
+ foo=somevalue + echo somevalue somevalue + set +x done
Dump the environment and check carefully:
PATH
LD_LIBRARY_PATH
LD_RUNPATH
PYTHONPATH
LANG
, LC_*
Beware of funky influences of locale settings on the behaviour of some programs. When paranoia sets in, issue
export LC_ALL=C export LANG=C
and try again. Also check the output of the locale
command.
For completeness sake, here we compound stdout and stderr onto a single file.
./whatever.sh > all_the_output 2>&1
Mind the ordering. First you need to send stdout to a file, then you want to send stderr to the same stream.
Give stdout and stderr separate colors.
#!/bin/bash
# Try to redirect stdout to one pipe, stderr to another
# test number of arguments
if [ -z "$1" ]; then
echo "Usage: $0 prg [args]"
echo "Run a program, and give standard out" \
"and standard error different colors."
exit 2;
fi
exec 3>&1 ; ( exec 4>&1
( exec "$@" 2>&4 ) | ( exec 1>&3
while read s; do
echo -e "\e[34m$s\e[0m"
done ) ) | ( exec 1>&2
while read s; do
echo -e "\e[31m$s\e[0m"
done )
The Unix philosophy is “do one thing and do it well.” There are a couple of programs out there that implement the primitives for basic data manipulation.
In true self-documenting spirit, all tools have manpages. Start
with man man
and work your way up.
For everything else, there is Google.
Slides below here are just for reference.
cat | just listed here for the most useless use of cat award |
sed | streamline editor with regular expression powers |
awk | the duct tape of Unix tools |
grep | find strings in files |
sort | order lines |
cut | select fields from each line |
diff | show differences between files |
head/tail | tail -f is actually useful |
tar | roll directories into tarballs |
gzip | compress files or data streams |
ls |
swiss army knife of file listings |
find |
most of the time you want to use locate instead |
touch |
create files out of nowhere, update timestamps |
cp |
copy |
mv |
move or rename |
ln |
link |
rm |
really remove |
rsync |
copy on steroids |
which |
where is my executable? |
stat |
what can we tell about a file |
du |
disk usage |
ps |
list processes, like ps aux or ps -ef |
top |
who is eating my cpu and memory? |
kill |
sending signals |
bg/fg |
background/foreground programs |
lsof |
find open files |
vmstat |
memory, buffers and io |
free |
overview of memory |
ip |
swiss army knife of network tools |
ip addr |
show network addresses on this system |
ip route |
show the routing table |
ping |
see if we can reach a machine |
dig |
query DNS |
traceroute |
see which path takes us to a machine |
ssh |
secure shell |
nc |
netcat, less useless than cat |
apt/dpkg |
Debian’s package manager |
yum/rpm |
Red Hat’s package manager |
pip |
Python package tool |
Below are a few examples.
This traverses a directory and finds all files of a certain name and then tries to grep for a certain pattern in these files.
find . -type d \( -path \*/.svn \
-o -path \*/.git \) -prune -o \
-type f \( -name \*.txt \) \
-exec grep --color -i -nH -e searchterm {} +
for i in `seq -f file-%03g.txt 1 100` ; do
sort -t, -n -k2 $i | cut -d, -f2,4-8 | \
tail -n 1 > ${i%.*}.ord
done
A set of 100 comma-separated data files is numerically sorted on the second field, cut to only output fields 2, 4, 5, 6, 7, and 8, and then the last lines are saved to an output file.
du -s * | sort -n
Show which file/directory uses the most disk space.
ls -lrt # sort by timestamp
find . -mmin -10 -ls # find files changed in the last 10 minutes
At some point you will need to edit files: source code, LaTeX files, shell scripts, configuration files…
Modern Linux systems have plenty of editors to choose from.
“Emacs is like a laser guided missile. It only has to be slightly mis-configured to ruin your whole day.” —Sean McGrathi
“While any text editor can save your files, only Emacs can save your soul.” —Per Abrahamseni
Emacs has a reputation for being slow and bloated, as well as overly complex. In truth, this editor has stood the test of time. There is active development and a ton of packages for every type of file and every type of workflow.
cons | pros |
---|---|
not generally installed everywhere | can edit files remotely |
steep learning curve | built-in documentation |
encourages heavy customisation | superbly extensible |
To me vi is zen. To use vi is to practice zen. Every command is a koan. Profound to the user, unintelligible to the uninitiated. You discover truth every time you use it. —Satish Reddy
Vi has two modes. The one in which it beeps and the one in which it doesn’t. —Unknown
The original text editor of Unix. Nowadays it is actually “VI Improved” or VIM, which is much more powerful. The graphical version is called gvim. It can be personalised and extended.
cons | pros |
---|---|
editing modes require practice | powerful editing with very few keystrokes |
limited extensibility | installed on nearly every system |
strictly just an editor | Remote editing at lightning speed |
The standard way to get around on systems is by way of a secure remote shell, a.k.a. ssh.
SSH will set up an encrypted communication channel between your computer and the remote server; even if the traffic across this channel is intercepted, the actual data will be indecipherable for the interceptor.
Instead of having to type your password every time it is possible to set up a public/private key pair. The private part stays with you and you alone; there is a password on it for good measure.
The public part you can spread everywhere.
ssh-keygen
cat ${HOME}/.ssh/id_rsa.pub > authorized_keys
scp authorized_keys login:.ssh/authorized_keys
Mind the permissions of authorized_keys
and the .ssh
directory
-rwxr-xr-x .ssh -rw-r--r-- authorized_keys
Logging in through a chain of servers is easier with an ssh agent. Normally an agent is already started for you.
ssh-add -l # list keys in the agent
ssh -A login # login with agent forwarding
The forwarding means that the agent can be reached through a backchannel.
There are some neat tricks that ssh can do to make life a little easier.
This goes to .ssh/config
Host login
Hostname login.nikhef.nl
DynamicForward 2020
Host *.darknet
user neo
CheckHostIP no
ProxyCommand ssh -q -A jumphost /usr/bin/nc %h %p 2>/dev/null
IdentityFile ~/.ssh/id_rsa.darknet
IdentitiesOnly yes
Host *.nikhef.nl
ControlMaster auto
ControlPath /tmp/%h-%p-%r.shared
Only the first connection needs to authenticate.
Host *
ForwardAgent yes
ForwardX11 no
ForwardX11Trusted no
User a07
HashKnownHosts yes
Copying files by SSH can be done with scp
, but there
is a really convenient way under Linux using the FUSE
file system driver.
The sshfs command mounts a remote server directory based on your ssh authentication. It appears just like an ordinary directory.
sshfs login: /tmp/login
ll /tmp/login/
fusermount -u /tmp/login
Sometimes you remote session should last longer than your workday. Or your laptop’s battery.
The screen
utility allocates a pseudo terminal attached to
a background process independent of your session. You can run
multiple shells in a screen and manoeuvre around with the Ctrl-A
prefix. Type Ctrl-A ?
for a help screen.
The tmux utility is a remake of screen, with modernised session handling, scripting, and ease of use. It is still less ubiquitous than screen so you may not have the option to run it unless you bring your own.
What? Are we going to talk about git now? How did this get on the agenda?
There is not that much we need to discuss, really. Bear with me.
It speaks to the flexibility of git that it lends itself to many different styles of collaborative version management. However, consensus seems to drift to some generally good ideas.
As usual, this is the battleground of a new holy war.
In one corner, we find gitflow. On the opposite side, a kind of anti-gitflow dubbed OneFlow.
If you are part of a team that already has a working model, the burden of choice is not upon you.
This workflow treats every type of new development as a feature. These are merged into the main development branch, from which releases are staged. The master branch only ever receives finalized releases.
A serious critique of gitflow was written up by Adam Ruka, and he countered with his own workflow. This flow is based on the idea that all of the project’s progress goes into a single branch; feature branches will have to be rebased on master before they are pushed upstream. This maintains a cleaner view of the projects history.
It all comes down to which school of merging you wish to follow. One school follows the principle that merges should be proper merges as this renders a more faithful representation of the development history. The other school adheres to the idea that rebasing produces a cleaner, if somewhat synthetic, outline of the project’s past.
Since a couple of months, there is a Gitlab server at Nikhef that is available for all local users. It is thus far considered an experimental service, so there are no guarantees, but everybody is welcome to try it.
A local gitlab server has the advantage that software that can not be made public on github can still be managed privately in a user friendly web environment.
Security considerations are usually not at the top of everyone’s priority list. By following the guidelines here most bases are covered.
Ask the experts.
Treat passwords with extreme care.
Passwords are considered ‘something only you know’, but as soon as you write them down somewhere, on a piece of paper or in a file, you could inadvertently share this with others.
Never put passwords in a script. There is always a better way. Be aware that passwords typed on the command line will appear in your history file.
Established practice for safely creating temporary files is by
using mktemp
.
tmpfile=`mktemp`
tmpdir=`mktemp -d`
This takes care of creating a new file with a randomised name that is guaranteed to be owned by the user.
Sometimes scripts need to use a password to authenticate or
unlock. The script can read the password from stdin
and keep
it in a local variable for the time that it is needed.
stty -echo echo "enter password:" read passwd stty echo mkproxy --passin - <<<$passwd unset passwd
Be aware that putting passwords on the command-line means that it will show up in the process list.