––help

Dennis van Dok

Nikhef Computing Course, Monday 2019-11-25

Unix for physicists

Actually, the whole experiment runs on my laptop, but people like a bit of show

The Philosophy of Unix

What? Unix has a philosophy?

Developing skills

Should you learn a new skill?

\(T\) time normally spent on related tasks
\(I\) investment
\(R\) rate of productivity increase

Learning a skill is worthwhile if \[ T ≥ I + \frac{T}{R} \]

Should you learn touch typing?

keyboard.en.jpg

  • \(T \approx 1000\text{h}\)
  • \(I \approx 10\text{h}\)
  • \(R \approx 2\)

(Yes, absolutely)

So How Will I Know What To Learn?

\(T\), \(I\), and \(R\) can only be learned from experience.

Unix

Getting Linux on your laptop:

Apple hardware

  • OS X = Unix
  • VirtualBox/VMWare
  • hard-core install Linux anyway

Microsoft Windows

Don't. But if you must:

  • Dual Boot
  • VMWare/VirtualBox
  • CygWin

Programming languages

programming-language-wordcloud.png

Scripting languages

  • No compilation required
  • Easy prototyping
  • Can be used interactively
  • Ideal to build workflows

Examples:

  • Bash
  • Python
  • Perl

Compiled languages

  • Translate down to the CPU instruction level
  • High performance
  • Various degrees of abstraction away from the underlying architecture

Examples:

  • C++
  • Fortran
  • Go

Most likely combo

Bash/C++

(Special recommendation: Python/Jupyter)

A note on C++

There is a decade of architectural development between current CPUs (AMD EPYC 7551P) and what we still had last year (Intel Xeon 5400). The clock speed, however, is still the same.

Principally, your C++ program will compile to both. Technically, to make use of all the advancements in processor design it takes a lot of insider knowledge of both the CPU and compiler optimization.

SSH

  • secure remote shell
  • Passwordless
  • versatile

Settings

.ssh/config

Host *.nikhef.nl
    ControlMaster auto
    ControlPath /tmp/%h-%p-%r.shared
Host *
    ForwardAgent yes
    User yournamehere
    HashKnownHosts yes

ssh public/private key

ssh-keygen
cat ${HOME}/.ssh/id_rsa.pub > authorized_keys
scp authorized_keys login:.ssh/authorized_keys

Permissions:

drwxr-xr-x .ssh/
-rw-r--r-- .ssh/authorized_keys
-r--r--r-- .ssh/id_rsa.pub
-r-------- .ssh/id_rsa

Agent forwarding

ssh-add -l      # list keys in the agent
ssh -A login    # login with agent forwarding

proxy from outside Nikhef

Host stbci5.proxy
    Hostname stbc-i5.nikhef.nl
    user yournamehere
    CheckHostIP no
    ProxyCommand ssh -q -A login.nikhef.nl /usr/bin/nc %h %p 2>/dev/null

sshfs

Fuse mount your remote home directory locally:

sshfs login.nikhef.nl: /tmp/login
ll /tmp/login/
fusermount -u /tmp/login

Command line shell

  • tell the computer what to do, one line at a time
  • most powerful way of direct interaction
  • also used for scripting and fast prototyping
  • ideal for taking notes as you go

Which shell do I need?

select your default shell at https://sso.nikhef.nl/chsh.

/bin/bash YES
/bin/zsh YES
/bin/csh NO!

Tuning

  • everything can be tuned
  • but you must resist
  • use only the common enhancement

Startup files

login shell .bash_profile
non-login shell .bashrc

This distinction is outmoded.

.bash_profile

if [ -f "$HOME/.bashrc" ]; then
    . "$HOME/.bashrc"
fi

PATH

.bashrc

if [ -d "$HOME/bin" ] ; then
   PATH="$HOME/bin:$PATH"
fi

Completions

  • pressing TAB will auto-complete your command line
  • works better with the package bash-completions installed

History

.bashrc

# don't keep more than one copy of a repeated command
HISTCONTROL=ignoredups
# append to the history file, don't overwrite it
shopt -s histappend
# keep plenty of history
HISTSIZE=65000
# useful on systems with shared home directories
HISTFILE=${HOME}/.bash_history-$(hostname)
# keep track of time
HISTTIMEFORMAT='%F %T %Z # '

History recall

  • Arrow up/down cycles through previous commands.
  • Ctrl-R reverse search in history

Recall the last argument

Seeing is believing.

stat /some/path/to/file
# now I want to run cat on the same file
cat <.>
cat /some/path/to/file

prompt

.bashrc

PS1='\u@\h:\w \A $(__git_ps1 " (%s)")\$ '

This shows:

a07@lena:/project/newton 11:24 (master)$

aliases

alias ls='ls --color=tty'
alias ll='ls -lhF'
alias rm='rm -i'
alias mv='mv -i'

Keeping notes

  • use script to capture an entire session
  • run a jupyter notebook with a bash kernel
  • emacs org-mode babel extension

Scripting

Write myscript.sh:

# my first script
echo "This is my first shellscript"

And then run it like

bash ./myscript.sh

Turn it into an executable like so:

#!/bin/bash
# my first script
echo "This is my first shellscript"

followed by

chmod +x myscript.sh
./myscript.sh

Escaping

Make a habit out of always quoting variables like so:

"${var}"    

and you will never go wrong.

Eval is evil

Do not use eval ever.

Parsing command-line options

#!/bin/sh
proxyhost=login.nikhef.nl
proxyport=8888
while getopts :h:p: OPT; do
    case $OPT in
    h|+h) proxyhost="$OPTARG" ;;
    p|+p) proxyport="$OPTARG" ;;
    *) echo "usage: `basename $0`"\
	   "[+-h proxyhost] [+-p proxyport} [--] ARGS..."
       exit 2 ;;
    esac
done
shift `expr $OPTIND - 1`
OPTIND=1
ssh -n -N -f -D "$proxyport" "$proxyhost" "$@"

Dangers of quotes

Jeff thoroughly tested the following code. Then he changed one line. What went wrong?

#!/bin/bash
# clean up leftover files
# echo 'running in test mode'
echo 'now it's running in production'
path=var/batch/jobs
# it's ok to drop old file
retention="30"
find /$path -type f -mtime +$retention -exec rm {} +
#!/bin/bash
# clean up leftover files
# echo 'running in test mode'
echo 'now it's running in production'
path=var/batch/jobs
# it's ok to drop old file
retention="30"
find /$path -type f -mtime +$retention -exec rm {} +
#!/bin/bash
# clean up leftover files
# echo 'running in test mode'
echo 'now it's running in production'
path=var/batch/jobs
# it's ok to drop old file
retention="30"
find /$path -type f -mtime +$retention -exec rm {} +

Debugging shell scripts

You will find yourself at times pondering why your shell script went south. Here is what you do next.

Don’t ignore errors

echo $?

Fail early and gracefully

set -e
trap 'fail $LINENO' ERR
fail() {
    echo "error on line $1" >&2
}

Input, output, errors?

input stdin 0
output stdout 1
output stderr 2

Redirections

Redirect both output streams to separate files.

run=`date -u +%FT%T`
./analysis.sh > "output.$run" 2> "err.$run"

Debugging statements

echo "now starting the frobnicator" >&2

Traces

set -x
foo=somevalue
echo $foo
set +x
echo done

Renders:

+ foo=somevalue
+ echo somevalue
somevalue
+ set +x
done

Debugging—Check the environment

Dump the environment and check carefully:

  • PATH
  • LD_LIBRARY_PATH
  • LD_RUNPATH
  • PYTHONPATH
  • LANG, LC_*

Keeping it in one file

For completeness sake, here we compound stdout and stderr onto a single file.

./whatever.sh > all_the_output 2>&1 

Mind the ordering. First you need to send stdout to a file, then you want to send stderr to the same stream.

Common Unix tools

“do one thing and do it well.”

Regular expressions

Find e-mail addresses:

grep -E -o "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b" 

regular_expressions.png

Some of the more common tools

Text manipulation

cat just listed here for the most useless use of cat award
sed streamline editor with regular expression powers
awk the duct tape of Unix tools
grep find strings in files
sort order lines
cut select fields from each line
diff show differences between files
head/tail tail -f is actually useful
tar roll directories into tarballs
gzip compress files or data streams

File system

ls swiss army knife of file listings
find most of the time you want to use locate instead
touch create files out of nowhere, update timestamps
cp copy
mv move or rename
ln link
rm really remove
rsync copy on steroids
which where is my executable?
stat what can we tell about a file
du disk usage

System processes

ps list processes, like ps aux or ps -ef
top who is eating my cpu and memory?
kill sending signals
bg/fg background/foreground programs
lsof find open files
vmstat memory, buffers and io
free overview of memory

Network

ip swiss army knife of network tools
ip addr show network addresses on this system
ip route show the routing table
ping see if we can reach a machine
dig query DNS
traceroute see which path takes us to a machine
ssh secure shell
nc netcat, less useless than cat

Package management

apt/dpkg Debian’s package manager
yum/rpm Red Hat’s package manager
pip Python package tool

Pipelines

Traditional Unix tools are designed to work with stream processing in mind. With ‘pipes’, the tools can be linked together like perls on a string.

Below are a few examples.

Job manipulation on stoomboot batch system

Find running jobs owned by user id and delete them (you can only delete your own jobs, of course).

qdel `qselect -u dennisvd -s "R" `    

Find and grep

This traverses a directory and finds all files of a certain name and then tries to grep for a certain pattern in these files.

find . -type d \( -path \*/.svn \
    -o -path \*/.git \) -prune -o \
    -type f \( -name \*.txt \) \
    -exec grep --color -i -nH -e searchterm {} +

manipulate a set of predictably numbered files

for i in `seq -f file-%03g.txt 1 100` ; do
    sort -t, -n -k2 $i | cut -d, -f2,4-8 | \
	tail -n 1 > ${i%.*}.ord
done

A set of 100 comma-separated data files is numerically sorted on the second field, cut to only output fields 2, 4, 5, 6, 7, and 8, and then the last lines are saved to an output file.

Disk usage report

du -s * | sort -n

Show which file/directory uses the most disk space.

Most recently changed files

ls -lrt    # sort by timestamp
find . -mmin -10 -ls    # find files changed in the last 10 minutes

Editing files

At some point you will need to edit files: source code, LaTeX files, shell scripts, configuration files…

Modern Linux systems have plenty of editors to choose from.

emacs-vi.png

Emacs

  • The thermonuclear word processor
  • Everything and the kitchen sink
  • Now with org-mode
  • \(T \approx 1000\)
  • \(I \approx \infty\)
  • \(R \approx 100\)

VIM

Originally vi, its pedigree going back to the original editor called ed.

  • \(T \approx 1000\)
  • \(I \approx 10\)
  • \(R \approx 3\)

Screen/tmux

Sometimes you remote session should last longer than your workday. Or your laptop’s battery.

The screen utility allocates a pseudo terminal attached to a background process independent of your session. You can run multiple shells in a screen and manoeuvre around with the Ctrl-A prefix. Type Ctrl-A ? for a help screen.

The tmux utility is a remake of screen, with modernised session handling, scripting, split screen, and ease of use. It is still less ubiquitous than screen so you may not have the option to run it unless you bring your own.

Git

Version control of all your work, notes, programming, etc.

  • \(T \approx 100\)
  • \(I \approx 10\)
  • \(R \approx 2\)

Workflows

(This may not be your choice to make.)

Security

Security considerations are usually not at the top of everyone’s priority list. The adage: “Convenience, Speed, Security: pick two” might as well be

Convenience, speed, security: we know you will pick convenience and speed.

Rule 1

Talk to the experts. At least once.

Rule 2—passwords

Treat passwords with extreme care.

Passwords are considered ‘something only you know’, but as soon as you write them down somewhere, on a piece of paper or in a file, you could inadvertently share this with others.

Never put passwords in a script. There is always a better way. Be aware that passwords typed on the command line will appear in your history file.

Rule 3—data

Where does this data go? Who has access to it? Since last year, a new EU directive went into effect governing the handling of personal information.

For Nikhef, personal data includes user identities.

This means that publishing the output of qstat on a personal web page is already a violation!

Rules 4 through \(\infty\)

  • protect your security tokens (ssh private key)
  • strong passwords
  • different passwords everywhere
  • do not log in from a public computer
  • encrypt your phone
  • encrypt your laptop
  • encrypt your grandmother
  • program with a deep mistrust of human beings

Temporary files and directories

Established practice for safely creating temporary files is by using mktemp.

tmpfile=`mktemp`
tmpdir=`mktemp -d`

This takes care of creating a new file with a randomised name that is guaranteed to be owned by the user.

Using passwords in scripts

Sometimes scripts need to use a password to authenticate or unlock. The script can read the password from stdin and keep it in a local variable for the time that it is needed.

stty -echo
echo "enter password:"
read passwd
stty echo
mkproxy --passin - <<<$passwd
unset passwd

Be aware that putting passwords on the command-line means that it will show up in the process list.

Finally

Learn just enough Linux to get things done
http://alexpetralia.com/posts/2017/6/26/learning-linux-bash-to-get-things-done
Learning git branching
https://learngitbranching.js.org/
Advanced Bash-Scripting Guide
http://tldp.org/LDP/abs/html/
Focus Hard. In Reasonable Bursts. One Day at a Time.
https://www.calnewport.com/blog/2009/08/20/focus-hard-in-reasonable-bursts-one-day-at-a-time/
#Linux on Freenode.net IRC
https://freenode.linux.community/how-to-connect/
Gitlab server at Nikhef
https://gitlab.nikhef.nl/
Let me Google that for you
http://bfy.tw/FDe5
Emacs Org mode
http://orgmode.org/
Reveal.js
https://revealjs.com/