Getting things done with shell scripting

TL;DR🔗

When to resort to shell scripts
- Portability is important
- Problem at hand is compact
- File system interaction
- Command-line program automation
When to search for an alternative
- Extensibility is required
- Coding mission-critical stuff
Shell scripting is dangerous, use shellcheck and limit yourself in idioms used

Computer consoles are the second most important UX improvement in computing, surpassed only by window managers. Consoles went a long way from allowing the computer user to enter programs to be executed by a single-process operating system to converting them into toolboxes. Gladly, most of it was happening in Bell Labs under the supervision of the unstoppable innovator Douglas McIlroy.

The Multics "shell", just like inputs on other terminal-enabled computers of that era, was an instrument to accept a program and execute it on the Multics OS, the predecessor of UNIX. The earliest versions of the Multics shell already had input/output redirection. It wasn't until Douglas McIlroy discovered (not invented) command pipelines, it became a golden standard across all the operating systems.

Pipelining and UNIX philosophy is allowing for writing small problem-specific programs that can be later composed. But the biggest reason for the usage of UNIX shell scripts in 2021 is portability. Indeed, every modern system has a UNIX shell readily available. Furthermore, often making a reasonably well-written shell script is "good enough for the job". But how does one do that? Let's explore the answer to this question, assuming that the reader already knows the very basics of shell scripting.

Minimal shell scripts in bash🔗

An Reannissance meme with an evil-looking man saying "thou shalt not establish bumblebee" Let's get the most important consideration out of the way. Writing secure and reliable shell scripts is almost impossible. The least one can do is to use shellcheck, which has integration with VSCode. Furthermore, be very disciplined with user inputs and do your best to quote every argument that has a variable in it.

Alright, with that out of the way, let's talk about step-by-step items one has to do to make a reasonable shell script.

Setting up your shell environment🔗

Before starting programming shells, we need to first determine which shell scripting language will we use. There are three schools of thought about this:

Use bash by default. Bash is a reasonable middle ground between having a lot of features and being portable since it is shipped with each of the popular OS.
Use sh by default and bash when advanced features are needed. This is a purist approach. It offers the most portability but requires distinguishing between basic features and bash-exclusive features.
Use zsh, fish or some other "hipster" shell for everything. I'm mentioning this school of thought for completeness. Since it breaks portability, people who pick this option may just as well code in Python.

As one can guess, we suggest simply using bash for everything. Of course, Apple seems to be deprecating bash as the default shell, but it's not going anywhere from mac os systems. Conversely, Windows 10 has support for reasonable bash integration with its WSL programme. It requires some setup, but these days WSL2 seems to become the default for Windows development.

Besides, bash scripts have fine-grained built-in support for lowering the impact of the inevitable bugs. While setting up your shell, we suggest you use the following options:

#!/usr/bin/env bash
set -euo pipefail

Optionally add -x for easier debugging.

Command line argument processing🔗

If you, for some reason, want to use shell to write something huge like a full-blown issue tracker with git backend, you'll need to use getopts or make your own despatch system. In this post, however, we shall consider simple argument processing. We heavily advocate for short and concise scripts that do one thing and one thing only, after all.

First things first, let's see how to print help:

if [[ "$1" == "--help" || "$1" == "-h" ]]; then
  cat <<EOH
frgtmv: for each file read from STDIN, forget its filename entirely or amend part of it.
...
''frgtmv'' will then ''mv'' each of these files to ''\$(date +'%Y%m%d%H%M%S%N')'', preserving the file extension.
...
EOH
  exit
fi

Key points:

We're using <<EOH / EOH "heredoc" syntax and make sure that we don't indent the lines under it.
We're escaping special characters like $ with a backslash. If we wouldn't, bash would evaluate internals in a subshell.
Don't forget to exit after printing the help!

Now let's use -n, a predicate checking if a variable is set, to prepare the variables needed. Often we want to set up defaults at the beginning of the file:

_mode="forget"
_pattern_from=""
_replace_with=""

if [ -n "$1" ]; then
  _mode="amend"
  _pattern_from="$1"
fi

if [ -n "$2" ]; then
  _replace_with="$2"
fi

Sometimes you would need to exit if an argument is not supplied. We use -z that checks if a string is not empty for this:

# Exit if file or directory is not submitted or not a valid file or directory
if [ -z "$1" ]; then
  echo "We really need the first argument"
  exit 228
fi

STDIN, pipes and GNU parallel🔗

Sometimes you need to receive input from STDIN through a pipe or user input. It's done using read -r:

while read -r _x; do
  mv -v "$_x" "$(date +'%Y%m%d%H%M%S%N').${_x#*.}"
done

If you care about performance more than about portability, use cat - to pass your STDIN to GNU parallel, following this pattern:

function forget() {
  mv -v "$2" "$(date +'%Y%m%d%H%M%S%N').$1.${2#*.}"
}
export -f forget # (A)

if [[ $_mode == "forget" ]]; then
  cat - | parallel forget {%} {} # (B)
fi

In (A), the parallel payload is implemented as a bash function. Notice the export statement! We suggest writing payloads that receive two variables: the parallel job ID ({%}) and the currently read out item from the STDIN stream ({}). The payload is called from (B).

Here are some interesting parallel techniques:

--keep-order to guarantee that the order of the input will be kept. Requires several file handles per input, which may turn out to be a bottle-neck.
find . -print0 | parallel -0 f {} to work in null-terminated mode.
parallel 'echo "{%}:{1}:{2}";' ::: 1 2 ::: a b c will prarallelise a Cartesian product of input sets {1,2} × {a,b,c}.

Bash parameter expansion🔗

Confusing many, if not all new shell users, "parameter expansion" has a veil of mystery around it. In our humble opinion, there are several causes for this effect:

Parameter expansion is a blanket name that unites accessing values and substituting values.
Within these use-cases, there is a myriad of conditional behaviours and they are decided based on the kind of parameter.
There isn't much expansion going on. Word "expanded" is simply an arcane way to say "reduced to a value".

Let's start from the beginning. A parameter in bash is either a variable (like $HOME), a positional "argument" parameter (like $1), or a special parameter (like $@).

"Expansion" is the process of reduction of parameters to values. Variables expand in the way one would expect from variables. Special parameters, however, can have context-dependent expansions. Expansions have special syntaxes to tack on additional computations like string substitution, length calculation, etc.

Let's use the following variable as an example: x="a.b.c". Here is a list of the most often used parameter expansions, according to us:

Stripping. Use-case: get a file extension or remove a file extension.
- ${x%.*} ≡ a.b
- ${x%%.*} ≡ a
- ${x#*.} ≡ b.c
- ${x##*.} ≡ c
String replacement.
- ${x/./\!} ≡ a!b.c
- ${x//./\!} ≡ a!b!c
Array enumeration with IFS and [@].
- IFS="."; for v in ${x[@]}; do echo -n "($v)"; done ≡ (a)(b)(c)

You can also construct arrays with a "compound assignment":

x=(a b c)
for v in ${x[@]}; do
    echo -n "($v)"
done

IFS manipulation is not normally needed. If you're changing the splitting context, you should see if there is another way. You might be falling victim to the XY problem.

This tutorial should provide good-enough techniques to quickly and effectively implement shell scripts that do what you want them to. Quote your variables, fail fast, make backups to recover from destructive changes, don't use too many "advanced features" since they are error-prone, and good luck!

We leave you with some shell scripts we wrote that push the boundaries of what shell should be used to:

If there will be community interest, we will take some time to cover extensible shell scripting in Haskell with Turtle. As usual, reach out to us on Twitter or in the comments on Dev.to or Medium mirrors.