Commandline

Bash variable scope and pipelines

I alluded to this nuance involving variable scope in my post on automating pdf processing, but I wanted to expand on it a bit.

Consider this little snippet:

i=0
printf "foo:bar:baz:quux" | grep -o '[^:]\+' | while read -r line ; do
   printf "Inner scope: %d - %s\n" $i $line
   ((i++))
   [ $i -eq 3 ] && break;
done
printf "====\nOuter scope\ni = %d\n" $i;

If you run this script - not in interactive mode in the shell - but as a script, what will i be in the outer scope? And why?

Automating the handling of bank and financial statements

In my perpetual effort to get out of work, I’ve developed a suite of automation tools to help file statements that I download from banks, credit cards and others. While my setup described here is tuned to my specific needs, any of the ideas should be adaptable for your particular circumstances. For the purposes of this post, I’m going to assume you already have Hazel. None of what follows will be of much use to you without it. I’ll also emphasize that this is a macOS-specific post. Bear in mind, too, that companies have the nasty habit of tweaking their statement formats. That fact alone makes any approach like this fragile; so be aware that maintaining these rules is just part of the game. With that out of the way, let’s dive in.

sterilize-ng: a command-line URL sterilizer

Introducing sterilize-ng [GitHub link] - a URL sterilizer made to work flexibily on the command line.

Background

The surveillance capitalist economy is built on the relentless tracking of users. Imagine going about town running errands but everywhere you go, someone is quietly following you. When you pop into the grocery, they examine your receipt. They look into the bags to see what you bought. Then they hop in the car with you and keep careful records of where you go, how fast you drive, whom you talk with on the phone. This is surveillance capitalism - the relentless “digital exhaust” left by our actions online.

Splitting a string on the command line - the search for the one-liner

It seems like the command line is one of those places where you can accomplish crazy efficient things with one-liners.

Here’s a perfect use case for a CLI one-liner:

In Anki, I often add lists of synonyms and antonyms to my vocabulary cards, but I like them formatted as a bulleted list. My usual route to that involves Markdown. But how to convert this:

известный, точный, определённый, достоверный

to

- `известный`
- `точный`
- `определённый`
- `достоверный`

After trying to come up with a single text replacement strategy to make this work, the best I could do was this:

sed matching whitespace on macOS

sed is such a useful pattern-matching and substitution tool for work on the command line. But there’s a little quirk on macOS that will trip you up. It tripped me up. On most platforms, \s is the character class for whitespace. It’s ubiquitous in regexes. But on macOS, it doesn’t work. In fact, it silently fails.

Consider this bash one-liner which looks like it should work but doesn’t:

# should print I am corrupt (W.Barr)
# instead it prints I am corrupt by W.Barr
echo "I am corrupt by W.Barr" | sed -E 's|^(.+)\sby\s(.+)|\1 (\2)|g'

What does work is the character class [:space:]: