sed matching whitespace on macOS

sed is such a useful pattern-matching and substitution tool for work on the command line. But there’s a little quirk on macOS that will trip you up. It tripped me up. On most platforms, \s is the character class for whitespace. It’s ubiquitous in regexes. But on macOS, it doesn’t work. In fact, it silently fails.

Consider this bash one-liner which looks like it should work but doesn’t:

# should print I am corrupt (W.Barr)
# instead it prints I am corrupt by W.Barr
echo "I am corrupt by W.Barr" | sed -E 's|^(.+)\sby\s(.+)|\1 (\2)|g'

What does work is the character class [:space:]:

# prints I am corrupt (W.Barr)
echo "I am corrupt by W.Barr" | sed -E 's|^(.+)[[:space:]]by[[:space:]](.+)|\1 (\2)|g'

Or just a space without a character class seems to work:

# prints I am corrupt (W.Barr)
sed -E 's|^(.+) by (.+)|\1 (\2)|g'

The [:blank:] character class works also:

sed -E 's|^(.+)[[:blank:]]by[[:blank:]](.+)|\1 (\2)|g'

Bracket expressions in sed

It turns out that if you RTFM for sed, the explanation is clear. There are several character classes documented in the sed manual and each must be enclosed in brackets []. Pertinent to our issue, the [:space:] character class matches the following: tab, newline, vertical tab, form feed, carriage return, and space. On the other hand [:blank:] is more restrictive, matching only space and tab. The manual is definitely worth looking at because there are other metacharacter classes that are simply not available. For example \w is unusable, requiring [:alnum:] instead, as in:

# prints foobar
echo "foo        bar" | sed -E 's|^([[:alnum:]]+)[[:space:]]+([[:alnum:]]+)$|\1\2|g'

References

  • macOS man page for sed - no mention of \s though.
  • This question about whitespace and sed on Superuser is worth reviewing.
  • The sed manual section on character classes and bracket expressions is a must-read. (Or the contents page of the sed manual.)