Splitting a string on the command line - the search for the one-liner

It seems like the command line is one of those places where you can accomplish crazy efficient things with one-liners.

Here’s a perfect use case for a CLI one-liner:

In Anki, I often add lists of synonyms and antonyms to my vocabulary cards, but I like them formatted as a bulleted list. My usual route to that involves Markdown. But how to convert this:

известный, точный, определённый, достоверный

to

- `известный`
- `точный`
- `определённый`
- `достоверный`

After trying to come up with a single text replacement strategy to make this work, the best I could do was this:

#!/bin/bash

words="известный, точный, определённый, достоверный";
echo $words | sed -E 's/, /\n/g'| sed -E 's/(.*)/- `\1`/g'

Sometimes, if I get really irritated at sed, which is more often than I’d like, I’ll switch to sd which has more straightforward syntax.1

#!/bin/bash

words="известный, точный, определённый, достоверный";
echo $words | sd ", " "\n" | sd '(.*)\n' '\u002d \u0060$1\u0060\n';

In both of these cases, the process requires two steps because of the way sed and sd work. First, we strip about the delimiters, then we capture what’s left and format it.


  1. Usually. In the sd example, you’ll see that I had to resort to Unicode in the replacement string, but it doesn’t like the dash and back-tick symbols. ↩︎

A Keyboard Maestro macro to edit Anki sound file

Often when I import a pronunciation file into Anki, from Forvo for example, the volume isn’t quite right or there’s a lot of background noise; and I want to edit the sound file. How?

The solution for me, as it often the case is a Keyboard Maestro macro.

Prerequisites

  • Keyboard Maestro - if you are a macOS power user and don’t have KM, then your missing on a lot.
  • Audacity - the multi-platform FOSS audio editor

Outline of the approach

Since Keyboard Maestro won’t know the path to our file in Anki’s collection.media directory, we have to find it. But the first task is to extract the filename. In the Anki note field, it’s going to have this format:

[sound:forvo-e21a80cf-285b8575-3972ebd2-24eaa712-d8e5cc26.mp3]

To extract the the filename forvo-e21a80cf-285b8575-3972ebd2-24eaa712-d8e5cc26.mp3 we can just use sed:

sed -E 's/\[.*:(.*)\]/\1/g'

And to find the file in the macOS file system:

mdfind -name $fn

But we want to restrict our search to a collection.media directory because the file might be cached somewhere else. For example Awesome TTS caches a copy of generated or downloaded files. Here, we can pipe our mdfind results to grep:

mdfind -name $fn | grep -E 'collection[^\0]media'

Putting it all together, this is script we’ll use in the KM macro:

#!/bin/bash

fn=$(pbpaste | sed -E 's/\[.*:(.*)\]/\1/g')
# open with Audacity but only if file is found in 
# a path with collection.media
open -a /Applications/Audacity.app \
     "$(mdfind -name $fn | grep -E 'collection[^\0]media')"

Now we just just have to make sure that the field contents are on the clipboard for pbpaste to work. In KM, we will just add Select All and Copy actions (⌘A and ⌘C.)

Querying the Anki database when the application is running

When the Anki application is open on the desktop, it places a lock on the sqlite3 database such that it can’t be queried by another process. One workaround is to try to open the database and if it fails, then make a temporary copy and query that. Of course, this only works with read-only queries. Here’s the basic strategy:

#!/usr/local/bin/python3
# -*- coding: utf-8 -*-

# requires python >= 3.8 to run because of anki module

from anki import Collection, errors

if __name__ == "__main__":
    try:
        col = Collection(path_to_anki_db)
    except (errors.DBError:
        # anki is open, copy to temp file
        import tempfile
        import shutil
        import os

        with tempfile.TemporaryDirectory() as tmpdir:
            dst = os.path.join(tmpdir, 'collectiontemp.anki2')
            shutil.copy(COLLECTION_PATH, dst)
            col = Collection(dst)
            # do something with Anki db

Note that the tempfile context manager will discard the database, if there are actions on the collection that are common to the Anki-is-open and Anki-is-not-open paths then those should be abstracted to separate function.

Normalizing spelling in Russian words containing the letter ё

The Russian letters ё and e have a complex and troubled relationship. The two letters are pronounced differently, but usually appear the same in written text. This presents complications for Russian learners and for text-to-speech systems. In several recent projects, I have needed to normalize the spelling of Russian words. For examples, if I have the written word определенно , is the word actually определенно ? Or is it определённо ?

This was a larger challenge than I imagined. Apart from udar1, I failed to find any off-the-shelf solutions to what I call normalizing the spelling of words that should be spelled with ё . It turns out that the Russian language Wiktionary respects URLs whether spelled with ё or e . Therefore, one way of normalizing the spelling is to query Wiktionary and grab the headword from the page. Normally I don’t like creating this sort of dependency; but it’s the only solution that presented itself so far. Here’s the approach I took:

Scraping Russian word definitions from Wikitionary: utility for Anki

While my Russian Anki deck contains around 27,000 cards, I’m always making more. (There are a lot words in the Russian language!) Over the years, I’ve become more and more efficient with card production but one of the missing pieces was finding a code-readable source of word definitions. There’s no shortage of dictionary sites, but scraping data from any site is complicated by the ways in which front-end developers spread the semantic content across multiple HTML tags arranged in deep and cryptic hierarchies. Yes, we can cut-and-paste, but my quest is about nearly completely automating quality card production. This is a quick post of a method for scraping word definitions from Wiktionary.

Encoding of the Cyrillic letter й - a UTF-8 gotcha

In the process of writing and maintaining a service that checks Russian word frequencies, I noticed peculiar phenomenon: certain words could not be located in a sqlite database that I knew actually contained them. For example, a query for the word - английский consistently failed, whereas other words would succeed. Eventually the commonality between the failures became obvious. All of the failures contained the letter й , which led me down a rabbit hole of character encoding and this specific case where it can go astray.

Dynamic DNS - auto-updating from macOS

To run a little project (that I’ll describe at some point in the future) I have to run a small web server from my home computer, one that happens to run macOS. More than anything else, this is just a reply of what I did to get it running in case: a) I have to do it again, or b) Someone else can find it useful.

Sign up for dynamic DNS service

I signed up for service with dynv6 because I saw it recommended elsewhere and it didn’t look creepy like some of the other options. I just signed up with email - through an email proxy anonymizer, because I’m paranoid. After verifying my email, I was able to create a new “zone”, basically a record of my public IP address linked to custom DNS.

No sir, I do not want Big Sur

Maybe I’m just getting cranky after over a year of on-again-off-again pandemic lockdowns, but I’ve had it with Apple’s heavy-handed attempts to get me to upgrade to Big Sur. Mind you, I have nothing against it. It’s just an operating system. I don’t particularly like it’s translucent bubbly iOS look. But I could live with.

But I don’t want it. I depend on a very unorthodox setup. I have a lot of infrastructure tools that depend on certain versions of Python to be in just the right place. Every single macOS major upgrade breaks all of this and I spend days picking up the pieces. I’m tired of Apple messing with it. So when my system launched into what seems like an unbidden upgrade process today, I lost it.

Dynamically loading Javascript in Anki card templates

The ability to execute Javascript in Anki card templates offers users flexibility in displaying data. In Anki 2.1, though, the asynchronous execution of Javascript means that user script functionality is not entirely predictable. This post on r/Anki discusses an approach for dynamically loading Javascript resources and ensuring that they are available when the card is displayed. Since I modularize my Javascript code so that it can be flexibly deployed to different card types, I extended this method to allow the template developer to load multiple scripts in one <script> block.

Fixing CodeRunner jQuery injection

CodeRunner is one of my favourite development environments on macOS. I use it for small one-off projects or for testing concepts for integration into larger projects. But in version 4.0.3, jQuery injection in a custom HTML page is broken, giving the error:

It’s probably due to some unescaped bit of code in their minified jQuery, but I didn’t have time to work that hard. Instead I reported the error to the developer an fixed it myself. The original (default) run script for jQuery is: