Anki

Although I’ve been writing about Anki for years, it’s been in bits and pieces. Solving little problems. Creating efficiencies. But I realized that I’ve never taken a top-down approach to my Anki language learning system. So consider the post the launch of that overdue effort.

Caveats

A few caveats at the outset:

I’m not a professional language tutor or pedagogue of any sort really. Much of what I’ve developed, I’ve done through trial-and-error, some intuition, and a some reading on relevant topics.
People learn differently and have different goals. This series will be exclusively focused on language-learning. There are similarities between this type of learning and the memorization of bare facts. But there are important differences, too.
As I get further and further into the details, more and more of what I discuss will be macOS specific. I’m not particularly opinionated about operating systems. And my preference has more to do with the accumulated weight of what I’m accustomed to and as a consequence, the potential pain of switching. In the sections that deal with macOS specific solutions, feel free to skip over that content or read it with a view toward thinking about parallel tools on whatever OS you are using.
I use Anki almost exclusively for Russian language acquisition and practice. Of necessity, some particularities of the language are going to dictate the specific issues that you need to solve for. For example, if verbs of motion aren’t part of the grammar of your target language (TL) then rather than getting lost in those weeds, think about what unique counterparts your TL does have and how you might adopt the approaches I’m presenting.

We that out of the way, let’s dive in!

For one-off projects that target Anki collections, I often use Python in a standalone application rather than an Anki add-on. Since I’m not going to distribute these little creations that are specific to my own needs, there’s no reason to create an add-on. These are just a few notes - nothing comprehensive - on the process.

One thing to be aware of is that there must be a perfect match between the Anki major and minor version numbers for the Python anki module to work. If you are running Anki 2.1.48 on your desktop application but have the Python module built for 2.1.49, it will not work. This is a huge irritation and there’s no backwards compatibility; the versions must match precisely.

One of the things that I love about Keyboard Maestro is the ability to chain together disparate technologies to achieve some automation goal on macOS.

In most of my previous posts about Keyboard Maestro macros, I’ve used Python or shell scripts, but I decided to draw on some decades-old experience with Perl to do a little text processing for a specific need.

Background

I want this text from Wiktionary:

to look like this:

Often when I import a pronunciation file into Anki, from Forvo for example, the volume isn’t quite right or there’s a lot of background noise; and I want to edit the sound file. How?

The solution for me, as it often the case is a Keyboard Maestro macro.

Prerequisites

Keyboard Maestro - if you are a macOS power user and don’t have KM, then your missing on a lot.
Audacity - the multi-platform FOSS audio editor

Outline of the approach

Since Keyboard Maestro won’t know the path to our file in Anki’s collection.media directory, we have to find it. But the first task is to extract the filename. In the Anki note field, it’s going to have this format:

When the Anki application is open on the desktop, it places a lock on the sqlite3 database such that it can’t be queried by another process. One workaround is to try to open the database and if it fails, then make a temporary copy and query that. Of course, this only works with read-only queries. Here’s the basic strategy:

#!/usr/local/bin/python3
# -*- coding: utf-8 -*-

# requires python >= 3.8 to run because of anki module

from anki import Collection, errors

if __name__ == "__main__":
    try:
        col = Collection(path_to_anki_db)
    except (errors.DBError:
        # anki is open, copy to temp file
        import tempfile
        import shutil
        import os

        with tempfile.TemporaryDirectory() as tmpdir:
            dst = os.path.join(tmpdir, 'collectiontemp.anki2')
            shutil.copy(COLLECTION_PATH, dst)
            col = Collection(dst)
            # do something with Anki db

Note that the tempfile context manager will discard the database, if there are actions on the collection that are common to the Anki-is-open and Anki-is-not-open paths then those should be abstracted to separate function.

While my Russian Anki deck contains around 27,000 cards, I’m always making more. (There are a lot words in the Russian language!) Over the years, I’ve become more and more efficient with card production but one of the missing pieces was finding a code-readable source of word definitions. There’s no shortage of dictionary sites, but scraping data from any site is complicated by the ways in which front-end developers spread the semantic content across multiple HTML tags arranged in deep and cryptic hierarchies. Yes, we can cut-and-paste, but my quest is about nearly completely automating quality card production. This is a quick post of a method for scraping word definitions from Wiktionary.

It’s possible to use cloze deletion cards within standard Anki note types using the Anki Cloze Anything setup. But additional scripts are required to allow it to function seamlessly in a typical language-learning environment. I’ll show you how to flexibly display a sentence with or without Anki Cloze Anything markup and also not break AwesomeTTS.

Anki’s built-in cloze deletion system

The built-in cloze deletion feature in Anki is an excellent way for language learners to actively test their recall. For example, a cloze deletion note type with the following content requires the learner to supply the missing word:

I think this is the last word on fixing Knowclip .apkg files. I’ve developed this in bits and pieces; but hopefully this is the last word on the subject. See my previous articles, here and here, for the details.

This issue, again, is that Knowclip gives these notes and cards sequential id values starting at 1. But Anki uses the note.id and the card.id as the creation date. I logged it as an issue on Github, but as of 2021-04-15 no action has been taken.

(N.B. A much-improved version of this script is published in a later post)

Fixing the Knowclip note files as I described previously, it turns out, is only half of the fix with the broken .apkg files. You also need to fix the cards table. Why? Same reason. The rows are number sequentially from 1. But since Anki uses the card id field as the date added, the added field is always wrong. Again, the fix is simple:

(N.B. A much-improved version of this script is published in a later post)

Language learners who want to develop their listening comprehension skills often turn to YouTube for videos that feature native language content. Often these videos have subtitles in the original language. A handful of applications allow users to take these videos along with their subtitles and chop them up into sentence-length bites that are suitable for Anki cards. Once such application is Knowclip. Indeed for macOS users, it’s one of the few viable options.¹

A deep dive into my Anki language learning: Part I (Overview and philosophy)

Caveats

Accessing Anki collection models from Python

Using Perl in Keyboard Maestro macros

Background

A Keyboard Maestro macro to edit Anki sound file

Prerequisites

Outline of the approach

Querying the Anki database when the application is running

Scraping Russian word definitions from Wikitionary: utility for Anki

Extending the Anki Cloze Anything script for language learners

Anki’s built-in cloze deletion system

Complete fix for broken Knowclip .apkg files

Fixing Knowclip .apkg files: one more thing

Fixing Knowclip Anki apkg creation dates