Anki

Recently I’ve been wondering how long it would take me to get to 1,000,000 reviews. Right now I’m sitting at between 800,000 and 900,000 reviews and for no other reason than pure ridiculous curiosity I was curious whether I could get SQLite to compute it directly for me. Turns out the answer is “yes, you can.”

Here’s the query in its gory detail and then I’ll walk through how it works:

While working on a project that requires querying the Anki database directly outside of the Anki desktop application, I encountered an interesting issue with sqlite3 collations. This is is just a short post about how I went about registering a collation in order to execute SQL queries against the Anki db.

The problem

Let’s try a simple query. Open the Anki database:

Like many Anki users, I keep track of my streaks because it motivates me to do my reviews each day. But since life gets in the way sometimes, I may miss my reviews in one or more decks. It has been years since I’ve neglected to do all of my reviews; but sometimes I will forget to come back later in the day to finish up some of my decks. Since I like to have a clean review heatmap, I will “fix” my streak in a skipped deck.

As its name implies, the AwesomeTTS Anki add-on is awesome. It’s nearly indispensable for language learners.

You can use it in one of two ways:

Subscribe on your own to the text-to-speech services that you plan to use and add those credentials to AwesomeTTS. (à la carte)
Subscribe to the AwesomeTTS+ service and gain access to these services. (prix fixe)

Because I had already subscribed to Google and Azure TTS before AwesomeTTS+ came on the scene, there was no reason for me to pay for the comprehensive prix fixe option. Furthermore, since I’ve never gone above the free tier on any of these services, it makes no sense for me to pay for something I’m already getting for free. For others, the convience of a one-stop-shopping experience probably makes the AwesomeTTS+ service worthwhile.

Most language learners are familiar with Forvo, a site that allows users to download and contribute pronunciations for words and phrases. For my Russian studies, I make daily use of the site. In fact, to facilitate my Anki card-making workflow, I am a paid user of the Forvo API. But that’s where the trouble started.

When the Forvo API works, it works OK, often extremely slow. But lately, it has been down more than up. In an effort to patch my workflow and continue to download Russian word pronunciations, I wrote this little scraper. I’d prefer to use the API, but experience has shown now that the API is slow and unreliable. I’ll keep paying for the API access, because I support what the company does. And as often as not when a company offers a free service, it’s likely to be involved in surveillance capitalism. So I’d rather companies offer a reliable product at a reasonable price.

Recently, someone asked a question on r/Anki about changing and existing cloze-type note to a regular note. Part of the solution involves stripping the cloze markup from the existing cloze’d field. A cloze sentence has the form Play {{c1::studid}} games. or Play {{c1::stupid::pejorative adj}} games.

To handle both of these cases, the following regular expression will work. Just substitute for $1.

\{\{c\d::([^:\}]+)(?:::+[^\}]*)*\}\}

However, the Cloze Anything markup is different. It uses ( and ) instead of curly braces. If we want to flexibly remove both the standard and Cloze Anything markup, then our pattern would look like:

I make a lot of Anki cards, so I’m on a constant quest to make the process more efficient. Like a lot of language-learners, I use images on my cards where possible in order to make the word or sentence more memorable.

Process

When I find an image online that I want to use on the card, I download it to ~/Documents/ankibound. A Hazel rule then grabs the image file and converts it to a .webp file with relatively low quality and a maximum horizontal dimension of 200px. The size and low quality allow me to store lots of images without overwhelming storage capacity, or more importantly, resulting in long synchronization times.

Anki users are protective of their streak - the number of consecutive days they’ve done their reviews. Right now, for example, my streak is 621 days. So if you miss a day for whatever reason, not only do you have to deal with double the number of reviews, but you also deal with the emotional toll of having lost your streak.

You can lose your streak for one of several reasons. You could have simply been lazy. You may have forgotten that you didn’t do your Anki. Or travel across timezones put you in a situation where Anki’s clock and your clock differ. Others have described a procedure for resetting the computer’s clock as a way of recovering a lost streak. It apparently works though I haven’t tried it. Instead I’ll focus on a technique that involves working directly with the Anki database.

Welcome to Part III of a deep dive into my Anki language learning decks. In Part I I covered the principles that guide how I setup my decks and the overall deck structure. In the lengthy Part II I delved into my vocabulary deck. In this installment, Part III, we’ll cover my sentence decks.

Principles

First, sentences (and still larger units of language) should eventually take precedence in language study. What help is it to know the word for “tomato” in your L2, if you don’t know how to slice a tomato, how to eat a tomato, how to grow a tomato plant? Focus on larger units of language increases your success rate in integrating vocabulary into daily use.

In Part I of my series on my Anki language-learning setup, I described the philosophy that informs my Anki setup and touched on the deck overview. Now I’ll tackle the largest and most complex deck(s), my vocabulary decks.

First some FAQ’s about my vocabulary deck:

Do you organize it as L1 → L2 or as L2 → L1, or both? Actually, it’s both and more. Keep reading.
Do you have separate subdecks by language level, or source, or some other characteristic? No, it’s just a single deck. First, I’m perpetually confused by how subdecks work. I’d rather subdecks just act as organizational, not functional, tools. But other users don’t see it that way. That’s why I use tags rather than subdecks to organize content.¹
Do you use frequency lists? No, I extract words from content that I’m reading, that I encounter when listening to moviews or podcasts, or words that my tutor mentions in conversation. That’s what goes in Anki.

Since this is a big topic, I’m going to start with a quick overview of the fields in the main note type that populates my vocabulary deck and then go into each one in more detail and how they fit together in each of my many card types. At the very end of the post, I’ll talk about verb cards which are similar in most ways to the straight vocabulary card, but which account from the complexities of the Russian verbal system.²

When will I get to 1,000,000 Anki reviews?

Registering a custom collation to query Anki database

The problem

Fix your Anki streak - the script edition

AwesomeTTS Anki add-on: Use Amazon Polly

Scraping Forvo pronunciations

A regex to remove Anki's cloze markup

Anki: Insert the most recent image

Process

Altering Anki's revlog table, or how to recover your streak

A deep dive into my Anki language learning: Part III (Sentences)

Principles

A deep dive into my Anki language learning: Part II (Vocabulary)