Anki
Factor analysis of failed language cards in Anki
After developing a rudimentary approach to detecting resistant language learning cards in Anki, I began teasing out individual factors. Once I was able to adjust the number of lapses for the age of the card, I could examine the effect of different factors on the difficulty score that I described previously.
Findings
Some of the interesting findings from this analysis:
- Prompt-answer direction - 62% of lapses were in the Russian → English (recognition) direction.1
- Part of speech - Over half (51%) of lapses were among verbs. Since the Russian verbal system is rich and complex, it’s not surprising to find that verb cards often fail.
- Noun gender - Between a fifth and a quarter (22%) of all lapses were among neuter nouns and among failures due to nouns only, neuter nouns represented 69% of all lapses. This, too, makes intuitive sense because neuter nouns often represent abstract concepts that are difficult to represent mentally. For example, the Russian words for community, representation, and indignation are all neuter nouns.
Interventions
With a better understanding of the factors that contribute to lapses, it is easier to anticipate failures before they accumulate. For example, I will immediately implement a plan to surround new neuter nouns with a larger variety of audio and sample sentence cards. For new verbs, I’ll do the same, ensuring that I include multiple forms of the verb, varying the examples by tense, number, person, aspect and so on.
Refactoring Anki language cards
Parsing Russian Wiktionary content using XPath
As readers of this blog know, I’m an avid user of Anki to learn Russian. I have a number of sources for reference content that go onto my Anki cards. Notably, I use Wiktionary to get word definitions and the word with the proper syllabic stress marked. (This is an aid to pronunciation for Russian language learners.)
Since I’m lazy to the core, I came up with a system way of grabbing the stress-marked word from the Wiktionary page using lxml
and XPath
.
Directly setting an Anki card's interval in the sqlite3 database
Regex to match a cloze
Anki and some other platforms use a particular format to signify cloze deletions in flashcard text. It has a format like any of the following:
{{c1::dog::}}
{{c2::dog::domestic canine}}
Here’s a regular expression that matches the content of cloze deletions in an arbitrary string, keeping only the main clozed word (in this case dog.)
{{c\d::(.*?)(::[^:]+)?}}
To see it in action, here it is in action in a Python script:
import re
def stripCloze(searchText):
return re.sub(r'{{c\d::(.*?)(::[^:]+)?}}', r"\1", searchText)
print(stripCloze("The {{c1::passengers::tourist riders}} spotted a breaching {{c2::whale}}."))
It should return The passengers spotted a breaching whale.
An alternative method for keyboard input switching on macOS
Sunday, September 16, 2018
Regex 101 is a great online regex tester.
Speaking of regular expressions, for the past year, I’ve used an automated process for building Anki flash cards. One of the steps in the process is to download Russian word pronunciations from Wiktionary. When Wiktionary began publishing transcoded mp3 files rather than just ogg files, they broke the URL scheme that I relied on to download content. The new regex for this scheme is: (?:src=.*:)?src=\"(\/\/.*\.mp3)
Peering into Anki using R
Yet another diversion to keep me from focusing on actually using Anki to learn Russian. I stumbled on the R programming language, a language that focuses on statistical analysis.
Here’s a couple snippets that begin to scratch the surface of what’s possible. Important caveat: I’m an R novice at best. There are probably much better ways of doing some of this…
Counting notes with a particular model type
Here we’ll use R to do what we did previously with Python.
Anki database adventures: Counting notes by model type
Continuing my series on accessing the Anki database outside of the Anki application environment, here’s a piece on accessing the note type model. You may wish to start here with the first article on accessing the Anki database. This is geared toward mac OS. (If you’re not on mac OS, then start here instead.)
The note type model
Since notes contain flexible fields in Anki, the model for a note type is in JSON. The best guess definition of the JSON is: