Fixing Knowclip .apkg files: one more thing

(N.B. A much-improved version of this script is published in a later post)

Fixing the Knowclip note files as I described previously, it turns out, is only half of the fix with the broken .apkg files. You also need to fix the cards table. Why? Same reason. The rows are number sequentially from 1. But since Anki uses the card id field as the date added, the added field is always wrong. Again, the fix is simple:

#!/usr/bin/env python3

import sqlite3
import time
import datetime

db_path = '/path/to/knowclip/generated/apkg/collection.anki2'
conn = sqlite3.connect(db_path)
cursor = conn.cursor()

q = 'UPDATE cards SET id = nid + 10'
cursor.execute(q)
conn.commit()
conn.close()

Here’s an improved version of the previous script, one that incorporates changes to both the notes and the cards tables.

#!/usr/bin/env python3

import sqlite3
import time
import datetime

# start date for new cards
dt = input("Creation date 2021-04-05: ")
(year,month,day) = [int(x) for x in dt.split('-')]
tm = input("Creation time 05:00: ")
(hr, minute) = [int(x) for x in tm.split(':')]
new_date = datetime.datetime(year, month, day, hr, minute)
new_epoch = int(new_date.timestamp() * 1000)
print(f'Will begin create date with epoch ms {new_epoch}.')

db_path = '/path/to/knowclip/generated/apkg/collection.anki2'
conn = sqlite3.connect(db_path)
cursor = conn.cursor()
q = 'SELECT id FROM notes'
cursor.execute(q)
note_rows = cursor.fetchall()
note_chg_sql_cmds = []
for note_row in note_rows:
	qn = f'UPDATE notes SET id = {new_epoch} WHERE id = {note_row[0]}'
	note_chg_sql_cmds.append(qn)
	qc = f'UPDATE cards SET nid = {new_epoch} WHERE nid = {note_row[0]}'
	new_epoch = new_epoch + 100
	cursor.execute(qc)
conn.commit()
qc = f'UPDATE cards SET id = nid +10'
cursor.execute(qc)
conn.commit()
for sql in note_chg_sql_cmds:
	cursor.execute(sql)


conn.commit()
conn.close()

Fixing Knowclip Anki apkg creation dates

(N.B. A much-improved version of this script is published in a later post)

Language learners who want to develop their listening comprehension skills often turn to YouTube for videos that feature native language content. Often these videos have subtitles in the original language. A handful of applications allow users to take these videos along with their subtitles and chop them up into sentence-length bites that are suitable for Anki cards. Once such application is Knowclip. Indeed for macOS users, it’s one of the few viable options.1

As expected, as of version 0.10.2-beta, it has a few rough edges, one of which is that notes it generates have the incorrect date. All of the creation dates are 1969-12-31, which of course is the zero date in the Unix epoch date-keeping world. As far as I can tell, it doesn’t cause in problems once the Knowclip-generated apkg file is imported into Anki, but it’s an irritating bug.

To fix this issue, it’s important to recognize that an apkg file is just a regular zip file in disguise. So the first step is to rename the file that Knowclip generates to something.zip, then decompress it. Inside the decompressed directory, you’ll see a number of file, including a collection.anki2 file. That’s the SQLite file we’ll be targeting.

We’ll need to correct both the notes and cards tables. For convenience, we can do this in a script:

#!/usr/bin/env python3

import sqlite3
import time
import datetime

# start date for new cards
new_date = datetime.datetime(2021,4,5,5,0)
new_epoch = int(new_date.timestamp() * 1000)
print(f'Will begin create date with epoch ms {new_epoch}.')

db_path = '/path/to/knowclip/generated/apkg/collection.anki2'
conn = sqlite3.connect(db_path)
cursor = conn.cursor()
q = 'SELECT id FROM notes'
cursor.execute(q)
note_rows = cursor.fetchall()
note_chg_sql_cmds = []
for note_row in note_rows:
	qn = f'UPDATE notes SET id = {new_epoch} WHERE id = {note_row[0]}'
	note_chg_sql_cmds.append(qn)
	qc = f'UPDATE cards SET nid = {new_epoch} WHERE nid = {note_row[0]}'
	new_epoch = new_epoch + 100
	cursor.execute(qc)
conn.commit()
for sql in note_chg_sql_cmds:
	cursor.execute(sql)
conn.commit()
conn.close()

Just run this script against the collection.anki2 file before importing, and you’ll have the correct date inside of Anki.


  1. subs2srs is frequently featured and offers similar features, but it Windows-only. ↩︎

Generating HTML from Markdown in Anki fields

I write in Markdown because it’s much easier to keep the flow of writing going without taking my hands off the keyboard.

I also like to write content in Anki cards in Markdown. Over the years there have been various ways in of supporting this through add-ons:

  • The venerable Power Format Pack was great but no longer supports Anki 2.1, so it became useless.
  • Auto Markdown worked for a while but as of Anki version 2.1.41 does not.
  • After Auto Markdown stopped working, I installed the supposed fix Auto Markdown - fix version but that didn’t work either.
  • It’s possible that the Mini Format Pack will work, but honestly I’m tired of the constant break-fix-break-fix cycle with Anki.

The problem

The real problem with Markdown add-ons for Anki is the same as every other add-on. They are all hanging by a thread. Almost every minor point upgrade of Anki breaks at least one of my add-ons. It’s nearly impossible to determine in advance whether an Anki upgrade is going to break some key functionality that I rely on. And add-on developers, even prominent and prolific ones come and go when they get busy, distracted or disinterested. It’s one of the most frustrating parts of using Anki.

Pre-processing Russian text for the AwesomeTTS add-on in Anki

The Anki add-on AwesomeTTS has been a vital tool for language learners using the Anki application on the desktop. It allows you to have elements of the card read aloud using text-to-speech capabilities. The new developer of the add-on has added a number of voice options, including the Microsoft Azure voices. The neural voices for Russian are quite good. But they have one major issue, syllabic stress marks that are sometimes seen in text intended for language learners cause the Microsoft Azure voices to grossly mispronounce the word.

Factor analysis of failed language cards in Anki

After developing a rudimentary approach to detecting resistant language learning cards in Anki, I began teasing out individual factors. Once I was able to adjust the number of lapses for the age of the card, I could examine the effect of different factors on the difficulty score that I described previously.

Findings

Some of the interesting findings from this analysis:

  • Prompt-answer direction - 62% of lapses were in the Russian → English (recognition) direction.1
  • Part of speech - Over half (51%) of lapses were among verbs. Since the Russian verbal system is rich and complex, it’s not surprising to find that verb cards often fail.
  • Noun gender - Between a fifth and a quarter (22%) of all lapses were among neuter nouns and among failures due to nouns only, neuter nouns represented 69% of all lapses. This, too, makes intuitive sense because neuter nouns often represent abstract concepts that are difficult to represent mentally. For example, the Russian words for community, representation, and indignation are all neuter nouns.

Interventions

With a better understanding of the factors that contribute to lapses, it is easier to anticipate failures before they accumulate. For example, I will immediately implement a plan to surround new neuter nouns with a larger variety of audio and sample sentence cards. For new verbs, I’ll do the same, ensuring that I include multiple forms of the verb, varying the examples by tense, number, person, aspect and so on.

Refactoring Anki language cards

Regardless of how closely you adhere to the 20 rules for formating knowledge, there are cards that seem destined to leechdom. For me part of the problem is that with languages, straight-up vocabulary cards take words out of the rich context in which they exist in the wild. With my maturing collection of Russian decks, I recently started to go through these resistant cards and figure out why they are so difficult.

Parsing Russian Wiktionary content using XPath

As readers of this blog know, I’m an avid user of Anki to learn Russian. I have a number of sources for reference content that go onto my Anki cards. Notably, I use Wiktionary to get word definitions and the word with the proper syllabic stress marked. (This is an aid to pronunciation for Russian language learners.)

Since I’m lazy to the core, I came up with a system way of grabbing the stress-marked word from the Wiktionary page using lxml and XPath.

Being grateful for those who push our buttons


We need people to push our buttons, otherwise how are we to know what buttons we have?

Jetsunma Tenzin Palmo
  <cite>Ten Percent Happier podcast, February 8, 2021</cite>

Jetsunma Tenzin Palmo is a Buddhist nun interviewed on the excellent Ten Percent Happier podcast. It’s always possible to reframe situations where someone “pushes our buttons” to see it as an opportunity to better understand that there are these buttons, these sensitivities that otherwise evade our awareness.

Directly setting an Anki card's interval in the sqlite3 database

It’s always best to let Anki set intervals according to its view of your performance on testing. That said, there are times when directly altering the interval makes sense. For example, to build out a complete representation of the entire Russian National Corpus, I’m forced to enter vocabulary terms that should be obvious to even elementary Russian learners but which aren’t yet in my nearly 24,000 card database. Therefore, I’m entering these cards gradually. When they come up as new cards, I pass them as “Easy” on the first appearance, converting them to review cards. But ideally, I’d like to send them away for years.

Where the power lies in 2021

From an article recently on the BBC Russian Service:

Блокировка уходящего президента США в “Твиттере” и “Фейсбуке” привела к необычной ситуации: теоретически Трамп еще может начать ядерную войну, но не может написать твит.

“Blocking the outgoing U.S. President from Twitter and Facebook has led to an unusual situation: theoretically Trump can still start a nuclear war, but cannot write a Tweet.”

In only a week, he won’t be able to do either. But while celebrating the deplatforming of this vicious clown, I have a tinge of worry about why it means for the future of democracy. It nearly goes without saying that social networks have become nearly the de facto equals of representative government in the U.S.