Posts

Scraping Russian word definitions from Wikitionary: utility for Anki

While my Russian Anki deck contains around 27,000 cards, I’m always making more. (There are a lot words in the Russian language!) Over the years, I’ve become more and more efficient with card production but one of the missing pieces was finding a code-readable source of word definitions. There’s no shortage of dictionary sites, but scraping data from any site is complicated by the ways in which front-end developers spread the semantic content across multiple HTML tags arranged in deep and cryptic hierarchies. Yes, we can cut-and-paste, but my quest is about nearly completely automating quality card production. This is a quick post of a method for scraping word definitions from Wiktionary.

Encoding of the Cyrillic letter й - a UTF-8 gotcha

In the process of writing and maintaining a service that checks Russian word frequencies, I noticed peculiar phenomenon: certain words could not be located in a sqlite database that I knew actually contained them. For example, a query for the word - английский consistently failed, whereas other words would succeed. Eventually the commonality between the failures became obvious. All of the failures contained the letter й , which led me down a rabbit hole of character encoding and this specific case where it can go astray.

Dynamic DNS - auto-updating from macOS

To run a little project (that I’ll describe at some point in the future) I have to run a small web server from my home computer, one that happens to run macOS. More than anything else, this is just a reply of what I did to get it running in case: a) I have to do it again, or b) Someone else can find it useful.

Sign up for dynamic DNS service

I signed up for service with dynv6 because I saw it recommended elsewhere and it didn’t look creepy like some of the other options. I just signed up with email - through an email proxy anonymizer, because I’m paranoid. After verifying my email, I was able to create a new “zone”, basically a record of my public IP address linked to custom DNS.

No sir, I do not want Big Sur

Maybe I’m just getting cranky after over a year of on-again-off-again pandemic lockdowns, but I’ve had it with Apple’s heavy-handed attempts to get me to upgrade to Big Sur. Mind you, I have nothing against it. It’s just an operating system. I don’t particularly like it’s translucent bubbly iOS look. But I could live with.

But I don’t want it. I depend on a very unorthodox setup. I have a lot of infrastructure tools that depend on certain versions of Python to be in just the right place. Every single macOS major upgrade breaks all of this and I spend days picking up the pieces. I’m tired of Apple messing with it. So when my system launched into what seems like an unbidden upgrade process today, I lost it.

Dynamically loading Javascript in Anki card templates

The ability to execute Javascript in Anki card templates offers users flexibility in displaying data. In Anki 2.1, though, the asynchronous execution of Javascript means that user script functionality is not entirely predictable. This post on r/Anki discusses an approach for dynamically loading Javascript resources and ensuring that they are available when the card is displayed. Since I modularize my Javascript code so that it can be flexibly deployed to different card types, I extended this method to allow the template developer to load multiple scripts in one <script> block.

Fixing CodeRunner jQuery injection

CodeRunner is one of my favourite development environments on macOS. I use it for small one-off projects or for testing concepts for integration into larger projects. But in version 4.0.3, jQuery injection in a custom HTML page is broken, giving the error:

It’s probably due to some unescaped bit of code in their minified jQuery, but I didn’t have time to work that hard. Instead I reported the error to the developer an fixed it myself. The original (default) run script for jQuery is:

Extending the Anki Cloze Anything script for language learners

It’s possible to use cloze deletion cards within standard Anki note types using the Anki Cloze Anything setup. But additional scripts are required to allow it to function seamlessly in a typical language-learning environment. I’ll show you how to flexibly display a sentence with or without Anki Cloze Anything markup and also not break AwesomeTTS.

Anki’s built-in cloze deletion system

The built-in cloze deletion feature in Anki is an excellent way for language learners to actively test their recall. For example, a cloze deletion note type with the following content requires the learner to supply the missing word:

Complete fix for broken Knowclip .apkg files

I think this is the last word on fixing Knowclip .apkg files. I’ve developed this in bits and pieces; but hopefully this is the last word on the subject. See my previous articles, here and here, for the details.

This issue, again, is that Knowclip gives these notes and cards sequential id values starting at 1. But Anki uses the note.id and the card.id as the creation date. I logged it as an issue on Github, but as of 2021-04-15 no action has been taken.

Fixing Knowclip Anki apkg creation dates

(N.B. A much-improved version of this script is published in a later post)

Language learners who want to develop their listening comprehension skills often turn to YouTube for videos that feature native language content. Often these videos have subtitles in the original language. A handful of applications allow users to take these videos along with their subtitles and chop them up into sentence-length bites that are suitable for Anki cards. Once such application is Knowclip. Indeed for macOS users, it’s one of the few viable options.1