Normalizing spelling in Russian words containing the letter ё

The Russian letters ё and e have a complex and troubled relationship. The two letters are pronounced differently, but usually appear the same in written text. This presents complications for Russian learners and for text-to-speech systems. In several recent projects, I have needed to normalize the spelling of Russian words. For examples, if I have the written word определенно , is the word actually определенно ? Or is it определённо ?

Scraping Russian word definitions from Wikitionary: utility for Anki

While my Russian Anki deck contains around 27,000 cards, I’m always making more. (There are a lot words in the Russian language!) Over the years, I’ve become more and more efficient with card production but one of the missing pieces was finding a code-readable source of word definitions. There’s no shortage of dictionary sites, but scraping data from any site is complicated by the ways in which front-end developers spread the semantic content across multiple HTML tags arranged in deep and cryptic hierarchies.

Encoding of the Cyrillic letter й - a UTF-8 gotcha

In the process of writing and maintaining a service that checks Russian word frequencies, I noticed peculiar phenomenon: certain words could not be located in a sqlite database that I knew actually contained them. For example, a query for the word - английский consistently failed, whereas other words would succeed. Eventually the commonality between the failures became obvious. All of the failures contained the letter й , which led me down a rabbit hole of character encoding and this specific case where it can go astray.

Dynamic DNS - auto-updating from macOS

To run a little project (that I’ll describe at some point in the future) I have to run a small web server from my home computer, one that happens to run macOS. More than anything else, this is just a reply of what I did to get it running in case: a) I have to do it again, or b) Someone else can find it useful. Sign up for dynamic DNS service I signed up for service with dynv6 because I saw it recommended elsewhere and it didn’t look creepy like some of the other options.

Fixing CodeRunner jQuery injection

CodeRunner is one of my favourite development environments on macOS. I use it for small one-off projects or for testing concepts for integration into larger projects. But in version 4.0.3, jQuery injection in a custom HTML page is broken, giving the error: It’s probably due to some unescaped bit of code in their minified jQuery, but I didn’t have time to work that hard. Instead I reported the error to the developer an fixed it myself.

Parsing Russian Wiktionary content using XPath

As readers of this blog know, I’m an avid user of Anki to learn Russian. I have a number of sources for reference content that go onto my Anki cards. Notably, I use Wiktionary to get word definitions and the word with the proper syllabic stress marked. (This is an aid to pronunciation for Russian language learners.) Since I’m lazy to the core, I came up with a system way of grabbing the stress-marked word from the Wiktionary page using lxml and XPath.

Friday, January 29, 2021

Custom aliases in oh-my-zsh With oh-my-zsh, you can store custom aliases in multiple (?per application) file under .oh-my-zsh/custom giving them .zsh file extensions.1 ¶For example, in my hugo.zsh file, I have: alias hnewtil="/Users/alan/Documents/blog/ojisan/scripts/" alias gtojisan="cd /Users/alan/Documents/blog/ojisan; ls -l;" Executing inline Python in a shell script It’s possible using the -c command.2 python -c 'import foo;' ↩︎ ↩︎

Wednesday, January 27, 2021 has a CSS library that’s quite nice. I often use Bootstrap; but I like some of the visual features here better. For example, I like their tags because they have more flexible use of colour. If you want to fetch from a Python dictionary, but you need a default value, this is how you do it: upos_badge = {'noun': 'lime','verb': 'amber', 'adv': 'blue',} badge_class_postfix = upos_badge.get(value.lower(), 'light-grey') I recently learned about DeepL as an alternative to Google Translate.

More on integrating Hazel and DEVONthink

Since DEVONthink is my primary knowledge-management and repository tool on the macOS desktop, I constantly work with mechanisms for efficiently getting data into and out of it. I previously wrote about using Hazel and DEVONthink together. This post extends those ideas about and looks into options for preprocessing documents in Hazel before importing into DEVONthink as a way of sidestepping some of the limitations of Smart Rules in the latter. I’m going to work from a particular use-case to illustrate some of the options.

Regex to match a cloze

Anki and some other platforms use a particular format to signify cloze deletions in flashcard text. It has a format like any of the following: {{c1::dog::}} {{c2::dog::domestic canine}} Here’s a regular expression that matches the content of cloze deletions in an arbitrary string, keeping only the main clozed word (in this case dog.) {{c\d::(.*?)(::[^:]+)?}} To see it in action, here it is in action in a Python script: