programming

sterilize-ng: a command-line URL sterilizer

Introducing sterilize-ng [GitHub link] - a URL sterilizer made to work flexibily on the command line. Background The surveillance capitalist economy is built on the relentless tracking of users. Imagine going about town running errands but everywhere you go, someone is quietly following you. When you pop into the grocery, they examine your receipt. They look into the bags to see what you bought. Then they hop in the car with you and keep careful records of where you go, how fast you drive, whom you talk with on the phone.

Using Perl in Keyboard Maestro macros

One of the things that I love about Keyboard Maestro is the ability to chain together disparate technologies to achieve some automation goal on macOS. In most of my previous posts about Keyboard Maestro macros, I’ve used Python or shell scripts, but I decided to draw on some decades-old experience with Perl to do a little text processing for a specific need. Background I want this text from Wiktionary: to look like this:

Stripping Russian stress marks from text from the command line

Russian text intended for learners sometimes contains marks that indicate the syllabic stress. It is usually rendered as a vowel + a combining diacritical mark, typically the combining acute accent \u301. Here are a couple ways of stripping these marks on the command line: First is a version using Perl #!/bin/bash f='покупа́ешья́'; echo $f | perl -C -pe 's/\x{301}//g;' And then another using the sd tool: #!/bin/bash f='покупа́ешья́'; echo $f | sd "\u0301" "" Both rely on finding the combining diacritical mark and removing it with regex.

Splitting a string on the command line - the search for the one-liner

It seems like the command line is one of those places where you can accomplish crazy efficient things with one-liners. Here’s a perfect use case for a CLI one-liner: In Anki, I often add lists of synonyms and antonyms to my vocabulary cards, but I like them formatted as a bulleted list. My usual route to that involves Markdown. But how to convert this: известный, точный, определённый, достоверный to - `известный` - `точный` - `определённый` - `достоверный` After trying to come up with a single text replacement strategy to make this work, the best I could do was this:

Normalizing spelling in Russian words containing the letter ё

The Russian letters ё and e have a complex and troubled relationship. The two letters are pronounced differently, but usually appear the same in written text. This presents complications for Russian learners and for text-to-speech systems. In several recent projects, I have needed to normalize the spelling of Russian words. For examples, if I have the written word определенно , is the word actually определенно ? Or is it определённо ?

Scraping Russian word definitions from Wikitionary: utility for Anki

While my Russian Anki deck contains around 27,000 cards, I’m always making more. (There are a lot words in the Russian language!) Over the years, I’ve become more and more efficient with card production but one of the missing pieces was finding a code-readable source of word definitions. There’s no shortage of dictionary sites, but scraping data from any site is complicated by the ways in which front-end developers spread the semantic content across multiple HTML tags arranged in deep and cryptic hierarchies.

Encoding of the Cyrillic letter й - a UTF-8 gotcha

In the process of writing and maintaining a service that checks Russian word frequencies, I noticed peculiar phenomenon: certain words could not be located in a sqlite database that I knew actually contained them. For example, a query for the word - английский consistently failed, whereas other words would succeed. Eventually the commonality between the failures became obvious. All of the failures contained the letter й , which led me down a rabbit hole of character encoding and this specific case where it can go astray.

Dynamic DNS - auto-updating from macOS

To run a little project (that I’ll describe at some point in the future) I have to run a small web server from my home computer, one that happens to run macOS. More than anything else, this is just a reply of what I did to get it running in case: a) I have to do it again, or b) Someone else can find it useful. Sign up for dynamic DNS service I signed up for service with dynv6 because I saw it recommended elsewhere and it didn’t look creepy like some of the other options.

Fixing CodeRunner jQuery injection

CodeRunner is one of my favourite development environments on macOS. I use it for small one-off projects or for testing concepts for integration into larger projects. But in version 4.0.3, jQuery injection in a custom HTML page is broken, giving the error: It’s probably due to some unescaped bit of code in their minified jQuery, but I didn’t have time to work that hard. Instead I reported the error to the developer an fixed it myself.

Parsing Russian Wiktionary content using XPath

As readers of this blog know, I’m an avid user of Anki to learn Russian. I have a number of sources for reference content that go onto my Anki cards. Notably, I use Wiktionary to get word definitions and the word with the proper syllabic stress marked. (This is an aid to pronunciation for Russian language learners.) Since I’m lazy to the core, I came up with a system way of grabbing the stress-marked word from the Wiktionary page using lxml and XPath.