regex

Regex to match a cloze

Anki and some other platforms use a particular format to signify cloze deletions in flashcard text. It has a format like any of the following: {{c1::dog::}} {{c2::dog::domestic canine}} Here’s a regular expression that matches the content of cloze deletions in an arbitrary string, keeping only the main clozed word (in this case dog.) {{c\d::(.*?)(::[^:]+)?}} To see it in action, here it is in action in a Python script:

Sunday, September 16, 2018

Regex 101 is a great online regex tester. Speaking of regular expressions, for the past year, I’ve used an automated process for building Anki flash cards. One of the steps in the process is to download Russian word pronunciations from Wiktionary. When Wiktionary began publishing transcoded mp3 files rather than just ogg files, they broke the URL scheme that I relied on to download content. The new regex for this scheme is: (?

Detecting Russian letters with regex

How to identify Russian letters in a string? The short answer is: [А-Яа-яЁё] but depending on your regex flavor, [\p{Cyrillic}] might work. What in the word does this regex mean? It’s just like [A-Za-z] with a twist. The Ёё at the end adds support for ё (“yo”) which is in the Latin group of characters. See this question on Stack Overflow.