programming

A macOS text service for morphological analysis and in situ marking of Russian syllabic stress

Building on my earlier explorations of the UDAR project, I’ve created a macOS Service-like method for in-situ marking of syllabic stress in arbitrary Russian text. The following video shows it in action: The Keyboard Maestro is simple; we execute the following script, bracketed by Copy and Paste: #!/Users/alan/.pyenv/shims/python3 import xerox import udar import re rawText = xerox.paste() doc1 = udar.Document(rawText, disambiguate=True) searchText = doc1.stressed() result = re.sub(r'( ,)', ",", searchText) xerox.

Beginning to experiement with Stanza for natural language processing

After installing Stanza as dependency of UDAR which I recently described, I decided to play around with what is can do. Installation The installation is straightforward and is documented on the Stanza getting started page. First, sudo pip3 install stanza Then install a model. For this example, I installed the Russian model: #!/usr/local/bin/python3 import stanza stanza.download('ru') Usage Part-of-speech (POS) and morphological analysis Here’s a quick example of POS analysis for Russian.

Automated marking of Russian syllabic stress

One of the challenges that Russian learners face is the placement of syllabic stress, an essential determinate of pronunciation. Although most pedagogical texts for students have marks indicating stress, practically no tests intended for native speakers do. The placement of stress is inferred from memory and context. I was delighted to discover Dr. Robert Reynolds’ work on natural language processing of Russian text to mark stress based on grammatical analysis of the text.

sed matching whitespace on macOS

sed is such a useful pattern-matching and substitution tool for work on the command line. But there’s a little quirk on macOS that will trip you up. It tripped me up. On most platforms, \s is the character class for whitespace. It’s ubiquitous in regexes. But on macOS, it doesn’t work. In fact, it silently fails. Consider this bash one-liner which looks like it should work but doesn’t: # should print I am corrupt (W.

Partitioning a large directory into subdirectories by size

Since I’m not fond of carrying around all my photos on a cell phone where they’re perpetually at list of loss, I peridiocally dump the image and video files to a drive on my desktop for later burning to optical disc.1 Saving these images in archival form is a hedge against the bet that my existing backup system won’t fail someday. I’m using Blue-Ray optical discs to archive these image and video files; and each stores 25 GB of data.

A folder-based image gallery for Hugo

Hugo is the platform I use to publish this weblog. Occasionally I have the need to include a collection of images in a post. Mostly this comes up on other sites that I publish. Fancybox can do this; but it wasn’t immediately clear how to direct Fancybox to create a gallery of images in a page based on images in a directory. Previously, I’ve solved this in different ways, but I was anxious to find a simple shortcode-based method.

An alternative method for keyboard input switching on macOS

macOS offers a variety of virtual keyboard layouts which are accessible through System Preferences > Keyboard > Input Sources. Because I spend about half of my time writing in Russian and half in English, rapid switching between keyboard layouts is important. Optionally in the Input Sources preference pane, you can choose to use the Caps lock key to toggle between sources. This almost always works well with the exception of Anki.