Programming

The AppleScript Safari API is apparently quite finicky and rejects Russian Cyrillic characters when loading URLs.

For example, the following URL https://en.wiktionary.org/wiki/стоять#Russian throws an error in AppleScript. Instead, Safari requires URL’s of the form https://en.wiktionary.org/wiki/%D1%81%D1%82%D0%BE%D1%8F%D1%82%D1%8C#Russian whereas Chrome happily consumes whatever comes along. So, we just need to encode the URL thusly:

use framework "Foundation"

-- encode Cyrillic test as "%D0" type strings
on urlEncode(input)
   tell current application's NSString to set rawUrl to stringWithString_(input)
   -- 4 is NSUTF8StringEncoding
   set theEncodedURL to rawUrl's stringByAddingPercentEscapesUsingEncoding:4 
   return theEncodedURL as Unicode text
end urlEncode

When researching Russian words for vocabulary study, I use the URL encoding handler to load the appropriate words into several reference sites in sequential Safari tabs.

While largely opaque to most users, Facebook and Google massage any links that you acquire on their sites to include data used to track you around the web. This script attempts to strip these surveillance parameters from the URL’s. It is by no means all-inclusive. Imaginably, there are links that I haven’t yet encountered and that need to be considered in a future version. So consider this a proof-of-concept.

The problem

For example, I performed a Google search¹ for “Smarties”. Inspecting the first link - to Wikipedia, I see:

As part of a Hazel rule to process downloaded mp3 files, I worked out a couple different methods for extracting the ID3 title tag. Not rocket science, but it took a little time to sort out. Both rely on non-standard third-party tools, both for parsing the text and for extracting the ID3 tags.

Extracting ID3 title with `ffprobe`

ffprobe is part of the ffmpeg suite of tools which on macOS can be installed with Homebrew. If you don’t have the latter, go install it now; because it opens up so many tools for your use. In this case, it makes ffmpeg available via brew install ffmpeg.

Having fallen in love with Keyboard Maestro for its flexibility in macOS automation, I began experimenting with scripting in various languages, like my old favourite Perl. That’s when the fun began. How do we access KM variables inside a Perl script.

Let’s see what the documentation says:

So the documentation clearly states that this script

#!/usr/bin/perl

print scalar reverse $KMVAR_MyVar;

should work if I have a KM variable named MyVar. But, you guessed it - it does not.

Justification

Although caching can make page loads notably faster, it comes with a cost. Browsers aren’t always capable of taking note when a cached resource has changed. I’ve noticed recently that Safari utterly refuses to reload .css files even after emptying the browser cache and clearing the web history.

Background

With a lot of help from the a pair of articles written by Ukiah Smith, I’ve developed a workflow for dealing with this problem during the deployment process. He describes two approaches to the problem of static asset caching, one an improvement on the other. I’ve implemented something like what he describes using the git file hash to modify the filename of the css files. When the client browser sees a new filename, it always reloads the resource. So the problem is to figure out how to only change the filename when the contents have changed. Let’s say you tweak a css parameter and want to ensure that client browsers load the correct version. We can use the git file hash, and append it on the filename. Then the only remaining problem is to make sure that the page head template knows how to find the correct version to bake into the pages. Here, our approach is the same as Smith’s.

Building on my earlier explorations of the UDAR project, I’ve created a macOS Service-like method for in-situ marking of syllabic stress in arbitrary Russian text. The following video shows it in action:

The Keyboard Maestro is simple; we execute the following script, bracketed by Copy and Paste:

After installing Stanza as dependency of UDAR which I recently described, I decided to play around with what is can do.

Installation

The installation is straightforward and is documented on the Stanza getting started page.

First,

sudo pip3 install stanza

Then install a model. For this example, I installed the Russian model:

#!/usr/local/bin/python3
import stanza
stanza.download('ru')

Usage

Part-of-speech (POS) and morphological analysis

Here’s a quick example of POS analysis for Russian. I used PrettyTable to clean up the presentation, but it’s not strictly-speaking necessary.

One of the challenges that Russian learners face is the placement of syllabic stress, an essential determinate of pronunciation. Although most pedagogical texts for students have marks indicating stress, practically no tests intended for native speakers do. The placement of stress is inferred from memory and context.

I was delighted to discover Dr. Robert Reynolds’ work on natural language processing of Russian text to mark stress based on grammatical analysis of the text. What follows is a brief description of the installation and use of this work. The project page on Github has installation instructions; but I found a number of items that needed to be addressed that were not covered there. For example, this project (UDAR) depends on Stanza; which in turn requires a language-specific (Russian) model.

sed is such a useful pattern-matching and substitution tool for work on the command line. But there’s a little quirk on macOS that will trip you up. It tripped me up. On most platforms, \s is the character class for whitespace. It’s ubiquitous in regexes. But on macOS, it doesn’t work. In fact, it silently fails.

Consider this bash one-liner which looks like it should work but doesn’t:

# should print I am corrupt (W.Barr)
# instead it prints I am corrupt by W.Barr
echo "I am corrupt by W.Barr" | sed -E 's|^(.+)\sby\s(.+)|\1 (\2)|g'

What does work is the character class [:space:]:

Since I’m not fond of carrying around all my photos on a cell phone where they’re perpetually at list of loss, I peridiocally dump the image and video files to a drive on my desktop for later burning to optical disc.¹ Saving these images in archival form is a hedge against the bet that my existing backup system won’t fail someday.

I’m using Blue-Ray optical discs to archive these image and video files; and each stores 25 GB of data. So my plan was to split the iPhone image dump into 24 GB partitions. H

Programming

URL-encoding URLs in AppleScript

Stripping surveillance parameters from Facebook and Google links

The problem

Extracting ID3 tags from the command line - two methods

Extracting ID3 title with `ffprobe`

Using variables in Keyboard Maestro scripts

Hugo cache busting

Justification

Background

A macOS text service for morphological analysis and in situ marking of Russian syllabic stress

Beginning to experiement with Stanza for natural language processing

Installation

Usage

Part-of-speech (POS) and morphological analysis

Automated marking of Russian syllabic stress

sed matching whitespace on macOS

Partitioning a large directory into subdirectories by size

The problem

Extracting ID3 title with ffprobe

Justification

Background

Installation

Usage

Part-of-speech (POS) and morphological analysis

Extracting ID3 title with `ffprobe`