Extracting title title of a web page from the command line

I was using a REST API at https://textance.herokuapp.com/title but it seems awfully fragile. Sure enough this morning, the entire application is down. It’s also not open-source and I have no idea who actually runs this thing.

Here’s the solution:

#!/bin/bash

url=$(pbpaste)
curl $url -so - | pup 'meta[property=og:title] attr{content}'

It does require pup. On macOS, you can install via brew install pup.

There are other ways using regular expressions but no dependency on pup but parsing HTML with regex is not such a good idea.

Monday, May 23, 2022

We are all confident idiots - David Dunning (of the Dunning-Kruger effect) in this 2014 article discusses the effect and gives some interesting narrative commentary.

An ignorant mind is precisely not a spotless, empty vessel, but one that’s filled with the clutter of irrelevant or misleading life experiences, theories, facts, intuitions, strategies, algorithms, heuristics, metaphors, and hunches that regrettably have the look and feel of useful and accurate knowledge.

The effect has taken a lot of hits recently over the statistical underpinning in the original paper, but the overconfidence effect is demonstrably alive and well.


An interesting article on frisson which is a complex of physical and emotional phenomena that occur on encountering some aesthetic stimulus. In the case of the article - music. I get this. Certain types of passages reliably create this sense of ecstasy, longing, and physically, goosebumps.


A list of PKM systems. That’s “personal knowldge management” systems. I use DEVONthink for storing, linking and synthesizing notes, but I have a weakness for investigating other systems.


Improving the efficiency of Hugo static site deployment to S3. I’m really proud of this solution to a vexing problem. My upload sites for this site and now around 45 seconds are less compared to 10 minutes previously.

Friday, May 20, 2022

“Enlightenment is the absolute cooperation with the inevitable.” - Anthony De Mello. Although he writes like a Buddhist, apparently he’s a Jesuit.


Three-line (though non-standard) interlinear glossing

Still thinking about interlinear glossing for my language learning project. The leizig.js library is great but my use case isn’t really what the author had in mind. I really just need to display a unit consisting of the word as it appears in the text, the lemma for that word form, and (possibly) the part of speech. For academic linguistics purposes, what I have in mind is completely non-standard.

The other issue with leizig.js for my use case is that I need to be able to respond to click events on individual words so that they can be tagged, defined or otherwise worked with. It’s straightforward how I could apply CSS id attributes to word-level elements to support that functionality.

Splitting text into sentences: Russian edition

Splitting text into sentences is one of those tasks that looks simple but on closer inspection is more difficult than you think. A common approach is to use regular expressions to divide up the text on punction marks. But without adding layers of complexity, that method fails on some sentences. This is a method using spaCy.