Posts

I was using a REST API at https://textance.herokuapp.com/title but it seems awfully fragile. Sure enough this morning, the entire application is down. It’s also not open-source and I have no idea who actually runs this thing.

Here’s the solution:

#!/bin/bash

url=$(pbpaste)
curl $url -so - | pup 'meta[property=og:title] attr{content}'

It does require pup. On macOS, you can install via brew install pup.

There are other ways using regular expressions but no dependency on pup but parsing HTML with regex is not such a good idea.

Interesting links, a little thoughts for Monday, May 23, 2022.

“Enlightenment is the absolute cooperation with the inevitable.” - Anthony De Mello. Although he writes like a Buddhist, apparently he’s a Jesuit.

Making the Hugo → S3 upload process much more efficient by tracking file hashes.

Dealing with PUNCT nodes in interlinear glossing.

Still thinking about interlinear glossing for my language learning project. The leizig.js library is great but my use case isn’t really what the author had in mind. I really just need to display a unit consisting of the word as it appears in the text, the lemma for that word form, and (possibly) the part of speech. For academic linguistics purposes, what I have in mind is completely non-standard.

The other issue with leizig.js for my use case is that I need to be able to respond to click events on individual words so that they can be tagged, defined or otherwise worked with. It’s straightforward how I could apply CSS id attributes to word-level elements to support that functionality.

leipzig.js is a library for applying interlinear gloss to texts for linguistic analysis. In this post, I experiment a little with this libary to evaluate whether it would work for a little project of mine.

Starting a new devlog about Hedghog, a new language learning app and some thoughts about the interlinear display of lemmas.

Splitting text into sentences is one of those tasks that looks simple but on closer inspection is more difficult than you think. A common approach is to use regular expressions to divide up the text on punction marks. But without adding layers of complexity, that method fails on some sentences. This is a method using spaCy.

More outages at Pinboard. It was time to find an open-source self-hosted alternative. Espial, it turns out, works great.

Posts

Extracting title title of a web page from the command line

Monday, May 23, 2022

Friday, May 20, 2022

Hugo static site upload woes and a way forward

Interlinear glossing dealing with punctuation

Three-line (though non-standard) interlinear glossing

Experimenting with leipzig.js for interlinear gloss

Hedghog and interlinear lemmas

Splitting text into sentences: Russian edition

What's up with Pinboard? And an alternative