programming

Extracting title title of a web page from the command line

I was using a REST API at https://textance.herokuapp.com/title but it seems awfully fragile. Sure enough this morning, the entire application is down. It’s also not open-source and I have no idea who actually runs this thing. Here’s the solution: #!/bin/bash url=$(pbpaste) curl $url -so - | pup 'meta[property=og:title] attr{content}' It does require pup. On macOS, you can install via brew install pup. There are other ways using regular expressions but no dependency on pup but parsing HTML with regex is not such a good idea.

Three-line (though non-standard) interlinear glossing

Still thinking about interlinear glossing for my language learning project. The leizig.js library is great but my use case isn’t really what the author had in mind. I really just need to display a unit consisting of the word as it appears in the text, the lemma for that word form, and (possibly) the part of speech. For academic linguistics purposes, what I have in mind is completely non-standard. The other issue with leizig.

Splitting text into sentences: Russian edition

Splitting text into sentences is one of those tasks that looks simple but on closer inspection is more difficult than you think. A common approach is to use regular expressions to divide up the text on punction marks. But without adding layers of complexity, that method fails on some sentences. This is a method using spaCy.

Bash variable scope and pipelines

I alluded to this nuance involving variable scope in my post on automating pdf processing, but I wanted to expand on it a bit. Consider this little snippet: i=0 printf "foo:bar:baz:quux" | grep -o '[^:]+' | while read -r line ; do printf "Inner scope: %d - %s\n" $i $line ((i++)) [ $i -eq 3 ] && break; done printf "====\nOuter scope\ni = %d\n" $i; If you run this script - not in interactive mode in the shell - but as a script, what will i be in the outer scope?

Automating the handling of bank and financial statements

In my perpetual effort to get out of work, I’ve developed a suite of automation tools to help file statements that I download from banks, credit cards and others. While my setup described here is tuned to my specific needs, any of the ideas should be adaptable for your particular circumstances. For the purposes of this post, I’m going to assume you already have Hazel. None of what follows will be of much use to you without it.

Bulk rename tags in DEVONthink 3

In DEVONthink, I tag a lot. It’s an integral part of my strategy for finding things in my paperless environment. As I wrote about previously hierarchical tags are a big part of my organizational system in DEVONthink. For many years, I tagged subject matter with tags that emmanate from a single tag named topic_, but it was really an unnecessary top-level complication. So, the first item on my to-do list was to get rid of the all tags with a topic_ first level.