Three-line (though non-standard) interlinear glossing

Still thinking about interlinear glossing for my language learning project. The leizig.js library is great but my use case isn’t really what the author had in mind. I really just need to display a unit consisting of the word as it appears in the text, the lemma for that word form, and (possibly) the part of speech. For academic linguistics purposes, what I have in mind is completely non-standard.

The other issue with leizig.js for my use case is that I need to be able to respond to click events on individual words so that they can be tagged, defined or otherwise worked with. It’s straightforward how I could apply CSS id attributes to word-level elements to support that functionality.

So I’m back to a CSS-only solution.

Here’s what a three-line CSS-only interlinear glossing display might look like:

You can find the code - in progress, as always, in a JSFiddle.

One my priorities is going to be dealing with punctuation. It looks messy and unrefined right now. First, the punctuation marks need to be glommed onto the previous word rather than standing alone. Second, there’s no need to display either a lemma or a POS for punctuation marks. It’s going to need either JavaScript running on the page to dynamically deal with the UI, or something on the backend. Most likely the former.

Experimenting with leipzig.js for interlinear gloss

One of the key features of my language learning app Hedghog is the display of source text with interlinear gloss. This is of huge benefit in understanding highly-inflected languages. Right now I’m playing around with different ways of achieving this sort of display. I stumbled on leipzig.js which is a library for formatting interlinear gloss according to the Leipzig Rules.

I like what I see, but my first inclination is to get under the hood and fix some of the CSS. For example, the original text is displayed in italic. This is fine, and it may be the convention in linguistics circles, but some Russian letters are a little confusing to Russian learners when displayed in oblique type. It’s not difficult to fix.

Here’s what it looks like:

I just needed to apply some of my own CSS to achieve the desired appearance - Leizig Rules or not.

.gloss__line--0 {
    font-family: "Georgia";
    font-size: 20px;
}

.gloss__line--1 {
    color: gray;
}

.gloss__word .gloss__line:first-child {
    font-style: normal !important;
}

And the minimal example in Russian:

<html>

  <head>
    <link rel="stylesheet" href="//cdn.jsdelivr.net/npm/leipzig@latest/dist/leipzig.min.css">
  </head>

  <body>
    <div data-gloss>
      <p>Дональд Трамп - нелепый болван, который был избран президентом.</p>
      <p>дональд трамп - нелепый болван который был избрать президент.</p>
      <p>‘Donald Trump is a ridiculous moron who was elected president.’</p>
    </div>
    <script src="//cdn.jsdelivr.net/npm/leipzig@latest/dist/leipzig.min.js"></script>
    <script>
      document.addEventListener('DOMContentLoaded', function() {
        var glosser = Leipzig();
        glosser.gloss();
      });
    </script>
  </body>
</html>

This minimal example as a JSFiddle

More on interlinear gloss

Splitting text into sentences: Russian edition

Splitting text into sentences is one of those tasks that looks simple but on closer inspection is more difficult than you think. A common approach is to use regular expressions to divide up the text on punction marks. But without adding layers of complexity, that method fails on some sentences. This is a method using spaCy.

My favourite Cyrillic font

I’ve tried a lot of fonts for Cyrillic. My favourite is Georgia.

As a non-native Russian speaker, there’s something about serif fonts, either on-screen or in print, that makes the text so much more legible.

The cancellation of Russian music

Free speech in Russia has never been particularly favoured. The Romanov dynasty remained in power long past their expiration date by suppressing waves of free thought, from the ideals of the Enlightenment, to the anti-capitalist ideals of Marx and Engels. At least, until the 1917 Revolution. And even then, the Bolsheviks continue to suppress dissent for the entire seventy-something year history of the Soviet Union. Perestroika and the collapse of the Soviet Union promised change. But the change was fleeting.

Bash variable scope and pipelines

I alluded to this nuance involving variable scope in my post on automating pdf processing, but I wanted to expand on it a bit.

Consider this little snippet:

i=0
printf "foo:bar:baz:quux" | grep -o '[^:]\+' | while read -r line ; do
   printf "Inner scope: %d - %s\n" $i $line
   ((i++))
   [ $i -eq 3 ] && break;
done
printf "====\nOuter scope\ni = %d\n" $i;

If you run this script - not in interactive mode in the shell - but as a script, what will i be in the outer scope? And why?

Automating the handling of bank and financial statements

In my perpetual effort to get out of work, I’ve developed a suite of automation tools to help file statements that I download from banks, credit cards and others. While my setup described here is tuned to my specific needs, any of the ideas should be adaptable for your particular circumstances. For the purposes of this post, I’m going to assume you already have Hazel. None of what follows will be of much use to you without it. I’ll also emphasize that this is a macOS-specific post. Bear in mind, too, that companies have the nasty habit of tweaking their statement formats. That fact alone makes any approach like this fragile; so be aware that maintaining these rules is just part of the game. With that out of the way, let’s dive in.

Bulk rename tags in DEVONthink 3

In DEVONthink, I tag a lot. It’s an integral part of my strategy for finding things in my paperless environment. As I wrote about previously hierarchical tags are a big part of my organizational system in DEVONthink. For many years, I tagged subject matter with tags that emmanate from a single tag named topic_, but it was really an unnecessary top-level complication. So, the first item on my to-do list was to get rid of the all tags with a topic_ first level.