Russian

Beginning to experiement with Stanza for natural language processing

After installing Stanza as dependency of UDAR which I recently described, I decided to play around with what is can do.

Installation

The installation is straightforward and is documented on the Stanza getting started page.

First,

sudo pip3 install stanza

Then install a model. For this example, I installed the Russian model:

#!/usr/local/bin/python3
import stanza
stanza.download('ru')

Usage

Part-of-speech (POS) and morphological analysis

Here’s a quick example of POS analysis for Russian. I used PrettyTable to clean up the presentation, but it’s not strictly-speaking necessary.

Automated marking of Russian syllabic stress

One of the challenges that Russian learners face is the placement of syllabic stress, an essential determinate of pronunciation. Although most pedagogical texts for students have marks indicating stress, practically no tests intended for native speakers do. The placement of stress is inferred from memory and context.

I was delighted to discover Dr. Robert Reynolds’ work on natural language processing of Russian text to mark stress based on grammatical analysis of the text. What follows is a brief description of the installation and use of this work. The project page on Github has installation instructions; but I found a number of items that needed to be addressed that were not covered there. For example, this project (UDAR) depends on Stanza; which in turn requires a language-specific (Russian) model.

More chorus repetition macros for Audacity

In a previous post I described macros to support certain tasks in generating source material for L2 chorus repetition practice. Today, I’ll describe two other macros that automate this practice by slowing the playback speed of the repetition.

Background

I’ve described the rationale for chorus repetition practice in previous posts. The technique I describe here is to slow the sentence playback speed to give the learner time to build speed by practicing slower repetitions. By applying the Change Tempo... effect^[Change tempo effect in the Audacity manual] in Audacity. In my own practice, I will often begin complex Russian sentences at -50% speed and progress to -25% speed before practicing the pronunciation at native-level speed. By practicing at slow speeds, it gives the learner time to appreciate how syllables are connected to each other. The prosody is more apparent.

Audacity macros to support chorus repetition practice

Achieving fluid, native-quality speech in a second language is difficult task for adult learners. For several years, I’ve used Dr. Olle Kjellin’s method of “chorus repetition” for my Russian language study. In this post, I’m presenting a method for scripting Audacity to facilitate the development of audio source material to support his methodology.

Background

For detailed background on the methodology, I refer you to Kjellin’s seminal paper “Quality Practise Pronunciation with Audacity - The Best Method!” on the subject of chorus repetition practice. The first half of the paper outlines the neurophysiologic rational for the method and the second half describes the practical use of the cross-platform tool Audacity to generate source material for this practice.

Scripting Apple Music on macOS for chorus repetition practice

This is an update to my previous post on automating iTunes on macOS to support chorus repetition practice. You can read the original post for the theory behind the idea; but in short, one way of developing prosody and quality pronunciation in a foreign language is to do mass repetitions in chorus with a recording of a native speaker.

Because in macOS 10.15, iTunes is no more, I’ve updated the script to work with the new Music app. It turns out that it’s a lot simpler. No need to dive into the application classes.

L-R method for language learning

L-R method for language learning

I’ve recently discovered the L-R system of language learning and have been setting up to learn it.

The idea is that you begin with long texts - novels, for example - in your target language (L2) and follow a systematic approach to reading and listening.

L-R system in a nutshell

Here are the steps:

  1. Read the text in L1 (your native language) and become familiar with it.^[I rephrased this intruction from other sources that say “read the translation” because what if the text itself if a translation? For example, my first text to try this with is Гарри Поттер и философский камень which was originally written in English and then translated into Russian, among other languages. So, it’s best to say for the first step “Read the text in your L1 and become very very familiar with what it says.”]
  2. Listen to the recording and simultaneously read the text in L2.
  3. Listen to the recording while reading the text in L1.
  4. Repeat after the speaker. But only do this once you truly understand the meaning of what you’re repeating. The goal is meaning, not only pronunciation.
  5. Translate the text from L1 to L2 by covereing up one side while reading the other.

Tips

It is hard, very hard in fact, to find parallel texts. Even if you can find .txt documents online, the formatting is a challenge. Columns in Word or Pages simply don’t work, because the L2 and L1 doesn’t line up properly. So here’s what I did for formatting Harry Potter and the Philosopher’s Stone^[Before you accuse me of intellectual property theft, I will mention that I own both the Russian translation and the English language original in book form. So no harm done to anyone.]:

Sunday, September 16, 2018

Regex 101 is a great online regex tester.


Speaking of regular expressions, for the past year, I’ve used an automated process for building Anki flash cards. One of the steps in the process is to download Russian word pronunciations from Wiktionary. When Wiktionary began publishing transcoded mp3 files rather than just ogg files, they broke the URL scheme that I relied on to download content. The new regex for this scheme is: (?:src=.*:)?src=\"(\/\/.*\.mp3)