As readers of this blog know, I’m an avid user of Anki to learn Russian. I have a number of sources for reference content that go onto my Anki cards. Notably, I use Wiktionary to get word definitions and the word with the proper syllabic stress marked. (This is an aid to pronunciation for Russian language learners.)
Since I’m lazy to the core, I came up with a system way of grabbing the stress-marked word from the Wiktionary page using lxml and XPath.
Previously, I wrote about adding syllabic stress marks to Russian text. Here’s a method for doing the opposite - that is, removing such marks (ударение) from Russian text.
Although there may well be a more sophisticated approach, regex is well-suited to this task. The problem is that
def string_replace(dict,text): sorted_dict = {k: dict[k] for k in sorted(dict)} for n in sorted_dict.keys(): text = text.replace(n,dict[n]) return text dict = { "а́" : "а", "е́" : "е", "о́" : "о", "у́" : "у", "я́" : "я", "ю́" : "ю", "ы́" : "ы", "и́" : "и", "ё́" : "ё", "А́" : "А", "Е́" : "Е", "О́" : "О", "У́" : "У", "Я́" : "Я", "Ю́" : "Ю", "Ы́" : "Ы", "И́" : "И", "Э́" : "Э", "э́" : "э" } print(string_replace(dict, "Существи́тельные в шве́дском обычно де́лятся на пять склоне́ний.
The AppleScript Safari API is apparently quite finicky and rejects Russian Cyrillic characters when loading URLs.
For example, the following URL https://en.wiktionary.org/wiki/стоять#Russian throws an error in AppleScript. Instead, Safari requires URL’s of the form https://en.wiktionary.org/wiki/%D1%81%D1%82%D0%BE%D1%8F%D1%82%D1%8C#Russian whereas Chrome happily consumes whatever comes along. So, we just need to encode the URL thusly:
use framework "Foundation" -- encode Cyrillic test as "%D0" type strings on urlEncode(input) tell current application's NSString to set rawUrl to stringWithString_(input) -- 4 is NSUTF8StringEncoding set theEncodedURL to rawUrl's stringByAddingPercentEscapesUsingEncoding:4 return theEncodedURL as Unicode text end urlEncode When researching Russian words for vocabulary study, I use the URL encoding handler to load the appropriate words into several reference sites in sequential Safari tabs.
I was puzzled by this sentence on the BBC Russian Service:
Нет свидетельств тому, что на нынешних выборах дело обстоит иначе.
ББС Мошенничество на выборах в США? Проверяем факты в речи Трампа It means “There is no evidence that in the current election things are any different." but the puzzle isn’t the meaning, it’s the grammatical case in which the author has placed the demonstrative pronoun то , which is dative here тому .
There’s a phenomenon that verteran Anki users are familiar with - the so-called “Anki hell” or “ease hell.”
Origins of ease hell The descent into ease hell has to do with the way Anki handles correct and incorrect answers when it presents cards for review. Ease is a numerical score associated with every card in the database and represents a valuation of the difficulty of the card. By default, when cards graduate from the learning phase, an ease of 250% is applied to the card.
While Russian text intended for native speakers doesn’t show accented vowel characters to point out the syllabic stress (ударение) , many texts intended for learners often do have these marks. But how to apply these marks when typing?
Typically, for Latin keyboards on macOS, you can hold down the key (like long-press on iOS) and a popup dialog will show you options for that character. But in the standard Russian phonetic keyboard it doesn’t work.
Building on my earlier explorations of the UDAR project, I’ve created a macOS Service-like method for in-situ marking of syllabic stress in arbitrary Russian text. The following video shows it in action:
The Keyboard Maestro is simple; we execute the following script, bracketed by Copy and Paste:
#!/Users/alan/.pyenv/shims/python3 import xerox import udar import re rawText = xerox.paste() doc1 = udar.Document(rawText, disambiguate=True) searchText = doc1.stressed() result = re.sub(r'( ,)', ",", searchText) xerox.
After installing Stanza as dependency of UDAR which I recently described, I decided to play around with what is can do.
Installation The installation is straightforward and is documented on the Stanza getting started page.
First,
sudo pip3 install stanza Then install a model. For this example, I installed the Russian model:
#!/usr/local/bin/python3 import stanza stanza.download('ru') Usage Part-of-speech (POS) and morphological analysis Here’s a quick example of POS analysis for Russian.
One of the challenges that Russian learners face is the placement of syllabic stress, an essential determinate of pronunciation. Although most pedagogical texts for students have marks indicating stress, practically no tests intended for native speakers do. The placement of stress is inferred from memory and context.
I was delighted to discover Dr. Robert Reynolds’ work on natural language processing of Russian text to mark stress based on grammatical analysis of the text.
In a previous post I described macros to support certain tasks in generating source material for L2 chorus repetition practice. Today, I’ll describe two other macros that automate this practice by slowing the playback speed of the repetition.
Background I’ve described the rationale for chorus repetition practice in previous posts. The technique I describe here is to slow the sentence playback speed to give the learner time to build speed by practicing slower repetitions.