One of the challenges that Russian learners face is the placement of syllabic stress, an essential determinate of pronunciation. Although most pedagogical texts for students have marks indicating stress, practically no tests intended for native speakers do. The placement of stress is inferred from memory and context.
I was delighted to discover Dr. Robert Reynolds' work on natural language processing of Russian text to mark stress based on grammatical analysis of the text. What follows is a brief description of the installation and use of this work. The project page on Github has installation instructions; but I found a number of items that needed to be addressed that were not covered there. For example, this project (UDAR) depends on Stanza; which in turn requires a language-specific (Russian) model.
The first step is to installation a few dependencies:
- Install the pexpect module:
sudo pip3 install pexpect
- Install stanza
sudo pip3 install stanza
- Install Stanza’s Russian model:
#!/usr/local/bin/python3 import stanza stanza.download('ru')
Note the my python3 is the Homebrew version; so your hashbang may be different.
sudo pip3 install --user git+https://github.com/reynoldsnlp/udar
See the project page on Github for more comprehensive details; but I was quickly able to create my own example following the documentation. For example:
#!/usr/local/bin/python3 import udar doc1 = udar.Document('Моя собака внезапно прыгнула на стол.') print(doc1.stressed())
which prints the correctly-marked Моя соба́ка внеза́пно пры́гнула на сто́л.
I’m looking forward to exploring the capabilities of this NLP tool further.
- Reynolds, Robert J. “Russian natural language processing for computer-assisted language learning: capturing the benefits of deep morphological analysis in real-life applications” PhD Diss., UiT–The Arctic University of Norway, 2016. https://hdl.handle.net/10037/9685
- UDAR - NLP system for applying syllabic stress markings