python

Stripping Russian syllabic stress marks in Python

I have written previously about stripping syllabic stress marks from Russian text using a Perl-based regex tool. But I needed a means of doing in solely in Python, so this just extends that idea. #!/usr/bin/env python3 def strip_stress_marks(text: str) -> str: b = text.encode('utf-8') # correct error where latin accented ó is used b = b.replace(b'\xc3\xb3', b'\xd0\xbe') # correct error where latin accented á is used b = b.replace(b'\xc3\xa1', b'\xd0\xb0') # correct error where latin accented é is used b = b.

Accessing Anki collection models from Python

For one-off projects that target Anki collections, I often use Python in a standalone application rather than an Anki add-on. Since I’m not going to distribute these little creations that are specific to my own needs, there’s no reason to create an add-on. These are just a few notes - nothing comprehensive - on the process. One thing to be aware of is that there must be a perfect match between the Anki major and minor version numbers for the Python anki module to work.

Generating HTML from Markdown in Anki fields

I write in Markdown because it’s much easier to keep the flow of writing going without taking my hands off the keyboard. I also like to write content in Anki cards in Markdown. Over the years there have been various ways in of supporting this through add-ons: The venerable Power Format Pack was great but no longer supports Anki 2.1, so it became useless. Auto Markdown worked for a while but as of Anki version 2.

Parsing Russian Wiktionary content using XPath

As readers of this blog know, I’m an avid user of Anki to learn Russian. I have a number of sources for reference content that go onto my Anki cards. Notably, I use Wiktionary to get word definitions and the word with the proper syllabic stress marked. (This is an aid to pronunciation for Russian language learners.) Since I’m lazy to the core, I came up with a system way of grabbing the stress-marked word from the Wiktionary page using lxml and XPath.

Wednesday, January 27, 2021

W3schools.com has a CSS library that’s quite nice. I often use Bootstrap; but I like some of the visual features here better. For example, I like their tags because they have more flexible use of colour. If you want to fetch from a Python dictionary, but you need a default value, this is how you do it: upos_badge = {'noun': 'lime','verb': 'amber', 'adv': 'blue',} badge_class_postfix = upos_badge.get(value.lower(), 'light-grey') I recently learned about DeepL as an alternative to Google Translate.

Removing stress marks from Russian text

Previously, I wrote about adding syllabic stress marks to Russian text. Here’s a method for doing the opposite - that is, removing such marks (ударение) from Russian text. Although there may well be a more sophisticated approach, regex is well-suited to this task. The problem is that def string_replace(dict,text): sorted_dict = {k: dict[k] for k in sorted(dict)} for n in sorted_dict.keys(): text = text.replace(n,dict[n]) return text dict = { "а́" : "а", "е́" : "е", "о́" : "о", "у́" : "у", "я́" : "я", "ю́" : "ю", "ы́" : "ы", "и́" : "и", "ё́" : "ё", "А́" : "А", "Е́" : "Е", "О́" : "О", "У́" : "У", "Я́" : "Я", "Ю́" : "Ю", "Ы́" : "Ы", "И́" : "И", "Э́" : "Э", "э́" : "э" } print(string_replace(dict, "Существи́тельные в шве́дском обычно де́лятся на пять склоне́ний.

A macOS text service for morphological analysis and in situ marking of Russian syllabic stress

Building on my earlier explorations of the UDAR project, I’ve created a macOS Service-like method for in-situ marking of syllabic stress in arbitrary Russian text. The following video shows it in action: The Keyboard Maestro is simple; we execute the following script, bracketed by Copy and Paste: #!/Users/alan/.pyenv/shims/python3 import xerox import udar import re rawText = xerox.paste() doc1 = udar.Document(rawText, disambiguate=True) searchText = doc1.stressed() result = re.sub(r'( ,)', ",", searchText) xerox.

Beginning to experiement with Stanza for natural language processing

After installing Stanza as dependency of UDAR which I recently described, I decided to play around with what is can do. Installation The installation is straightforward and is documented on the Stanza getting started page. First, sudo pip3 install stanza Then install a model. For this example, I installed the Russian model: #!/usr/local/bin/python3 import stanza stanza.download('ru') Usage Part-of-speech (POS) and morphological analysis Here’s a quick example of POS analysis for Russian.

Automated marking of Russian syllabic stress

One of the challenges that Russian learners face is the placement of syllabic stress, an essential determinate of pronunciation. Although most pedagogical texts for students have marks indicating stress, practically no tests intended for native speakers do. The placement of stress is inferred from memory and context. I was delighted to discover Dr. Robert Reynolds' work on natural language processing of Russian text to mark stress based on grammatical analysis of the text.

Scripting thumbnail image file creation on macOS

One of the sites that I manage uses a jQuery-based image gallery to display images in a grid. The script decides which thumbnail to use based on how large and image is needed. A series of suffixes à la Flickr^[Well, sort of. I don’t think this is exactly what Flickr uses; and I made up the _q suffix for the less than 500px image.] is used to signify classes of image size.