bash

Converting Cyrillic UTF-8 text encoded as Latin-1

This may be obvious to some, but visually-recognizing character encoding at a glance is not always obvious. For example, pronunciation files downloaded form Forvo have the following appearance: pronunciation_ru_оÑ‚бывание.mp3 How can we extact the actual word from this gibberish? Optimally, the filename should reflect that actual word uttered in the pronunciation file, after all. Step 1 - Extracting the interesting bits The gibberish begins after the pronunciation_ru_ and ends before the file extension.

accentchar: a command-line utility to apply Russian stress marks

I’ve written a lot about applying and removing syllabic stress marks in Russian text because I use it a lot when making Anki cards. This iteration is a command line tool for applying the stress mark at a particular character index. The advantage of these little shell tools is that they can be composable, integrating into different tools as the need arises. #!/usr/local/bin/zsh while getopts i:w: flag do case "${flag}" in i) index=${OPTARG};; w) word=${OPTARG};; esac done if [ $word ]; then temp=$word else read temp fi outword="" for (( i=0; i<${#temp}; i++ )); do thischar="${temp:$i:1}" if [ $i -eq $index ]; then thischar=$(echo $thischar | perl -C -pe 's/(.

sterilize-ng: a command-line URL sterilizer

Introducing sterilize-ng [GitHub link] - a URL sterilizer made to work flexibily on the command line. Background The surveillance capitalist economy is built on the relentless tracking of users. Imagine going about town running errands but everywhere you go, someone is quietly following you. When you pop into the grocery, they examine your receipt. They look into the bags to see what you bought. Then they hop in the car with you and keep careful records of where you go, how fast you drive, whom you talk with on the phone.

Extracting ID3 tags from the command line - two methods

As part of a Hazel rule to process downloaded mp3 files, I worked out a couple different methods for extracting the ID3 title tag. Not rocket science, but it took a little time to sort out. Both rely on non-standard third-party tools, both for parsing the text and for extracting the ID3 tags. Extracting ID3 title with ffprobe ffprobe is part of the ffmpeg suite of tools which on macOS can be installed with Homebrew.

Partitioning a large directory into subdirectories by size

Since I’m not fond of carrying around all my photos on a cell phone where they’re perpetually at list of loss, I peridiocally dump the image and video files to a drive on my desktop for later burning to optical disc.1 Saving these images in archival form is a hedge against the bet that my existing backup system won’t fail someday. I’m using Blue-Ray optical discs to archive these image and video files; and each stores 25 GB of data.