Registering a custom collation to query Anki database

While working on a project that requires querying the Anki database directly outside of the Anki desktop application, I encountered an interesting issue with sqlite3 collations. This is is just a short post about how I went about registering a collation in order to execute SQL queries against the Anki db.

CAUTION
Before you run anything that accesses the Anki database, you absolutely should backup your database first.

The problem

Let’s try a simple query. Open the Anki database:

sqlite3 collection.anki2
SQLite version 3.43.2 2023-10-10 13:08:14
Enter ".help" for usage hints.

Great, we’re in. Now let’s see what our decks are in this collection:

sqlite> SELECT name FROM decks;
Default

That works. But now let’s try a WHERE clause:

sqlite> SELECT name FROM decks WHERE name != "dogfish";
Parse error: no such collation sequence: unicase

This is the crux of the problem we need to address here. We can use a built-in collation such as NOCASE and it works in some, but not all cases. For example:

sqlite> SELECT name FROM decks WHERE name COLLATE NOCASE != "dogfish";
Default

But if we added a deck to our collection called Ёлка, then we can demonstrate how that fails. Since the е character can be interpreted as either ё or е this is an opportunity for the straight NOCASE collation to fail. Let’s see if it does:

sqlite> SELECT id, name FROM decks;
1|Default
1735994559972|Ёлка

So far, so good, but we aren’t doing in text comparisons yet. How about:

sqlite> SELECT name FROM decks WHERE name COLLATE NOCASE = "елка";
# crickets...

This is why we need a custom Unicode-sensitive collation when performing certain text comparisons in queries against the Anki database. Before outlining such a collation, let’s take one step back and discuss what sqlite3 collations are.

A brief introduction to SQLite collations

Collations are just rules about the ordering of text. Does a come before or after b? However the built-in collations NOCASE, RTRIM, and BINARY aren’t sufficient for dealing with comparisons in diverse internationalized text. This is why we often need to provide a custom collation.

A custom unicase collation for Rust

Since the project I’m working on is written in Rust, I’ll show you how to register a custom unicase collation in Rust, but it can easily be adapted to other languages as well through their own sqlite3 connector.

The Cargo.toml file will need to include unicase = "2.6.0" (or whatever version of that module is current.)

To register the collation:

fn register_unicase_collation(conn: &Connection) -> Result<()> {
    conn.create_collation("unicase", |s1: &str, s2: &str| {
        let s1_key = UniCase::new(s1);
        let s2_key = UniCase::new(s2);
        s1_key.cmp(&s2_key)
    })?;
    Ok(())
}

and to open the database with this collation:

fn open_database_with_collation(db_path: &str) -> Result<Connection> {
    let conn = Connection::open(db_path)?;
    register_unicase_collation(&conn)?;
    Ok(conn)
}

Now to open our database, it’s just:

let conn = open_database_with_collation(self.db_path.to_str().unwrap())?;

What Anki tables use the unicase collation?

To answer that question, we can query the schemas:

#!/bin/bash

if [ "$#" -ne 1 ]; then
    echo "Usage: $0 database_path"
    exit 1
fi

DB=$1

sqlite3 "$DB" "
SELECT m.tbl_name as 'Table',
       group_concat(p.name) as 'Columns with UNICASE'
FROM sqlite_master m
JOIN pragma_table_info(m.tbl_name) p
WHERE m.sql LIKE '%COLLATE unicase%'
  AND p.type LIKE 'text%'
  AND m.sql LIKE '%' || p.name || ' text%COLLATE unicase%'
GROUP BY m.tbl_name;" -box

which gives us:

┌─────────────┬──────────────────────┐
│    Table    │ Columns with UNICASE │
├─────────────┼──────────────────────┤
│ deck_config │ name                 │
│ decks       │ name                 │
│ fields      │ name                 │
│ notetypes   │ name                 │
│ tags        │ tag                  │
│ templates   │ name                 │
└─────────────┴──────────────────────┘

Whenever we are executing queries that involve comparisons that involve these tables and columns, we need to employ the unicase collation that we have register, e.g.:

// SQL query with the `COLLATE unicase` collation
let query = "
            SELECT name
            FROM decks
            WHERE name COLLATE unicase LIKE '%' || ?1 || '%'
            ORDER BY name COLLATE unicase;
        ";

// Open the database and register the `unicase` collation
let conn = open_database_with_collation(self.db_path.to_str().unwrap())?;
let mut stmt = conn.prepare(query)?;

let matching_decks = stmt
    .query_map(params![deck_name], |row| row.get(0))?
    .collect::<Result<Vec<String>, _>>()?;

Fix your Anki streak - the script edition

Like many Anki users, I keep track of my streaks because it motivates me to do my reviews each day. But since life gets in the way sometimes, I may miss my reviews in one or more decks. It has been years since I’ve neglected to do all of my reviews; but sometimes I will forget to come back later in the day to finish up some of my decks. Since I like to have a clean review heatmap, I will “fix” my streak in a skipped deck.

Yes, this is “cheating”; but applied rarely, I gives me no moral qualms. YMMV.

I’ve described a manual process previously in which we execute queries directly against the Anki sqlite3 database. It works, but you have to deal with “bare metal” interaction with the database. There’s some risk involved. To make the process a little easier I’ve developed the following script. It just automates the review date correction so that you don’t have to interact directly with the database. I’ll walk you through the process; which does require a little technical facility, but only a little.

WARNING
Backing up your collection before running this script is strongly recommended.

Prerequisites and installation

N.B.: I work most of the time on macOS and have almost no experience on the Windows ecosystem. I’m sure this could be adapted to work on Windows; but that’s for someone else to do.

  1. Ruby is installed by default on macOS; so you should be good there. If you want to be sure, you can check by going to the Terminal and typing which ruby. You should get something like:
➜  ~ which ruby
/Users/alan/.rbenv/shims/ruby
➜  ~
  1. You will need to install a couple Ruby gems.
gem install sqlite3
gem install tzinfo
  1. Copy the Ruby script (see below for the entire listing)
  2. Install the script
cd ~/Documents  # or wherever you want to put the script
pbpaste > anki_streak_fix.rb
  1. Your collection name is not going to be “Alan - Russian” so you can use any text editor (e.g. TextEdit) to change that in the code.

At this point should have everything you need installed on the system.

Usage

  1. Open Anki and do a couple review in a deck where you missed your streak yesterday.
  2. Now quit Anki.
  3. Run the script from the Terminal:

cd ~/Documents # or wherever to saved the script ruby anki_streak_fix.rb “your_deck_name” –simulate

This should show you which cards will be moved to yesterday. If you’re satisfied with how that looks, then run the script without the --simulate flag.

Source code for the script

#!/usr/bin/env ruby

require 'sqlite3'
require 'optparse'
require 'time'
require 'tzinfo'

def get_system_timezone
  begin
    TZInfo::Timezone.get(Time.now.zone)
  rescue TZInfo::InvalidTimezoneIdentifier
    puts "Unknown system time, default to America/Toronto"
    TZInfo::Timezone.get('America/Toronto')
  end
end

class AnkiCollection
  def initialize(collection_name)
    base_path = "~/Library/Application Support/Anki2/"
    @path = base_path + collection_name + "/collection.anki2"
  end
  
  def collection_path
    File.expand_path(@path)
  end
end

class AnkiProcessor
  def initialize(deck_name, simulate: false)
    @deck_name = deck_name
    @simulate = simulate
    @db_path = AnkiCollection.new("Alan - Russian").collection_path
  end
  
  def process
    rid_string = generate_rid_string
    note_ids = fetch_reviewed_notes
    
    if note_ids.empty?
      puts "No notes found for today in deck '#{@deck_name}'"
      return
    end
    
    process_notes(note_ids, rid_string)
  end
  
  private
  
  def generate_rid_string
    system_timezone = get_system_timezone
    puts "Using timezone: #{system_timezone.identifier}"
    
    today = Time.now
    local_midnight = system_timezone.local_to_utc(Time.new(today.year, today.month, today.day))
    start_time = local_midnight.to_i * 1000
    end_time = (local_midnight + 86400).to_i * 1000
    
    "rid:#{start_time}:#{end_time}"
  end
  
  def fetch_reviewed_notes
    query = <<-SQL
      SELECT DISTINCT notes.id
      FROM cards
      JOIN notes ON cards.nid = notes.id
      JOIN decks ON cards.did = decks.id
      JOIN revlog ON cards.id = revlog.cid
      WHERE decks.name COLLATE NOCASE = ?
      AND date(revlog.id/1000, 'unixepoch', 'localtime') = date('now', 'localtime')
      ORDER BY notes.id;
    SQL
    
    begin
      db = SQLite3::Database.new(@db_path)
      db.results_as_hash = true
      db.execute(query, @deck_name)
    rescue SQLite3::Exception => e
      puts "Database error: #{e.message}"
      []
    ensure
      db&.close
    end
  end
  
  def process_notes(notes, rid_string)
    # Extract start and end times from rid_string
    start_time = rid_string.split(':')[1]
    end_time = rid_string.split(':')[2]
    
    begin
      db = SQLite3::Database.new(@db_path)
      
      notes.each do |row|
        note_id = row['id']
        
        if @simulate
          puts "Would execute: UPDATE revlog for note #{note_id} (#{start_time} to #{end_time})"
        else
          update_query = <<-SQL
            UPDATE revlog
            SET id = id - 86400000
            WHERE id IN (
              SELECT r.id
              FROM revlog r 
              INNER JOIN cards c ON r.cid = c.id
              INNER JOIN notes n ON n.id = c.nid
              WHERE n.id = ?
                AND r.id >= ?
                AND r.id < ?
            );
          SQL
          
          db.execute(update_query, [note_id, start_time, end_time])
          puts "Note date updated successfully for #{note_id}"
        end
      end
    rescue SQLite3::Exception => e
      puts "Database error: #{e.message}"
    ensure
      db&.close
    end
  end
end

# Parse command line arguments
options = {simulate: false}
parser = OptionParser.new do |opts|
  opts.banner = "Usage: #{$0} [options] DECK_NAME"
  opts.on('-s', '--simulate', 'Simulate ankifix calls (print only)') do |s|
    options[:simulate] = s
  end
end

parser.parse!

if ARGV.empty?
  puts parser.help
  exit 1
end

# Run the processor
processor = AnkiProcessor.new(ARGV[0], simulate: options[:simulate])
processor.process

If you have any difficulties or you have ideas for improvements, I can try to help. See my contact page.

An API (sort of) for adding links to ArchiveBox

I use ArchiveBox extensively to save web content that might change or disappear. While a REST API is apparently coming eventually, it doesn’t appear to have been merged into the main fork. So I cobbled together a little application to archive links via a POST request. It takes advantage of the archivebox command line interface. If you are impatient, you can skip to the full source code. Otherwise I’ll describe my setup to provide some context.

A Keyboard Maestro action to save bookmarks to Espial

So this is a little esoteric, but it meets a need I encountered; and it may meet yours if you use Espial, Keyboard Maestro and are on macOS.

For several years I’ve been using Espial a bookmark manager that looks and feels like Pinboard, but is both self-hosted and drama-free1. Espial is easy to setup, stores its data in a comprehensible sqlite database and has an API, which comes in handy when it came to solving the problem I encountered.

Louisiana and the Ten Commandments

Recently, the governor of Louisiana signed a bill requiring all public school classrooms in the state to display a poster-sized copy of the Ten Commandments. In the “Beforetimes” (before the current partisan Supreme Court took shape), this would have been struck down immediately as a violation of the Establishment Clause of the First Amendment. This bill is a clear violation of that clause. I imagine that the justices will dance around the cultural and historical significance of the document without stopping to consider the state’s motives in passing this law. While the proponents of the Ten Commandments aren’t wrong about its historical significance, the U.S. Constitution and its Amendments arguably hold more importance from the secular perspective that one must adopt in a public school.

Improving vegetable seed germination with chemical pretreatment

Some vegetable seeds, particularly many exotic chilli pepper varieties and some Asian eggplants are tricky to germinate. After trying the obvious things - cold-induced forced dormancy (cold stratification), abundant moisture, high humidity, and temperatures over 80F, I’ve found that some seeds simply do not germinate with much success at all. But having read a number of articles on this problem, we decided to try an intensive chemical process to see if we could achieve better results. And it looks successful.

A quick word on ATtiny 1-series interrupts

The Atmel AVR 8-bit microcontrollers have always been a favourite for tinkering; and the massive popularity of the Arduino based on the ATmega 168 and 328 MCUs introduced a lot of hobbyists to this series. The companion ATtiny series from Atmel were the poor stepchildren of the ATmega controllers to an extent - useful for small projects but often quite limited. However, the acquisition of Atmel by Microchip Technology in 2016 ushered in a new series of MCUs bearing the same moniker of ATtiny, but much more capable and innovative. They have been around for a while now, but many hobbyists are just beginning to poke around with these new capable MCUs.

FreeRTOS stack size on ESP32 - words or bytes?

Although FreeRTOS1 is an indispensible tool for working on anything more than the simplest application on ESP32, there are some difficulties to master, such as multitasking. Multitasking using FreeRTOS is accomplished by creating tasks with xTaskCreate() or xTaskCreatePinnedToCore(). In both of these calls, one of the parameters is uxStackDepth which is the allocated stack size for the task. The FreeRTOS documentation on the subject is clear about the units for uxStackDepth:

Our vermiculture process: A sustainable contribution

Several people have asked me how we manage a very productive vegetable garden; so I’ve written this post as a brief description of one aspect our our approach - vermiculture.

One of our overarching family goals is sustainable living. It’s basically about leaving a small footprint. A practical component of this philosophical stance is dealing with food waste. We deal with kitchen waste with a combination of bokashi composting and vermicomposting (also known as vermiculture) It’s not for the faint-of-heart and some are horrified to learn that I keep thousands - possibly hundreds of thousands - of worms in our basement. But some have asked me to describe our process; so this article is meant just to document it. There is a lot of art and science to vermiculture and this is not meant to be a definitive guide to vermiculture.

An approach to interleaved and variable musical practice: Tools and techniques

“How do you get to Carnegie Hall” goes the old joke. “Practice, practice, practice.” But of course there’s no other way. If the science of talent development has taught us anything over the last fifty years, it’s that there is no substitute for strategic practice. Some even argue that innate musical abilities don’t exist. Whether it’s nature, nurture, or both, show me a top-notch musician and I’ll show you a person who has learned to practice well. Here we’ll take a dive into a set of practice techniques that I’ve developed, along with tools to realize them in the practice room.