Peering into Anki using R

Yet another diversion to keep me from focusing on actually using Anki to learn Russian. I stumbled on the R programming language, a language that focuses on statistical analysis.

Here’s a couple snippets that begin to scratch the surface of what’s possible. Important caveat: I’m an R novice at best. There are probably much better ways of doing some of this…

Counting notes with a particular model type

Here we’ll use R to do what we did previously with Python.

First load some of the libraries we’ll need:

library(rjson)
library(RSQLite)
library(DBI)

Next we’ll connect to the database and extract the model information:

# connect to the Anki database
dbpath <- "path to your collection"
con = dbConnect(RSQLite::SQLite(),dbname=dbpath)

# get information about the models
modelInfo <- as.character(dbGetQuery(con,'SELECT models FROM col'))
models <- fromJSON(modelInfo)

Since the model information is stored as JSON, we’ll need to parse the JSON to build a data frame that we can use to extract the model ID that we’ll need.

names <- c()
mid <- names(models)
for(i in 1:length(mid))
{
  names[i] <- models[[mid[i]]]$name
}
models <- data.frame(cbind(mid,names))

Next we’ll extract the model ID (mid) from this data frame so that we can find all of the notes with that model ID:

verbmid <- as.numeric(as.character(models[models$names=="Русский - глагол","mid"]))

# query the notes database for notes with this model
query <- paste("SELECT COUNT(id) FROM notes WHERE mid =",verbmid)
res <- as.numeric(dbGetQuery(con,query))

And of course, close the connection to the Anki SQLite database:

dbDisconnect(con)

As of this writing, res tells me I have 702 notes with the verb model types (named “Русский - глагол” in my collection.)

Counting hours per month in Anki

Ever wonder how many hours per month you spend reviewing in Anki? Here’s an R program that will grab review time information from the database and plot it for you. I ran across the original idea in this blog post by Gene Dan, but did a little work on the x-axis scale to get it to display correctly.

library(RSQLite)
library(DBI)
library(rjson)
library(anytime)
library(sqldf)
library(zoo)
library(ggplot2)

dbpath <- "/Users/alan/Library/Application Support/Anki2/Alan - Russian/collection.anki2"
con = dbConnect(RSQLite::SQLite(),dbname=dbpath)
#get reviews
rev <- dbGetQuery(con,'select CAST(id as TEXT) as id
                  , CAST(cid as TEXT) as cid
                  , time
                  from revlog')

cards <- dbGetQuery(con,'select CAST(id as TEXT) as cid, CAST(did as TEXT) as did from cards')

#Get deck info - from the decks field in the col table
deckinfo <- as.character(dbGetQuery(con,'select decks from col'))
decks <- fromJSON(deckinfo)

names <- c()
did <- names(decks)
for(i in 1:length(did))
{
  names[i] <- decks[[did[i]]]$name
}

decks <- data.frame(cbind(did,names))
#decks$names <- as.character(decks$names)

cards_w_decks <- merge(cards,decks,by="did")
#Date is UNIX timestamp in milliseconds, divide by 1000 to get seconds
rev$revdate <- as.yearmon(anydate(as.numeric(rev$id)/1000))

# Assign deck info to reviews
rev_w_decks <- merge(rev,cards_w_decks,by="cid")
time_summary <- sqldf("select revdate, sum(time) as Time from rev_w_decks group by revdate")
time_summary$Time <- time_summary$Time/3.6e+6

ggplot(time_summary,aes(x=revdate,y=Time))+geom_bar(stat="identity",fill="#d93d2a")+
  scale_x_yearmon()+
  ggtitle("Hours per Month") +
  xlab("Review Date") +
  ylab("Time (hrs)") +
  theme(axis.text.x=element_text(hjust=2,size=rel(1))) +
  theme(plot.title=element_text(size=rel(1.5),vjust=.9,hjust=.5)) +
  guides(fill = guide_legend(reverse = TRUE))

dbDisconnect(con)

You should get a plot like this the one at the top of the post.

I’m anxious to learn more about R and apply it to understanding my performance in Anki.

Language word frequencies

Since one of the cornerstones of my approach to learning the Russian language has been to track how many words I’ve learned and their frequencies, I was intrigued by reading the following statistics today:

  • The 15 most frequent words in the language account for 25% of all the words in typical texts.
  • The first 100 words account for 60% of the words appearing in texts.
  • 97% of the words one encounters in a ordinary text will be among the first 4000 most frequent words.

In other words, if you learn the first 4000 words of a language, you’ll be able to understand nearly everything.

Source - Five Cornerstones for Second-Language Acquisition - the Neurophysiological Opportunist’s Way - Olle Kjellin, M.D., Ph.D. but originally from The Cambridge Encyclopedia of Language (Crystal, 1995)

Anki database adventures: Counting notes by model type

Continuing my series on accessing the Anki database outside of the Anki application environment, here’s a piece on accessing the note type model. You may wish to start here with the first article on accessing the Anki database. This is geared toward mac OS. (If you’re not on mac OS, then start here instead.)

The note type model

Since notes contain flexible fields in Anki, the model for a note type is in JSON. The best guess definition of the JSON is:

Accessing the Anki database with Python: Working with a specific deck

I previously wrote about accessing the Anki database using Python on mac OS. Extending that post, I’ll show how to work with a specific deck in this short post.

To use a named deck you’ll need its deck ID. Fortunately there’s a built-in method for finding a deck ID by name:

col = Collection(COLLECTION_PATH)
dID = col.decks.id(DECK_NAME)

Now in queries against the cards and notes tables we can apply the deck ID to restrict them to a certain deck. For example, to find all of the cards currently in the learning stage:

Working with the Anki database on mac OS using Python

Not long ago I ran across this post detailing a method for opening and inspecting the Anki database using Python outside the Anki application environment. However, the approach requires linking to the Anki code base which is inaccessible on mac OS since the Python code is packaged into a Mac app on this platform.

The solution I’ve found is inelegant; but just involves downloading the Anki code base to a location on your file system where you can link to it in your code. You can find the Anki code here on github.

Process automation in building Anki vocabulary cards

For the last two years, I’ve been working through a 10,000 word Russian vocabulary ordered by frequency. I have a goal of finishing the list before the end of 2019. This requires not only stubborn persistence but an efficient process of collecting the information that goes onto my Anki flash cards.

My manual process has been to work from a Numbers spreadsheet. As I collect information about each word from several websites, I log it in this table.

More Javascript with Anki

I wrote a piece previously about using JavaScript in Anki cards. Although I haven’t found many uses for employing this idea, it does come up from time-to-time including a recent use-case I’m writing about now.

After downloading a popular French frequency list deck for my daughter to use, I noticed that it omits the gender of nouns in the French prompt. In school, I was always taught to memorize the gender along with the noun. For example, when you memorize the word for law, “loi” you should mermorize it with either the definite article “la” or the indefinite article “une” so that the feminine gender of the noun is inseparable from the noun itself. But this deck has only the noun prompt and I was afraid that my daughter would fail to memorize the noun’s gender. JavaScript to the rescue.

An approach to dealing with spurious sensor data in Indigo

Spurious sensor data can wreak havoc in an otherwise finely-tuned home automation system. I use temperature data from an Aeotech Multisensor 6 to monitor the environment in our greenhouse. Living in Canada, I cannot rely solely on passive systems to maintain the temperature, particularly at night. So, using the temperature and humidity measurements transmitted back to the controller over Z-wave, I control devices inside the greenhouse that heat and humidify the environment.

Follow the intent.

With Trump the usual advice of “Follow the money.” doesn’t work because Congress refuses to force him to disclose his conflicts of interest. As enormous and material as those conflicts must be, I’m just going to focus on what I can see with my own eyes, the man’s apparent intent.

In his public life, Donald Trump has never done anything that did not personally and directly benefit him. Most of us, as we go through life, assemble a collection of acts that are variously self-serving and other-serving. This is the way of life. Normal life. With Trump, not so. Even his meager philanthropic acts are tainted with controversy. The man simply cannot act in sacrificial way. He is incurable.^[In a campaign event in Fort Dodge, Iowa on November 12, 2015, Trump claimed that rival Ben Carson was “pathological” and that “…if you’re pathological, there’s no cure for that, folks, okay? There’s no cure for that.” Since Trump’s own psychopathology is widely questioned, one wonders if he, too, is incurable. Given that narcissistic personality disorder is almost certainly among the potential diagnoses, he probably is incurable.]