I’ve always despised iTunes and because of that I began to use Clementine several years ago, after trying out lots of unsatisfying music player alternatives on OS X. At the end of the musical year, it’s time to draw some conclusions, like what’s the top 10 songs and albums of the year. Fortunately, Clementine stores all information about your music collection in an SQLite database and hence it is possible to answer these questions with some SQL.
Clementine’s main database is usually called clementine.db and is located in its user configuration directory. On OS X, this is ~/Library/Application Support/Clementine. We should create a copy of the database file before we mess with it, even if we’ll only use SELECT
statements. After that, we can open the database with the Terminal command sqlite3 clementine_copy.db
. First of all, for better display let’s enable “column mode” and display the column headers:
The only database table that’s interesting for us is called “songs”. It contains almost all information about each single song in your music collection. We can have a look at its fields by printing its schema:
For our “top of the year” statistics, the most interesting fields will be playcount
and rating
, the latter describing the number of “stars” in the score rating with a number between 0 and 1.0, or -1 if the song is unrated. The score
field seems to be some kind of internal rating which I couldn’t find out what it’s for or how it’s calculated.
With this information, we can already make a simple top 10 list of songs by play count in year 2015:
However, this is not very satisfying, because many songs have the same play count and the rating is not considered at all. Let’s include the rating and also filter out songs that were not rated. Furthermore, we need to cap the rating to a maximum value of 1.0, because 5-star songs sometimes get higher values than 1.0 (I couldn’t find out why).
We can either primarily sort by play count or by rating, both is not very satisfying. The problem is that the play count is highly dependent on when the song was added to the collection. Songs released earlier in the year will more likely have higher play counts than winter releases. What we need is a “play count by day” since the song was added. We can calculate it by something like “play count / days(now - date_added)”. Fortunately, the “ctime” field contains the date when the song file was created. Although this is not necessarily the date the song was added to the collection, this is the most accurate we can get. Together with SQLite’s date and time functions we can get a “play count by day (ppd)” number and use it for sorting:
The rating is still important, so we should include it in some kind of weighted score. Let’s say this score should be made up by 50% rating and 50% of the “ppd” value so we can write the following:
In order to find out the top albums, we just need to modify the query so that all songs are grouped by the album name and use the average song play count and average rating of the album. The “*1.0” multiplications are only there to force floating-point division.