I’ve always despised iTunes and because of that I began to use Clementine several years ago, after trying out lots of unsatisfying music player alternatives on OS X. At the end of the musical year, it’s time to draw some conclusions, like what’s the top 10 songs and albums of the year. Fortunately, Clementine stores all information about your music collection in an SQLite database and hence it is possible to answer these questions with some SQL.
Clementine’s main database is usually called clementine.db and is located in its user configuration directory. On OS X, this is ~/Library/Application Support/Clementine. We should create a copy of the database file before we mess with it, even if we’ll only use SELECT
statements. After that, we can open the database with the Terminal command sqlite3 clementine_copy.db
. First of all, for better display let’s enable “column mode” and display the column headers:
sqlite> .mode column
sqlite> .headers on
The only database table that’s interesting for us is called “songs”. It contains almost all information about each single song in your music collection. We can have a look at its fields by printing its schema:
sqlite> .schema songs
CREATE TABLE songs (
/* Metadata from taglib */
title TEXT,
album TEXT,
artist TEXT,
albumartist TEXT,
composer TEXT,
track INTEGER,
disc INTEGER,
bpm REAL,
year INTEGER,
genre TEXT,
comment TEXT,
compilation INTEGER,
[...]
/* Information about the file on disk */
directory INTEGER NOT NULL,
filename TEXT NOT NULL,
mtime INTEGER NOT NULL,
ctime INTEGER NOT NULL,
filesize INTEGER NOT NULL
, [...], playcount INTEGER NOT NULL DEFAULT 0, lastplayed INTEGER, rating INTEGER, [...], skipcount INTEGER NOT NULL DEFAULT 0, score INTEGER NOT NULL DEFAULT 0, [...]);
[...]
For our “top of the year” statistics, the most interesting fields will be playcount
and rating
, the latter describing the number of “stars” in the score rating with a number between 0 and 1.0, or -1 if the song is unrated. The score
field seems to be some kind of internal rating which I couldn’t find out what it’s for or how it’s calculated.
With this information, we can already make a simple top 10 list of songs by play count in year 2015:
sqlite> SELECT artist,album,title,playcount,rating
...> FROM songs WHERE year = 2015
...> ORDER BY playcount DESC LIMIT 10;
artist album title playcount rating
---------- ---------- ------------ ---------- ----------
Aloa Input Mars etc Far Away Sun 14 0.8
[...]
Will Samso Animal Han Your Golden 13 0.8
[...]
Aloa Input Mars etc Perry 12 1.3
Will Butle Policy Take My Side 12 0.8
[...]
However, this is not very satisfying, because many songs have the same play count and the rating is not considered at all. Let’s include the rating and also filter out songs that were not rated. Furthermore, we need to cap the rating to a maximum value of 1.0, because 5-star songs sometimes get higher values than 1.0 (I couldn’t find out why).
SELECT artist,album,title,playcount,MIN(rating,1.0) AS capped_rating
FROM songs WHERE year = 2015 AND rating >= 0
ORDER BY capped_rating DESC,playcount DESC LIMIT 10;
We can either primarily sort by play count or by rating, both is not very satisfying. The problem is that the play count is highly dependent on when the song was added to the collection. Songs released earlier in the year will more likely have higher play counts than winter releases. What we need is a “play count by day” since the song was added. We can calculate it by something like “play count / days(now - date_added)”. Fortunately, the “ctime” field contains the date when the song file was created. Although this is not necessarily the date the song was added to the collection, this is the most accurate we can get. Together with SQLite’s date and time functions we can get a “play count by day (ppd)” number and use it for sorting:
SELECT
artist,album,title,playcount,rating,
playcount/(julianday('now')-julianday(datetime(ctime, 'unixepoch'))) AS ppd
FROM songs
WHERE year = 2015
ORDER BY ppd DESC
LIMIT 10;
The rating is still important, so we should include it in some kind of weighted score. Let’s say this score should be made up by 50% rating and 50% of the “ppd” value so we can write the following:
SELECT
artist,album,title,playcount,rating,
playcount/(julianday('now')-julianday(datetime(ctime, 'unixepoch')))*0.5+MIN(rating,1.0)*0.5
AS weighted_score
FROM songs
WHERE year = 2015 and rating >= 0
ORDER BY weighted_score DESC
LIMIT 10;
Villagers Darling Arithmetic Courage
Melanie De Biasio No Deal Remixed I’m Gonna Leave You (C
Portable feat. Lcio DJ-Kicks Surrender (Kosi Edit)
Courtney Barnett Sometimes I Sit And Kim's Caravan
Kamasi Washington The Epic Isabelle
Kurt Vile B'lieve I'm Goin Down Pretty Pimpin
Will Butler Policy Anna
Rocko Schamoni & Le Die Vergessenen Angela
Bob Moses All In All Far From The Tree
Dan Mangan + Blacksm Club Meds Forgetery
In order to find out the top albums, we just need to modify the query so that all songs are grouped by the album name and use the average song play count and average rating of the album. The “*1.0” multiplications are only there to force floating-point division.
SELECT
artist, album,
SUM(playcount)*1.0/COUNT(title) AS avg_playcount,
AVG(MIN(rating,1.0)) AS avg_rating,
SUM(playcount)*1.0/COUNT(title)/(julianday('now')-julianday(datetime(ctime, 'unixepoch')))*0.5
+AVG(MIN(rating,1.0))*0.5
AS weighted_score
FROM songs
WHERE year = 2015 and rating >= 0
GROUP BY album
ORDER BY weighted_score DESC
LIMIT 10;
Viet Cong Viet Cong
Built to Spill Untethered Moon
Dan Mangan + Blacksm Club Meds
Courtney Barnett Sometimes I Sit And Think, And Sometimes I Just Sit
Villagers Darling Arithmetic
Will Butler Policy
Kamasi Washington The Epic
A Tribe Called Knarf Es ist die Wahrheit obwohl es nie passierte
Kurt Vile B'lieve I'm Goin' Down
Ought Sun Coming Down