I’ve always despised iTunes and because of that I began to use Clementine several years ago, after trying out lots of unsatisfying music player alternatives on OS X. At the end of the musical year, it’s time to draw some conclusions, like what’s the top 10 songs and albums of the year. Fortunately, Clementine stores all information about your music collection in an SQLite database and hence it is possible to answer these questions with some SQL.
Clementine’s main database is usually called clementine.db and is located in its user configuration directory. On OS X, this is ~/Library/Application Support/Clementine. We should create a copy of the database file before we mess with it, even if we’ll only use
SELECT statements. After that, we can open the database with the Terminal command
sqlite3 clementine_copy.db. First of all, for better display let’s enable “column mode” and display the column headers:
sqlite> .mode column sqlite> .headers on
The only database table that’s interesting for us is called “songs”. It contains almost all information about each single song in your music collection. We can have a look at its fields by printing its schema:
sqlite> .schema songs CREATE TABLE songs ( /* Metadata from taglib */ title TEXT, album TEXT, artist TEXT, albumartist TEXT, composer TEXT, track INTEGER, disc INTEGER, bpm REAL, year INTEGER, genre TEXT, comment TEXT, compilation INTEGER, [...] /* Information about the file on disk */ directory INTEGER NOT NULL, filename TEXT NOT NULL, mtime INTEGER NOT NULL, ctime INTEGER NOT NULL, filesize INTEGER NOT NULL , [...], playcount INTEGER NOT NULL DEFAULT 0, lastplayed INTEGER, rating INTEGER, [...], skipcount INTEGER NOT NULL DEFAULT 0, score INTEGER NOT NULL DEFAULT 0, [...]); [...]
For our “top of the year” statistics, the most interesting fields will be
rating, the latter describing the number of “stars” in the score rating with a number between 0 and 1.0, or -1 if the song is unrated. The
score field seems to be some kind of internal rating which I couldn’t find out what it’s for or how it’s calculated.
With this information, we can already make a simple top 10 list of songs by play count in year 2015:
sqlite> SELECT artist,album,title,playcount,rating ...> FROM songs WHERE year = 2015 ...> ORDER BY playcount DESC LIMIT 10; artist album title playcount rating ---------- ---------- ------------ ---------- ---------- Aloa Input Mars etc Far Away Sun 14 0.8 [...] Will Samso Animal Han Your Golden 13 0.8 [...] Aloa Input Mars etc Perry 12 1.3 Will Butle Policy Take My Side 12 0.8 [...]
However, this is not very satisfying, because many songs have the same play count and the rating is not considered at all. Let’s include the rating and also filter out songs that were not rated. Furthermore, we need to cap the rating to a maximum value of 1.0, because 5-star songs sometimes get higher values than 1.0 (I couldn’t find out why).
SELECT artist,album,title,playcount,MIN(rating,1.0) AS capped_rating FROM songs WHERE year = 2015 AND rating >= 0 ORDER BY capped_rating DESC,playcount DESC LIMIT 10;
We can either primarily sort by play count or by rating, both is not very satisfying. The problem is that the play count is highly dependent on when the song was added to the collection. Songs released earlier in the year will more likely have higher play counts than winter releases. What we need is a “play count by day” since the song was added. We can calculate it by something like “play count / days(now - date_added)”. Fortunately, the “ctime” field contains the date when the song file was created. Although this is not necessarily the date the song was added to the collection, this is the most accurate we can get. Together with SQLite’s date and time functions we can get a “play count by day (ppd)” number and use it for sorting:
SELECT artist,album,title,playcount,rating, playcount/(julianday('now')-julianday(datetime(ctime, 'unixepoch'))) AS ppd FROM songs WHERE year = 2015 ORDER BY ppd DESC LIMIT 10;
The rating is still important, so we should include it in some kind of weighted score. Let’s say this score should be made up by 50% rating and 50% of the “ppd” value so we can write the following:
SELECT artist,album,title,playcount,rating, playcount/(julianday('now')-julianday(datetime(ctime, 'unixepoch')))*0.5+MIN(rating,1.0)*0.5 AS weighted_score FROM songs WHERE year = 2015 and rating >= 0 ORDER BY weighted_score DESC LIMIT 10;
Villagers Darling Arithmetic Courage Melanie De Biasio No Deal Remixed I’m Gonna Leave You (C Portable feat. Lcio DJ-Kicks Surrender (Kosi Edit) Courtney Barnett Sometimes I Sit And Kim's Caravan Kamasi Washington The Epic Isabelle Kurt Vile B'lieve I'm Goin Down Pretty Pimpin Will Butler Policy Anna Rocko Schamoni & Le Die Vergessenen Angela Bob Moses All In All Far From The Tree Dan Mangan + Blacksm Club Meds Forgetery
In order to find out the top albums, we just need to modify the query so that all songs are grouped by the album name and use the average song play count and average rating of the album. The “*1.0” multiplications are only there to force floating-point division.
SELECT artist, album, SUM(playcount)*1.0/COUNT(title) AS avg_playcount, AVG(MIN(rating,1.0)) AS avg_rating, SUM(playcount)*1.0/COUNT(title)/(julianday('now')-julianday(datetime(ctime, 'unixepoch')))*0.5 +AVG(MIN(rating,1.0))*0.5 AS weighted_score FROM songs WHERE year = 2015 and rating >= 0 GROUP BY album ORDER BY weighted_score DESC LIMIT 10;
Viet Cong Viet Cong Built to Spill Untethered Moon Dan Mangan + Blacksm Club Meds Courtney Barnett Sometimes I Sit And Think, And Sometimes I Just Sit Villagers Darling Arithmetic Will Butler Policy Kamasi Washington The Epic A Tribe Called Knarf Es ist die Wahrheit obwohl es nie passierte Kurt Vile B'lieve I'm Goin' Down Ought Sun Coming Down