I have created a small visualization tool for animating integer sequences (e.g. the decimal expansion of π, e or other constants) that are available on the On-Line Encyclopedia of integer sequences. It was just a training project to learn the excellent visualization library d3.js, but it turned out the some of the sequences create quite beautiful animations with this tool. Hence I decided to put it online so you one play around with it and also publish the source code on github.
Cross joins which form the cartesian product between two datasets, are a quite useful operation when you need to run calculations on all possible combinations of the rows in these datasets, for example calculating the age difference between each person in two groups of people. Another example would be calculating the distance between several origin cities and several destination cities. This is a good use-case for pandas, which helps working with large datasets efficiently in Python. But although the library supports most common join operations using
DataFrames, it lacks the support for cross joins. However, cross joins can be created with a little workaround using
pandas.merge(), which I will demonstrate with a small example.
I needed some algorithm for clustering multidimensional vectors in Python, so I remembered that I once implemented the Linde-Buzo-Gray aka Generalized Lloyd algorithm (for better explanation see this website) in Java and since it does the job well, I decided to implement it in Python. I put the result on github along with a small IPython notebook with which you can visualize the results of the clustering process – the red dots are the detected clusters of the input data (blue crosses):
Sometimes it is necessary to normalize a string by removing all kinds of diacritics (accents), underlines or other “marks” that can be attached to characters in unicode. This is important for example for full-text search or text mining. Transliteration to ASCII characters is not an option because this would for example also eliminate Greek, Russian or other characters. With the help of the PyICU library, the task can easily be achieved:
from icu import UnicodeString, Transliterator, UTransDirection u = UnicodeString(s) t = Transliterator.createInstance("NFD; [:M:] Remove; NFC", UTransDirection.FORWARD) t.transliterate(u) normalized = str(u)
After converting a Python string to an ICU
UnicodeString object, we can apply a transliteration
operation that is defined as
"NFD; [:M:] Remove; NFC“. This operation means the Unicode string is at first decomposed (NFD), then the character class “marks” is removed (“[:M:] Remove”) and finally the string is re-composed again (NFC). At the end, the
UnicodeString object is converted back to a Python
After defining a function we can use it as follows and see that it works (underlines may not be displayed correctly in your browser):
normalize_string('café') > 'cafe' normalize_string('αυτῷ μνήμ̳η̳ς') > 'αυτω μνημης'
I’ve always despised iTunes and because of that I began to use Clementine several years ago, after trying out lots of unsatisfying music player alternatives on OS X. At the end of the musical year, it’s time to draw some conclusions, like what’s the top 10 songs and albums of the year. Fortunately, Clementine stores all information about your music collection in an SQLite database and hence it is possible to answer these questions with some SQL.