Welcome to the personal website of Markus Konrad. You can find blog posts about computer science related topics here, as well as information about own open-source software projects.

Blog Posts

A circular integer sequences visualization tool done with d3.js

Posted at 05 Nov 2016
Tags: visualization, javascript, d3js

Integer sequence OEIS A002262

I have created a small visualization tool for animating integer sequences (e.g. the decimal expansion of π, e or other constants) that are available on the On-Line Encyclopedia of integer sequences. It was just a training project to learn the excellent visualization library d3.js, but it turned out the some of the sequences create quite beautiful animations with this tool. Hence I decided to put it online so you one play around with it and also publish the source code on github.

Why You Should Never Use MongoDB

Posted at 27 Sep 2016
Tags: databases, nosql, link

Cross join / cartesian product between pandas DataFrames

Posted at 16 Apr 2016
Tags: python, pandas, datascience

Cross joins which form the cartesian product between two datasets, are a quite useful operation when you need to run calculations on all possible combinations of the rows in these datasets, for example calculating the age difference between each person in two groups of people. Another example would be calculating the distance between several origin cities and several destination cities. This is a good use-case for pandas, which helps working with large datasets efficiently in Python. But although the library supports most common join operations using DataFrames, it lacks the support for cross joins. However, cross joins can be created with a little workaround using pandas.merge(), which I will demonstrate with a small example.

Read on…

Most software already has a “golden key” backdoor: the system update

Posted at 28 Feb 2016
Tags: link, security

A Python script to convert BibTeX to BibJSON

Posted at 18 Feb 2016
Tags: python, bibtex

I made a small Python script that converts BibTeX to BibJSON. It uses BibtexParser to parse the input BibTeX data and then constructs the JSON format as proposed by the BibJSON draft.

An equation that debunks conspiracy theories

Posted at 18 Feb 2016
Tags: link, science, probability

Unexpected gender bias on GitHub

Posted at 14 Feb 2016
Tags: link, opensource, github, gender

Python implementation of Linde-Buzo-Gray algorithm

Posted at 10 Jan 2016
Tags: python, clustering, datascience, imageproc

I needed some algorithm for clustering multidimensional vectors in Python, so I remembered that I once implemented the Linde-Buzo-Gray aka Generalized Lloyd algorithm (for better explanation see this website) in Java and since it does the job well, I decided to implement it in Python. I put the result on github along with a small IPython notebook with which you can visualize the results of the clustering process – the red dots are the detected clusters of the input data (blue crosses):

IPython matplotlib output example of LBG algorithm

Removing diacritics, underlines and other marks from unicode strings in Python

Posted at 05 Jan 2016
Tags: icu, python, unicode

Sometimes it is necessary to normalize a string by removing all kinds of diacritics (accents), underlines or other “marks” that can be attached to characters in unicode. This is important for example for full-text search or text mining. Transliteration to ASCII characters is not an option because this would for example also eliminate Greek, Russian or other characters. With the help of the PyICU library, the task can easily be achieved:

from icu import UnicodeString, Transliterator, UTransDirection
u = UnicodeString(s)
t = Transliterator.createInstance("NFD; [:M:] Remove; NFC", UTransDirection.FORWARD)
t.transliterate(u)
normalized = str(u)

After converting a Python string to an ICU UnicodeString object, we can apply a transliteration operation that is defined as "NFD; [:M:] Remove; NFC“. This operation means the Unicode string is at first decomposed (NFD), then the character class “marks” is removed (“[:M:] Remove”) and finally the string is re-composed again (NFC). At the end, the UnicodeString object is converted back to a Python str.

After defining a function we can use it as follows and see that it works (underlines may not be displayed correctly in your browser):

normalize_string('café')
> 'cafe'
normalize_string('αυτῷ μνήμ̳η̳ς')
> 'αυτω μνημης'

Finding out annual music favorites from Clementine music player

Posted at 30 Dec 2015
Tags: sql, sqlite, music, clementine

I’ve always despised iTunes and because of that I began to use Clementine several years ago, after trying out lots of unsatisfying music player alternatives on OS X. At the end of the musical year, it’s time to draw some conclusions, like what’s the top 10 songs and albums of the year. Fortunately, Clementine stores all information about your music collection in an SQLite database and hence it is possible to answer these questions with some SQL.

Read on…

 
Next