Welcome to the personal website of Markus Konrad. You can find blog posts about computer science related topics here, as well as information about own open-source software projects.

Blog Posts

A synthesized music video with MoviePy

Posted at 29 Jul 2018
Tags: music, video, visualization, synthesis, python

After getting inspired by a blog post about using Voronoi tesselation to get interesting visual effects and another blog post on onset detection and video synthesis with Python, I decided to combine both for a music video of the band I’m playing in, kiriloff. Here it is:

MoviePy is used for its video editing functions that can be used in Python. Using aubio, it’s possible to detect when notes are played (onset detection). Input video clips then get the special “spider web effect” by constructing Voronoi diagrams according to the detected onsets and their amplitude. For this, I convert the input video frame to a binary image and sample some of the white pixels while the sample size is dependent on the onset amplitude (hence more pixels are sampled in the louder parts). With NumPy’s where function the coordinates of the sampled pixels are found out and those coordinates are used to construct a Voronoi diagram using SciPy. The resulting lines are rendered and MoviePy writes it to the video file.

For more details, see the project page and the github repository.

A circular integer sequences visualization tool done with d3.js

Posted at 05 Nov 2016
Tags: visualization, javascript, d3js

Integer sequence OEIS A002262

I have created a small visualization tool for animating integer sequences (e.g. the decimal expansion of π, e or other constants) that are available on the On-Line Encyclopedia of integer sequences. It was just a training project to learn the excellent visualization library d3.js, but it turned out the some of the sequences create quite beautiful animations with this tool. Hence I decided to put it online so you one play around with it and also publish the source code on github.

Cross join / cartesian product between pandas DataFrames

Posted at 16 Apr 2016
Tags: python, pandas, datascience

Cross joins which form the cartesian product between two datasets, are a quite useful operation when you need to run calculations on all possible combinations of the rows in these datasets, for example calculating the age difference between each person in two groups of people. Another example would be calculating the distance between several origin cities and several destination cities. This is a good use-case for pandas, which helps working with large datasets efficiently in Python. But although the library supports most common join operations using DataFrames, it lacks the support for cross joins. However, cross joins can be created with a little workaround using pandas.merge(), which I will demonstrate with a small example.

Read on…

A Python script to convert BibTeX to BibJSON

Posted at 18 Feb 2016
Tags: python, bibtex

I made a small Python script that converts BibTeX to BibJSON. It uses BibtexParser to parse the input BibTeX data and then constructs the JSON format as proposed by the BibJSON draft.

Python implementation of Linde-Buzo-Gray algorithm

Posted at 10 Jan 2016
Tags: python, clustering, datascience, imageproc

I needed some algorithm for clustering multidimensional vectors in Python, so I remembered that I once implemented the Linde-Buzo-Gray aka Generalized Lloyd algorithm (for better explanation see this website) in Java and since it does the job well, I decided to implement it in Python. I put the result on github along with a small IPython notebook with which you can visualize the results of the clustering process – the red dots are the detected clusters of the input data (blue crosses):

IPython matplotlib output example of LBG algorithm

Removing diacritics, underlines and other marks from unicode strings in Python

Posted at 05 Jan 2016
Tags: icu, python, unicode

Sometimes it is necessary to normalize a string by removing all kinds of diacritics (accents), underlines or other “marks” that can be attached to characters in unicode. This is important for example for full-text search or text mining. Transliteration to ASCII characters is not an option because this would for example also eliminate Greek, Russian or other characters. With the help of the PyICU library, the task can easily be achieved:

from icu import UnicodeString, Transliterator, UTransDirection
u = UnicodeString(s)
t = Transliterator.createInstance("NFD; [:M:] Remove; NFC", UTransDirection.FORWARD)
t.transliterate(u)
normalized = str(u)

After converting a Python string to an ICU UnicodeString object, we can apply a transliteration operation that is defined as "NFD; [:M:] Remove; NFC“. This operation means the Unicode string is at first decomposed (NFD), then the character class "marks” is removed (“[:M:] Remove”) and finally the string is re-composed again (NFC). At the end, the UnicodeString object is converted back to a Python str.

After defining a function we can use it as follows and see that it works (underlines may not be displayed correctly in your browser):

normalize_string('café')
> 'cafe'
normalize_string('αυτῷ μνήμ̳η̳ς')
> 'αυτω μνημης'

Finding out annual music favorites from Clementine music player

Posted at 30 Dec 2015
Tags: sql, sqlite, music, clementine

I’ve always despised iTunes and because of that I began to use Clementine several years ago, after trying out lots of unsatisfying music player alternatives on OS X. At the end of the musical year, it’s time to draw some conclusions, like what’s the top 10 songs and albums of the year. Fortunately, Clementine stores all information about your music collection in an SQLite database and hence it is possible to answer these questions with some SQL.

Read on…

Scraping data from Facebook groups and pages

Posted at 23 Dec 2015
Tags: facebook, scraping, nlp, python, php

I’ve written a small set of tools to (semi-)automatically collect and analyze data from online discussions on Facebook groups and pages. It is available on github. It can be used to collect posts and comments (including their hierarchical structure and some metadata) from public groups and pages automatically. For closed groups, manually saving the HTML output and parsing it with a provided Python script is necessary.

After collecting the data, statistical analyses can be performed on it. For now, identifying and counting nouns as described in a previous blog post is implemented.

Extracting Nouns in German texts with Python using Pattern library and libleipzig

Posted at 13 Dec 2015
Tags: python, nlp, pattern, libleipzig

Extracting nouns in their baseform (lemmata) from German texts can be easily done using Python and the Pattern library, especially its pattern.de module. However, using the pattern.de library alone often leads to unsatisfying results, because the baseform is often not correctly determined. The results can be enhanced using libleipzig which queries the Wortschatz Uni Leipzig database.

Read on…

Finally valid and free SSL certificates

Posted at 08 Dec 2015
Tags: ssl, server

After years of self-signed certificates, I finally have valid, trusted SSL certificates for my domains thanks to Let’s Encrypt. Although still in beta, having the certificats issued worked flawlessly using the letsencrypt command line tool and the short instructions from the official documentation.

Now it’s no problem to access HTTP, CalDAV, email or other services on my server without being bugged by “invalid certificate” warnings or, in case of many mobile OS, needing to install a self-signed certificate manually on the client device.

 
Next