From now on, I will post links here to articles that I write for the WZB Data Science Blog from time to time. They will be prefixed with “[WZB]”.
Cross joins which form the cartesian product between two datasets, are a quite useful operation when you need to run calculations on all possible combinations of the rows in these datasets, for example calculating the age difference between each person in two groups of people. Another example would be calculating the distance between several origin cities and several destination cities. This is a good use-case for pandas, which helps working with large datasets efficiently in Python. But although the library supports most common join operations using
DataFrames, it lacks the support for cross joins. However, cross joins can be created with a little workaround using
pandas.merge(), which I will demonstrate with a small example.