Jekyll2022-08-14T10:47:49+02:00https://mkonrad.net/rss.xmlmkonrad.netPersonal website of Markus Konrad from Berlin: Computer Science related topics, own open-source software projects.An update to the @ZwoSchlagzeilen Twitter-Bot2022-08-14T10:12:34+02:002022-08-14T10:12:34+02:00https://mkonrad.net/2022/08/14/zweischlagzeilen-update<p>I recently updated my <a href="https://twitter.com/ZwoSchlagzeilen">@ZwoSchlagzeilen</a> Twitter bot. I replaced the complicated, mostly rule-based language generation algorithm with a statistical approach using a <a href="https://web.stanford.edu/%7Ejurafsky/slp3/3.pdf">trigram language model</a>. Details can be found in the <a href="https://github.com/internaut/zweischlagzeilen">GitHub repository</a>.</p>
<p>Using this approach, the bot strictly speaking doesn’t mix two headlines anymore but rather uses a part from a randomly sampled headline as “seed” headline and then randomly samples new words conditional on the previous two words. So it mixes one headline with words from a corpus of headlines (originating from news of the last two weeks).</p>
<p>I observed the generated tweets for the last two weeks and found the results better than those generated with the previous approach. Still, some generated headlines don’t make sense at all, but these imperfections are what makes it fun.</p>I recently updated my @ZwoSchlagzeilen Twitter bot. I replaced the complicated, mostly rule-based language generation algorithm with a statistical approach using a trigram language model. Details can be found in the GitHub repository. Using this approach, the bot strictly speaking doesn’t mix two headlines anymore but rather uses a part from a randomly sampled headline as “seed” headline and then randomly samples new words conditional on the previous two words. So it mixes one headline with words from a corpus of headlines (originating from news of the last two weeks). I observed the generated tweets for the last two weeks and found the results better than those generated with the previous approach. Still, some generated headlines don’t make sense at all, but these imperfections are what makes it fun.memex – encrypted chronological note keeping tool for Unix CLIs2022-06-29T10:44:23+02:002022-06-29T10:44:23+02:00https://mkonrad.net/2022/06/29/memexNeural Network from Scratch in JuliaLang2022-05-23T10:02:23+02:002022-05-23T10:02:23+02:00https://mkonrad.net/2022/05/23/neural-network-from-scratch-juliaSome thoughts about the use of cloud services and web APIs in social science research2022-03-07T10:41:23+01:002022-03-07T10:41:23+01:00https://mkonrad.net/2022/03/07/some-thoughts-about-the-use-of-cloud-services-and-web-apis-in-social-science-researchContinuous Integration testing with GitHub Actions using tox and hypothesis2022-03-04T16:20:00+01:002022-03-04T16:20:00+01:00https://mkonrad.net/2022/03/04/continuous-integration-testing-with-github-actions-using-tox-and-hypothesisBatch transfer GitLab projects with the GitLab API2022-02-22T11:35:25+01:002022-02-22T11:35:25+01:00https://mkonrad.net/2022/02/22/batch-transfer-gitlab-projects-with-the-gitlab-apiProperty-based testing2021-11-07T11:49:00+01:002021-11-07T11:49:00+01:00https://mkonrad.net/2021/11/07/property-based-testing<p>Although I’m employing <a href="https://hypothesis.works/articles/what-is-property-based-testing/">property-based testing</a> for several years already, I keep on being surprised about its ability to find obscure bugs before they occur in production. I’m using the <a href="https://github.com/HypothesisWorks/hypothesis">Hypothesis</a> Python package regularily in software projects and lately it helped me to catch a very simple but still somehow surprising bug while working on <a href="https://github.com/WZBSocialScienceCenter/tmtoolkit">tmtoolkit</a>:</p>
<p>I had a simple function that converted <a href="https://www.cs.toronto.edu/%7Ekrueger/csc209h/tut/line-endings.html">Windows line breaks to UNIX line breaks</a>. Something like:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">def linebreaks_win2unix(text):
return text.replace('\r\n', '\n')
</code></pre></figure>
<p>What could go wrong with such a simple function? Let’s write a property-based test with Hypothesis that generates strings with maximum length of 20 characters from an alphabet of <code>a</code>, <code>b</code>, <code>c</code>, space, carriage return <code>\r</code> and line feed <code>\n</code>. We only check for the property that after conversion, there shouldn’t be any Windows line breaks <code>\r\n</code> left in the converted string.</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">from hypothesis import given, strategies as st
@given(text=st.text(alphabet=list('abc \r\n'), max_size=20))
def test_linebreaks_win2unix(text):
assert '\r\n' not in linebreaks_win2unix(text)
</code></pre></figure>
<p>If you (like me) didn’t give much thought into the problem, because it seemed to simple, you’ll be surprised that the test fails:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">Falsifying example: test_linebreaks_win2unix(
text='\r\r\n',
)
@given(text=st.text(alphabet=list('abc \r\n'), max_size=20))
def test_linebreaks_win2unix(text):
> assert '\r\n' not in linebreaks_win2unix(text)
E AssertionError: assert '\r\n' not in '\r\n'
</code></pre></figure>
<p>On second thought, though, it is clear why this happens: If you have a string that contains <code>\r\r\n</code>, only the last two characters will be translated to <code>\n</code> which in the end leads to the string <code>\r\n</code> so that the result string still contains a Windows line break. A possible solution would be to perform the replacements iteratively:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">def linebreaks_win2unix(text):
while '\r\n' in text:
text = text.replace('\r\n', '\n')
return text
</code></pre></figure>
<p>It’s natural not to think about such issues, because when thinking about which input such a function could receive, you’d imagine files with lines that end with <code>\r\n</code>. You wouldn’t probably come up with edge cases like a file that contains <code>\r\r\n</code>. But it is these edge cases that make programs fail, not the “regular” inputs. Property-based testing helps to find such edge cases.</p>Although I’m employing property-based testing for several years already, I keep on being surprised about its ability to find obscure bugs before they occur in production. I’m using the Hypothesis Python package regularily in software projects and lately it helped me to catch a very simple but still somehow surprising bug while working on tmtoolkit: I had a simple function that converted Windows line breaks to UNIX line breaks. Something like: def linebreaks_win2unix(text): return text.replace('\r\n', '\n') What could go wrong with such a simple function? Let’s write a property-based test with Hypothesis that generates strings with maximum length of 20 characters from an alphabet of a, b, c, space, carriage return \r and line feed \n. We only check for the property that after conversion, there shouldn’t be any Windows line breaks \r\n left in the converted string. from hypothesis import given, strategies as st @given(text=st.text(alphabet=list('abc \r\n'), max_size=20)) def test_linebreaks_win2unix(text): assert '\r\n' not in linebreaks_win2unix(text) If you (like me) didn’t give much thought into the problem, because it seemed to simple, you’ll be surprised that the test fails: Falsifying example: test_linebreaks_win2unix( text='\r\r\n', ) @given(text=st.text(alphabet=list('abc \r\n'), max_size=20)) def test_linebreaks_win2unix(text): > assert '\r\n' not in linebreaks_win2unix(text) E AssertionError: assert '\r\n' not in '\r\n' On second thought, though, it is clear why this happens: If you have a string that contains \r\r\n, only the last two characters will be translated to \n which in the end leads to the string \r\n so that the result string still contains a Windows line break. A possible solution would be to perform the replacements iteratively: def linebreaks_win2unix(text): while '\r\n' in text: text = text.replace('\r\n', '\n') return text It’s natural not to think about such issues, because when thinking about which input such a function could receive, you’d imagine files with lines that end with \r\n. You wouldn’t probably come up with edge cases like a file that contains \r\r\n. But it is these edge cases that make programs fail, not the “regular” inputs. Property-based testing helps to find such edge cases.Problems using a serial console with the Raspberry Pi 32021-07-17T16:32:00+02:002021-07-17T16:32:00+02:00https://mkonrad.net/2021/07/17/raspi-serial-console-problems<p>For a recent hobby project, I wanted to access a Raspberry Pi 3 B+ via serial console using an USB-to-serial cable. The setup is <a href="https://learn.adafruit.com/adafruits-raspberry-pi-lesson-5-using-a-console-cable">fairly easy</a> and basically involves enabling the serial console via <code>raspi-config</code>, connecting the USB-to-serial (a.k.a. <em>USB-to-TTL</em>) adapter cable to the right pins (the UART pins) and using a serial console emulator like <code>screen</code>, <code>tmux</code> or PuTTY to connect to the Raspi.</p>
<p>Despite this straight forward setup, I couldn’t get the serial console connection running. The console emulator (I’m using <code>screen</code> on Linux) was either not responding or showing gibberish. I tried out other console emulators, other USB-to-serial adapters or a fresh installation of Raspberry Pi OS – still, nothing worked. There was not much advice on the web, either. <a href="https://www.raspberrypi.org/forums/viewtopic.php?t=153514">Most posts</a> only reiterated enabling the serial console via <code>raspi-config</code> or <code>/boot/config.txt</code> or highlighted that using the correct baud rate was important. Others recommended <a href="https://blog.adafruit.com/2016/03/07/raspberry-pi-3-uart-speed-workaround/">disabling bluetooth and remapping the GPIO pins to use the hardware UART on the Raspi 3</a> which I didn’t want to mess around with.</p>
<p>In the end, I found the solution by chance: My Raspi most of the time complained about low voltage, but I couldn’t fix this since none of the power supplies and none of the USB cables that I tried seemed to satisfy the machine. The Raspi worked anyway so I didn’t bother much for the moment. However, I noticed that sometimes the serial console worked for a few seconds and this happend to coincide with the few seconds when there was no “low voltage” warning. In the end, the solution was using a USB-to-TTL adapter with strong enough power supply and short cables (long cables cause voltage loss), which fixed the low voltage problem and with it the unstable serial console connection. It’s important to note that you should <a href="https://learn.adafruit.com/adafruits-raspberry-pi-lesson-5-using-a-console-cable/connect-the-lead#powering-via-cable-1961421-6">disconnect the Raspi’s standard power supply (via mini-USB) before you connect the USB-to-TTL adapter’s power supply</a>.</p>
<p>So the moral of the story is that the UART pins of the Raspi (at least of my version 3) seem to be <em>very</em> sensitive to power supply problems and you should always make sure that your Raspi doesn’t complain about “low voltage” when using a serial console connection.</p>For a recent hobby project, I wanted to access a Raspberry Pi 3 B+ via serial console using an USB-to-serial cable. The setup is fairly easy and basically involves enabling the serial console via raspi-config, connecting the USB-to-serial (a.k.a. USB-to-TTL) adapter cable to the right pins (the UART pins) and using a serial console emulator like screen, tmux or PuTTY to connect to the Raspi. Despite this straight forward setup, I couldn’t get the serial console connection running. The console emulator (I’m using screen on Linux) was either not responding or showing gibberish. I tried out other console emulators, other USB-to-serial adapters or a fresh installation of Raspberry Pi OS – still, nothing worked. There was not much advice on the web, either. Most posts only reiterated enabling the serial console via raspi-config or /boot/config.txt or highlighted that using the correct baud rate was important. Others recommended disabling bluetooth and remapping the GPIO pins to use the hardware UART on the Raspi 3 which I didn’t want to mess around with. In the end, I found the solution by chance: My Raspi most of the time complained about low voltage, but I couldn’t fix this since none of the power supplies and none of the USB cables that I tried seemed to satisfy the machine. The Raspi worked anyway so I didn’t bother much for the moment. However, I noticed that sometimes the serial console worked for a few seconds and this happend to coincide with the few seconds when there was no “low voltage” warning. In the end, the solution was using a USB-to-TTL adapter with strong enough power supply and short cables (long cables cause voltage loss), which fixed the low voltage problem and with it the unstable serial console connection. It’s important to note that you should disconnect the Raspi’s standard power supply (via mini-USB) before you connect the USB-to-TTL adapter’s power supply. So the moral of the story is that the UART pins of the Raspi (at least of my version 3) seem to be very sensitive to power supply problems and you should always make sure that your Raspi doesn’t complain about “low voltage” when using a serial console connection.Spatially weighted averages in R with sf2021-07-01T15:51:10+02:002021-07-01T15:51:10+02:00https://mkonrad.net/2021/07/01/spatially-weighted-averages-in-r-with-sfClustered standard errors with R2021-05-18T14:38:04+02:002021-05-18T14:38:04+02:00https://mkonrad.net/2021/05/18/clustered-standard-errors-with-r