Small Facebook Discussion Analysis Toolkit

2015-12-01 • facebook, scraping, language processing, python, php

The Small Facebook Discussion Analysis Toolkit is a collection of tools to (semi-)automatically collect and analyze data from online discussions on Facebook groups and pages. This means that posts and comments (including their hierarchical structure and some metadata) are collected using either the Facebook API for public groups and pages or by parsing Facebook’s HTML files for closed groups and pages. The data is saved in a JSON format and can be used for different analyses. By now, only counting nouns in all the posts and comments in German language is supported.

Data from public groups and pages is collected using the Facebook Graph API in some PHP scripts. Data from closed groups and pages can only be retrieved by parsing Facebook’s generated HTML pages. This is done with some Python scripts.

Noun identification and counting is also done with Python. The basic approach on how the nouns are extracted for analysis is described in this blog post.

The source code can be examined in the github repository.