EXAMINING POLITICAL POLARIZATION IN THE GERMAN BUNDESTAG USING LARGE LANGUAGE MODELS: HISTORICAL TRENDS AND A CONTEMPORARY ANALYSIS

Overview

This repository contains the code and resources for our thesis, which focuses on analyzing polarization in the speeches of the German Bundestag. We aim to detect, evaluate and analyse polarizing language through sentiment analysis, structural analysis, and fine-tuned transformer models.

Contributors

Key Components

Data Processing Pipelines
The repository includes robust pipelines for ingesting, cleaning, and preprocessing all speeches provided by the german bundestag, leveraging the open discourse project.
Analyses
- Analysis of electoral terms 17-20, as well as electoral term 20 in-depth.
- Sentiment analysis leveraging dictionaries and a bag-of-words approach.
- Structural analysis to identify patterns and trends in speech interruptions, reactions, and debate types.
Fine-Tuned Transformer Models
- BERT, GPT 4o Mini and LLAMA 3.1 8B Instruct: LLMs fine-tuned to detect polarizing language in Bundestag speeches based on our manually labelled corpus.
Model Comparison
Provides a comparative analysis of the different transformer models implemented, highlighting their varying effectiveness in identifying polarization individually and using different ensemble approaches.

Repository Structure

1. `analyses/`

Contains Jupyter Notebooks for key analyses:
- RQ1_Developments.ipynb: Analysis of electoral terms 17-20.
- RQ2_Electoral_Term_20.ipynb: Analysis specific to the 20th electoral term.
- Sentiment_Analysis.ipynb: Detailed Analysis of speech sentiment.
- Structural_Analysis.ipynb: Analysis of structural aspects of different kinds of bundestag debates, interruptions, reactions captured by the stenographs.

2. `data/`

Contains scripts for scraping raw data from the bundestag API, scripts for data extraction, preprocessing, cleaning and exploratory data analysis
- open_discourse/: API calling and XML parsers for data ingestion, preprocessing and cleaning pipelines for data transformation. Leans heavily on the open discourse project.
- postprocessing_speeches/: Collecting corpus statistics and corpus paraphrasing.
- preprocessing/: Scripts for manual and automated data labeling.
- 05-staedte.xlsx: Dataset taken from Statistisches Bundesamt (2022) to map birthplaces of politicians for data enrichment.
- Data_Cleaning.ipynb Data cleaning and exploratory data analysis.
- Master_Dataframe.ipynb Data integration for further analysis.

3. `model_comparison/`

Evaluation notebooks and data comparing fine-tuned transformer models:
- BERTval.xlsx, GPTval.xlsx, LLAMAval.csv: Validation set output of the fine-tuned transformer models.
- Model_Comparison.ipynb: Comprehensive analysis and comparison of model results.

4. `sentiment/`

Resources for sentiment analysis:
- sent_dictionary_1.csv: Sentiment dictionary as taken from Haselmayer et al. (2017)
- sent_dictionary_2.csv: Sentiment dictionary as taken from Rauh (2018)
- sentiment_score_calculation.ipynb: Implementation of sentiment score calculations.

5. `topic_modelling/`

Includes resources for topic modeling:
- topic_list.csv: Topics and associated words with weights identified from speeches.
- topic_modelling.ipynb: Implementation of topic modeling using LDA.

6. `transformer_models/`

Contains implementation and fine-tuning of transformer models:
- BERT/
- GPT/
- LLAMA/
- Tokenization.ipynb: Notebook to segment longer speeches for model ingestion.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EXAMINING POLITICAL POLARIZATION IN THE GERMAN BUNDESTAG USING LARGE LANGUAGE MODELS: HISTORICAL TRENDS AND A CONTEMPORARY ANALYSIS

Overview

Contributors

Key Components

Repository Structure

1. `analyses/`

2. `data/`

3. `model_comparison/`

4. `sentiment/`

5. `topic_modelling/`

6. `transformer_models/`

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 151 Commits
analyses		analyses
data		data
docs		docs
model_comparison		model_comparison
sentiment		sentiment
topic_modelling		topic_modelling
transformer_models		transformer_models
.gitignore		.gitignore
README.md		README.md

committopush/Thesis

Folders and files

Latest commit

History

Repository files navigation

EXAMINING POLITICAL POLARIZATION IN THE GERMAN BUNDESTAG USING LARGE LANGUAGE MODELS: HISTORICAL TRENDS AND A CONTEMPORARY ANALYSIS

Overview

Contributors

Key Components

Repository Structure

1. analyses/

2. data/

3. model_comparison/

4. sentiment/

5. topic_modelling/

6. transformer_models/

About

Topics

Resources

Stars

Watchers

Forks

Languages

1. `analyses/`

2. `data/`

3. `model_comparison/`

4. `sentiment/`

5. `topic_modelling/`

6. `transformer_models/`