|
Description
|
This dataset and replication package accompany the paper: “Beyond the Link: Assessing LLMs’ Ability to Classify Political Content Across Global Media.” by Alejandro De La Fuente-Cuesta, Alberto Martínez-Serra, R. Nienke Visscher, and Ana S. Cardenal (2025). The materials include the data and code necessary to reproduce all analyses presented in the main text and the Supplementary Information. The study investigates whether large language models (LLMs) can accurately classify political versus non-political news content based solely on URLs, compared to full-text analysis, across five countries (France, Germany, Spain, the UK, and the US). Using web-tracking data and manually coded ground-truth labels, we benchmark multiple state-of-the-art LLMs (Gemma-3-27B, Mistral-3.1-24B, Qwen-32B, Llama-3.1-8B, and DeepSeek-R1-Distill-Qwen-7B) to assess their performance, precision–recall trade-offs, and sources of bias in URL-only classification. The dataset is derived from web-tracking records of news consumption across five democratic countries. A subset of 1,140 URLs was manually coded by human annotators as either Political (POL) or Non-political (NON) content to serve as the gold standard. LLM predictions were then compared against these human labels to compute accuracy, F1, precision, recall, and Cohen’s Kappa metrics. All personal data were anonymized before analysis, and all procedures complied with GDPR and institutional ethical guidelines. Replication Instructions Open the .Rmd files in RStudio (R ≥ 4.2). Install the packages listed in the setup section. Knit the documents to reproduce the corresponding .html outputs. All analyses use open-source R packages and can be fully reproduced on any standard machine. If you use these data or materials, please cite: Martínez-Serra, A., De la Fuente-Cuesta, A., Visscher, R. N., & Cardenal, A. S. (2025). Beyond the Link: Assessing LLMs’ Ability to Classify Political Content Across Global Media. (2025-10-22)
|
|
Notes
| Related Material: https://arxiv.org/pdf/2506.17435 |