|
View: |
Part 1: Document Description
|
|
Citation |
|
|---|---|
|
Title: |
Replication Data for: Beyond the Link: Assessing LLMs’ Ability to Classify Political Content Across Global Media |
|
Identification Number: |
doi:10.82201/8UPPY6 |
|
Distributor: |
BSC Dataverse |
|
Date of Distribution: |
2025-10-23 |
|
Version: |
2 |
|
Bibliographic Citation: |
DE LA FUENTE CUESTA, ALEJANDRO; Alberto Martínez Serra; Nienke Visscher; Cardenal, Ana S., 2025, "Replication Data for: Beyond the Link: Assessing LLMs’ Ability to Classify Political Content Across Global Media", https://doi.org/10.82201/8UPPY6, BSC Dataverse, V2, UNF:6:j4VEdzl/g0wp+AX0/Pik1w== [fileUNF] |
|
Citation |
|
|
Title: |
Replication Data for: Beyond the Link: Assessing LLMs’ Ability to Classify Political Content Across Global Media |
|
Identification Number: |
doi:10.82201/8UPPY6 |
|
Authoring Entity: |
DE LA FUENTE CUESTA, ALEJANDRO (Barcelona Supercomputing Center) |
|
Alberto Martínez Serra (https://ror.org/05sd8tv96) |
|
|
Nienke Visscher (Barcelona Supercomputing Center) |
|
|
Cardenal, Ana S. (https://ror.org/01f5wp925) |
|
|
Date of Production: |
2025-10-22 |
|
Software used in Production: |
R |
|
Grant Number: |
C005/24-ED CV1 |
|
Distributor: |
BSC Dataverse |
|
Access Authority: |
DE LA FUENTE CUESTA, ALEJANDRO |
|
Access Authority: |
Ana Sofia Cardenal |
|
Depositor: |
DE LA FUENTE CUESTA, ALEJANDRO |
|
Date of Deposit: |
2025-10-22 |
|
Holdings Information: |
https://doi.org/10.82201/8UPPY6 |
|
Study Scope |
|
|
Keywords: |
Social Sciences, Large Language Models, Media analysis, Political Content, URL |
|
Topic Classification: |
Large Language Models |
|
Abstract: |
This dataset and replication package accompany the paper: “Beyond the Link: Assessing LLMs’ Ability to Classify Political Content Across Global Media.” by Alejandro De La Fuente-Cuesta, Alberto Martínez-Serra, R. Nienke Visscher, and Ana S. Cardenal (2025). The materials include the data and code necessary to reproduce all analyses presented in the main text and the Supplementary Information. The study investigates whether large language models (LLMs) can accurately classify political versus non-political news content based solely on URLs, compared to full-text analysis, across five countries (France, Germany, Spain, the UK, and the US). Using web-tracking data and manually coded ground-truth labels, we benchmark multiple state-of-the-art LLMs (Gemma-3-27B, Mistral-3.1-24B, Qwen-32B, Llama-3.1-8B, and DeepSeek-R1-Distill-Qwen-7B) to assess their performance, precision–recall trade-offs, and sources of bias in URL-only classification. The dataset is derived from web-tracking records of news consumption across five democratic countries. A subset of 1,140 URLs was manually coded by human annotators as either Political (POL) or Non-political (NON) content to serve as the gold standard. LLM predictions were then compared against these human labels to compute accuracy, F1, precision, recall, and Cohen’s Kappa metrics. All personal data were anonymized before analysis, and all procedures complied with GDPR and institutional ethical guidelines. Replication Instructions Open the .Rmd files in RStudio (R ≥ 4.2). Install the packages listed in the setup section. Knit the documents to reproduce the corresponding .html outputs. All analyses use open-source R packages and can be fully reproduced on any standard machine. If you use these data or materials, please cite: Martínez-Serra, A., De la Fuente-Cuesta, A., Visscher, R. N., & Cardenal, A. S. (2025). Beyond the Link: Assessing LLMs’ Ability to Classify Political Content Across Global Media. |
|
Time Period: |
2022-02-22-2022-06-05 |
|
Unit of Analysis: |
LLMs |
|
Universe: |
Top open source LLMs by April 2025 |
|
Kind of Data: |
News |
|
Notes: |
<b>Related Material:</b> <a href="https://arxiv.org/pdf/2506.17435"> https://arxiv.org/pdf/2506.17435 </a> |
|
Methodology and Processing |
|
|
Type of Research Instrument: |
Structutred |
|
Sources Statement |
|
|
Data Access |
|
|
Notes: |
<a href="http://creativecommons.org/licenses/by/4.0">CC BY 4.0</a> |
|
Other Study Description Materials |
|
|
File Description--f8917 |
|
|
File: metadata_llm_classification.tab |
|
|
|
|
Notes: |
UNF:6:3JOQw070RCv+caIx4qwiRg== |
|
The data needed for the replication of the main text. |
|
|
File Description--f8914 |
|
|
File: SI_metadata_llm_classification.tab |
|
|
|
|
Notes: |
UNF:6:VV9+42PqeCoCKyXOS7EEPQ== |
|
The data needed for the replication of the Supplementary Information. |
|
|
List of Variables: |
|
|
Variables |
|
|
f8917 Location: |
Variable Format: character Notes: UNF:6:nuhhgLaoWxwcOIckIpUuSQ== |
|
f8917 Location: |
Variable Format: character Notes: UNF:6:NVbZ2uYlVkWaBa4vh2DexA== |
|
f8917 Location: |
Variable Format: character Notes: UNF:6:xfGwQdIbHeQ4zDMcXX7DlA== |
|
f8917 Location: |
Variable Format: character Notes: UNF:6:xpbKpjnW4oqUn66QAOj3Lg== |
|
f8917 Location: |
Variable Format: character Notes: UNF:6:WmkGslO9HgITyAY0jDrjhA== |
|
f8917 Location: |
Variable Format: character Notes: UNF:6:2nEgjXh39uUvlMa5HFBsfA== |
|
f8917 Location: |
Variable Format: character Notes: UNF:6:LHUdxLsCnVBweNYBfLyeVw== |
|
f8917 Location: |
Variable Format: character Notes: UNF:6:8NJXkN70+3znlX++JuBrLQ== |
|
f8917 Location: |
Variable Format: character Notes: UNF:6:kBzBPdf3ZsU6oE7yfl1GAw== |
|
f8917 Location: |
Variable Format: character Notes: UNF:6:T0FyBqX2zNHqkQAa/khLkw== |
|
f8917 Location: |
Summary Statistics: Mean 7.2656551724137115; Min. 1.0; Valid 7250.0; Max. 705.0; StDev 28.473690455640313 Variable Format: numeric Notes: UNF:6:7inlR1If6qg3pFl9lcXRag== |
|
f8914 Location: |
Variable Format: character Notes: UNF:6:3KSj/q2CDB3AGg128S1qyQ== |
|
f8914 Location: |
Variable Format: character Notes: UNF:6:AxacQbjXHyjoDx47hU8kBA== |
|
f8914 Location: |
Variable Format: character Notes: UNF:6:lz5MN1i7ouXUi0jOA/FEGw== |
|
f8914 Location: |
Variable Format: character Notes: UNF:6:yVqEvQGJQaOqcOyXd+P9YA== |
|
f8914 Location: |
Variable Format: character Notes: UNF:6:YW91E5doPz+IQMrK/BzHIw== |
|
f8914 Location: |
Variable Format: character Notes: UNF:6:33G3MteSM+tBYnvjKXhcGg== |
|
f8914 Location: |
Variable Format: character Notes: UNF:6:vN1DXvPN0sP81v//QT0zgQ== |
|
f8914 Location: |
Variable Format: character Notes: UNF:6:vKT2kuHcOofe6bf6j8uaeg== |
|
f8914 Location: |
Variable Format: character Notes: UNF:6:RDuCvGZZrgIxCzIaSC6o7g== |
|
f8914 Location: |
Variable Format: character Notes: UNF:6:MNtSGg18LvdTqE9Z1j4G2w== |
|
f8914 Location: |
Variable Format: character Notes: UNF:6:gfcKP/ejOyo5dIpqgPhX3A== |
|
f8914 Location: |
Variable Format: character Notes: UNF:6:EElL1HlimSs4MqIP7G8CdQ== |
|
f8914 Location: |
Variable Format: character Notes: UNF:6:FbH76btPuTn3jXbU7XHEfA== |
|
Label: |
README.md |
|
Notes: |
text/markdown |
|
Label: |
Replication_Material.html |
|
Text: |
Replication HTML of all the tables and figures in the paper. |
|
Notes: |
text/html |
|
Label: |
Replication_Material.Rmd |
|
Text: |
Replication Code of all the tables and figures in the paper. |
|
Notes: |
text/x-r-notebook |
|
Label: |
Replication_SI.html |
|
Text: |
The HTML with the code and results of the Supplementary Information. |
|
Notes: |
text/html |
|
Label: |
Replication_SI.Rmd |
|
Text: |
The R code for the replication of the Supplementary Information. |
|
Notes: |
text/x-r-notebook |