BSC AI Factory

BSC AI Factory Data Catalog

BSC AI Factory Software Catalog

Featured Dataverses

In order to use this feature you must have at least one published or linked dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

11 to 20 of 29 Results

CA-EN_Parallel_Corpus Nov 3, 2025 - Language Technologies Laboratory De Luca Fornaciari, Francesca; Villegas, Marta; Melero, Maite; Mash, Audrey, 2025, "CA-EN_Parallel_Corpus", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/ERUHKY, BSC Dataverse, V2 The CA-EN Parallel Corpus is a Catalan-English textual dataset of parallel sentences created to support Catalan in NLP tasks, specifically Machine Translation. The dataset can be used to train Bilingual Machine Translation models between English and Catalan in any direction, as well as Multilingual Machine Translation models.
XitXat Sep 29, 2025 - Language Technologies Laboratory Rodriguez-Penagos, Carlos; Armentano i Oller, Carme; Villegas, Marta, 2025, "XitXat", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/642QYD, BSC Dataverse, V2 XitXat is a conversational dataset consisting of 950 chatbot–user conversations across 10 different domains. The conversations were created using the Wizard-of-Oz method. User interactions are annotated with intents and relevant slots, following the attached annotation guidelines. The dataset is designed to support research in natural language unde...
VeritasQA Sep 29, 2025 - Language Technologies Laboratory Aula-Blasco, Javier; Falcão, Júlia; Villegas, Marta; Sotelo, Susana; Paniagua Suárez, Silvia, 2025, "VeritasQA", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/65WHD9, BSC Dataverse, V2 VeritasQA is a context‑ and time‑independent truthfulness benchmark, comprising 353 questions and answers inspired by common misconceptions that are not tied to a specific country or recent event. It is designed for multilingual transferability, offering versions in Spanish, Catalan, Galician, and English. The benchmark aims to test the tendency of...
EQ-Bench_ca Sep 29, 2025 - Language Technologies Laboratory Rivera Hidalgo de Torralba, Paula; Gonzalez-Agirre, Aitor; Villegas, Marta; Aula-Blasco, Javier; Saiz Antón, José Javier, 2025, "EQ-Bench_ca", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/UECWEX, BSC Dataverse, V2 EQ‑bench_ca is the Catalan translation and linguistic adaptation of EQ‑Bench, a dataset for evaluating emotional reasoning in language models via dialogue prompts. It is intended to reflect how emotional expression and perception vary across languages, enabling evaluation in Catalan.
EQ-bench_es Sep 29, 2025 - Language Technologies Laboratory Saiz Antón, José Javier; Rivera Hidalgo de Torralba, Paula; Gonzalez-Agirre, Aitor; Villegas, Marta; Aula-Blasco, Javier, 2025, "EQ-bench_es", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/PVIYPG, BSC Dataverse, V3 EQ‑bench_es is the Spanish translation and adaptation of EQ‑Bench, designed for evaluating emotional reasoning in language models via dialogue prompts in Spanish.
WikiCAT_ca Sep 29, 2025 - Language Technologies Laboratory Armentano i Oller, Carme; Rodriguez-Penagos, Carlos; Gonzalez-Agirre, Aitor; Villegas, Marta, 2025, "WikiCAT_ca", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/TIODTW, BSC Dataverse, V1 WikiCAT_ca is a text classification dataset in Catalan, automatically constructed from Catalan Wikipedia and Wikidata sources. Its purpose is to support multi-class topic classification of Wikipedia article text. The dataset contains article texts paired with one of 13 topic labels (e.g. “Ciència i Tecnologia”, “Economia”, “Esport”, etc.). It i...
CaBBQ Sep 29, 2025 - Language Technologies Laboratory Ruiz-Fernández, Valle; Gonzalez-Agirre, Aitor; Villegas, Marta; Falcão, Júlia; Vasquez Reina, Luis Antonio, 2025, "CaBBQ", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/1OO2M0, BSC Dataverse, V1 CaBBQ is the Catalan adaptation of the BBQ benchmark, adjusted to Catalan language and the social context of Spain. It aims to evaluate social bias in language models via a multiple-choice QA task, following the same 10 social categories as EsBBQ.
EsBBQ Sep 29, 2025 - Language Technologies Laboratory Ruiz-Fernández, Valle; Gonzalez-Agirre, Aitor; Falcão, Júlia; Vasquez Reina, Luis Antonio; Villegas, Marta, 2025, "EsBBQ", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/MJGCT3, BSC Dataverse, V1 EsBBQ is an adaptation of the the original BBQ benchmark to Spanish and the Spanish social context. It is used to evaluate social bias in language models via a multiple‑choice question answering task along 10 social categories (Age, Disability, Gender, LGBTQIA, Nationality, Physical Appearance, Race/Ethnicity, Religion, Socioeconomic Status, Spanis...
ViquiQuAD Sep 29, 2025 - Language Technologies Laboratory Gutiérrez-Fandiño, Asier; Armengol-Estapé, Jordi; de Gibert, Ona; Gonzalez-Agirre, Aitor; Armentano i Oller, Carme; Rodriguez-Penagos, Carlos; Villegas, Marta, 2025, "ViquiQuAD", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/0DRCY6, BSC Dataverse, V1 ViquiQuAD is an extractive question answering (QA) dataset in Catalan, built from original Catalan Wikipedia articles (i.e. not translations). The dataset contains contexts drawn from Wikipedia, and for each context, 1 to 5 QA pairs in which the answer is a span in the context. It is intended for training and evaluating extractive-QA models in Cata...
Composite Database of Ultra-Large Chemical Libraries Sep 29, 2025 - Life Sciences Filella Merce, Isaac; Guallar, Victor; Isaac Soul Garcia, 2025, "Composite Database of Ultra-Large Chemical Libraries", https://doi.org/10.82201/6043UT, BSC Dataverse, V1 The Composite Database consists of approximately 120 billion molecules sourced from five ultra-large (> 100 million compounds) and nine large publicly available chemical libraries. Developed to support early-stage drug discovery, it is the largest publicly available database of enlisted molecules, readily accessible for efficient analog searches an...

CA-EN_Parallel_Corpus

Nov 3, 2025 - Language Technologies Laboratory

De Luca Fornaciari, Francesca; Villegas, Marta; Melero, Maite; Mash, Audrey, 2025, "CA-EN_Parallel_Corpus", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/ERUHKY, BSC Dataverse, V2

The CA-EN Parallel Corpus is a Catalan-English textual dataset of parallel sentences created to support Catalan in NLP tasks, specifically Machine Translation. The dataset can be used to train Bilingual Machine Translation models between English and Catalan in any direction, as well as Multilingual Machine Translation models.

XitXat

Sep 29, 2025 - Language Technologies Laboratory

Rodriguez-Penagos, Carlos; Armentano i Oller, Carme; Villegas, Marta, 2025, "XitXat", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/642QYD, BSC Dataverse, V2

XitXat is a conversational dataset consisting of 950 chatbot–user conversations across 10 different domains. The conversations were created using the Wizard-of-Oz method. User interactions are annotated with intents and relevant slots, following the attached annotation guidelines. The dataset is designed to support research in natural language unde...

VeritasQA

Sep 29, 2025 - Language Technologies Laboratory

Aula-Blasco, Javier; Falcão, Júlia; Villegas, Marta; Sotelo, Susana; Paniagua Suárez, Silvia, 2025, "VeritasQA", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/65WHD9, BSC Dataverse, V2

VeritasQA is a context‑ and time‑independent truthfulness benchmark, comprising 353 questions and answers inspired by common misconceptions that are not tied to a specific country or recent event. It is designed for multilingual transferability, offering versions in Spanish, Catalan, Galician, and English. The benchmark aims to test the tendency of...

EQ-Bench_ca

Sep 29, 2025 - Language Technologies Laboratory

Rivera Hidalgo de Torralba, Paula; Gonzalez-Agirre, Aitor; Villegas, Marta; Aula-Blasco, Javier; Saiz Antón, José Javier, 2025, "EQ-Bench_ca", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/UECWEX, BSC Dataverse, V2

EQ‑bench_ca is the Catalan translation and linguistic adaptation of EQ‑Bench, a dataset for evaluating emotional reasoning in language models via dialogue prompts. It is intended to reflect how emotional expression and perception vary across languages, enabling evaluation in Catalan.

EQ-bench_es

Sep 29, 2025 - Language Technologies Laboratory

Saiz Antón, José Javier; Rivera Hidalgo de Torralba, Paula; Gonzalez-Agirre, Aitor; Villegas, Marta; Aula-Blasco, Javier, 2025, "EQ-bench_es", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/PVIYPG, BSC Dataverse, V3

EQ‑bench_es is the Spanish translation and adaptation of EQ‑Bench, designed for evaluating emotional reasoning in language models via dialogue prompts in Spanish.

WikiCAT_ca

Sep 29, 2025 - Language Technologies Laboratory

Armentano i Oller, Carme; Rodriguez-Penagos, Carlos; Gonzalez-Agirre, Aitor; Villegas, Marta, 2025, "WikiCAT_ca", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/TIODTW, BSC Dataverse, V1

WikiCAT_ca is a text classification dataset in **Catalan**, automatically constructed from Catalan Wikipedia and Wikidata sources. Its purpose is to support multi-class topic classification of Wikipedia article text. The dataset contains article texts paired with one of 13 topic labels (e.g. “Ciència i Tecnologia”, “Economia”, “Esport”, etc.). It i...

CaBBQ

Sep 29, 2025 - Language Technologies Laboratory

Ruiz-Fernández, Valle; Gonzalez-Agirre, Aitor; Villegas, Marta; Falcão, Júlia; Vasquez Reina, Luis Antonio, 2025, "CaBBQ", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/1OO2M0, BSC Dataverse, V1

CaBBQ is the Catalan adaptation of the BBQ benchmark, adjusted to Catalan language and the social context of Spain. It aims to evaluate social bias in language models via a multiple-choice QA task, following the same 10 social categories as EsBBQ.

EsBBQ

Sep 29, 2025 - Language Technologies Laboratory

Ruiz-Fernández, Valle; Gonzalez-Agirre, Aitor; Falcão, Júlia; Vasquez Reina, Luis Antonio; Villegas, Marta, 2025, "EsBBQ", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/MJGCT3, BSC Dataverse, V1

EsBBQ is an adaptation of the the original BBQ benchmark to Spanish and the Spanish social context. It is used to evaluate social bias in language models via a multiple‑choice question answering task along 10 social categories (Age, Disability, Gender, LGBTQIA, Nationality, Physical Appearance, Race/Ethnicity, Religion, Socioeconomic Status, Spanis...

ViquiQuAD

Sep 29, 2025 - Language Technologies Laboratory

Gutiérrez-Fandiño, Asier; Armengol-Estapé, Jordi; de Gibert, Ona; Gonzalez-Agirre, Aitor; Armentano i Oller, Carme; Rodriguez-Penagos, Carlos; Villegas, Marta, 2025, "ViquiQuAD", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/0DRCY6, BSC Dataverse, V1

ViquiQuAD is an extractive question answering (QA) dataset in Catalan, built from original Catalan Wikipedia articles (i.e. not translations). The dataset contains contexts drawn from Wikipedia, and for each context, 1 to 5 QA pairs in which the answer is a span in the context. It is intended for training and evaluating extractive-QA models in Cata...

Composite Database of Ultra-Large Chemical Libraries

Sep 29, 2025 - Life Sciences

Filella Merce, Isaac; Guallar, Victor; Isaac Soul Garcia, 2025, "Composite Database of Ultra-Large Chemical Libraries", https://doi.org/10.82201/6043UT, BSC Dataverse, V1

The Composite Database consists of approximately 120 billion molecules sourced from five ultra-large (> 100 million compounds) and nine large publicly available chemical libraries. Developed to support early-stage drug discovery, it is the largest publicly available database of enlisted molecules, readily accessible for efficient analog searches an...

Add Data

Share Dataverse

Link Dataverse

Reset Modifications