Metrics
33,620 Downloads
Featured Dataverses

In order to use this feature you must have at least one published or linked dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

Advanced Search

1 to 3 of 3 Results
Sep 18, 2025 - Language Technologies Laboratory
De Luca Fornaciari, Francesca; Mash, Audrey; Melero, Maite; Villegas, Marta, 2025, "CA-GL_Parallel_Corpus", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/VUTENU, BSC Dataverse, V1
The CA-GL Parallel Corpus is a Catalan-Galician synthetic dataset of parallel sentences created to support the use of co-official languages from Spain, such as Catalan and Galician, in NLP tasks, specifically Machine Translation. The dataset can be used to train Bilingual Machine Translation models between Galician and Catalan in any direction, as...
Sep 18, 2025 - Language Technologies Laboratory
De Luca Fornaciari, Francesca; Mash, Audrey; Melero, Maite; Villegas, Marta, 2025, "CA-EU_Parallel_Corpus", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/A9UJA9, BSC Dataverse, V1
The CA-EU Parallel Corpus is a Catalan-Basque synthetic dataset of parallel sentences created to support the use of co-official languages from Spain, such as Catalan and Basque, in NLP tasks, specifically Machine Translation. The dataset can be used to train Bilingual Machine Translation models between Basque and Catalan in any direction, as well a...
Sep 16, 2025 - Language Technologies Laboratory
Saiz Antón, José Javier; Palomar-Giner, Jorge; Villegas, Marta, 2025, "CATalog", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/FAFYBH, BSC Dataverse, V2
CATalog is a diverse, open-source Catalan corpus for language modelling. It consists of text documents from 26 different sources, including web crawling, news, forums, digital libraries and public institutions, totaling in 17.45 billion words.
Add Data

Sign up or log in to create a dataverse or add a dataset.

Share Dataverse

Share this dataverse on your favorite social media networks.

Link Dataverse
Reset Modifications

Are you sure you want to reset the selected metadata fields? If you do this, any customizations (hidden, required, optional) you have done will no longer appear.