Featured Dataverses

In order to use this feature you must have at least one published or linked dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

Advanced Search

1 to 10 of 29 Results
BSC AI Factory Software Catalog(Barcelona Supercomputing Center)
Dec 3, 2025
BSC AI Factory Data Catalog(Barcelona Supercomputing Center)
Dec 3, 2025
Nov 3, 2025 - Language Technologies Laboratory
De Luca Fornaciari, Francesca; Aleix Sant Savall; Melero, Maite; Villegas, Marta, 2025, "ES-OC_Parallel_Corpus", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/TTRGIC, BSC Dataverse, V2
The ES-OC Parallel Corpus is a synthetic Spanish-Aranese dataset created to support the use of under-resourced languages from Spain, such as Aranese, in NLP tasks, specifically Machine Translation. Aranese is a variant of the Occitan language spoken in the Val d'Aran, Spain, where it is recognised as a co-official language. The dataset can be used...
Nov 3, 2025 - Language Technologies Laboratory
De Luca Fornaciari, Francesca; Aleix Sant Savall; Melero, Maite; Villegas, Marta, 2025, "ES-AST_Parallel_Corpus", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/2BK1NZ, BSC Dataverse, V2
The ES-AST Parallel Corpus is a Spanish-Asturian dataset created to support the use of under-resourced languages from Spain, such as Asturian, in NLP tasks, specifically Machine Translation. This dataset aggregates both synthetic and authentic data, and can be used to train Bilingual Machine Translation models between Asturian and Spanish in any di...
Nov 3, 2025 - Language Technologies Laboratory
De Luca Fornaciari, Francesca; Aleix Sant Savall; Melero, Maite; Villegas, Marta, 2025, "ES-AN_Parallel_Corpus", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/ELCJXZ, BSC Dataverse, V2
The ES-AN Parallel Corpus is a mainly synthetic Spanish-Aragonese dataset created to support the use of under-resourced languages from Spain, such as Aragonese, in NLP tasks, specifically Machine Translation. The dataset can be used to train Bilingual Machine Translation models between Aragonese and Spanish in any direction, as well as Multilingual...
Nov 3, 2025 - Language Technologies Laboratory
De Luca Fornaciari, Francesca; Melero, Maite; Villegas, Marta; Liao, Xixian, 2025, "CA-ZH_Parallel_Corpus", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/MFXKNG, BSC Dataverse, V2
The CA-ZH Parallel Corpus is a Catalan-Chinese textual dataset created to support Catalan in NLP tasks, specifically Machine Translation. The dataset is structured at the sentence level and can be used to train Bilingual Machine Translation models between Chinese and Catalan in any direction, as well as Multilingual Machine Translation models. The...
Nov 3, 2025 - Language Technologies Laboratory
De Luca Fornaciari, Francesca; Mash, Audrey; Melero, Maite; Villegas, Marta, 2025, "CA-IT_Parallel_Corpus", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/BTMN1V, BSC Dataverse, V2
The CA-IT Parallel Corpus is a Catalan-Italian textual dataset created to support Catalan in NLP tasks, specifically Machine Translation. The dataset is structured at the sentence level and can be used to train Bilingual Machine Translation models between Italian and Catalan in any direction, as well as Multilingual Machine Translation models.
Nov 3, 2025 - Language Technologies Laboratory
De Luca Fornaciari, Francesca; Mash, Audrey; Melero, Maite; Villegas, Marta, 2025, "CA-FR_Parallel_Corpus", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/V7AZKB, BSC Dataverse, V2
The CA-FR Parallel Corpus is a Catalan-French textual dataset created to support Catalan in NLP tasks, specifically Machine Translation. The dataset is structured at the sentence level and can be used to train Bilingual Machine Translation models between French and Catalan in any direction, as well as Multilingual Machine Translation models.
Nov 3, 2025 - Language Technologies Laboratory
De Luca Fornaciari, Francesca; Mash, Audrey; Melero, Maite; Villegas, Marta, 2025, "CA-DE_Parallel_Corpus", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/DYAZII, BSC Dataverse, V2
The CA-DE Parallel Corpus is a Catalan-German textual dataset created to support Catalan in NLP tasks, specifically Machine Translation. The dataset is structured at the sentence level and can be used to train Bilingual Machine Translation models between German and Catalan in any direction, as well as Multilingual Machine Translation models.
Nov 3, 2025 - Language Technologies Laboratory
De Luca Fornaciari, Francesca; Mash, Audrey; Melero, Maite; Villegas, Marta, 2025, "CA-PT_Parallel_Corpus", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/XYBQ8Q, BSC Dataverse, V2
The CA-PT Parallel Corpus is a Catalan-Portuguese textual dataset created to support Catalan in NLP tasks, specifically Machine Translation. The dataset is structured at the sentence level and can be used to train Bilingual Machine Translation models between Portuguese and Catalan in any direction, as well as Multilingual Machine Translation models...
Add Data

Sign up or log in to create a dataverse or add a dataset.

Share Dataverse

Share this dataverse on your favorite social media networks.

Link Dataverse
Reset Modifications

Are you sure you want to reset the selected metadata fields? If you do this, any customizations (hidden, required, optional) you have done will no longer appear.