1 to 10 of 31 Results
Nov 3, 2025 - Language Technologies Laboratory
De Luca Fornaciari, Francesca; Aleix Sant Savall; Melero, Maite; Villegas, Marta, 2025, "ES-OC_Parallel_Corpus", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/TTRGIC, BSC Dataverse, V2
The ES-OC Parallel Corpus is a synthetic Spanish-Aranese dataset created to support the use of under-resourced languages from Spain, such as Aranese, in NLP tasks, specifically Machine Translation. Aranese is a variant of the Occitan language spoken in the Val d'Aran, Spain, where it is recognised as a co-official language. The dataset can be used... |
Nov 3, 2025 - Language Technologies Laboratory
De Luca Fornaciari, Francesca; Aleix Sant Savall; Melero, Maite; Villegas, Marta, 2025, "ES-AST_Parallel_Corpus", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/2BK1NZ, BSC Dataverse, V2
The ES-AST Parallel Corpus is a Spanish-Asturian dataset created to support the use of under-resourced languages from Spain, such as Asturian, in NLP tasks, specifically Machine Translation. This dataset aggregates both synthetic and authentic data, and can be used to train Bilingual Machine Translation models between Asturian and Spanish in any di... |
Nov 3, 2025 - Language Technologies Laboratory
De Luca Fornaciari, Francesca; Aleix Sant Savall; Melero, Maite; Villegas, Marta, 2025, "ES-AN_Parallel_Corpus", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/ELCJXZ, BSC Dataverse, V2
The ES-AN Parallel Corpus is a mainly synthetic Spanish-Aragonese dataset created to support the use of under-resourced languages from Spain, such as Aragonese, in NLP tasks, specifically Machine Translation. The dataset can be used to train Bilingual Machine Translation models between Aragonese and Spanish in any direction, as well as Multilingual... |
Nov 3, 2025 - Language Technologies Laboratory
De Luca Fornaciari, Francesca; Melero, Maite; Villegas, Marta; Liao, Xixian, 2025, "CA-ZH_Parallel_Corpus", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/MFXKNG, BSC Dataverse, V2
The CA-ZH Parallel Corpus is a Catalan-Chinese textual dataset created to support Catalan in NLP tasks, specifically Machine Translation. The dataset is structured at the sentence level and can be used to train Bilingual Machine Translation models between Chinese and Catalan in any direction, as well as Multilingual Machine Translation models. The... |
Nov 3, 2025 - Language Technologies Laboratory
De Luca Fornaciari, Francesca; Mash, Audrey; Melero, Maite; Villegas, Marta, 2025, "CA-IT_Parallel_Corpus", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/BTMN1V, BSC Dataverse, V2
The CA-IT Parallel Corpus is a Catalan-Italian textual dataset created to support Catalan in NLP tasks, specifically Machine Translation. The dataset is structured at the sentence level and can be used to train Bilingual Machine Translation models between Italian and Catalan in any direction, as well as Multilingual Machine Translation models. |
Nov 3, 2025 - Language Technologies Laboratory
De Luca Fornaciari, Francesca; Mash, Audrey; Melero, Maite; Villegas, Marta, 2025, "CA-FR_Parallel_Corpus", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/V7AZKB, BSC Dataverse, V2
The CA-FR Parallel Corpus is a Catalan-French textual dataset created to support Catalan in NLP tasks, specifically Machine Translation. The dataset is structured at the sentence level and can be used to train Bilingual Machine Translation models between French and Catalan in any direction, as well as Multilingual Machine Translation models. |
Nov 3, 2025 - Language Technologies Laboratory
De Luca Fornaciari, Francesca; Mash, Audrey; Melero, Maite; Villegas, Marta, 2025, "CA-DE_Parallel_Corpus", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/DYAZII, BSC Dataverse, V2
The CA-DE Parallel Corpus is a Catalan-German textual dataset created to support Catalan in NLP tasks, specifically Machine Translation. The dataset is structured at the sentence level and can be used to train Bilingual Machine Translation models between German and Catalan in any direction, as well as Multilingual Machine Translation models. |
Nov 3, 2025 - Language Technologies Laboratory
De Luca Fornaciari, Francesca; Mash, Audrey; Melero, Maite; Villegas, Marta, 2025, "CA-PT_Parallel_Corpus", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/XYBQ8Q, BSC Dataverse, V2
The CA-PT Parallel Corpus is a Catalan-Portuguese textual dataset created to support Catalan in NLP tasks, specifically Machine Translation. The dataset is structured at the sentence level and can be used to train Bilingual Machine Translation models between Portuguese and Catalan in any direction, as well as Multilingual Machine Translation models... |
Nov 3, 2025 - Language Technologies Laboratory
De Luca Fornaciari, Francesca; Villegas, Marta; Melero, Maite; Mash, Audrey, 2025, "CA-EN_Parallel_Corpus", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/ERUHKY, BSC Dataverse, V2
The CA-EN Parallel Corpus is a Catalan-English textual dataset of parallel sentences created to support Catalan in NLP tasks, specifically Machine Translation. The dataset can be used to train Bilingual Machine Translation models between English and Catalan in any direction, as well as Multilingual Machine Translation models. |
Sep 29, 2025 - Language Technologies Laboratory
Rodriguez-Penagos, Carlos; Armentano i Oller, Carme; Villegas, Marta, 2025, "XitXat", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/642QYD, BSC Dataverse, V2
XitXat is a conversational dataset consisting of 950 chatbot–user conversations across 10 different domains. The conversations were created using the Wizard-of-Oz method. User interactions are annotated with intents and relevant slots, following the attached annotation guidelines. The dataset is designed to support research in natural language unde... |
