Metrics
33,620 Downloads
The BSC Dataverse is the institutional research data repository of the Barcelona Supercomputing Center - Centro Nacional de Supercomputación (BSC-CNS). It seeks to enable the storage, sharing, and search of research data coming from the BSC researchers, collaborators, and affiliated projects.
Featured Dataverses

In order to use this feature you must have at least one published or linked dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

Advanced Search

31 to 40 of 107 Results
Nov 3, 2025 - RES Users' Conference 2025
Duró Diaz, Josep Maria, 2025, "Towards an open-access dataset of the flow over realistic urban geometries: high-fidelity simulations and validation", https://doi.org/10.82201/CWCU4Q, BSC Dataverse, V1
This work presents the development of a high-resolution, open-access dataset of urban airflow over a realistic district in Barcelona, based on large-eddy simulations (LES) performed for 16 different wind directions. The simulations are conducted over a highly detailed computational domain that faithfully reproduces the real urban geometry, using gr...
Nov 3, 2025 - A Decade of DerStandard Forum Interactions
Fraxanet Morales, Emma; Gómez, Vicenç; Kaltenbrunner, Andreas; Pellert, Max, 2025, "(Data Records) A Decade of News Forum Interactions: Threaded Conversations, Signed Votes, and Topical Tags", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/P32CXW, BSC Dataverse, V2, UNF:6:MmzkAl6KMTPYXLdJYALuKw== [fileUNF]
This dataset contains the full set of data records described in the "A Decade of News Forum Interactions: Threaded Conversations, Signed Votes, and Topical Tags" publication. It includes the data records for User-level metadata, Comment-level data, Voting behavior, Article metadata, and Pre-computed text embeddings. Additionally, it includes: Annot...
Nov 3, 2025 - Language Technologies Laboratory
De Luca Fornaciari, Francesca; Aleix Sant Savall; Melero, Maite; Villegas, Marta, 2025, "ES-OC_Parallel_Corpus", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/TTRGIC, BSC Dataverse, V2
The ES-OC Parallel Corpus is a synthetic Spanish-Aranese dataset created to support the use of under-resourced languages from Spain, such as Aranese, in NLP tasks, specifically Machine Translation. Aranese is a variant of the Occitan language spoken in the Val d'Aran, Spain, where it is recognised as a co-official language. The dataset can be used...
Nov 3, 2025 - Language Technologies Laboratory
De Luca Fornaciari, Francesca; Aleix Sant Savall; Melero, Maite; Villegas, Marta, 2025, "ES-AST_Parallel_Corpus", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/2BK1NZ, BSC Dataverse, V2
The ES-AST Parallel Corpus is a Spanish-Asturian dataset created to support the use of under-resourced languages from Spain, such as Asturian, in NLP tasks, specifically Machine Translation. This dataset aggregates both synthetic and authentic data, and can be used to train Bilingual Machine Translation models between Asturian and Spanish in any di...
Nov 3, 2025 - Language Technologies Laboratory
De Luca Fornaciari, Francesca; Aleix Sant Savall; Melero, Maite; Villegas, Marta, 2025, "ES-AN_Parallel_Corpus", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/ELCJXZ, BSC Dataverse, V2
The ES-AN Parallel Corpus is a mainly synthetic Spanish-Aragonese dataset created to support the use of under-resourced languages from Spain, such as Aragonese, in NLP tasks, specifically Machine Translation. The dataset can be used to train Bilingual Machine Translation models between Aragonese and Spanish in any direction, as well as Multilingual...
Nov 3, 2025 - Language Technologies Laboratory
De Luca Fornaciari, Francesca; Melero, Maite; Villegas, Marta; Liao, Xixian, 2025, "CA-ZH_Parallel_Corpus", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/MFXKNG, BSC Dataverse, V2
The CA-ZH Parallel Corpus is a Catalan-Chinese textual dataset created to support Catalan in NLP tasks, specifically Machine Translation. The dataset is structured at the sentence level and can be used to train Bilingual Machine Translation models between Chinese and Catalan in any direction, as well as Multilingual Machine Translation models. The...
Nov 3, 2025 - Language Technologies Laboratory
De Luca Fornaciari, Francesca; Mash, Audrey; Melero, Maite; Villegas, Marta, 2025, "CA-IT_Parallel_Corpus", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/BTMN1V, BSC Dataverse, V2
The CA-IT Parallel Corpus is a Catalan-Italian textual dataset created to support Catalan in NLP tasks, specifically Machine Translation. The dataset is structured at the sentence level and can be used to train Bilingual Machine Translation models between Italian and Catalan in any direction, as well as Multilingual Machine Translation models.
Nov 3, 2025 - Language Technologies Laboratory
De Luca Fornaciari, Francesca; Mash, Audrey; Melero, Maite; Villegas, Marta, 2025, "CA-FR_Parallel_Corpus", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/V7AZKB, BSC Dataverse, V2
The CA-FR Parallel Corpus is a Catalan-French textual dataset created to support Catalan in NLP tasks, specifically Machine Translation. The dataset is structured at the sentence level and can be used to train Bilingual Machine Translation models between French and Catalan in any direction, as well as Multilingual Machine Translation models.
Nov 3, 2025 - Language Technologies Laboratory
De Luca Fornaciari, Francesca; Mash, Audrey; Melero, Maite; Villegas, Marta, 2025, "CA-DE_Parallel_Corpus", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/DYAZII, BSC Dataverse, V2
The CA-DE Parallel Corpus is a Catalan-German textual dataset created to support Catalan in NLP tasks, specifically Machine Translation. The dataset is structured at the sentence level and can be used to train Bilingual Machine Translation models between German and Catalan in any direction, as well as Multilingual Machine Translation models.
Nov 3, 2025 - Language Technologies Laboratory
De Luca Fornaciari, Francesca; Mash, Audrey; Melero, Maite; Villegas, Marta, 2025, "CA-PT_Parallel_Corpus", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/XYBQ8Q, BSC Dataverse, V2
The CA-PT Parallel Corpus is a Catalan-Portuguese textual dataset created to support Catalan in NLP tasks, specifically Machine Translation. The dataset is structured at the sentence level and can be used to train Bilingual Machine Translation models between Portuguese and Catalan in any direction, as well as Multilingual Machine Translation models...
Add Data

Sign up or log in to create a dataverse or add a dataset.

Share Dataverse

Share this dataverse on your favorite social media networks.

Link Dataverse
Reset Modifications

Are you sure you want to reset the selected metadata fields? If you do this, any customizations (hidden, required, optional) you have done will no longer appear.