AMSMB HTR model for Medieval Notarial Manuscripts (perma:BSC/AQRKOH)
(AMSMB HTR model)

View:

Part 1: Document Description
Part 2: Study Description
Part 5: Other Study-Related Materials
Entire Codebook

Document Description

Citation

Title:

AMSMB HTR model for Medieval Notarial Manuscripts

Identification Number:

perma:BSC/AQRKOH

Distributor:

BSC Dataverse

Date of Distribution:

2025-06-26

Version:

2

Bibliographic Citation:

Berganzo-Besga, Iban; Coll Ardanuy, Mariona, 2025, "AMSMB HTR model for Medieval Notarial Manuscripts", https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/AQRKOH, BSC Dataverse, V2

Study Description

Citation

Title:

AMSMB HTR model for Medieval Notarial Manuscripts

Alternative Title:

AMSMB HTR model

Identification Number:

perma:BSC/AQRKOH

Authoring Entity:

Berganzo-Besga, Iban (https://ror.org/05sd8tv96)

Coll Ardanuy, Mariona (https://ror.org/05sd8tv96)

Other identifications and acknowledgements:

Arxiu dels Marquesos de Santa Maria de Barberà

Other identifications and acknowledgements:

Arxiu Municipal de Vilassar de Dalt

Producer:

Barcelona Supercomputing Center

Date of Production:

2025-03-14

Software used in Production:

python

Software used in Production:

kraken

Software used in Production:

ketos

Grant Number:

C005/24-ED CV1

Distributor:

BSC Dataverse

Distributor:

Barcelona Supercomputing Center

Access Authority:

Iban Berganzo-Besga

Depositor:

Coll Ardanuy, Mariona

Date of Deposit:

2025-06-25

Date of Distribution:

2025-06-25

Holdings Information:

https://dataverse.bsc.es/dataset.xhtml?persistentId=perma:BSC/AQRKOH

Study Scope

Keywords:

Arts and Humanities, handwritten text recognition, Late Middle Ages, automatic transcription

Topic Classification:

medieval history, artificial intelligence, archival history, diplomatics, paleography, document analysis and recognition

Abstract:

AMSMB HTR is a model trained using <a href="https://kraken.re/main/index.html">Kraken</a>. The model uses the <a href="https://zenodo.org/records/10788591">Tridis v1</a> model as base and is fine-tuned on the <a href="https://dataverse.bsc.es/citation?persistentId=perma:BSC/0VB0MC">AMSMB HTR dataset</a> (model named <i>ft:v1</i> on the paper). Details on how the model was trained are reported in our <a href="https://gitlab.bsc.es/cssh/releases/medieval-transcription/">GitLab repository</a> and in our paper (see citation information below).

Time Period:

1208-01-01-1499-12-31

Date of Collection:

2024-08-01-2025-03-14

Country:

Spain, Spain

Geographic Coverage:

Catalonia, Balearic Islands

Geographic Bounding Box:

  • West Bounding Longitude: -1.27
  • East Bounding Longitude: 4.64
  • South Bounding Latitude: 38.37
  • North Bounding Latitude: 43.05

Kind of Data:

Kraken handwritten text recognition model (.mlmodel)

Notes:

<b>Related Datasets:</b> AMSMB HTR dataset: <a href="https://dataverse.bsc.es/citation?persistentId=perma:BSC/0VB0MC"> https://dataverse.bsc.es/citation?persistentId=perma:BSC/0VB0MC </a>

Methodology and Processing

Sources Statement

Origins of Sources:

Arxiu dels Marquesos de Santa Maria de Barberà: https://arxiumarquesosdebarbera.cat/

Documentation and Access to Sources:

The digitized images that have been used to train this model have been provided by the Arxiu Municipal de Vilassar de Dalt (AMVD), through its agreement with the Arxiu dels Marquesos de Santa Maria de Barberà (AMSMB). For further details, check the dataset datasheet at: https://dataverse.bsc.es/citation?persistentId=perma:BSC/0VB0MC.

Data Access

Notes:

<a href="http://creativecommons.org/licenses/by-sa/4.0">CC BY-SA 4.0</a>

Other Study Description Materials

Related Publications

Citation

Title:

Mariona Coll Ardanuy, Iban Berganzo-Besga, Ramon Sarobe, and Coral Cuadrada. 2025 (forthcoming). Evaluating Handwritten Text Recognition in Medieval Notarial Manuscripts: A New Dataset and Comprehensive Analysis. In International Conference on Document Analysis and Recognition.

Bibliographic Citation:

Mariona Coll Ardanuy, Iban Berganzo-Besga, Ramon Sarobe, and Coral Cuadrada. 2025 (forthcoming). Evaluating Handwritten Text Recognition in Medieval Notarial Manuscripts: A New Dataset and Comprehensive Analysis. In International Conference on Document Analysis and Recognition.

Other Study-Related Materials

Label:

htr_amsmb.mlmodel

Notes:

application/octet-stream

Other Study-Related Materials

Label:

README.md

Notes:

text/markdown