A Knowledge Graph for Spanish American Notary Records

To enable easy access to 17th century Spanish American notary records using deep learning and knowledge management technologies

Using recent advances in deep learning and knowledge management we will develop a tool to manage and analyze about 220,000 pages of digital images of seventeenth-century manuscripts available at the Archivo General de la República Argentina (National Archives) located in Buenos Aires. This software will enable twenty-first century scholars to expeditiously read and analyze seventeenth century Spanish American notary records and efficiently find relevant content in these documentary collections.

Our Challenges

The documents contained in this depository combine a large variety of handwritten scripts.

Scans contain different types of noise including discoloration, stains, as well as ink bleeds and smudges.

Written in cursive, historical scripts usually employ irregular characters and capitalization, abbreviations, archaic spelling, and linked words.

Preprocessing techniques are applied to clean the images without affecting the written content.

Paleography experts actively engage in the process of information extraction to obtain accurate information from the images.

Our Solution

Optical character recognition (ORC) is used to automatically convert printed or handwritten text into machine-readable, editable, and searchable text. In order to enable OCR tasks, researchers apply different methods. In recent years, deep learning has achieved remarkable success for image understanding and classification, image segmentation, speech recognition, and natural language processing.

Our software for fast and effective analysis of seventeenth century notary records

Acknowledgments
We thank the National Endowment for Humanities (NEH, Grant No. HAA-271747-20 and Grant No. HAA-287903-22). Missouri Institute for Defense and Energy (UMKC MIDE), UMKC Funding for Excellence Program, UMKC/IDEAS Collaborative Data Science Grant, and the University of Missouri System Tier 3 Strategic Investment Grant for supporting this project

This is an ongoing collaboration between University of Missouri-Kansas City, the University of Missouri-Columbia, and the National Archives of Argentina.

This work was supported by the National Endowment for the Humanities under Grant No. HAA-271747-20.