Information extraction from historical collections

Last updated on Nov 2, 2023

Gallery, Library, Archival, and Museum (GLAM) collections contain large troves of visual and textual information. Their digitisation is opening up new opportunities to automatically extract contents and create knowledge graphs. Machine learning methods need to be adapted to the task, since challenges include input noise, source and domain variety, lack of linguistic resources. This open-ended project aims to design and develop machine learning methods for the automatic extraction of information for GLAM collections. Tasks of interest include text recognition, named entity recognition, object detection, similarity search.

Methodologically, I work with transfer learning, to re-use existing resources, active learning, to leverage expert knowledge while training models, and eXplainable AI, to open models to human inspection and understanding.

Machine Learning

Information extraction from historical collections

Related

Publications

Unsilencing colonial archives via automated entity recognition

Archives and AI: An Overview of Current Debates and Future Perspectives