Information extraction from historical documents

Historical collections contain large troves of documents with wich visual and textual information. Their digitisation is opening up new opportunities to automatically extract contents and pupulate information systems. Machine learning methods need to be adapted to the task, since challenges include input noise, source and domain variety, lack of linguistic resources. This open-ended project aims to design, develop and test machine learning methods for the automatic extraction of information from historical documents.

Methodologies of interest also include transfer learning, to re-use existing resources, active learning, to leverage expert knowledge in training AIs, and eXplainable AI, to open the AI black-box to human inspection and understanding.