Virtual Record Treasury of Ireland

Research Guide

Knowledge Graph

Research Guide Subtitle: Knowledge Graph for Irish History Research Guide
Contributors: Lynn Kilgallon, Fabrizio Orlandi, Katie Miller, Peter Crooks, Declan O’Sullivan
First published: 2022

What is the Knowledge Graph for Irish History?

The Knowledge Graph for Irish History is a dynamic tool that allows people to explore the information contained in historical records and databases. It contains robust, peer-reviewed historical data, which is presented using an ontology (or linguistic framework) specifically designed for historical research. The Knowledge Graph is the first resource of this kind to exist for Irish historical research. Through exploring the Knowledge Graph, users can discover new connections in Ireland’s past. 

What value does the graph add to the Virtual Record Treasury of Ireland?

The Knowledge Graph provides an easy access point for historians and the interested public to explore the rich information contained within the Virtual Record Treasury of Ireland. The Virtual Record Treasury of Ireland on its own is a rich database of records on Irish history. With the addition of the Knowledge Graph, the Virtual Record Treasury also becomes a valuable tool for conducting historical research.

The Knowledge Graph, for example, allows us to identify connections between people, places and offices across time. It is as if you were able to read every document in the Virtual Record Treasury and keep all the information in your head. If you could do so then you might spot that the people who served as sheriff in a certain county across many decades were all part of the one extended family, or were all members of the same club or association, revealing hidden social trends or political networks.

What is its future potential?

The Knowledge Graph has been built as a dynamic resource, meaning it can grow and change. More information can be added to the Knowledge Graph, incorporating new records or other existing datasets. A Knowledge Graph allows, not only humans, but also AI algorithms to process the data for advanced and automated discovery of new links, classification, inference and pattern recognition. The Knowledge Graph will continue to grow, becoming an invaluable research tool and database for information on Irish history.

What kind of data does it contain?

The Knowledge Graph contains data on people, places, offices, and organisations mentioned in a wide range of historical records. These people, places, offices, and organisations exist as individual, unique ‘entities’ in the Knowledge Graph. 

The Knowledge Graph contains information about each entity’s attributes and relationships. This allows users to learn about specific entities, and to see how different historical entities were connected to one another. 

What are the attributes we capture for different types of entities?

For people, the Knowledge Graph captures name, gender, family relationships, office, occupation, rank, associated organisations, associated places, country, the century they lived in, and dates of birth, death, and/or floruit – the period in which their career took place. For example the person, or entity, Margaret Butler had the following attributes: she was female, was the second daughter of the 8th earl of Kildare and was countess of Ormond, she was associated with Kilkenny and Tipperary, was born about 1471 and died in 1542. All of these attributes link her to other entities – people, places and offices.

For places, the Knowledge Graph captures the place name, country, whether the entity is a place and/or an organisation, the place type, the organisation type, and what other place the entity is contained in. For example the entity of ‘Kinawley’ has the following attributes: it is a civil parish within the diocese of Clogher and the county of Fermanagh. While the entity ‘Diocese of Clogher’ extends across counties Tyrone, Femanagha and parts of Armagh. 

For offices, the Knowledge Graph captures office names, country, office type (1-4, which categorises the office to various levels of granularity), associated organisation, and any office the entity is subject to, counsels, and/or is overseen by.  

For organisations, the Knowledge Graph captures the organisation names, country, organisation type, start and end dates, and any organisation which the entity is a part of, accounts to, appeals to, and/or succeeds. 

Where does the data come from?

The data populating the Knowledge Graph comes from a wide variety of sources, including both original documents, printed historical sources (such as lists, indexes, maps, and calendars) and authoritative online databases (such as the Dictionary of Irish Biography). The data contained in the Knowledge Graph spans more than seven centuries, from medieval to early modern to modern Ireland. 

Key datasets in the graph include:

  • Modern Place: over 60,000 modern place-names in a hierarchy of all counties, baronies, parishes and townlands in Ireland generating 1,450,640 triples
  • Early-Modern Place: approximately 44,000 early-modern place-names from seventeenth-century sources (primarily the Down Survey of Ireland and the Books of Survey and Distribution), interlinked with modern locations, generating 58,384 triples
  • Dictionary of Irish Biography: over 10,000 persons from the Dictionary of Irish Biography generating 581,596 triples
  • Medieval People: over 2,000 persons drawn initially from Philomena Connolly (ed.), Irish Exchequer Payments. The subgraph includes persons from c.1200–1500 generating 76,099 triples

How is the data organised?

Data in the Knowledge Graph is mainly organised by historical period: Medieval (approximately 1200 – 1550), Early Modern (1550–1800) and Modern (post-1800). This structure reflects the source material from which data in the Graph is collected, although there is always some overlap between these categories. 

Each historical period corresponds to a distinct dataset, or ‘subgraph’, within the Knowledge Graph. There is a Medieval Subgraph, an Early Modern Subgraph, and a Modern Subgraph. Each subgraph contains information about people, places, offices and organisations.

What are the sub-graphs?

Subgraphs are different groupings of data contained within the Knowledge Graph. There are sub-graphs for each entity type (person, place, office, organisation) in each time period (medieval, early modern, modern), resulting in a total of 12 ‘core’ subgraphs in the Knowledge Graph for Irish History. 

While most subgraphs are organised temporally, others  are thematic: for instance, one subgraph contains all the chief governors of Ireland from the middle ages to the twenty-first century. Another thematic subgraph is based on the Dictionary of Irish Biography, which contains information about nearly 11,000 of the most notable figures in Irish history. 

This structure reflects how historians typically organise historical data: according to theme, and according to time period. Organizing our data in this way allows users to isolate groups of information if they wish to focus on a particular historical period or theme.

How was the Graph created?

The Knowledge Graph was created through a collaboration between humanities scholars on the Beyond 2022 project team and computer scientists in the ADAPT Centre, School of Computer Science & Statistics, Trinity College Dublin. The first phase of development involved creating frameworks, or ‘schemas’, for organising and capturing information found in historical records; this process resulted in the creation of  structured, or machine-readable, data. Some of the data in the Knowledge Graph has been curated and enriched manually by subject-matter experts; other data has been drawn from external, authoritative datasets. Knowledge engineers then transform the structured data into RDF (Resource Description Framework), a standard model for representing structured data on the web. 

Data populating the Knowledge Graph is drawn from authoritative, peer-reviewed, historical sources, and added to the Knowledge Graph using a careful and rigorous process. Data is extracted (curated) from various sources, either by historians collecting information by hand or through automated Natural Language Processing, which can recognise named entities in digitised documents. The provenance of the data within each graph and subgraph is tracked within the graph. The data are collected in different schemas according to entity type (person, place, office or organisation). Schemas help to organise the data and capture granular information about each entity, and how entities relate to each other.

How do we curate data from scratch?

When data is compiled and curated manually by a subject-matter expert, the process begins by gathering information from historical sources and inputting that data into a spreadsheet with predefined categories. These spreadsheets are then mapped to the Resource Description Framework. Finally, the RDF output is checked for errors. Once stored in RDF, the information in the spreadsheets can populate the Knowledge Graph as, for example, person-entities, attributes, and relationships. Additionally, we can link each person-entity, attribute and relationship back to its original source, creating linked data within the Knowledge Graph.

How do we interlink with external data sets?

The Knowledge Graph is represented using the W3C internet standard Resource Description Framework (RDF) XML-based representation. RDF allows us to define the relationship between two entities, for example between Person A is a parent of Person B, or town X is within county Y. In the same way, we can relate an entity to a resource described in another RDF dataset, linking our information with information outside our system. For example, in our Knowledge Graph, entities about persons that have been extracted from the DIB are linked, not only to their original DIB web pages, but also to their corresponding Wikidata entries. Wikidata is a popular Knowledge Graph based on Wikipedia, a cornerstone of the Linked Open Data cloud connecting many different datasets.