Content from Introduction to Linked Open Data in the Humanities


Last updated on 2025-04-08 | Edit this page

Estimated time: 12 minutes

Overview

Questions

  • What is Linked Open Data, and how does it differ from other data models?
  • Why are standardized identifiers (e.g., URIs) essential for LOD?
  • How can the subject-predicate-object model be used to describe LOD?
  • What are real-world examples of Linked Open Data in the humanities?

Objectives

  • Explain the concept of Linked Open Data (LOD) in your own words.
  • Distinguish between “Linked Data” and “Linked Open Data” using an example.
  • Describe the importance of standardized identifiers (e.g., URIs) for linking data.
  • Represent simple Linked Open Data relationships using the subject-predicate-object model.

Introduction


In this lesson, we want to explore the fundamentals of Linked Open Data (LOD). What is it, and why is it important? To answer these questions, we will break the term down step by step. The first and most fundamental concept we need to understand is: What type of data are we dealing with? In what form does data exist when we talk about LOD? To do this, first we want to look at the terms linked, open, and data, to understand what we are talking about in the first place.

Discussion

Discussion: What is data?

When we talk about data, many people often understand different things about it, and no-one can quite put their finger on what it actually means. Try to approach this term on a linguistic level and find out what it could mean.

It is not quite easy to find a universal definition to this, but most of the time data is described as something from the real world, which was observed and then written down. In the humanities this could be a letter or an archaeological object . In other fields of research, data is often a measurement result or observation. What they have in common is that they try to depict a part of the real world. In our digital age, hopefully, these data is digitalised and this digitalisation leads to challenges. It is impossible to map the entire section of the real world, which means that people with domain knowledge have to decide what to capture and what not. This is always application-related and individual.

Now that we understand what data is, we want to look at how it can be captured and digitised, which is why we will look at the L from LOD next.

Discussion

Discussion: What requirements should data fulfil?

Discussion

Discussion: What data modelling options do you know?

Imagine you are a researcher studying Vincent van Gogh and want to build a collection of information about him. You could gather details about his paintings, his friends, the places he visited, and much more. Probably the most common way would be to store this information in a table. This has various advantages, but also disadvantages. As with the collection of data and writing it down, there is no clear answer as to which type of modelling is correct, it remains individual and above all depends on the project. If you want to combine your own data with other data, such as information about Van Gogh’s home town or his circle of acquaintances, it becomes difficult to visualise this in a table. The question is now, how we structure our knowledge in a way, that is easy to share, connect, and expand?

Structuring Knowledge: The Subject-Predicate-Object Model


Given the following Information about Vincent Van Gogh: He was born in Zundert and has drawn the painting Starry Night

One way to structure and link knowledge is to break it down into simple relationships using the subject-predicate-object model. This model is a fundamental method for structured data representation:

Callout

The subject-predicate-object model

Subject: The entity being described.

Predicate: The relationship or attribute.

Object: The value or linked entity.

For example, if we want to express that Vincent van Gogh painted Starry Night, we can structure it like this:

Subject Predicate Object
Bouquet of Sunflowers was painted by Vincent van Gogh

By structuring information in this way, we ensure that the knowledge we store—namely, that Vincent van Gogh painted this artwork—is precise and easy to understand. We reduce the sentence to the essential elements, making it easier to store and process.

Now, if we wanted to store additional paintings by Vincent van Gogh, we could use the same format. Adding another painting to the table would look like this:

Subject Predicate Object
Bouquet of Sunflowers was painted by Vincent van Gogh
Starry Night was painted by Vincent van Gogh

However, at this point, our data is still in a tabular format, which is not the format used in LOD.

Triples Visualized


To visualize how Linked Open Data works, imagine a mind map. Write Vincent van Gogh in the center of a page and draw lines to various related terms:

  • One line connects Bouquet of Sunflowers with the label was painted by.

  • Another line connects Zundert (his birthplace) with the label was born in.

  • A third line connects Zundert with Netherlands with the label is part of.

Each of these connections expands the knowledge network—a simple version of what we call the LOD cloud. The more connections we create, the richer and more meaningful our dataset becomes. The resulting mind map would look like this:

By visualizing the data, it becomes easier to see why this way of storing and structuring knowledge is so efficient and valuable. Imagine a much larger mind map with significantly more information. This could reveal connections between people that were previously invisible. Furthermore, if researchers from different locations collaborate on such a mind map, additional insights and knowledge can be discovered. In very theorital and ideal scenario it would be possible to draw a mindmap with every information in the world to find a connection from you to Bill Gates.

In essence, we are working with graphs—more specifically, directed graphs that follow a particular reading direction. Each connection has a clear subject, predicate, and object, forming what’s known as a triple.

Challenge

Exercise: Create a Graph

Look at one of the following texts and try to visualise the information from it in a mind map. Pay attention to decisions that need to be made and possible problems that may arise. To draw the mind map you can use whatever you want. One possibility is Excalidraw , an open source tool with which you can also work in a group Go into breakout rooms and create a graph with Excalidraw. Try to find connections you could model in that graph.


Group 1: Vincent van Gogh was born in Zundert, the Netherlands, in 1853 and is a Post-Impressionist artist. In his youth, he developed a strong interest in art and initially studied in The Hague. He later moved to Paris, where he gained his first insights into modern art.


Group 2: Van Gogh created numerous famous paintings. The masterpiece ‘Starry Night’ was created in Saint-Remy-de-Provence and belongs to the Post-Impressionist era. The painting can be found in the Museum of Modern Art in Manhattan.

One way to visualize both texts in one graph is the following. If your solution looks different, this does not necessarily mean that it is wrong. It is, as always in data modelling, individual and decision based.

Content from The concept of IRIs


Last updated on 2025-04-08 | Edit this page

Estimated time: 12 minutes

Overview

Questions

  • How do IRIs eliminate ambiguity when different datasets use similar titles?
  • What are the essential components of a IRI, and how do they work together to ensure uniqueness?
  • Why are namespaces crucial for maintaining clarity and consistency in linked data?

Objectives

  • Explain what IRIs are
  • Explain why they are important in Linked Open Data.
  • Explain the structure of a IRI
  • Explain what namespaces are

Ambiguity


Now that the fundamental concept behind the subject-predicate-object model is understood, a problem arises when trying to connect your own data with external datasets or when modeling knowledge unambiguously: How can we ensure that we are talking about the same objects?

Consider the following example:

Suppose you are researching in the database from the Metropolitan Museum of Art for Van Gogh’s painting Wheat Field with Cypresses. Now, if you would like to talk about the different paintings, it could become difficult and confusing which painting you want to talk about exactly, because you find:

  1. An entry titled Wheatfield with Cypresses, attributed to Vincent van Gogh, painted in 1889.

  2. Another entry titled Cypresses, also attributed to Vincent van Gogh, from 1889.

  3. Another entry titled Wheat Field, also by Vincent van Gogh from 1888.

  4. Yet another entry with the same title Wheat Field, but by Jean Jacques de Boissieu from 1772.

This highlights that names are often not unique. While additional details like year and artist usually clarify which object is meant, this is not always guaranteed. Humans can often resolve such ambiguities using context, but computers struggle with this.

Since a subject-predicate-object model does not have the context of a larger text, ambiguity can be resolved using IDs. Museums, for instance, assign unique IDs to paintings, ensuring that even those with identical names are distinguishable. This concept is adapted and expanded for LOD to work in an open, large-scale environment.

Now, with these IDs, it becomes possible to refer to one exact painting and be sure that everyone within the same context understands which artwork is meant. However, when we are only talking about an ID—a number—it is conceivable that in another context the same number might refer to a different object, meaning that the number alone is not free from ambiguity, so we need to find a way to resolve to that exact “context”.

To resolve this problem, we can use a URI, a Uniform Resource Identifier. By combining a unique ID with a well-defined namespace, a URI guarantees global uniqueness. The namespace acts like a contextual “container” that ensures the ID is interpreted in a specific environment, making it unambiguous no matter where or when it is used. Similarly, IRIs extend this principle by allowing a broader set of characters, accommodating diverse languages and scripts. Together, the use of namespaces and IDs ensures that every resource is uniquely identifiable at all times and in every context. The namespace is the “context” mentioned before.

Understanding IRIs


Applied to the painting Wheatfield with Cypresses, the Metropolitan Museum of Art does not provide a guaranteed way to reference this so-called resource unambiguously. While we can use the link to the museum’s website, there is no guarantee that this link will remain unchanged over time. If the URL were to change, our reference would no longer work.

To avoid this problem, certain providers offer ways to generate IRIs. One example we want to examine is Wikidata, the structured data repository behind Wikipedia. If we search for Wheatfield with Cypresses on Wikidata, we also find multiple entries. Looking at this entry, we can already see the associated ID in the page title. The link to the page _https://www.wikidata.org/wiki/Q26221215_ forms the IRI. The first part, _https://www.wikidata.org/wiki/_, is the namespace, which is predefined, while the second part, Q26221215, is the ID, which is uniquely referable within this namespace. The combination of both elements ensures that this object can be referenced unambiguously in different contexts. Like subjects, predicates need to get a IRI aswell, which describes what the predicate means exactly. Wikidata also provides some properties in their List of Properties. For example we can find an IRI for the property place of birth.

Conclusion


URIs and IRIs form the bedrock of Linked Open Data by ensuring that every digital resource, such as Van Gogh’s Wheatfield with Cypresses, has a unique, reliable address. By breaking down the structure of these identifiers and understanding the role of namespaces, we see how ambiguity is resolved. This system not only enhances clarity but also fosters global collaboration and deeper insights in research.

Key Points

Internationalized Resource Identifier

  • Used to prevent ambiguities
  • Needs to be defined and saved open in the Internet
  • Are neccesary for Objects and Predicates
  • Are created from a namespace in combination with an ID

Content from Introduction to RDF and Basic Modeling


Last updated on 2025-04-08 | Edit this page

Estimated time: 22 minutes

Overview

Questions

  • What is RDF, and why is it used in Linked Open Data?
  • How does RDF structure information?
  • How can we represent real-world relationships using RDF?

Objectives

  • Explain the purpose and structure of RDF.
  • Model basic relationships using RDF.
  • Explain limitations of n-triples.

As we have already learnt, information can be stored and displayed in the form of the subject-predicate-object model. To avoid ambiguities and guarantee unambiguity, IRIs are used for the respective parts. We also need a similar approach to these IRIs in the representation and format of the modelling. We need standards in the notation to ensure that the computer handles the data correctly. The standard we want to use in LOD is RDF.

RDF (Resource Description Framework) is a universal data model designed to represent relationships between entities in a structured way. It provides a standardized format for expressing knowledge using the subject-predicate-object model, which allows different datasets to be linked together and ensures interoperability.

Using RDF, we can describe facts in a machine-readable way. Each RDF statement, called a triple, consists of.

For example, we can express the relationship between the painting Wheatfield with Cypresses and Vincent van Gogh using an RDF triple with IRIs:

<https://www.wikidata.org/wiki/Q26221215> <https://www.wikidata.org/wiki/Property:P170> <https://www.wikidata.org/wiki/Q5582>.

This triple means that Wheatfield with Cypresses was painted by Vincent van Gogh.

Using N-Triples as the Simplest RDF Representation

The format used above is called N-Triples, which is the simplest, standardized way to represent RDF data. The basic notation is quite simple. You just need to write your IRIs in angle brackets and end each statement with a period. N-Triples is a line-based format where each fact is written on a single line in the form:

<subject> <predicate> <object>.

This simplicity makes N-Triples easy to parse and store, making it ideal for exchanging RDF data between different systems.

Callout

Writing RDF

  • Write your information in the subject-predicate-object model
  • Use IRIs for the subject, predicate and object
  • Put the IRIs in angle brackets
  • End each statement with a period.

Blank Nodes and Literals in RDF

In RDF, not every object needs a globally unique identifier (IRI). Sometimes, an entity exists that doesn’t need to be explicitly named—this is where Blank Nodes come in.

A Blank Node represents an entity that exists in the data model but has no specific identifier. It is useful when:
- The entity doesn’t have a meaningful global ID.
- It is only relevant in a limited local context.

For example, if a painting has an unknown artist, we might use a blank node to represent the missing information:

<IRI for Unknown Painting> <IRI for "was painted by"> _:b1.

Here, _:b1 is a blank node acting as a placeholder for an unknown artist.

Additionally, RDF supports Literals, which represent concrete values such as dates, numbers, or text instead of links to other entities.

For example, adding the creation year of Wheatfield with Cypresses:

<https://www.wikidata.org/wiki/Q26221215> <"was painted in"> "1889"^^http://www.w3.org/2001/XMLSchema#integer.

This triple states that Wheatfield with Cypresses was painted in the year 1889, with the data type explicitly set as an integer.

Challenge

Exercise: Creating RDF Triples

Convert the following statements or your mindmap from the first task into RDF triples using the correct structure. Use wikidata for IRIs and properties:

  1. Vincent van Gogh was born in Zundert.
  2. Starry Night was painted by Vincent van Gogh.
  3. The painting The Potato Eaters was created in 1885.
<https://www.wikidata.org/wiki/Q5582><https://www.wikidata.org/wiki/Property:P19><https://www.wikidata.org/wiki/Q9883>

<https://www.wikidata.org/wiki/Q45585><https://www.wikidata.org/wiki/Property:P170><https://www.wikidata.org/wiki/Q5582>

<https://www.wikidata.org/wiki/Q154469><https://www.wikidata.org/wiki/Property:P571><"1889"^^http://www.w3.org/2001/XMLSchema#integer>
Key Points

Ressource Description Framework

  • Gives a standardised form to write subject-predicate-object modeled data
  • Is used to connect data on a global scope
  • One Format to write RDF is n-triples
  • Terms are written in angle brackets
  • Statements ending with a period

Content from Model Linked Data


Last updated on 2025-03-09 | Edit this page

Estimated time: 12 minutes

Overview

Questions

  • What are IRIs, literals and a blanknodes and when to use them?
  • How to write down RDF?
  • What is Turtle?
  • What is a vocabulary and a namespace?
  • What is RDF Schema and what is it used for?

Objectives

  • Differentiate between the concepts of IRIs, literals and blanknodes?
  • Know when to use IRIs, literals and blanknodes.
  • Write down RDF in turtle format?
  • Understand what a vocabulary is used for in linked open data.
  • Remember where to look for the concepts of the RDF schema vocabulary and their definitions.

Different concepts in RDF


  • IRI
  • Literal
  • Blanknode

Serialization formats


A serialisation format is the answer to the question of how to write things down in RDF so that the machine understands them. In short Turtle is an example for such a format.

Turtle example


##namespaces
##statements

Other common serialization formats:

  • RDF/XML
  • JSON-LD

You do not have to know all the serialization Formats, there are plenty of converter tools on the web, for example the EASYRDF Converter or the RDF Converter by Zazuko.

Challenge

Write down the information in turtle

List some statements and ask the learners to transform them to valid turtle.

TODO
Valid turtle.

Vocabularies & namespaces


RDF Schema


RDF Schema is an extension of the basic RDF vocabulary, you already know from before. You can always go to the published (RDF Schema Vocabulary file)[https://www.w3.org/TR/rdf11-schema/] and look up the the terms (concepts), meanings and rules.

  • Classes
    • rdfs:Resource
    • rdfs:Class
    • rdfs:Literal
  • Properties
    • rdfs:domain
    • rdfs:range
    • rdfs:subClassOf
    • rdfs:subClassOf
    • rdfs:label
Challenge

Apply RDF Schmea to our artwork example

Fill in the cloze or the gaps in the picture.

Key Points
  • keypoint 1
  • keypoint 2

Content from Create Linked Data


Last updated on 2025-03-09 | Edit this page

Estimated time: 42 minutes

Overview

Questions

  • What possibilities do I have to write RDF statements?
  • What is OpenRefine useful for with the focus on Linked Open Data?
  • How to create a RDF dataset by using OpenRefine?
  • How do I reconcile my data by comparing it to authoritative datasets?
  • How to export and save the transfomed RDF data with OpenRefine?

Objectives

  • Create Linked data from a csv table.
  • Getting to know the OpenRefine functionalities and the graphical user interface of OpenRefine.
  • Applying the theory learnt about RDF, namspaces, RDFs to a data set.
  • Understand how Reconciliation services are used to annotate data.
  • Memorize how to export and save the transfomed RDF data with OpenRefine.

Using OpenRefine to transform a dataset from CSV to RDF


  • Open OpenRefine
  • Load Example Dataset
  • Get to know the GUI
  • Work with the RDF Transform extension
Key Points
  • keypoint 1
  • keypoint 2

Content from Query Linked Data with SPARQL


Last updated on 2025-02-11 | Edit this page

Estimated time: 12 minutes

Overview

Questions

  • What is SPARQL?

Objectives

  • Write a SPARQL-Query

Content from Publish your Linked Data


Last updated on 2025-02-11 | Edit this page

Estimated time: 12 minutes

Overview

Questions

  • How do I publish my linked data?

Objectives

  • Publish linked data with github.