The concept of IRIs
Last updated on 2026-05-22 | Edit this page
Overview
Questions
- How do IRIs eliminate ambiguity when different datasets use similar titles?
- What are the essential components of a IRI, and how do they work together to ensure uniqueness?
- Why are namespaces crucial for maintaining clarity and consistency in linked data?
Objectives
- Explain what IRIs are
- Explain why they are important in Linked Open Data.
- Explain the structure of a IRI
- Explain what namespaces are
Ambiguity
Now that the fundamental concept behind the subject-predicate-object model is understood, a problem arises when trying to connect your own data with external datasets or when modeling knowledge unambiguously: How can we ensure that we are talking about the same objects?
Consider the following example:
Suppose you are researching in the database from the Metropolitan Museum of Art for Van Gogh’s painting Wheat Field with Cypresses. Now, if you would like to talk about the different paintings, it could become difficult and confusing which painting you want to talk about exactly, because you find:
An entry titled Wheatfield with Cypresses, attributed to Vincent van Gogh, painted in 1889.
Another entry titled Cypresses, also attributed to Vincent van Gogh, from 1889.
Another entry titled Wheat Field, also by Vincent van Gogh from 1888.
Yet another entry with the same title Wheat Field, but by Jean Jacques de Boissieu from 1772.
This highlights that names are often not unique. While additional details like year and artist usually clarify which object is meant, this is not always guaranteed. Humans can often resolve such ambiguities using context, but computers struggle with this.
Since a subject-predicate-object model does not have the context of a larger text, ambiguity can be resolved using IDs. Museums, for instance, assign unique IDs to paintings, ensuring that even those with identical names are distinguishable. This concept is adapted and expanded for LOD to work in an open, large-scale environment.
Now, with these IDs, it becomes possible to refer to one exact painting and be sure that everyone within the same context understands which artwork is meant. However, when we are only talking about an ID, a number, it is conceivable that in another context the same number might refer to a different object, meaning that the number alone is not free from ambiguity, so we need to find a way to resolve to that exact “context”.
From IDs to URIs
To resolve this problem, we can use a URI, a Uniform Resource Identifier. The key idea is to combine a unique ID with a namespace and together they form a globally unique address.
Think of it like a postal address. If someone tells you “the house is number 42”, that is not very helpful, there are thousands of houses numbered 42 in the world. But “42 Baker Street, London” is unambiguous. The house number is the ID, and the street and city together form the namespace: they provide the context that makes the number meaningful.
Back to our example: the Metropolitan Museum of Art assigns the
internal ID 436535 to Wheatfield with Cypresses.
Another museum might use the very same number for a completely different
object. But by combining that ID with the name of the institution as the
namespace, the combination
Metropolitan Museum of Art / 436535 is already unambiguous.
The namespace is therefore not just a technical prefix, it is a
declaration of context: ”this ID belongs to this institution, and
has a defined meaning there”.
In Linked Open Data, this principle is taken a step further: namespaces are defined as addresses on the internet, so that every resource can be looked up and referenced globally by anyone. This turns a namespace-and-ID combination into a URI (Uniform Resource Identifier), a unique, web-resolvable address. IRIs (Internationalized Resource Identifiers) extend this further by also supporting non-Latin scripts such as Arabic, Chinese, or Japanese.
URI and IRI in a nutshell
A URI is a unique, web-resolvable address for an object, made up of two parts:
- The namespace gives the context, a web address identifying the institution or system that manages the data
- The ID identifies the specific object within that context
Together: namespace + ID = URI, a globally
unique address that anyone can look up.
An IRI works exactly the same way, but also supports non-Latin characters.
Understanding IRIs
Applied to the painting Wheatfield with Cypresses, the Metropolitan Museum of Art does not provide a guaranteed way to reference this so-called resource unambiguously. While we can use the link to the museum’s website, there is no guarantee that this link will remain unchanged over time. If the URL were to change, our reference would no longer work.
To avoid this problem, certain providers offer ways to generate IRIs. One example we want to examine is Wikidata, the structured data repository behind Wikipedia. If we search for Wheatfield with Cypresses on Wikidata, we also find multiple entries. Looking at this entry, we can already see the associated ID in the page title. The link to the page _https://www.wikidata.org/wiki/Q26221215_ forms the IRI. The first part, _https://www.wikidata.org/wiki/_, is the namespace, which is predefined, while the second part, Q26221215, is the ID, which is uniquely referable within this namespace. The combination of both elements ensures that this object can be referenced unambiguously in different contexts. Like subjects, predicates need to get a IRI aswell, which describes what the predicate means exactly. Wikidata also provides some properties in their List of Properties. For example we can find an IRI for the property place of birth.
Exercise
One Entity, Two IRIs
In LOD, the goal is to identify every resource unambiguously. But what happens when different systems each assign their own IRI to the same entity?
- Open Wikidata and find the entry for Vincent van Gogh. Note down his IRI.
- Now open VIAF (Virtual International Authority File) and search for Vincent van Gogh. Note down his IRI there as well.
- Identify the namespace and the ID in each IRI.
- You now have two different IRIs for the same person. Does this contradict the LOD principle of unambiguous identification? Look carefully at the Wikidata entry for Van Gogh. Can you find anything that addresses this problem?
Wikidata IRI:
https://www.wikidata.org/wiki/Q5582 Namespace:
https://www.wikidata.org/wiki/ — ID: Q5582
VIAF IRI: https://viaf.org/viaf/9854560
Namespace: https://viaf.org/viaf/ — ID:
9854560
Both IRIs are internally unambiguous. Within their own system, each
points to exactly one entity. The apparent contradiction is resolved by
the fact that LOD systems can explicitly declare that two IRIs refer to
the same thing. In the Wikidata entry for Van Gogh, you can find the
property VIAF ID with the value 9854560 ,
a direct link to the VIAF record. The two systems are already
connected.
This is a fundamental pattern in LOD: rather than forcing a single global ID on every entity, different institutions maintain their own IRIs and link them to each other. How exactly this linking works will be covered in a later chapter.
Conclusion
URIs and IRIs form the bedrock of Linked Open Data by ensuring that every digital resource, such as Van Gogh’s Wheatfield with Cypresses, has a unique, reliable address. By breaking down the structure of these identifiers and understanding the role of namespaces, we see how ambiguity is resolved. This system not only enhances clarity but also fosters global collaboration and deeper insights in research.
Internationalized Resource Identifier
- Used to prevent ambiguities
- Needs to be defined and saved open in the Internet
- Are neccesary for Objects and Predicates
- Are created from a namespace in combination with an ID