Summary and Schedule

OpenRefine is a powerful free, open source tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data. This lesson was especially developed for researchers in the humanities who want to learn how to improve the quality of their research data. It is designed for participants with no pevious experience.

In scientific research, data is rarely perfect. Information is often collected from many different sources, such as archives, museums, or fieldwork, and combined into a single dataset. This can lead to a variety of problems: names spelled in different ways, missing or inconsistent values, dates in different formats, or duplicate entries. Sometimes data is entered manually, which can introduce typos or errors. Other times, data comes from automated exports or digitization projects, which may not follow a consistent structure. As a result, researchers often face the challenge of working with data that is not ready for analysis.

Before you can draw meaningful conclusions from your data, it is essential to clean and organize it. This process helps ensure that your analysis is accurate and reliable. OpenRefine is designed to make this step easier, even for those with no technical background. With OpenRefine, you can quickly identify and fix errors, standardize formats, and prepare your data for further research.

This lesson was especially developed for researchers in the humanities who want to learn how to improve the quality of their research data. It is designed for participants with no previous experience.

Learning objectives

By the end of this lesson, you will be able to:

start a new OpenRefine project and import data
work with a subset of your data with the help of facets and filters
correct errors and reduce variations in your data through facets and clustering
transform your data for future analysis
use undo and redo actions of your cleaning steps
enrich your data with the help of reconciliation service
save and export cleaned data as well as data cleaning steps

Setup Instructions

Download files required for the lesson

00h 00m

1. Introduction to OpenRefine

What is OpenRefine and how can it help with messy data?
What kinds of tasks and analyses can you perform with OpenRefine?

00h 00m

2. Importing Data and Getting to Know the OpenRefine User Interface

How do I start a new project in OpenRefine?
How do I import a CSV file?
What options and settings are available during import?

00h 00m

3. Exploring Data

What options does OpenRefine offer for data exploration?
What is a facet and how does it help me explore data?
How do facets differ from filters?

00h 00m

4. Custom Facets and GREL

When do we need a custom facet instead of a built-in one?
How can GREL help us filter or transform data more flexibly?

00h 00m

5. Transforming Data

How can we clean and standardize the ArtistBio values in OpenRefine?
What is the difference between finding issues (facets) and fixing them (transformations & clustering)?

00h 00m

6. Reconciling Data with External Data Sources

What does it mean to reconcile data?
Why is reconciliation useful in humanities research?
How can we use OpenRefine to enrich our dataset with identifiers and structured information?

00h 35m

7. Undo, Redo, and Exporting Workflows

How can we go back to an earlier step if we realize we made a mistake?
How can we save our cleaning process to repeat it later or share it with colleagues?

00h 35m

8. Resources for Future Self-study

TODO

00h 35m

Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.

Getting started

To follow this lesson, you must have OpenRefine installed on your computer and download a data file.

Prerequisite

Dataset

The dataset used in this lesson is a subset of the Museum of Modern Art Collection dataset. It has been reduced in the number of columns and intentionally ‘messed up’ a little bit.

Download the csv data file to your Computer.

Prerequisite

Software Setup

For this lesson you will need OpenRefine and a web browser. Note: OpenRefine is a Java program that runs on your machine (not in the cloud). It runs inside your browser, but no web connection is needed.

Check that you have Firefox, Edge, Opera, Chrome, Chromium or Safari installed and set as your default browser. OpenRefine only runs in your default browser. It will not run correctly in Internet Explorer. Sometimes it even has some issues with Firefox.
Download the software from openrefine.org/download and check below for further instructions depending on your operating system.

Unzip the downloaded file into a directory by right-clicking and selecting “Extract…”. Name that directory something like OpenRefine.
Go to your newly created OpenRefine directory.
Launch OpenRefine by double clicking on openrefine.exe (this will launch a black command prompt window first; ignore this window, and wait for OpenRefine to launch in the web browser, which is where you will interact with the program).

If Windows displays a blue notification titled Microsoft Defender SmartScreen prevented an unrecognized app from starting, click on More info and then click on Run anyway.

If you are using a different browser, or OpenRefine does not automatically open for you, point your browser at http://127.0.0.1:3333/ or http://localhost:3333 to launch the program.

Unzip the downloaded file into a directory by double-clicking it. Name that directory something like OpenRefine.
Go to your newly created OpenRefine directory.
Drag the OpenRefine icon into Applications folder, and Ctrl-click/Open… it.

If Mac shows a notification when you try to run the program that it cannot verify the developer, click Cancel. Then, Right-click or Ctrl-click the icon and select Open. The notification will now have an Open button. If it does not allow to open the program, repeat the process and there will be an Open button the second time. For additional details, consult the OpenRefine installation guide.

If you are using a different browser, or OpenRefine does not automatically open for you, point your browser at http://127.0.0.1:3333/ or http://localhost:3333 to launch the program

Unzip the downloaded file into a directory. Name that directory something like OpenRefine.
Navigate to your newly created OpenRefine directory using the command line.
Type ./refine into the terminal within the OpenRefine directory
If you are using a different browser, or OpenRefine does not automatically open for you, point your browser at http://127.0.0.1:3333/ or http://localhost:3333 to launch the program.

Getting help

If you encounter problems installing or running OpenRefine, a good source of support is the OpenRefine mailing list and user forum. Include your operating system when searching to find the most relevant answers for your issue, such as threads related to Windows, macOS, or Linux.

You may also want to check the Stack Overflow OpenRefine tag.

If you want to know more details about installation, upgrades and configuration the installing manual of OpenRefine is a good resource.

Exiting OpenRefine

To exit OpenRefine, close all the browser tabs or windows, then navigate to the command line window. To close this window and ensure OpenRefine exits properly, hold down [control] and press [c] on your keyboard. This will save all changes to your projects.