Summary and Setup

OpenRefine is a powerful free, open source tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data. This lesson was especially developed for researchers in the humanities who want to learn how to improve the quality of their research data. It is designed for participants with no pevious experience.

OpenRefine is a powerful, free, open source tool for working with messy data: cleaning it, transforming it from one format into another, and extending it with web services and external data.

In scientific research, data is rarely perfect. Information is often collected from many different sources, such as archives, museums, or fieldwork, and combined into a single dataset. This can lead to a variety of problems: names spelled in different ways, missing or inconsistent values, dates in different formats, or duplicate entries. Sometimes data is entered manually, which can introduce typos or errors. Other times, data comes from automated exports or digitization projects, which may not follow a consistent structure. As a result, researchers often face the challenge of working with data that is not ready for analysis.

Before you can draw meaningful conclusions from your data, it is essential to clean and organize it. This process helps ensure that your analysis is accurate and reliable. OpenRefine is designed to make this step easier, even for those with no technical background. With OpenRefine, you can quickly identify and fix errors, standardize formats, and prepare your data for further research.

This lesson was especially developed for researchers in the humanities who want to learn how to improve the quality of their research data. It is designed for participants with no previous experience.

Learning objectives


By the end of this lesson, you will be able to:

  • start a new OpenRefine project and import data
  • work with a subset of your data with the help of facets and filters
  • correct errors and reduce variations in your data through facets and clustering
  • transform your data for future analysis
  • use undo and redo actions of your cleaning steps
  • enrich your data with the help of reconciliation service
  • save and export cleaned data as well as data cleaning steps

Getting started


To follow this lesson, you must have OpenRefine installed on your computer and download a data file.

Prerequisite

Dataset

The dataset used in this lesson is a subset of the Metropolitan Museum of Art’s Open Access Initiative dataset with information about the objects in the Metropolitan Museum of Art (e.g. title, culture, artist biography). It has been reduced in the number of columns and intentionally ‘messed up’ a little bit.

Download the csv data file to your Computer.

Prerequisite

Software Setup

For this lesson you will need OpenRefine and a web browser. Note: OpenRefine is a Java program that runs on your machine (not in the cloud). It runs inside your browser, but no web connection is needed.

  1. Check that you have Firefox, Edge, Opera or Chrome, Chromium, Safari browsers installed and set as your default browser. OpenRefine runs in your default browser. It will not run correctly in Internet Explorer. Sometimes it even has some issues with Firefox.
  2. Download the software from openrefine.org/download and check below for further instructions depending on your operating system

Getting help

If you encounter problems installing or running OpenRefine, a good source of support is the OpenRefine mailing list and user forum. Include your operating system when searching to find the most relevant answers for your issue, such as threads related to Windows, macOS, or Linux.

You may also want to check the Stack Overflow OpenRefine tag.

If you want to know more details about installation, upgrades and configuration the installing manual of OpenRefine is a good resource.

Exiting OpenRefine

To exit OpenRefine, close all the browser tabs or windows, then navigate to the command line window. To close this window and ensure OpenRefine exits properly, hold down [control] and press [c] on your keyboard. This will save all changes to your projects.