Summary and Schedule

OpenRefine is a powerful free, open source tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data. This lesson was especially developed for researchers in the humanities who want to learn how to improve the quality of their research data. It is designed for participants with no pevious experience.

Learning objectives


By the end of this lesson, you will be able to:

  • start a new OpenRefine project and import data
  • work with a subset of your data with the help of facets and filters
  • correct errors and reduce variations in your data through facets and clustering
  • transform your data for future analysis
  • use undo and redo actions of your cleaning steps
  • enrich your data with the help of reconciliation service
  • save and export cleaned data as well as data cleaning steps

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.

Getting started


To follow this lesson, you must have OpenRefine installed on your computer and download a data file.

Dataset

The dataset used in this lesson is a subset of the Metropolitan Museum of Art’s Open Access Initiative dataset with information about the objects in the Metropolitan Museum of Art (e.g. title, culture, artist biography). It has been reduced in the number of columns and intentionally ‘messed up’ a little bit.

Download the csv data file to your Computer.

Software Setup

For this lesson you will need OpenRefine and a web browser. Note: OpenRefine is a Java program that runs on your machine (not in the cloud). It runs inside your browser, but no web connection is needed.

  1. Check that you have Firefox, Edge, Opera or Chrome, Chromium, Safari browsers installed and set as your default browser. OpenRefine runs in your default browser. It will not run correctly in Internet Explorer. Sometimes it even has some issues with Firefox.
  2. Download the software from openrefine.org/download and check below for further instructions depending on your operating system

Getting help

If you encounter problems installing or running OpenRefine, a good source of support is the OpenRefine mailing list and user forum. Include your operating system when searching to find the most relevant answers for your issue, such as threads related to Windows, macOS, or Linux.

You may also want to check the Stack Overflow OpenRefine tag.

If you want to know more details about installation, upgrades and configuration the installing manual of OpenRefine is a good resource.

Exiting OpenRefine

To exit OpenRefine, close all the browser tabs or windows, then navigate to the command line window. To close this window and ensure OpenRefine exits properly, hold down [control] and press [c] on your keyboard. This will save all changes to your projects.