Summary and Schedule
OpenRefine is a powerful free, open source tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data. This lesson was especially developed for researchers in the humanities who want to learn how to improve the quality of their research data. It is designed for participants with no pevious experience.
Learning objectives
By the end of this lesson, you will be able to:
- start a new OpenRefine project and import data
- work with a subset of your data with the help of facets and filters
- correct errors and reduce variations in your data through facets and clustering
- transform your data for future analysis
- use undo and redo actions of your cleaning steps
- enrich your data with the help of reconciliation service
- save and export cleaned data as well as data cleaning steps
Setup Instructions | Download files required for the lesson | |
Duration: 00h 00m | 1. Introduction to OpenRefine | How do you write a lesson using R Markdown and sandpaper? |
Duration: 00h 00m | 2. Importing Data and Getting to Know the OpenRefine User Interface | TODO |
Duration: 00h 00m | 3. Exploring Data | TODO |
Duration: 00h 00m | 4. Transforming Data | TODO |
Duration: 00h 00m | 5. Filtering and Sorting Data | TODO |
Duration: 00h 00m | 6. Reconciling Data with External Data Sources | TODO |
Duration: 00h 00m | 7. Exporting and Saving Data and Cleaning Steps | TODO |
Duration: 00h 00m | 8. Resources for Future Self-study | TODO |
Duration: 00h 00m | Finish |
The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.
Getting started
To follow this lesson, you must have OpenRefine installed on your computer and download a data file.
Dataset
The dataset used in this lesson is a subset of the Metropolitan Museum of Art’s Open Access Initiative dataset with information about the objects in the Metropolitan Museum of Art (e.g. title, culture, artist biography). It has been reduced in the number of columns and intentionally ‘messed up’ a little bit.
Download the csv data file to your Computer.
Software Setup
For this lesson you will need OpenRefine and a web browser. Note: OpenRefine is a Java program that runs on your machine (not in the cloud). It runs inside your browser, but no web connection is needed.
- Check that you have Firefox, Edge, Opera or Chrome, Chromium, Safari browsers installed and set as your default browser. OpenRefine runs in your default browser. It will not run correctly in Internet Explorer. Sometimes it even has some issues with Firefox.
- Download the software from openrefine.org/download and check below for further instructions depending on your operating system
Getting help
If you encounter problems installing or running OpenRefine, a good source of support is the OpenRefine mailing list and user forum. Include your operating system when searching to find the most relevant answers for your issue, such as threads related to Windows, macOS, or Linux.
You may also want to check the Stack Overflow OpenRefine tag.
If you want to know more details about installation, upgrades and configuration the installing manual of OpenRefine is a good resource.