Introduction to OpenRefine


  • OpenRefine is a free, open-source tool for cleaning, organizing, and exploring messy data.
  • You can easily import, filter, sort, and analyze your data, even without technical experience.
  • OpenRefine supports many data formats and can be extended with add-ons and custom scripts for even more possibilities.
  • Using OpenRefine helps you prepare your data for analysis, making your research more accurate, efficient, and enjoyable.

Importing Data and Getting to Know the OpenRefine User Interface


  • You can import data from different different formats in OpenRefine
  • Adjust import settings to ensure your data is read correctly and preview the results before starting.
  • Functions to work with your data are used from the Arrow Buttons next to the column header

Exploring Data


  • Facets provide an interactive overview of the values in a column and help you explore your data.
  • Multi-valued cells must be split before accurate faceting is possible.
  • Numeric and Timeline facets require converting text values into numbers or dates first.
  • Scatterplot facets help explore relationships between two numeric columns.

Custom Facets and GREL


  • Custom facets group data using computed results from a GREL expression, not only the original cell values.
  • GREL is a lightweight language that allows you to inspect, transform, and classify data inside OpenRefine.
  • Custom facets let you ask flexible questions about your data, such as identifying multiple creators or unusually long titles.
  • With conditional expressions like if(), you can define new categories that support deeper exploration and data-quality checks.

Transforming Data


  • Transformations modify the content of cells, while column operations reshape the structure of the dataset.
  • Literal GREL replacements help remove unwanted characters and prepare text for further processing.
  • Splitting columns separates different types of information, making the data easier to analyze and clean.
  • Clustering identifies similar but inconsistently written values and supports manual standardization.

Reconciling Data with External Data Sources


  • Reconciliation links text strings to unique identifiers in external databases.
  • This makes your dataset more precise, reusable, and comparable across projects.
  • OpenRefine provides a structured workflow for reconciliation: propose → review → confirm → enrich.
  • The human researcher stays in control: machines suggest, but you decide.

Undo, Redo, and Exporting Workflows


  • OpenRefine records every transformation you make.
  • The Undo/Redo panel lets you move backward and forward through your cleaning process.
  • Workflows can be exported as JSON and reapplied to other projects, ensuring transparency and reproducibility.

Resources for Future Self-study