Introduction to OpenRefine
- OpenRefine is a free, open-source tool for cleaning, organizing, and exploring messy data.
- You can easily import, filter, sort, and analyze your data, even without technical experience.
- OpenRefine supports many data formats and can be extended with add-ons and custom scripts for even more possibilities.
- Using OpenRefine helps you prepare your data for analysis, supporting transparent and reproducible research practices.
Importing Data and Getting to Know the OpenRefine User Interface
- OpenRefine organizes your work in projects
- You can import data from different sources and in different formats into OpenRefine
- Adjust import settings to ensure your data is read correctly and preview the results before starting
- The main components of the user interface are the grid, the grid
header, the project bar, and
Facet/Filteras well asUndo/Redotab
Exploring Data
- Facets provide an interactive overview of the values in a column and help you explore your data.
- Multi-valued cells must be split before accurate faceting is possible.
- Numeric and Timeline facets require converting text values into numbers or dates first.
Custom Facets and GREL
- Custom facets group data using computed results
from a GREL expression, not only the original cell values.
- GREL is a lightweight language that allows you to inspect,
transform, and classify data inside OpenRefine.
- Custom facets let you ask flexible questions about your data, such
as identifying multiple creators or unusually long titles.
- With conditional expressions like
if(), you can define new categories that support deeper exploration and data-quality checks.
Transforming Data
- Transformations modify the content of cells, while column operations reshape the structure of the dataset.
- Literal GREL replacements help remove unwanted characters and prepare text for further processing.
- Splitting columns separates different types of information, making the data easier to analyze and clean.
- Clustering identifies similar but inconsistently written values and supports manual standardization.
Reconciling Data with External Data Sources
- Reconciliation links text strings to unique identifiers in external databases.
- This makes your dataset more precise, reusable, and comparable across projects.
- OpenRefine provides a structured workflow for reconciliation: propose → review → confirm → enrich.
- The human researcher stays in control: machines suggest, but you decide.
Undo, Redo, and Exporting Workflows
- OpenRefine records every transformation you make.
- The Undo/Redo tab lets you move backward and
forward through your cleaning process.
- Workflows can be exported as JSON and reapplied to other projects, ensuring transparency and reproducibility.