Introduction to OpenRefine
- OpenRefine is a free, open-source tool for cleaning, organizing, and exploring messy data.
- You can easily import, filter, sort, and analyze your data, even without technical experience.
- OpenRefine supports many data formats and can be extended with add-ons and custom scripts for even more possibilities.
- Using OpenRefine helps you prepare your data for analysis, making your research more accurate, efficient, and enjoyable.
Importing Data and Getting to Know the OpenRefine User Interface
- You can import data from different different formats in OpenRefine
- Adjust import settings to ensure your data is read correctly and preview the results before starting.
- Functions to work with your data are used from the Arrow Buttons next to the column header
Exploring Data
- Facets provide an interactive overview of the values in a column and help you explore your data.
- Multi-valued cells must be split before accurate faceting is possible.
- Numeric and Timeline facets require converting text values into numbers or dates first.
- Scatterplot facets help explore relationships between two numeric columns.
Custom Facets and GREL
- Custom facets group data using computed results
from a GREL expression, not only the original cell values.
- GREL is a lightweight language that allows you to inspect,
transform, and classify data inside OpenRefine.
- Custom facets let you ask flexible questions about your data, such
as identifying multiple creators or unusually long titles.
- With conditional expressions like
if(), you can define new categories that support deeper exploration and data-quality checks.
Transforming Data
- Transformations modify the content of cells, while column operations reshape the structure of the dataset.
- Literal GREL replacements help remove unwanted characters and prepare text for further processing.
- Splitting columns separates different types of information, making the data easier to analyze and clean.
- Clustering identifies similar but inconsistently written values and supports manual standardization.
Reconciling Data with External Data Sources
- Reconciliation links text strings to unique identifiers in external databases.
- This makes your dataset more precise, reusable, and comparable across projects.
- OpenRefine provides a structured workflow for reconciliation: propose → review → confirm → enrich.
- The human researcher stays in control: machines suggest, but you decide.
Undo, Redo, and Exporting Workflows
- OpenRefine records every transformation you make.
- The Undo/Redo panel lets you move backward and
forward through your cleaning process.
- Workflows can be exported as JSON and reapplied to other projects, ensuring transparency and reproducibility.