tidyr is an R package for creating tidy data, where each variable is a column, each observation is a row, and each value is a cell. This standardized data structure is used throughout the tidyverse ecosystem and reduces time spent reformatting data during analysis.
The package provides five main categories of functions: pivoting between long and wide formats (pivot_longer(), pivot_wider()), rectangling nested lists like JSON into tibbles, nesting and unnesting grouped data frames, splitting and combining character columns, and handling missing values. It supersedes the older reshape and reshape2 packages with a focused design specifically for data tidying rather than general reshaping or aggregation.
Contributors#
Resources featuring tidyr#
Hadley Wickham - R in Production
R in Production by Hadley Wickham
Visit https://rstats.ai for information on upcoming conferences.
Abstract: In this talk, we delve into the strategic deployment of R in production environments, guided by three core principles to elevate your work from individual exploration to scalable, collaborative data science. The essence of putting R into production lies not just in executing code but in crafting solutions that are robust, repeatable, and collaborative, guided by three key principles:
-
Not just once: Successful data science projects are not a one-off, but will be run repeatedly for months or years. I’ll discuss some of the challenges for creating R scripts and applications that run repeatedly, handle new data seamlessly, and adapt to evolving analytical requirements without constant manual intervention. This principle ensures your analyses are enduring assets not throw away toys.
-
Not just my computer: the transition from development on your laptop (usually windows or mac) to a production environment (usually linux) introduces a number of challenges. Here, I’ll discuss some strategies for making R code portable, how you can minimise pain when something inevitably goes wrong, and few unresolved auth challenges that we’re currently working on.
-
Not just me: R is not just a tool for individual analysts but a platform for collaboration. I’ll cover some of the best practices for writing readable, understandable code, and how you might go about sharing that code with your colleagues. This principle underscores the importance of building R projects that are accessible, editable, and usable by others, fostering a culture of collaboration and knowledge sharing.
By adhering to these principles, we pave the way for R to be a powerful tool not just for individual analyses but as a cornerstone of enterprise-level data science solutions. Join me to explore how to harness the full potential of R in production, creating workflows that are robust, portable, and collaborative.
Bio: Hadley is Chief Scientist at Posit PBC, winner of the 2019 COPSS award, and a member of the R Foundation. He builds tools (both computational and cognitive) to make data science easier, faster, and more fun. His work includes packages for data science (like the tidyverse, which includes ggplot2, dplyr, and tidyr)and principled software development (e.g. roxygen2, testthat, and pkgdown). He is also a writer, educator, and speaker promoting the use of R for data science. Learn more on his website, http://hadley.nz .
Mastodon: https://fosstodon.org/@hadleywickham
Presented at the 2024 New York R Conference (May 17, 2024) Hosted by Lander Analytics (https://landeranalytics.com )

posit::conf(2023) Workshop: Introduction to Data Science with R and Tidyverse
Register now: http://pos.it/conf Instructors: Posit Academy Instructors Workshop Duration: 2-Day Workshop
This course is ideal for: • those new to R or the Tidyverse • anyone who has dabbled in R, but now wants a rigorous foundation in up-to-date data science best practices • SAS and Excel users looking to switch their workflows to R
This is not a standard workshop, but a six-week online apprenticeship that culminates in two in-person days at posit::conf(2023). Begins August 7th, 2023. No knowledge of R required. Visit posit.co/academy to learn more about this uniquely effective learning format.
Here, you will learn the foundations of R and the Tidyverse under the guidance of a Posit Academy mentor and in the company of a close group of fellow learners. You will be expected to complete a weekly curriculum of interactive tutorials, and to attend a weekly presentation meeting with your mentor and fellow students. Topics will include the basics of R, importing data, visualizing data with ggplot2, wrangling data with dplyr and tidyr, working with strings, factors, and date-times, modelling data with base R, and reporting reproducibly with quarto
posit::conf(2023) Workshop: Tidy time series and forecasting in R
Register now: http://pos.it/conf Instructor: Rob J Hyndman Workshop Duration: 2-Day Workshop
This course is for you if you: • already use the tidyverse packages in R such as dplyr, tidyr, tibble and ggplot2 • need to analyze large collections of related time series • would like to learn how to use some tidy tools for time series analysis including visualization, decomposition and forecasting
It is common for organizations to collect huge amounts of data over time, and existing time series analysis tools are not always suitable to handle the scale, frequency and structure of the data collected. In this workshop, we will look at some packages and methods that have been developed to handle the analysis of large collections of time series.
On day 1, we will look at the tsibble data structure for flexibly managing collections of related time series. We will look at how to do data wrangling, data visualizations and exploratory data analysis. We will explore feature-based methods to explore time series data in high dimensions. A similar feature-based approach can be used to identify anomalous time series within a collection of time series, or to cluster or classify time series. Primary packages for day 1 will be tsibble, lubridate and feasts (along with the tidyverse of course).
Day 2 will be about forecasting. We will look at some classical time series models and how they are automated in the fable package, and we will explore the creation of ensemble forecasts and hybrid forecasts. Best practices for evaluating forecast accuracy will also be covered. Finally, we will look at forecast reconciliation, allowing millions of time series to be forecast in a relatively short time while accounting for constraints on how the series are related
Hadley Wickham | testthat 3.0.0 | RStudio (2020)
In this webinar, I’ll introduce some of the major changes coming in testthat 3.0.0. The biggest new idea in testthat 3.0.0 is the idea of an edition. You must deliberately choose to use the 3rd edition, which allows us to make breaking changes without breaking old packages. testthat 3e deprecates a number of older functions that we no longer believe are a good idea, and tweaks the behaviour of expect_equal() and expect_identical() to give considerably more informative output (using the new waldo package).
testthat 3e also introduces the idea of snapshot tests which record expected value in external files, rather than in code. This makes them particularly well suited to testing user output and complex objects. I’ll show off the main advantages of snapshot testing, and why it’s better than our previous approaches of verify_output() and expect_known_output().
Finally, I’ll go over a bunch of smaller quality-of-life improvements, including tweaks to test reporting and improvements to expect_error(), expect_warning() and expect_message().
Webinar materials: https://rstudio.com/resources/webinars/testthat-3/
About Hadley: Hadley Wickham is the Chief Scientist at RStudio, a member of the R Foundation, and Adjunct Professor at Stanford University and the University of Auckland. He builds tools (both computational and cognitive) to make data science easier, faster, and more fun. You may be familiar with his packages for data science (the tidyverse: including ggplot2, dplyr, tidyr, purrr, and readr) and principled software development (roxygen2, testthat, devtools, pkgdown). Much of the material for the course is drawn from two of his existing books, Advanced R and R Packages, but the course also includes a lot of new material that will eventually become a book called “Tidy tools”

Data Manipulation Tools: dplyr – Pt 3 Intro to the Grammar of Data Manipulation with R
Data wrangling is too often the most time-consuming part of data science and applied statistics. Two tidyverse packages, tidyr and dplyr, help make data manipulation tasks easier. Keep your code clean and clear and reduce the cognitive load required for common but often complex data science tasks.
dplyr docs: dplyr.tidyverse.org/reference/
- http://dplyr.tidyverse.org/reference/union.html
- http://dplyr.tidyverse.org/reference/intersect.html
- http://dplyr.tidyverse.org/reference/set_diff.htm
Pt. 1: What is data wrangling? Intro, Motivation, Outline, Setup https://youtu.be/jOd65mR1zfw
- /01:44 Intro and what’s covered Ground Rules
- /02:40 What’s a tibble
- /04:50 Use View
- /05:25 The Pipe operator:
- /07:20 What do I mean by data wrangling?
Pt. 2: Tidy Data and tidyr https://youtu.be/1ELALQlO-yM
- /00:48 Goal 1 Making your data suitable for R
- /01:40
tidyr“Tidy” Data introduced and motivated - /08:10
tidyr::gather - /12:30
tidyr::spread - /15:23
tidyr::unite - /15:23
tidyr::separate
Pt. 3: Data manipulation tools: dplyr https://youtu.be/Zc_ufg4uW4U
- 00.40 setup
- 02:00
dplyr::select - 03:40
dplyr::filter - 05:05
dplyr::mutate - 07:05
dplyr::summarise - 08:30
dplyr::arrange - 09:55 Combining these tools with the pipe (Setup for the Grammar of Data Manipulation)
- 11:45
dplyr::group_by
Pt. 4: Working with Two Datasets: Binds, Set Operations, and Joins https://youtu.be/AuBgYDCg1Cg Combining two datasets together
- /00.42
dplyr::bind_cols - /01:27
dplyr::bind_rows - /01:42 Set operations
dplyr::union,dplyr::intersect,dplyr::set_diff - /02:15 joining data
dplyr::left_join,dplyr::inner_join,dplyr::right_join,dplyr::full_join,
Cheatsheets: https://www.rstudio.com/resources/cheatsheets/
Documentation:
tidyr docs: tidyr.tidyverse.org/reference/
tidyrvignette: https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.htmldplyrdocs: http://dplyr.tidyverse.org/reference/dplyrone-table vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.htmldplyrtwo-table (join operations) vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/two-table.html
Tidy Data and tidyr – Pt 2 Intro to Data Wrangling with R and the Tidyverse
Data wrangling is too often the most time-consuming part of data science and applied statistics. Two tidyverse packages, tidyr and dplyr, help make data manipulation tasks easier. Keep your code clean and clear and reduce the cognitive load required for common but often complex data science tasks.
http://tidyr.tidyverse.org/reference/
- http://tidyr.tidyverse.org/reference/gather
- http://tidyr.tidyverse.org/reference/spread
- http://tidyr.tidyverse.org/reference/unite
- http://tidyr.tidyverse.org/reference/separate
Pt. 1: What is data wrangling? Intro, Motivation, Outline, Setup https://youtu.be/jOd65mR1zfw
- /01:44 Intro and what’s covered Ground Rules
- /02:40 What’s a tibble
- /04:50 Use View
- /05:25 The Pipe operator:
- /07:20 What do I mean by data wrangling?
Pt. 2: Tidy Data and tidyr https://youtu.be/1ELALQlO-yM
- 00:48 Goal 1 Making your data suitable for R
- 01:40
tidyr“Tidy” Data introduced and motivated - 08:10
tidyr::gather - 12:30
tidyr::spread - 15:23
tidyr::unite - 15:23
tidyr::separate
Pt. 3: Data manipulation tools: dplyr https://youtu.be/Zc_ufg4uW4U
- 00.40 setup
- /02:00
dplyr::select - /03:40
dplyr::filter - /05:05
dplyr::mutate - /07:05
dplyr::summarise - /08:30
dplyr::arrange - /09:55 Combining these tools with the pipe (Setup for the Grammar of Data Manipulation)
- /11:45
dplyr::group_by - /15:00
dplyr::group_by
Pt. 4: Working with Two Datasets: Binds, Set Operations, and Joins https://youtu.be/AuBgYDCg1Cg Combining two datasets together
- /00.42
dplyr::bind_cols - /01:27
dplyr::bind_rows - /01:42 Set operations
dplyr::union,dplyr::intersect,dplyr::set_diff - /02:15 joining data
dplyr::left_join,dplyr::inner_join,dplyr::right_join,dplyr::full_join,
Cheatsheets: https://www.rstudio.com/resources/cheatsheets/
Documentation:
tidyr docs: tidyr.tidyverse.org/reference/
tidyrvignette: https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.htmldplyrdocs: http://dplyr.tidyverse.org/reference/dplyrone-table vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.htmldplyrtwo-table (join operations) vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/two-table.html
What is data wrangling? Intro, Motivation, Outline, Setup – Pt. 1 Data Wrangling Introduction
Data wrangling is too often the most time-consuming part of data science and applied statistics. Two tidyverse packages, tidyr and dplyr, help make data manipulation tasks easier. These videos introduce you to these tools. Keep your R code clean and clear and reduce the cognitive load required for common but often complex data science tasks.
Pt. 1: What is data wrangling? Intro, Motivation, Outline, Setup https://youtu.be/jOd65mR1zfw
- 01:44 Intro and what’s covered Ground Rules
- 02:40 What’s a tibble
- 04:50 Use View
- 05:25 The Pipe operator:
- 07:20 What do I mean by data wrangling?
Pt. 2: Tidy Data and tidyr https://youtu.be/1ELALQlO-yM
- /00:48 Goal 1 Making your data suitable for R
- /01:40
tidyr“Tidy” Data introduced and motivated - /08:15
tidyr::gather - /12:38
tidyr::spread - /15:30
tidyr::unite - /15:30
tidyr::separate
Pt. 3: Data manipulation tools: dplyr https://youtu.be/Zc_ufg4uW4U
- 00.40 setup
- /02:00
dplyr::select - /03:40
dplyr::filter - /05:05
dplyr::mutate - /07:05
dplyr::summarise - /08:30
dplyr::arrange - /09:55 Combining these tools with the pipe (Setup for the Grammar of Data Manipulation)
- /11:45
dplyr::group_by - /15:00
dplyr::group_by
Pt. 4: Working with Two Datasets: Binds, Set Operations, and Joins https://youtu.be/AuBgYDCg1Cg Combining two datasets together
- /00.42
dplyr::bind_cols - /01:27
dplyr::bind_rows - /01:42 Set operations
dplyr::union,dplyr::intersect,dplyr::set_diff - /02:15 joining data
dplyr::left_join,dplyr::inner_join,dplyr::right_join,dplyr::full_join,
Cheatsheets: https://www.rstudio.com/resources/cheatsheets/
Documentation:
tidyr docs: tidyr.tidyverse.org/reference/
tidyrvignette: https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.htmldplyrdocs: http://dplyr.tidyverse.org/reference/dplyrone-table vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.htmldplyrtwo-table (join operations) vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/two-table.html
New York Times “For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights”, By STEVE LOHRAUG. 17, 2014 https://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-work.html
Working with Two Datasets: Binds, Set Operations, and Joins – Pt 4 Intro to Data Manipulation
Data wrangling is too often the most time-consuming part of data science and applied statistics. Two tidyverse packages, tidyr and dplyr, help make data manipulation tasks easier. Keep your R code clean and clear and reduce the cognitive load required for common but often complex data science tasks.
dplyr docs: dplyr.tidyverse.org/reference/
Pt. 1: What is data wrangling? Intro, Motivation, Outline, Setup https://youtu.be/jOd65mR1zfw
- /01:44 Intro and what’s covered Ground Rules:
- /02:40 What’s a tibble
- /04:50 Use View
- /05:25 The Pipe operator:
- /07:20 What do I mean by data wrangling?
Pt. 2: Tidy Data and tidyr https://youtu.be/1ELALQlO-yM
- /00:48 Goal 1 Making your data suitable for R
- /01:40
tidyr“Tidy” Data introduced and motivated - /08:10
tidyr::gather - /12:30
tidyr::spread - /15:23
tidyr::unite - /15:23
tidyr::separate
Pt. 3: Data manipulation tools: dplyr https://youtu.be/Zc_ufg4uW4U
- /00.40 setup
- /02:00
dplyr::select - /03:40
dplyr::filter - /05:05
dplyr::mutate - /07:05
dplyr::summarise - /08:30
dplyr::arrange - /09:55 Combining these tools with the pipe (Setup for the Grammar of Data Manipulation)
- /11:45
dplyr::group_by
Pt. 4: Working with Two Datasets: Binds, Set Operations, and Joins https://youtu.be/AuBgYDCg1Cg Combining two datasets together
- 00.42
dplyr::bind_cols - 01:27
dplyr::bind_rows - 01:42 Set operations
dplyr::union,dplyr::intersect,dplyr::set_diff - 02:15 joining data -
dplyr::left_join,dplyr::inner_join, -dplyr::right_join,dplyr::full_join,
Cheatsheets: https://www.rstudio.com/resources/cheatsheets/
Documentation:
tidyr docs: tidyr.tidyverse.org/reference/
tidyrvignette: https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.htmldplyrdocs: http://dplyr.tidyverse.org/reference/dplyrone-table vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.htmldplyrtwo-table (join operations) vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/two-table.html

