forcats
: tools for working with categorical variables (factors)
forcats is an R package designed to simplify working with factors, which are R’s way of handling categorical variables with fixed sets of possible values. It provides tools for common factor manipulation tasks like reordering levels, changing values, and collapsing infrequent categories.
The package addresses the pain points of factor manipulation in R through functions that reorder factors by frequency (fct_infreq()), by another variable (fct_reorder()), manually (fct_relevel()), or collapse rare levels into an “other” category (fct_lump()). These tools are particularly useful for improving data visualization and analysis workflows. forcats is part of the tidyverse collection of packages and integrates seamlessly with other tidyverse tools.
Contributors#
Resources featuring forcats#
Amelia McNamara | Working with categorical data in R without losing your mind | RStudio (2019)
Categorical data, called “factor” data in R, presents unique challenges in data wrangling. R users often look down at tools like Excel for automatically coercing variables to incorrect datatypes, but factor data in R can produce very similar issues. The stringsAsFactors=HELLNO movement and standard tidyverse defaults have moved us away from the use of factors, but they are sometimes still necessary for analysis. This talk will outline common problems arising from categorical variable transformations in R, and show strategies to avoid them, using both base R and the tidyverse (particularly, dplyr and forcats functions).
VIEW MATERIALS http://www.amelia.mn/WranglingCats.pdf
(related paper from the DSS collection) http://bitly.com/WranglingCats https://peerj.com/collections/50-practicaldatascistats/
About the Author Amelia McNamara My work is focused on creating better tools for novices to use for data analysis. I have a theory about what the future of statistical programming should look like, and am working on next steps toward those tools. For more on that, see my dissertation. My research interests include statistics education, statistical computing, data visualization, and spatial statistics. At the moment, I am very interested in the effects of parameter choices on data analysis, particularly data visualizations. My collaborator Aran Lunzer and I have produced an interactive essay on histograms, and an initial foray into the effects of spatial aggregation. I talked more about spatial aggregation in my 2017 OpenVisConf talk, How Spatial Polygons Shape Our World
