The rstudio-conf repository is a comprehensive archive of materials from rstudio::conf, the annual conference for data science professionals working with R and Python. It organizes presentations, workshop materials, keynote recordings, and session slides from multiple years of conferences.
This repository provides access to educational content covering foundational topics like tidyverse and Shiny as well as advanced subjects including machine learning with tidymodels, package development, and data visualization. Materials are organized by year and presenter, making it straightforward to find specific sessions or explore workshop content from expert practitioners in the R and Python ecosystems. It serves as both a learning resource for those who couldn’t attend and a reference library for conference attendees.
Contributors#
Resources featuring rstudio-conf#
10 Years of Data Science Tools…and What Happens Next (Jonathan McPherson) | posit::conf(2025)
10 Years of Data Science Tools… and What Happens Next
Speaker(s): Jonathan McPherson
Abstract:
In this talk, I’ll reflect on a decade of work on RStudio and the principles of tool-building that have led it to become the standard data science environment for R. We’ll talk about how those same principles have guided the development of Positron, a new data science environment from Posit, and how you can apply them to your own tool-building work.
Slides - https://github.com/rstudio/rstudio-conf/blob/main/2025/jonathanmcpherson/10%20Years%20of%20Data%20Science%20Tools.key posit::conf(2025) Subscribe to posit::conf updates: https://posit.co/about/subscription-management/
What we’re doing to make Quarto fast(er) (Carlos Scheidegger, Posit) | posit::conf(2025)
What we’re doing to make Quarto fast(er) Speaker(s): Carlos Scheidegger
Abstract: Quarto is a powerful system, but its performance leaves much to be desired. In this talk, I’ll go through the things that make Quarto slow, and I will describe the journey I’m taking in 2025 to fix the issues. This is going to be a deeper technical talk on performance analysis, profiling, and will include discussing the custom tooling we’ve had to build to measure performance in a system as complex as Quarto.
Quarto markdown repo: https://github.com/rstudio/rstudio-conf/blob/main/2025/github.com/quarto-dev/quarto-markdown
posit::conf(2025) Subscribe to posit::conf updates: https://posit.co/about/subscription-management/

Abigail Haddad - GitHub: How To Tell Your Professional Story
GitHub is more than just a version control tool, it’s a way of explaining your professional identity to prospective employers and collaborators – and you can build your profile now, before you’re looking for new opportunities. This talk is about how to think of GitHub as an opportunity, not a chore, and how to represent yourself well without making developing your GitHub profile into a part-time job. I’ll talk about why GitHub adds value beyond a personal website, what kinds of projects are helpful to share, and some good development practices to get in the habit of, regardless of your project specifics.
Talk by Abigail Haddad
Slides: https://github.com/rstudio/rstudio-conf/tree/master/2024/abigailhaddad/haddad_2024_posit_slides.pdf
Brandon Sucher - Beyond the Classroom: Unspoken Realities of a Data Science Career
Embarking on a data science career extends well beyond academic knowledge. In many ways, the learning has just begun. Soft skills have become increasingly valuable, with effective collaboration being essential for success. Additionally, there may be moments when advocating for your own work is crucial, turning data scientists into persuasive salespeople for their own insights and contributions. In this talk, I’ll touch on some of the aspects of a data science job that aren’t talked about as frequently, including onboarding successfully, becoming a subject matter expert, and understanding the end-to-end data workflow.
Talk by Brandon Sucher
Slides: https://github.com/rstudio/rstudio-conf/blob/master/2024/brandonsucher/Posit_Conf_2024_Slides.pdf
Claire Bai - Translating clinical guidance to actionable insights with R
COTA’s team of oncologists and data scientists curate real-world data used by life science companies and healthcare partners to inform drug development and patient care. Over time, we have received many of the same questions from our data users, which indicated a dire need for translating our internal clinical guidance and data model knowledge into a tool for successfully navigating our data. We developed rwnavigator, an R package that helps users easily prepare COTA data for analysis with time-to-event packages. As first-time package developers, we ran into many challenges as we created, tested, and deployed rwnavigator. We hope to share with the greater R community our motivations for developing this package and best practices we learned along the way.
Talk by Claire Bai
Slides: https://github.com/rstudio/rstudio-conf/blob/master/2024/clairebai/rwnavigator_FINAL.pptx
Eric Leung - R Scripts to Databricks: Lessons in Production Workflow
This talk is about how a team at The Walt Disney Company took the past year to take a local R workflow into production. This updated process uses a mix of R, Python, SQL, Databricks, and Tableau dashboard, all of which involves multiple teams and stakeholders. The project started as a manual monthly process to measure the effect of ESPN’s live TV marketing to get consumers to convert to cross-channel platforms related to ESPN. But then we not only needed to automate this process, but also to scale it to measure the effect of marketing. The few lessons shared are: don’t reinvent the wheel; use not only the best tool for the job, but the best available; take time to get used to new tools.
Talk by Eric Leung
Slides: https://github.com/rstudio/rstudio-conf/tree/master/2024/ericleung/Leung_PositConf_Lessons.pptx
Alex Farach | Let’s start at the beginning - bits to character encoding in R | RStudio (2022)
Attendees will recieve a broad overview of the encoding and decoding process in the human-to-computer loop, how bits are used, and the math that gets us to common bit values. A brief history of ASCII, Latin-1, and UTF-8 will be provided as well.
Attendees will also be exposed to how character encoding works in R and in the tidyverse.
Talk materials are available at https://github.com/rstudio/rstudio-conf/blob/master/2022/alexfarach/bits_to_character_in_R_RSTUDIO%20-%20Alex%20F.pdf
Session: Lightning Talks
Beatriz Milz | Making Awesome Automations with GitHub Actions | Posit (2022)
This talk is an introduction to GitHub Actions (GHA), which is a feature from GitHub that allows us to automate several tasks in R. In this presentation, I aim to answer these questions: “What is GitHub Actions? How can I run R Scripts with it?”. I will list supplementary materials that are helpful to learn how to start automating tasks in R projects and packages.
Talk materials are available at https://beamilz.com/talks/en/2022-rstudio-conf/
Session: Lightning Talks
Colin Gillespie | Comparing Package Versions with Diffify | Posit (2022)
Even when we run the simplest of R scripts, we are using dozens of R packages. We use packages for data cleaning, writing reports, graphics and modelling. One of the strengths of R, is the depth of packages.
Unfortunately, packages change and break our code. Not all R packages have NEWS file, and even those that do, it might not be complete.
The diffify service aims to make comparing between package versions easier. For example, is there a new Import? Or perhaps a package has been removed from Suggests? Maybe the arguments of a function have changed? Or a function is no longer exported. Diffify can help.
NB: We have completed the back-end infrastructure, and are currently working on the front-end. Expected launch: ~May 1st
Talk materials are available at https://github.com/rstudio/rstudio-conf/blob/master/2022/colingillespie/2022-07-27_rstudio-conf%20-%20Colin%20Gillespie.pdf
Session: Lightning Talks
David Smith | Zero-setup R workshops with GitHub Codespaces | RStudio (2022)
If you’ve ever tried to run a workshop using R, you’ll be aware of the challenges of getting everyone’s laptop set up to able to run your R scripts, Rmarkdown documents, or Jupyter Notebooks without errors.
What if you could host a workshop using R that required no setup from the participants at all? With GitHub Codespaces, a GitHub repository becomes a cloud-based engine for running R in a container with a single click. Every participant, regardless of the power, configuration or operating system of their laptop will have the same experience, all with NO setup in advance.
In this talk, I’ll describe the process and share tips for setting up a GitHub repository for an R-based workshop to take advantage of GitHub Codespaces.
Talk materials are available at https://github.com/rstudio/rstudio-conf/blob/master/2022/davidsmith/Zero%20Setup%20Workshops%20RStudioConf%202022%20-%20David%20Smith.pdf
Session: Lightning Talks
Davis Vaughan | It’s about time | RStudio (2022)
Dealing with date-times is hard. Dealing with date-times without the proper tooling is even harder! clock is an R package that aims to provide comprehensive and safe handling of date-times. It goes beyond the date and date-time types that base R provides, implementing new types for year-month, year-quarter, ISO year-week, and many other date-like formats, all with up to nanosecond precision. In this talk, you’ll see how clock emphasizes “safety first” when manipulating date-times, and how these new date-time types can be used in your own work.
Talk materials are available at https://speakerdeck.com/davisvaughan/2022-rstudio-conf-its-about-time
Session: Lightning Talks

Dewey Dunnington | Accelerating geospatial computing using Apache Arrow | RStudio (2022)
The ‘arrow’ R package and wider Apache Arrow ecosystem provide an end-to- end solution for querying and computing on in-memory and bigger-than-memory data sets using the Apache Arrow C++ library. In this talk we introduce the ‘geoarrow’ package, which extends Arrow to provide efficient columnar storage for spatial types and functions to support spatial queries in the Arrow compute engine. We focus on a workflow where (1) data are stored in multiple files that can be hosted remotely (e.g., on S3-compatible storage), (2) queries are processed batchwise and in parallel allowing for efficient processing of bigger- than-memory geospatial data and (3) results can be passed without copying to Rust, Python, or other R packages for further analysis.
Talk materials are available at https://github.com/rstudio/rstudio-conf/blob/master/2022/deweydunnington/Accelerating%20geospatial%20computing%20using%20Apache%20Arrow%20-%20Dewey%20Dunnington.pdf
Session: Lightning Talks
George Stagg | WebR: R compiled for WebAssembly and running in the browser | RStudio (2022)
In this talk I introduce webR, a port of R to WebAssembly using Emscripten. WebR brings a full R environment to the browser, enabling R code execution, numerical analysis, loading packages and more. No local or cloud-based R servers are required as all computation is performed within the browser. I give a brief overview of our build process for webR, describing the toolchain and some of the issues we encountered. A publicly available web-based R session is demonstrated, with package and plotting support.
Talk materials are available at https://github.com/rstudio/rstudio-conf/blob/master/2022/georgestagg/webr%20-%20George%20Stagg.pdf
Session: Lightning Talks

Hannah Frick | Censored - Survival Analysis in Tidymodels | Posit (2022)
tidymodels is extending support for survival analysis and censored is a new parsnip extension package for survival models. It offers various types of models: parametric models, semi-parametric models like the Cox model, and tree- based models like decision trees, boosted trees, and random forests. They all come with the consistent parsnip interface so that you can focus on the modelling instead of details of the syntax. Happy modelling!
Talk materials are available at https://hfrick.github.io/rstudio-conf-2022
Session: Updates from the tidymodels team

Jamie Ralph | Developing internal tools for multi-lingual teams | RStudio (2022)
Internal packages are great for boosting productivity and promoting good practice, but what kinds of challenges do we face when designing solutions for multi-lingual teams? Here I will advocate for a design approach we are using at Bumble to build Python and R packages with the same foundations. I will discuss the benefits of this approach for the developer and the wider organisation.
Talk materials are available at https://github.com/jamie-ralph/rstudio-conf-2022
Session: Some of my best friends use Python
Josiah Parry | Exploratory Spatial Data Analysis in the tidyverse | RStudio (2022)
R has come quite a long way to enable spatial analysis over the past few years. Packages such as sf have made spatial analysis and mapping easier for many. However, adoption of R for spatial statistics and econometrics has been limited. Many spatial analysts, researchers, and practitioners lean on Python libraries such as pysal.
In this talk I briefly discuss my journey through spatial analysis and introduce a new package sfdep which provides a tidy interface to spatial statistics and noteably exploratory spatial data analysis. sfdep is an interface to the spdep package as well as implements other common exploratory spatial statistics.
Talk materials are available at https://github.com/rstudio/rstudio-conf/blob/master/2022/josiahparry/rstudio__conf(2022L)%20-%20Josiah%20Parry.pdf
Session: Lightning Talks
Kirill Müller | dm: Analyze, build and deploy relational data models | RStudio (2022)
dm bridges the gap in the data pipeline between standalone data frames and relational databases. Implementing a “grammar of joined tables”, it provides a consistent set of verbs for consuming, creating, and deploying relational data models. In this talk I present a short overview of how dm can help your data analysis and ETL processes, and highlight recent developments.
Talk materials are available at https://github.com/rstudio/rstudio-conf/blob/master/2022/kirillmuller/dm-rstudioconf2022.pdf/
Session: Databases
Lightning Talk | Andreas Hofheinz | leafdown: Interactive Multi-layer maps in Shiny apps
Interactive maps are indispensable tools for exploring spatial datasets because of their real-world context and intuitiveness. For a comprehensive understanding of the data, it is often necessary to switch between several map layers (such as states and counties) and to analyze multiple variables simultaneously - both of which are challenging. In this talk, I will show how we can overcome these challenges using the leafdown package, which allows us to create multi-layer maps embedded in Shiny apps.
Talk materials are available at https://github.com/rstudio/rstudio-conf/blob/master/2022/andreashofheinz/leafdown_presentation%20-%20Andreas%20H.pdf
Session: Lightning Talks
Mark Rieke | Intro to Workboots: Make Prediction Intervals from Tidymodel Workflows | Posit (2022)
Sometimes, we want a model that generates a range of possible outcomes around each prediction. Other times, we just care about point predictions and may opt to use a fancy model like XGBoost. But what if we want the best of both worlds: getting a range of predictions while still using a fancy model? That’s where bootstrapping comes to the rescue! By using bootstrap resampling, we can create many models that produce a prediction distribution – regardless of the model type! In this talk, I’ll give an overview of bootstrap resampling for prediction, the pros/cons of this method, and how to implement it as a part of a tidymodel workflow with the workboots package.
Talk materials are available at https://github.com/markjrieke/rstudio-conf-2022
Session: Machine learning
Matthew Kay | Visualizing distributions and uncertainty using ggdist | RStudio (2022)
I propose a talk on visualizing distributions and uncertainty using {ggdist}. I will describe how to think systematically about distributional visualization as mappings of PDFs, CDFs, and quantile functions onto aesthetics, and how support for this enables creative and easy exploration of the space of possible uncertainty visualizations. I will highlight features like true gradient support in R 4.1, support for distribution vector datatypes, and the automatic binwidth- selecting geom_dots(). I expect to leave the audience with: (1) a systemic way to think about visualizing distributions and uncertainty in the grammar of graphics and (2) an understanding of how to actually do it using ggdist.
Talk materials are available at https://www.mjskay.com/presentations/rstudio-conf-2022-talk.pdf
Session: Lightning Talks
Melissa Van Bussel | Achieving a seamless workflow between R, Python and SAS from within RStudio
Some of my best friends use Python…and all of my coworkers use SAS.
Statistics Canada is the official statistical agency of Canada and employs over 6,000 employees. Statistics Canada has a legal obligation to ensure that personal information collected for statistical purposes is kept strictly confidential. An internal system which prevents the release of confidential information is only implemented in SAS. As such, many Analysts and Data Scientists at Statistics Canada must use the SAS programming language as part of their workflow. It is therefore imperative to find ways to work with open source programming languages and SAS seamlessly. I will present a method for achieving a harmonious workflow between R, Python, and SAS, all within RStudio.
Talk materials are available at https://github.com/melissavanbussel/rstudio-conf-2022
Session: Some of my best friends use Python
Nicholas Tierney | The Future of Missing Data | Posit (2022)
If you do data analysis, you encounter missing data. Missing data upsets data analysis workflow because you have to make decisions on how to deal with it - do you impute the values? Remove them? These each have consequences! The data we often encounter does not always arrive with a research question in mind, so how do you understand why you have missing values? When I first encountered missing data I was incredibly frustrated at how hard it was to understand and explore it. This frustration led me to create two R packages to explore missing data, {naniar} and {visdat}. In this talk I will showcase how to use these tools to explore missing data, as well as new features that have not been presented, and planned advances.
Talk materials are available at https://github.com/rstudio/rstudio-conf/blob/master/2022/nicholastierney/The%20Future%20of%20NA%20Data.pdf
Session: Lightning Talks
Nicola Rennie | Say Hello! to Multilingual Shiny Apps | Posit (2022)
Multilingual shiny apps are not straightforward to build. Translation affects almost every single aspect of an app. Although there are a few packages designed to automate the translation process, they tend to only work for the most widely spoken languages.
Using a bilingual English-Welsh shiny app we developed to present public health data as a case study, this talk will discuss:
• how we built a multilingual shiny app; • how translation affected design decisions; • how we overcame the main issues faced; • and most importantly, what we’d do differently next time.
By the end of this talk, you will have a better understanding of how to translate your Shiny app to help you to share your app with a much wider audience.
Talk materials are available at https://nrennie.rbind.io/talks/2022-july-rstudio-conf/
Session: Lightning Talks
Rebecca Hadi | Exploring Query Optimization: How a few lines of code can save hours of time
If you find yourself waiting hours for your queries to run, this talk is for you. Learn from my query mistakes and avoid crashing your database. In this talk, you’ll learn about minor code changes that can dramatically improve query run time.
Talk materials are available at https://github.com/bhadi26/rstudio-conf-2022-slides/blob/main/rebecca-hadi-r-studio-presentation-2022.pptx
Session: Databases
Zhian N. Kamvar | Building Accessible Lessons with R and Friends | RStudio (2022)
The Carpentries is a global community of volunteers who collaboratively develop and deliver lessons to build capacity in data and coding skills to researchers worldwide. In the recent redesign of our lesson infrastructure (serving more than100 lessons, used daily by more than 5K learners), we replaced embedded Jekyll templates with a workbench of modular and accessible packages using R and Pandoc. By leveraging renv and knitr for R-based lessons, we provide a seamless and collaborative lesson development experience that maximizes reproducibility and minimizes frustration so authors can focus on the contents, not the tooling. We demonstrate how anyone can use our infrastructure to build customised and accessible sites for their own lessons or tutorials.
Talk materials are available at https://github.com/zkamvar/rstudio-conf-2022
Session: It takes a village: building communities of practice
Rebecca Barter | Becoming an R blogger | RStudio (2020)
Blogging is an excellent way to learn, improve your communication skills, and gain exposure in the R and data science communities. In this talk, I will discuss how and why I started blogging, and why you should too. I will guide you through choosing topics, writing your blog using RStudio and blogdown, hosting it on netlify, and sharing your blog with the world. This talk is for you if you’ve wanted to start a blog on R, data science, or to showcase your data analyses, but don’t know where to start.
Materials: github https://github.com/rlbarter/rstudio-conf-2020-blogger-slides slides (pdf) https://github.com/rlbarter/rstudio-conf-2020-blogger-slides/blob/master/Becoming%20an%20R%20blogger
Jonathan McPherson | New language features in RStudio | RStudio (2019)
RStudio 1.2 dramatically improves support for many languages frequently used alongside R in data science projects, including SQL, D3, Stan, and Python. In this talk, you’ll learn how to use RStudio 1.2’s new language features to work more efficiently and fluidly in multi-lingual projects.
VIEW MATERIALS https://github.com/rstudio/rstudio-conf/tree/master/2019/RStudio_1.2_Language_Features--Jonathan_McPherson
About the Author Jonathan McPherson Jonathan is a software engineer at RStudio working on the IDE. In the past, he’s written Web applications at a nuclear site in the desert, exploratory information visualization systems at UC Davis, and features for flagship Office products and modern web applications at Microsoft
Jeroen Ooms | A preview of Rtools 4.0 | RStudio (2019)
Rtools is getting a major upgrade. In addition to the latest gcc, it now includes a full build system and package manager to build, install, and distribute external c/c++/fortran libraries needed by R packages. Thereby it bridges the long-standing gap between Windows and MacOS/Linux with respect to the availability of high quality, up-to-date system libraries. In this talk, we will show how to build and install system libraries with Rtools, and manage your Rtools build environment. It should be interesting both for Windows users as well as non-Windows package authors that are interested in reducing the pain of making things work on Windows.
VIEW MATERIALS https://resources.rstudio.com/rstudio-conf-2019/a-preview-of-rtools-4-0
About the Author Jeroen Ooms Postdoc hacker for @ropensci at UC Berkeley

Amanda Gadrow | Getting it right: Writing reliable and maintainable R code | RStudio (2019)
How can you tell that your scripts, applications, and package functions are working as expected? Are you sure that when you make changes in one part of the code, it won’t break something in another part? Have you thought deeply about how the consumers of your code (including Future You) will use it, maintain it, fix it, and improve it? Code quality is essential not only for reliable results but also for your script’s maintainability and your users’ satisfaction. Quality can be measured in part with targeted testing, and fortunately, there are several effective and easy-to-use code testing tools available in R. This talk will discuss some of the most useful testing packages, covering both concepts and examples.
VIEW MATERIALS https://github.com/rstudio/rstudio-conf/tree/master/2019/Testing_R_Code--Amanda_Gadrow
About the Author Amanda Gadrow Amanda is a software engineer with many years’ experience writing automated test frameworks for enterprise software. She started learning R when she joined RStudio in 2016, and has been basking in its glory ever since. Amanda leads the QA and Support teams, and spends a significant amount of time analyzing customer data to improve the products and optimize support. She is a co-organizer of R-Ladies Columbus, and an avid musician on the side
Thomas Lin Pedersen | gganimate live cookbook | RStudio (2019)
Animation of data visualisation is becoming increasingly popular both as an attention grabber on social media and as a way to tell small data stories. gganimate is a package that extends ggplot2 for making animations and provides a grammar of animation on top of the grammar of graphics. This talk will quickly introduce gganimate, and then dive into a series of different animation and show how they were made and how they could be changed or expanded.
Slides: https://data-imaginist.com/slides/rstudioconf2019 4 Resources: https://resources.rstudio.com/rstudio-conf-2019/gganimate-live-cookbook 4 Discussion https://community.rstudio.com/t/gganimate-live-cookbook-thomas-lin-pedersen-rstudio-conf-2019l-video/24852

Max Kuhn | parsnip A tidy model interface | RStudio (2019)
parsnip is a new tidymodels package that generalizes model interfaces across packages. The idea is to have a single function interface for types of specific models (e.g. logistic regression) that lets the user choose the computational engine for training. For example, logistic regression could be fit with several R packages, Spark, Stan, and Tensorflow. parsnip also standardizes the return objects and sets up some new features for some upcoming packages.
VIEW MATERIALS https://github.com/rstudio/rstudio-conf/tree/master/2019/Parsnip--Max_Kuhn
About the Author Max Kuhn Dr. Max Kuhn is a Software Engineer at RStudio. He is the author or maintainer of several R packages for predictive modeling including caret, Cubist, C50 and others. He routinely teaches classes in predictive modeling at rstudio::conf, Predictive Analytics World, and UseR! and his publications include work on neuroscience biomarkers, drug discovery, molecular diagnostics and response surface methodology. He and Kjell Johnson wrote the award-winning book Applied Predictive Modeling in 2013

Javier Luraschi | Scaling R with Spark | RStudio (2019)
This talk introduces new features in sparklyr that enable real-time data processing, brand new modeling extensions and significant performance improvements. The sparklyr package provides an interface to Apache Spark to enable data analysis and modeling in large datsets through familiar packages like dplyr and broom.
VIEW MATERIALS https://github.com/rstudio/rstudio-conf/tree/master/2019/Scaling%20R%20with%20Spark%20-%20Javier%20Luraschi
About the Author Javier Luraschi Javier is a Software Engineer with experience in technologies ranging from desktop, web, mobile and backend; to augmented reality and deep learning applications. He previously worked for Microsoft Research and SAP and holds a double degree in Mathematics and Software Engineering
Nic Crane | The future’s Shiny: Pioneering genomic medicine in R | Posit (2019)
Shiny’s expanding capabilities are rapidly transforming how it is used in an enterprise. This talk details the creation of a large-scale application, supporting hundreds of concurrent users, making use of the future and promises packages. The 100,000 genomes project is an ambitious exercise that follows on from the Human Genome Project - aiming to put the UK at the forefront of genomic medicine, with the NHS as the first health service in the world to offer precision medicine to patients with rare diseases and cancer. Data is at the heart of this project; not only the outputs of the genomic sequencing, but vast amounts of metadata used to track progress against the 100,000 genome target and the status and path of each case through the sample tracking pipeline. In order to make this data readily available to stakeholders, Shiny was used to create an application containing multiple interactive dashboards. A scaled-up version of the app is being rolled out in early 2019 to a much larger audience to support the National Genomics Informatics Service, with the challenge of creating a complex app capable of supporting so many users without grinding to a halt. In this talk, I will explain why Shiny was the obvious technology choice for this task, and discuss the design decisions which enabled this project’s success.
VIEW MATERIALS https://github.com/thisisnic/rstudio-conf-2019
About the Author Nic Crane Nic Crane is a Data Scientist at Elucidata, and has formerly worked for Mango Solutions and IBM. She is passionate about learning and teaching all things data science