tidyverse
Easily install and load packages from the tidyverse
The tidyverse package provides a single command to install and load a collection of R packages that share common data structures and design principles. It bundles core packages for data analysis workflows, including tools for visualization (ggplot2), manipulation (dplyr), tidying (tidyr), import (readr), and functional programming (purrr).
The package solves the problem of managing multiple dependencies by loading nine core packages at once and providing utilities to check for package conflicts and updates. It also installs additional packages for working with specific data types (dates, times, factors, strings) and importing data from various sources (Excel, SPSS, JSON, web APIs). The shared API design across all tidyverse packages means they work together seamlessly without requiring different syntax or data structure conversions.
Contributors#
Resources featuring tidyverse#
Coding vs. thinking programmatically | Samia Baig | Data Science Hangout
ADD THE DATA SCIENCE HANGOUT TO YOUR CALENDAR HERE: https://pos.it/dsh - All are welcome! We’d love to see you!
This week’s guest was Samia Baig, Senior Data Scientist/Data Engineer at Johnson & Johnson Innovative Medicine!
Some topics covered in this week’s Hangout were transitioning from a background in pharmacy and public health to a data career in pharma, distinguishing the responsibilities of data scientists versus analytics engineers, strategies for making data pipelines more robust (and convincing your team that you NEED robust pipelines in the first place), and the value of joining open-source communities like Tidy Tuesday.
Resources mentioned in the video and chat: Posit Data Science Lab → https://pos.it/dslab Tidy Tuesday GitHub Repository → https://github.com/rfordatascience/tidytuesday {dbplyr} → https://dbplyr.tidyverse.org/ The Missing Semester of Your CS Education → https://missing.csail.mit.edu/
► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu
Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh The Lab: https://pos.it/dslab LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co
Thanks for hanging out with us!
Timestamps 00:00 Introduction 03:22 “Do you feel like analytics engineer is a good descriptor for what you do?” 04:44 “How did you get into data from being on a pharmacist’s job path?” 11:55 “What was it like in the move that you made from public health to pharma?” 16:57 “What do you say are distinguishing factors between data science and engineering?” 20:16 “What are the most popular tools that you and your team use, in your job at J&J?” 24:00 “What do you use SQL in?” 27:40 “How would you go about convincing a team of the need for a more robust pipeline?” 31:10 “Can you define robust?” 33:31 “Do you happen to have any specific resources or strategies or examples that might help students or others with that mindset of thinking programmatically?” 37:06 “Are there any non data science skills that are very helpful in your either current or former job?” 40:23 “Is there any kind of community among data scientists across the whole company?” 45:44 “What are your biggest data challenges that you have?” 46:12 “If you had a magic wand, what problem would you solve in that area?” 49:52 “What is a piece of career advice that maybe you wish you could go back in time and give yourself?”
Data analysis with Posit AI-assistants | Sara Altman & Simon Couch | Data Science Lab
The Data Science Lab is a live weekly call. Register at pos.it/dslab! Discord invites go out each week on lives calls. We’d love to have you!
The Lab is an open, messy space for learning and asking questions. Think of it like pair coding with a friend or two. Learn something new, and share what you know to help others grow.
On this call, Libby Heeren is joined by Sara Altman who walks through using Posit’s AI assistants to analyze data, including a sneak peek at Posit Assistant, and Simon Couch drops by to give us a demo of the reviewer package! Together, Sara and Simon author the Posit AI Newsletter, the best place to stay up-to-date with all the cool tools and advice on staying an informed and level-headed AI user.
Hosting crew from Posit: Libby Heeren, Isabella Velasquez, Sara Altman, Simon Couch
Sara’s Bluesky: https://bsky.app/profile/sara-altman.bsky.social Sara’s LinkedIn: https://www.linkedin.com/in/sarakaltman/ Sara’s GitHub: https://github.com/skaltman Posit AI Newsletter by Sara and Simon: https://posit.co/blog/?category=roundups
Resources from the hosts and chat:
Positron IDE → https://positron.posit.co/ Databot Extension → https://positron.posit.co/databot.html Getting started with Positron Assistant → https://positron.posit.co/assistant-getting-started.html Posit Assistant (Private Beta) → https://posit-ai-beta.share.connect.posit.cloud/ Reviewer Package (by Simon Couch) → https://github.com/simonpcouch/reviewer ellmer Package → https://elmer.tidyverse.org/ chatlas Package → https://github.com/posit-dev/chatlas Read the Posit AI Newsletter → https://posit.co/blog/?category=roundups Sign up to get the Posit AI Newsletter → http://pos.it/ai-news Simon’s blog post about local LLMs not quite being ready for primetime → https://posit.co/blog/local-models-are-not-there-yet/ Join the waitlist for Posit AI in RStudio → https://posit.co/products/ai/ Posit AI Known Issues & FAQs → https://posit-ai-beta.share.connect.posit.cloud/#frequently-asked-questions-faqs Blog post from Simon and Sara about Privacy and LLMs → https://posit.co/blog/trust-llm-tools/ DS Lab YouTube playlist → https://youtube.com/playlist?list=PL9HYL-VRX0oSeWeMEGQt0id7adYQXebhT&si=7tmU6EAJpO5S7GBh
► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu
Follow Us Here: Website: https://www.posit.co The Lab: https://pos.it/dslab Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co
Thanks for learning with us!
Timestamps 00:00 Introduction 07:23 “Would you mind real quick just briefly explaining the differences between Positron Assistant and Databot?” 15:01 “Is there any way to configure reasoning efforts when signing in with GitHub Copilot?” 15:49 “Does DataBot already support other providers beyond Cloud?” 20:36 “What is the cases with monetary penalty in the console output?” 22:14 “Do you happen to know if the column names of the dataset are very, very messy?” 23:18 “Can you add skills to DataBot?” 26:36 “This code isn’t being saved anywhere. So where does it go?” 27:38 “There a way to know what all the slash commands are?” 28:51 Requesting Databot to use the namespace operator 33:58 “Is there a way to search within that Databot pane?” 39:34 “Have you noticed any time differences with how quickly things run-in RStudio versus Positron?” 40:33 “What happens if you open that URL that it mentions at the bottom in your browser?” 40:50 Clarifying the difference between Posit Assistant and Positron Assistant 43:18 “What is the typical token burn rate?” 53:31 “Is this on CRAN and working in both Positron and RStudio?”

The mall package: using LLMs with data frames in R & Python | Edgar Ruiz | Data Science Lab
The Data Science Lab is a live weekly call. Register at pos.it/dslab! Discord invites go out each week on lives calls. We’d love to have you!
The Lab is an open, messy space for learning and asking questions. Think of it like pair coding with a friend or two. Learn something new, and share what you know to help others grow.
On this call, Libby Heeren is joined by Edgar Ruiz as they walk through how mall works (with ellmer) in R, and then python. The mall package lets you use LLMs to process tabular or vectors of data, letting you do things such as feeding it a column of reviews and asking mall to use an anthropic model via ellmer to add a column of summaries or sentiments. Follow along with the code here: https://github.com/LibbyHeeren/mall-package-r
Hosting crew from Posit: Libby Heeren, Isabella Velasquez, Edgar Ruiz
Edgar’s Bluesky: https://bsky.app/profile/theotheredgar.bsky.social Edgar’s LinkedIn: https://www.linkedin.com/in/edgararuiz/ Edgar’s GitHub: https://github.com/edgararuiz
Resources from the hosts and chat:
Ollama → https://ollama.com/download Posit Data Science Lab → https://posit.co/dslab mall package → https://mlverse.github.io/mall/ ellmer package → https://elmer.tidyverse.org/ Libby’s Positron theme (Catppuccin) → https://marketplace.visualstudio.com/items?itemName=Catppuccin.catppuccin-vsc GitHub repo with Libby and Edgar’s code → https://github.com/LibbyHeeren/mall-package-r LLM providers supported by ellmer → https://ellmer.tidyverse.org/index.html#providers vitals package → https://vitals.tidyverse.org/ chatlas package → https://posit-dev.github.io/chatlas/ polars package → https://pola.rs/ narwhals package → https://narwhals-dev.github.io/narwhals/ pandas package → https://pandas.pydata.org/ LM Studio → https://lmstudio.ai/ Simon Couch’s blog → https://www.simonpcouch.com/ Edgar’s dataset: TidyTuesday Animal Crossing Dataset (May 5, 2020) → https://github.com/rfordatascience/tidytuesday Libby’s dataset: Kaggle Tweets Dataset → https://www.kaggle.com/datasets/mmmarchetti/tweets-dataset Blog from Sara and Simon on evaluating LLMs → https://posit.co/blog/r-llm-evaluation-03/ Data Science Lab YouTube playlist → https://www.youtube.com/watch?v=LDHGENv1NP4&list=PL9HYL-VRX0oSeWeMEGQt0id7adYQXebhT&index=2 AWS Bedrock → https://aws.amazon.com/bedrock/ Anthropic → https://www.anthropic.com/ Google Gemini → https://gemini.google.com/ What is rubber duck debugging anyway?? → https://en.wikipedia.org/wiki/Rubber_duck_debugging
► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu
Follow Us Here: Website: https://www.posit.co The Lab: https://pos.it/dslab Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co
Thanks for learning with us!
Timestamps 00:00 Introduction to Libby, Isabella, Edgar, and the mall package + ellmer package 07:14 “What’s the difference between using mall for these NLP tasks versus traditional or classical NLP?” 09:37 “Can mall be used with a local LLM?” 17:32 “What kind of laptop specs should I realistically have to make good use of these models?” 22:12 “Are you limited to three output options?” 22:55 “Can mall return the prediction probabilities?” 24:14 “What are a rule of thumb set of specs for a machine so local LLMs are practically feasible?” 24:47 “Would that be in the additional prompt area where you’re defining things?” 25:04 “You could use the vitals package to compare models, right?” 25:24 “Can we use LM Studio instead of Ollama?” 28:35 “How do you iterate and validate the model?” 36:39 “Why use paste if it is all text?” 37:31 “Are these recent tweets (from X) or older ones from actual Twitter?” 40:23 “Is there a playlist for the Data Science Labs on YouTube?” 46:11 “Does that mean that the python version does not work with pandas?” 50:14 “Where is this data set from?”


Getting Started with LLM APIs in R
Getting Started with LLM APIs in R - Sara Altman
Abstract: LLMs are transforming how we write code, build tools, and analyze data, but getting started with directly working with LLM APIs can feel daunting. This workshop will introduce participants to programming with LLM APIs in R using ellmer, an open-source package that makes it easy to work with LLMs from R. We’ll cover the basics of calling LLMs from R, as well as system prompt design, tool calling, and building basic chatbots. No AI or machine learning background is required—just basic R familiarity. Participants will leave with example scripts they can adapt to their own projects.
Resources mentioned in the workshop:
- Workshop site: https://skaltman.github.io/r-pharma-llm/
- ellmer documentation: https://ellmer.tidyverse.org/
- shinychat documentation: https://posit-dev.github.io/shinychat/
Inspecting websites to find JSON data APIs | Marcos Huerta | Data Science Lab
The Data Science Lab is a live weekly call. Register at pos.it/dslab! Discord invites go out each week on lives calls. We’d love to have you!
The Lab is an open, messy space for learning and asking questions. Think of it like pair coding with a friend or two. Learn something new, and share what you know to help others grow.
On this call, Libby Heeren is joined by Marcos Huerta, a Data Science Manager at Carmax, as he walks us through the guts of websites looking for data we can play with. He shows us how to find hidden REST/JSON APIs by using the web inspector in Safari/Firefox and then how to get what’s necessary to pull the same data programmatically in python or R.
Hosting crew from Posit: Libby Heeren, Isabella Velasquez, Daniel Chen
Marcos’s urls: Website: https://marcoshuerta.com GitHub: https://github.com/astrowonk/
Resources from the hosts and from participants in the Discord chat:
Postman: https://www.postman.com/ Insomnia (open source alternative to Postman): https://insomnia.rest/ Baseball Savant website Marcos is using: https://baseballsavant.mlb.com/gamefeed/?gamePk=777076 Isabella Velasquez’s blog on using {polite} R package to help scrape Wikipedia: https://ivelasq.rbind.io/blog/politely-scraping/ Festivas Mac app Marcos used to add the lights to his desktop: https://festivitas.app/ Ted Laderas blog post on parsing JSON in R: https://laderast.github.io/intro_apis_json_cascadia/#/how-does-r-translate-json New rvest read_html_live() function: https://rvest.tidyverse.org/reference/read_html_live.html yyjsonr R package: https://github.com/coolbutuseless/yyjsonr tuber R package: https://github.com/gojiplus/tuber WikipediaR R package: https://www.quantargo.com/help/r/latest/packages/WikipediaR/1.1/WikipediaR-package rookiepy python package: https://pypi.org/project/rookiepy/
► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu
Follow Us Here: Website: https://www.posit.co The Lab: https://pos.it/dslab Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co
Thanks for learning with us!
Timestamps 00:00 Introduction 03:05 Web scraping vs. API calls 04:12 Server-side rendering vs. client-side JSON 06:12 Warning: Rate limits and business ethics (ahem) 08:39 Demo: Baseball Savant website 08:57 Using browser Developer Tools and the Network tab 12:15 “What is curl?” 13:30 Importing curl into Postman 16:03 Generating Python code from Postman 16:50 “Are there open source alternatives to Postman?” 17:50 Using the generated code in Python/Jupyter 22:28 R packages for JSON (jsonlite, yyjsonr) 25:09 Demo: Massachusetts Lottery website 28:17 Example: scripts Marcos automated with Cron jobs 30:17 Handling logins and cookies with RookiePie 32:19 Demo: CNN Election Data 34:26 Inspecting ESPN’s website 36:58 “Can you scrape YouTube?” 38:19 Finding hidden JSON in CardsMania history 45:00 Benefits of API inspection over Beautiful Soup 46:59 New rvest function: read_html_live 50:40 Inspecting LinkedIn and finding GraphQL 53:58 Encouragement on handling API pagination
Open source development practices | Isabel Zimmerman & Davis Vaughan | Data Science Hangout
ADD THE DATA SCIENCE HANGOUT TO YOUR CALENDAR HERE: https://pos.it/dsh - All are welcome! We’d love to see you!
We were recently joined by Isabel Zimmerman and Davis Vaughan, Software Engineers at Posit, to chat about the life of an open source developer, strategies for navigating complex codebases, and how to leverage AI in data science workflows. Plus, NERDY BOOKS!
In this Hangout, we explore the differences between maintaining established ecosystems like the Tidyverse as well as building new tools like the Positron IDE. Davis and Isabel (and sometimes Libby ) share practical advice for developers, such as the utility of AI for writing tests and “rubber ducking”, and their various approaches to writing accessible documentation that bridges the expert-novice gap.
Resources mentioned in the video and zoom chat: Positron IDE → https://posit.co/positron/ Air (R formatter) → https://posit-dev.github.io/air/ Python Packages Book (free) → https://py-pkgs.org/ R Packages Book (free) → https://r-pkgs.org/ DeepWiki (AI tool mentioned for docs) → https://deepwiki.com/tidyverse/vroom
If you didn’t join live, one great discussion you missed from the zoom chat was about Brandon Sanderson’s Cosmere books and the debate between starting with Mistborn vs. The Stormlight Archive. Are you a Cosmere fan?! Which book did you start with? (Libby started with Elantris years before picking up Mistborn Era 1 book 1, but she’d now recommend maybe starting with Warbreaker!)
► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu
Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh The Lab: https://pos.it/dslab LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co
Thanks for hanging out with us!
Timestamps: 00:00 Introduction 04:41 “What does a day in the life of an open source dev look like?” 09:43 “What got you into building your own R packages?” 13:00 “Personal tips for working with code bases you’re not familiar with?” 16:35 “How much of what you build is in R/Python vs. lower-level languages?” 19:57 “Does Air work inside code chunks in Positron?” 20:12 “Changing the Python Quarto formatter in Positron without an extension” 22:56 “What do your side projects look like?” 26:40 “How do you approach writing documentation?” 30:55 “What interesting trends in data science are you noticing?” 33:38 “How do you leverage AI in your work?” 37:30 “What are the hexes on Davis’s back wall?” 38:50 “What career advice would you give to someone in a similar position?” 43:45 “How can I be more resilient when things go wrong?” 47:59 “Do you have keyboard preferences?” 49:25 “What is the best way to report bugs in packages?” 50:56 “Open source dev work vs. in-house dev work” 51:50 “Tips for getting started with Positron”


Advent of Code for R users | Emil Hvitfeldt | Data Science Lab
The Data Science Lab is a live weekly call. Register at pos.it/dslab! Discord invites go out each week on lives calls. We’d love to have you!
The Lab is an open, messy space for learning and asking questions. Think of it like pair coding with a friend or two. Learn something new, and share what you know to help others grow.
On this call, Libby Heeren is joined by Posit engineer Emil Hvitfeldt as he walks through Day 1 of Advent of Code 2026 using R. This is a super friendly, collaborative, and cheery intro to AoC! Don’t forget, you can do Advent of Code at any ole time of year
Hosting crew from Posit: Libby Heeren, Isabella Velasquez, Daniel Chen, Emil Hvitfeldt
Emil’s socials and urls: website: https://emilhvitfeldt.com/ GitHub: https://github.com/emilhvitfeldt Bluesky: https://bsky.app/profile/emilhvitfeldt.bsky.social LinkedIn: https://www.linkedin.com/in/emilhvitfeldt/
Resources from the hosts and chat:
Advent of Code: https://adventofcode.com/ Install Positron: https://positron.posit.co/ Eric Wastl, Advent of Code: Behind the Scenes: https://www.youtube.com/watch?v=_oNOTknRTSU AoC Subreddit: https://www.reddit.com/r/adventofcode/ Kieran Healy shared a reddit post with an Advent of Code answer done in Minecraft: https://www.reddit.com/r/adventofcode/comments/1pbeyxx/2025_day_01_part_2_advent_of_code_in_minecraft/ Emil’s Solutions: https://github.com/EmilHvitfeldt/rstats-adventofcode Emil’s helper package: https://github.com/EmilHvitfeldt/aocfuns purrr::accumulate() function: https://purrr.tidyverse.org/reference/accumulate.html
And, for anyone hangin’ in there at the end, Emil updated us on Discord that he figured out why his cumsum() didn’t work: he forgot to start the dial at 50! Once you fix that, it works to solve part 1 :)
► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu
Follow Us Here: Website: https://www.posit.co The Lab: https://pos.it/dslab Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co
Thanks for learning with us!
Timestamps 00:00 Introduction 01:01 Tour of the Advent of Code website 02:30 Dashboard overview and puzzle schedule 03:23 How to view and access previous years’ events 03:37 Structure of puzzles: Two parts and stars 04:40 Understanding the global leaderboard 05:08 “Does that ASCII art build itself? 06:16 Setting up private leaderboards for friend 07:54 Starting Day 1: Story prompt and mechanics 09:30 Understanding unique puzzle inputs 10:51 Submission feedback and delay penalties 11:44 Safe dial logic: Left, Right, and circularity 12:50 Starting position and Part 1 success criteria 14:09 Setting up the project in Positron 16:26 Strategy for speed: Reading from the bottom up 18:49 Problem-solving strategies: Pen, paper, and visualization 19:22 Walking through the logic with a sample case 20:52 Coding Part 1: Data parsing and vectorization 23:17 Positron keyboard shortcuts for duplicating lines 24:40 Debugging the logic and handling negative numbers 26:03 Explaining the Modulo operator (%%) 28:15 Managing large inputs of over 4,000 instructions 29:21 Submitting Part 1 and transitioning to Part 2 32:03 Part 2 challenge: Counting zero “clicks” 34:02 Brainstorming Part 2 code modifications 36:19 Checking important warnings for edge cases 37:00 Coding Part 2: Nested loops and incrementing counters 38:23 Hint: Modulo vs. integer division 40:40 Success with the Part 2 test case 42:30 Alternative method: Vectorized cumulative sums 45:29 “What’s the difference between % and %%?” (percent vs modulo) 46:50 Mathematical optimization to avoid inner loops

Simon Couch - Practical AI for data science
Practical AI for data science (Simon Couch)
Abstract: While most discourse about AI focuses on glamorous, ungrounded applications, data scientists spend most of their days tackling unglamorous problems in sensitive data. Integrated thoughtfully, LLMs are quite useful in practice for all sorts of everyday data science tasks, even when restricted to secure deployments that protect proprietary information. At Posit, our work on ellmer and related R packages has focused on enabling these practical uses. This talk will outline three practical AI use-cases—structured data extraction, tool calling, and coding—and offer guidance on getting started with LLMs when your data and code is confidential.
Presented at the 2025 R/Pharma Conference Europe/US Track.
Resources mentioned in the presentation:
- {vitals}: Large Language Model Evaluations https://vitals.tidyverse.org/
- {mcptools}: Model Context Protocol for R https://posit-dev.github.io/mcptools/
- {btw}: A complete toolkit for connecting R and LLMs https://posit-dev.github.io/btw/
- {gander}: High-performance, low-friction Large Language Model chat for data scientists https://simonpcouch.github.io/gander/
- {chores}: A collection of large language model assistants https://simonpcouch.github.io/chores/
- {predictive}: A frontend for predictive modeling with tidymodels https://github.com/simonpcouch/predictive
- {kapa}: RAG-based search via the kapa.ai API https://github.com/simonpcouch/kapa
- Databot https://positron.posit.co/dat

Air: A blazingly fast R code formatter - Davis Vaughan, Lionel Henry
In Python, Rust, Go, and many other languages, code formatters are widely loved. They run on every save, on every pull request, and in git pre-commit hooks to ensure code consistently looks its best at all times.
In this talk, you’ll learn about Air, a new R code formatter. Air is extremely fast, capable of formatting individual files so fast that you’ll question if its even running, and of formatting entire projects in under a second. Air integrates directly with your favorite IDEs, like Positron, RStudio, and VS Code, and is available on the command line, making it easy to standardize on one tool even for teams using various IDEs.
Once you start using Air, you’ll never worry about code style ever again!
https://www.tidyverse.org/blog/2025/02/air/ https://github.com/posit-dev/air


Supporting 100 Data Scientists with a Small Team | Mike Thomson | Data Science Hangout
ADD THE DATA SCIENCE HANGOUT TO YOUR CALENDAR HERE: https://pos.it/dsh - All are welcome! We’d love to see you!
We were recently joined by Mike Thomson, Data Science Manager at Flatiron Health, to chat about managing open source tools and maintaining R packages, creating reproducible reports for Word and Excel using Quarto, the “hub and spoke” support model for data scientists, and applying R and Posit tools in Real World Evidence (RWE) oncology space.
In this Hangout, we explore creating reproducible outputs using Quarto for formats like Word and Excel. Flatiron Health uses Quarto because it allows the reproducible publication of analyses to multiple formats simultaneously (like HTML and a downloadable Word document) from the same source code. A specific challenge discussed was outputting formatted analytic tables to Excel, as this is not natively supported by Quarto. Erica Yim, from Mike’s team, detailed how they built an internal R function that uses the flexlsx package along with flextable to easily output pre-existing formatted tables from a Quarto document into an Excel template.
Resources mentioned in the video and zoom chat: flexlsx R package GitHub repository → https://github.com/pteridin/flexlsx DBPlier PR for Snowflake Translations (contributed to by Flat Iron Health) → https://github.com/tidyverse/dbplyr/pull/860
If you didn’t join live, one great discussion you missed from the zoom chat was about the pain points of exporting data from Quarto to Word or Excel, particularly concerning table formatting and styles. Attendees in the chat strongly highlighted the difficulty of managing table formatting, including issues with table cross-references, headers, and footers. They noted that dealing with styles often requires workarounds, such as creating flextables that match desired Word styles instead of relying on default table styles. Let us know below if you’d like to hear more about this topic!
► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu
Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co
Thanks for hanging out with us!
Timestamps: 00:00 Introduction 02:23 “Can you talk about what Flatiron does and what your teams do?” 03:29 “Could you give us a few examples of the data types or collections that you might be working with?” 05:00 “Do you have longitudinal data?” 07:46 “Are you aware of any computer vision applications in the health care industry from your perspective?” 09:38 “Do you use mixed models or Bayesian MCMC?” 10:56 “How does your team use Quarto?” 16:59 “How do you convince stakeholders of the value of going open source (and handle security concerns)?” 22:56 “Do you allow people to have a certain amount of time to contribute back to open source?” 26:03 “I just want to understand a little bit about your support model for that group.” 29:57 “Do you have any tips for asynchronous working?” 31:02 “Are you like a Jira team or an Asana team for assigning tasks or tickets?” 32:10 “How many people on your platform team support Posit teams?” 34:24 “What does your team use for unstructured document analysis?” 36:24 “How important is domain knowledge in your recruitment?” 40:02 “Where do you store all of this stuff (data storage and databases)?” 42:04 “What is the approximate timeline from the time you do analysis to final deployment of results in the real world?” 44:31 “Is there a process for people getting things approved to use in your environment?” 47:39 “How do you handle the challenge of going back from Word to Quarto source code (after changes are tracked)?” 50:22 “What does a typical Workday look like for you?” 51:47 “Is there a piece of career advice that has either really helped you, that you’ve really liked, that you try to give to other people?”
R & Python Interoperability in Data Science Teams | Dave Gruenewald | Data Science Hangout
ADD THE DATA SCIENCE HANGOUT TO YOUR CALENDAR HERE: https://pos.it/dsh - All are welcome! We’d love to see you!
We were recently joined by Dave Gruenewald, Senior Director of Data Science at Centene, to chat about polyglot teams, data science best practices, right-sizing development efforts, and process automation.
In this Hangout, we explore working in a polyglot team and fostering interoperability (a word that Libby loves, but struggles to pronounce out loud). Dave Gruenewald emphasizes that teams should use the tools they are comfortable with, whether that’s R or Python. Some strategies for collaboration across languages that Dave suggests include tools like Quarto to seamlessly run R and Python code in the same report. Teams utilize data science checkpoints, saving outputs as platform-agnostic file types like Parquet so that they can be accessed by any language. The use of REST APIs allows R processes to be accessed programmatically by Python (and vice versa), which can be a real game-changer. The newly released nanonext package was also highlighted as a promising development for improved interoperability.
Resources mentioned in the video and zoom chat: Posit Conf 2025 Table and Plotnine Contests → https://posit.co/blog/announcing-the-2025-table-and-plotnine-contests/ nanonext 1.7.0 Tidyverse Blog Post → https://www.tidyverse.org/blog/2025/09/nanonext-1-7-0/
If you didn’t join live, one great discussion you missed from the zoom chat was about pivoting away from academia, including leaving PhD programs. Many attendees shared their personal experiences of making the difficult decision to drop out of a PhD program. The community suggested alternative terms like “pivot,” “reallocating your resources,” or being a “refugee fleeing academia” instead of “drop out.” Dave Gruenewald shared that he himself left a PhD program but has “no regrets about that.” Did you leave a PhD program? You’re not alone!
► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu
Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co
Thanks for hanging out with us!
Timestamps: 00:00 Introduction 02:21 “What types of data do your teams use?” 06:53 “Which of the three pillars you mentioned is your personal favorite to work on?” 09:26 “How do you avoid or divert scope creep?” 11:41 “How much of the project should be “planning” before any code happens?” 13:53 “Do you feel like people are just hopping in and going, hey, LLM, make me a POC?” 14:28 “Do you give them what they say they want, or do you give them what they need?” 16:40 “I’m wondering what public data do you wish existed?” 18:48 “Why not Positron yet?” 20:43 “How do you unify as a team and make it so that I can always read everybody else’s code?” 23:10 “Could you talk a little bit about how R and Python work together?” 27:28 “How to start package development with a team who are very new to package development.” 33:01 “What’s your greatest regret career wise?” 35:53 “What about your biggest wins, specifically in your early career?” 39:40 “How would you recommend building a data science culture and community from scratch?” 41:49 “Would you set a specific timeline for EDA, exploratory analysis, to scope the project better?” 45:15 “How do you define fun projects, and how much time do you allocate for exploration in those?” 48:21 “Does your team use DVC or something similar for data version control?” 50:00 “Can you talk a bit more about your pivot from academia into data science?” 51:31 “Any advice on where to look for opportunities in data science after getting a masters degree?”
Building the Future of Data Apps: LLMs Meet Shiny
GenAI in Pharma 2025 kicks off with Posit’s Phil Bowsher and Garrick Aiden-Buie sharing a technical overview of how LLMs can integrate with Shiny applications and much more!
Abstract: When we think of LLMs (large language models), usually what comes to mind are general purpose chatbots like ChatGPT or code assistants like GitHub Copilot. But as useful as ChatGPT and Copilot are, LLMs have so much more to offer—if you know how to code. In this demo Garrick will explain LLM APIs from zero, and have you building and deploying custom LLM-empowered data workflows and apps in no time.
Resources mentioned in the session:
- GitHub Repository for session: https://github.com/gadenbuie/genAI-2025-llms-meet-shiny
- {mcptools} - Model Context Protocols servers and clients https://posit-dev.github.io/mcptools/
- {vitals} - Large language model evaluation for R https://vitals.tidyverse.org/
Purrrfectly parallel, purrrfectly distributed (Charlie Gao, Posit) | posit::conf(2025)
Purrrfectly parallel, purrrfectly distributed
Speaker(s): Charlie Gao
Abstract:
purrr is a powerful functional programming toolkit that has long been a cornerstone of the tidyverse. In 2025, it receives a modernization that means you can use it to harness the power of all computing cores on your machine, dramatically speeding up map operations.
More excitingly, it opens up the doors to distributed computing. Through the mirai framework used by purrr, this is made embarrassingly simple. For those in small businesses, or even large ones – every case where there is a spare server in your network, you can now put to good use in simple, straightforward steps.
Let us show you how distributed computing is no longer the preserve of those with access to high performance compute clusters.
Materials - https://shikokuchuo-posit2025.share.connect.posit.cloud/ posit::conf(2025) Subscribe to posit::conf updates: https://posit.co/about/subscription-management/

Translating R for Data Science into Portuguese: A Community-Led Initiative (Beatriz Milz, UFABC)
Translating R for Data Science into Portuguese: A Community-Led Initiative
Speaker(s): Beatriz Milz
Abstract:
How can open-source collaboration help make data science more accessible and expand Posit’s global impact? The book “R for Data Science” by Hadley Wickham, Mine Çetinkaya-Rundel, and Garrett Grolemund is a key resource for learning R and the tidyverse. In a collaborative effort, volunteers from the R community translated the second edition into Brazilian Portuguese, making it freely available online. This talk explores the translation journey, the challenges of adapting technical content, and key lessons learned to support future translation teams.
Materials - https://beamilz.com/talks/en/2025-posit-conf/ posit::conf(2025) Subscribe to posit::conf updates: https://posit.co/about/subscription-management/


Using Quarto to Improve Formatting/Automate the Generation of Hundreds of Reports (Keaton Wilson)
Using Quarto to Improve Formatting and Automate the Generation of Hundreds of Reports
Speaker(s): Keaton Wilson
Abstract:
This presentation showcases how KS&R’s Decision Sciences and Innovation (DSI) team modernized a legacy reporting pipeline to automate and scale custom survey report generation. Using tidyverse and Quarto, the team produced hundreds of personalized PDFs weekly over three months. Hosted on GitHub, the project integrated version control and streamlined collaboration while documentation ensured easy onboarding and adaptability. Attendees will gain insights into automating report workflows, overcoming implementation challenges, integrating custom formatting and fostering collaboration using tidyverse, Quarto, and GitHub.
Materials - https://github.com/ksrinc/posit_conf_2025_quarto_automation posit::conf(2025) Subscribe to posit::conf updates: https://posit.co/about/subscription-management/
New data science tools & old laptops on fire | Jenny Bryan | Data Science Hangout
ADD THE DATA SCIENCE HANGOUT TO YOUR CALENDAR HERE: https://pos.it/dsh - All are welcome! We’d love to see you!
We were joined by Jenny Bryan, Senior Software Engineer at Posit, to chat about (setting laptops on fire,) adapting careers to embrace change and new technologies, behind-the-scenes technical advancements powering the R ecosystem with tools like Positron, demystifying project-based workflows, plus LLM integration and best practices in programming.
Listen to this episode to hear us chat about topics like this:
-
the benefits and limitations of using Large Language Models (LLMs) in programming. Jenny shared her initial skepticism towards LLMs for coding in R, but her attitude changed significantly when applying LLMs to problems involving languages she was less familiar with, like Rust or TypeScript.
-
adapting in your career to embrace change and new technologies. Jenny, who describes herself as being on a “third career”, transitioned from management consulting to a statistics professor, and then to a senior software engineer at Posit. She talks a bit about her career journey and how she’s embracing new stuff (ahem, Typescript) so that she gets to keep doing cool stuff!
-
Positron IDE for R package development. She specifically praises Positron’s unique test explorer and reliable console, and its integrated Data Explorer. For many, Positron offers out-of-the-box data science functionality, unlike other IDEs that require extensive customization.
-
what new technologies like Ark, Air, and Positron mean for the longterm health of R. Jenny’s been working on lots of nerdy things behind the scenes at Posit and she talks all about how they’re great for developers, package builders, data scientists, and engineers alike.
Another tidbit from this hangout: Jenny gave some advice for those looking to branch into software engineering without formal training: try reading code from admired developers, inviting code reviews, and undertaking small, recreational package development projects to gain practical experience and confidence. She also advocates for adopting a project-oriented workflow (associated with her famous “laptop on fire” remark, of course) using tools like the here package for managing project paths.
Resources mentioned in the video and zoom chat: Positron IDE → https://positron.posit.co/ Happy Git with R → https://happygitwithr.com/ Jenny Bryan’s “Project-oriented workflow” blog post → https://www.tidyverse.org/blog/2017/12/workflow-vs-script/ Air R code formatter → https://posit-dev.github.io/air/ The here() package → https://here.r-lib.org/ Posit Conf → https://posit.co/conference/ Tidy Dev Day 2025 → https://www.tidyverse.org/blog/2025/07/tdd-2025/ R Packages book → https://r-pkgs.org/
If you didn’t join live, you missed a ROARINGLY active chat. Let’s just say, if you’ve ever broken down in tears over a programming project, you’re not alone! Come join us live each week if you’d like to hang out in the chat with us!
► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu
Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co
Thanks for hanging out with us!
Timestamps 00:00 Introduction 03:39 “Is that a Wooble on your desk?” (Spoiler, it’s a gnome!!) 06:23 “As a builder of data science tools, what are the tool features data scientists want most?” 08:43 “Have you experienced needing to adapt to change recently and how have you embraced it?” 13:46 “What is ‘setting laptops on fire’ about?” 13:50 “How did you decide to change your career a few times?” 21:23 “What are your thoughts on the ease of putting models into production in Python versus R and does it make sense to shift everybody to one language or the other?” 27:30 “How do you navigate the ‘I have a hammer so everything looks like a nail’ feeling when working with emerging tools like LLMs?” 33:24 “Do you have any general advice for those data scientists who find themselves wanting to branch out more into software engineering but don’t have formal training?” 39:39 “Why should I use Positron instead of Versus Code?” 47:57 “Can you speak to the value of developing an R package and how to clear the mental hurdle of it being a huge challenge?” 52:34 “What does your career trajectory look like and what is your advice for other people who are looking to grow their career but don’t know if they want to be an IC or a manager? Does being a manager mean you don’t get to write code anymore?”

Harnessing LLMs for Data Analysis | Led by Joe Cheng, CTO at Posit
When we think of LLMs (large language models), usually what comes to mind are general purpose chatbots like ChatGPT or code assistants like GitHub Copilot. But as useful as ChatGPT and Copilot are, LLMs have so much more to offer—if you know how to code. In this demo Joe Cheng will explain LLM APIs from zero, and have you building and deploying custom LLM-empowered data workflows and apps in no time.
Posit PBC hosts these Workflow Demos the last Wednesday of every month. To join us for future events, you can register here: https://posit.co/events/
Slides: https://jcheng5.github.io/workflow-demo/ GitHub repo: https://github.com/jcheng5/workflow-demo
Resources shared during the demo: Ellmer https://ellmer.tidyverse.org/ Chatlas https://posit-dev.github.io/chatlas/
Environment variable management: For R: https://docs.posit.co/ide/user/ide/guide/environments/r/managing-r.html#renviron For Python https://pypi.org/project/python-dotenv/
Shiny chatbot UI: For R, Shinychat https://posit-dev.github.io/shinychat/ For Python, ui.Chat https://shiny.posit.co/py/docs/genai-inspiration.html
Deployment Cloud hosting https://connect.posit.cloud On-premises (Enterprise) https://posit.co/products/enterprise/connect/ On-premises (Open source) https://posit.co/products/open-source/shiny-server/
Querychat Demo: https://jcheng.shinyapps.io/sidebot/ Package: https://github.com/posit-dev/querychat/
If you have specific follow-up questions about our professional products, you can schedule time to chat with our team: pos.it/llm-demo

Bringing data science to the construction industry | Blake Abbenante | Data Science Hangout
To join future data science hangouts, add it to your calendar here: https://pos.it/dsh - All are welcome! We’d love to see you!
We were recently joined by Blake Abbenante, Director of Analytics and Data Science at Suffolk Construction, to chat about his career journey in data science, implementing modern data practices in the construction industry, innovative applications of AI and data science in construction, and building a data-driven culture in a traditionally less tech-focused sector.
In this Hangout, we explore innovative applications of AI and data science in construction. Blake shared how Suffolk Construction is leveraging cutting-edge technologies like AI to revolutionize traditional processes. One focus is their GenAI scheduling tool, which aims to augment and speed up the design and planning phases of building projects. This tool has the potential to significantly reduce the time planners spend on creating schedules, moving from weeks to potentially minutes or hours for an 80% completion rate. Blake discussed the development and implementation of safety models that forecast risk on projects, enabling proactive measures to ensure safer construction sites by predicting which projects might require additional safety personnel based on historical data.
Resources mentioned in the video and zoom chat: The ellmer R package → https://ellmer.tidyverse.org/ The chatlas R package → https://github.com/posit-dev/chatlas Posit Blog Post on ellmer → https://posit.co/blog/announcing-ellmer/
If you didn’t join live, one great discussion you missed from the zoom chat was about the challenges of data collection and analysis when encountering pushback from those whose work is being analyzed, and strategies to build trust and demonstrate value. Let us know below if you’d like to hear more about this topic!
► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu
Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co
Thanks for hanging out with us!
Shiny community, hackathons, and his AI mindset | Joe Cheng | Data Science Hangout
To join future data science hangouts, add it to your calendar here: https://pos.it/dsh - All are welcome! We’d love to see you!
We were recently joined by Joe Cheng, CTO at Posit, to chat about the Shiny contest, the use of AI in data science, and designing hackathons for learning new technologies. We were joined by several past and present Shiny contest winners who gave great advice on how to get started if you want to participate (and we really hope you do)!
In this Hangout, we explore the evolution of the Shiny contest since its inception, including what made the 2024 submissions unique and the ways the contest encourages community contribution and learning. Joe also shared about his personal journey from feeling skepticism about AI to seeing and embracing its potential. We got some amazing questions from the Hangout attendees! We hope you join us live next time to ask some of your own questions
Resources mentioned in the video and zoom chat:
2024 Shiny Contest Winners → https://posit.co/blog/winners-of-the-2024-shiny-contest/
Joe’s AI Hackathon Slides → https://jcheng5.github.io/llm-quickstart/quickstart.html
Shiny Assistant → https://gallery.shinyapps.io/assistant/
Isabella’s blog post on prototyping with Shiny Assistant → https://posit.co/blog/ai-powered-shiny-app-prototyping/
Posit Conf Workshops → https://reg.rainfocus.com/flow/posit/positconf25/attendee-portal/page/sessioncatalog?tab.day=20250916&search.sessiontype=1675316728702001wr6r
Shiny Conference 2025 → https://www.shinyconf.com/
Call for Speakers Shiny Conf 2025 → https://sessionize.com/shiny-conf-2025/
Shiny Tableau → https://rstudio.github.io/shinytableau/
Echarts4r → https://echarts4r.john-coene.com
Elmer package on Github → https://github.com/tidyverse/ellmer
All the Shiny app links mentioned in the video and zoom chat: Eric Nantz 2021 Shiny Contest Submission → https://forum.posit.co/t/the-hotshots-racing-dashboard-shiny-contest-submission/104925 Eric Nantz’s R/Pharma conference keynote on AI → https://youtu.be/AfMa1CVUdXU?si=ThLsKFyonntxzBUF Eric Nantz’s Haunted Places app → https://youtu.be/vX09QGMuOfo?si=K5_uPfK5bcfZZ92l Umair Durrani’s Shiny Storytelling app → https://umair.shinyapps.io/storytimegcp/ Umair’s Blue Sky profile → https://bsky.app/profile/transport-talk.bsky.social Umair’s Shiny meetings project on Github → https://github.com/shiny-meetings/shiny-meetings Abby Stamm’s Shiny Accessibility app → https://github.com/ajstamm/shiny-a11y-app
If you didn’t join live, one great discussion you missed from the zoom chat was about everyone’s favorite interactive plotting tools. Someone asked whether Plotly was the best option, and lots of people said they loved ggiraph, echarts4r, ObservableJS, and others. What about you?! What’s your favorite interactive plotting library?
► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu
Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co
Thanks for hanging out with us!

Wes McKinney & Hadley Wickham (on cross-language collaboration, Positron, career beginnings, & more)
We hosted a special event hosted by Posit PBC with Wes McKinney (Pandas & Apache Arrow) and Hadley Wickham (rstats & tidyverse) to ask questions, share your thoughts, and exchange insights about cross-language collaboration with fellow data community members.
Here’s a preview into what came up in conversation:
- Cross-language collaboration between R and Python
- Positron, a new polyglot data science IDE
- Open source development, how Wes and Hadley got involved in open source and their experiences in building and maintaining open-source projects such as Pandas and the tidyverse.
- Documentation for R and Python, especially in the context of teams that use both languages (shoutout to Quarto!)
- The use of LLMs in data science
- The emergence of libraries like Polars and DuckDB
- Challenges of switching between the two languages
- Package development and maintenance for polyglot teams that have internal packages in both languages
- The future of data science
The chat was on fire for this conversation and we’ve gathered most of the links shared among the community below:
Documentation mentioned: Positron, next-generation data science IDE built by Posit: https://positron.posit.co/ Quarto tabset documentation: https://quarto.org/docs/output-formats/html-basics.html#tabset-groups
Packages / Extensions mentioned: Pins: https://pins.rstudio.com/ Vetiver: https://vetiver.posit.co Orbital: https://orbital.tidymodels.org Elmer: https://elmer.tidyverse.org Tabby Extension: https://quarto.thecoatlessprofessor.com/tabby/
Blog posts: AI chat apps with Shiny for Python: https://shiny.posit.co/blog/posts/shiny-python-chatstream/ Using an LLM to enhance a data dashboard written in Shiny: R Sidebot & Python Sidebot Marco Gorelli Data Science Hangout (polars): https://youtu.be/lhAc51QtTHk?feature=shared Emily Riederer’s blog post on Polars: https://www.emilyriederer.com/post/py-rgo-polars/ Jeffrey Sumner’s tabset example: https://rpy.ai/posts/visualizations%20with%20r%20and%20python/r_python_visualizations Emily Riederer’s blog post on Python and R ergonomics: https://www.emilyriederer.com/post/py-rgo/11 Sam Tyner’s blog post on Lessons from “Tidy Data”: https://medium.com/@sctyner90/10-lessons-from-tidy-data-on-its-10th-anniversary-dbe2195a82b7
Other: Hadley Wickham’s cocktails website: https://cocktails.hadley.nz 5 Posit subscription management to find out about new tools, events, etc.: https://posit.co/about/subscription-management/
New to Posit? Posit builds enterprise solutions and open source tools for people who do data science with R and Python. (We are also the company formerly called RStudio) We’d love to have you join us for future community events!
Every Thursday from 12-1pm ET we host a Data Science Hangout with the community and invite you to join us! You can add that event to your calendar with this link: https://www.addevent.com/event/Qv9211919

Joe Cheng - Summer is Coming: AI for R, Shiny, and Pharma
Summer is Coming: AI for R, Shiny, and Pharma - Joe Cheng
Abstract: R users tend to be skeptical of modern AI models, given our weird insistence on answers being accurate, or at least supported by the data. But I believe the time has come—or maybe it’s a little late—for even the most AI-cynical among us to push past their discomfort and get informed about what these tools are truly capable of. And key to that is moving beyond using AI-enabled apps, and towards building our own scripts, packages, and apps that make judicious use of AI.
In this talk, I’ll tell you why I believe AI has more to offer the R community than just wrong answers from chat windows or mediocre code suggestions in our IDEs. I’ll also introduce brand-new tools we’re developing at Posit that put powerful AI tools within reach of every R user. And finally, I’ll show how adding some AI could make your next Shiny app dramatically more useful for your users.
Resources mentioned in the talk:
- Slides: https://jcheng5.github.io/pharma-ai-2024
- {elmer} Call LLM APIs from R: https://elmer.tidyverse.org/
- {shinychat} Chat UI component for Shiny for R https://github.com/jcheng5/shinychat
- R/Pharma GenAI Day Recordings: https://www.youtube.com/playlist?list=PLMtxz1fUYA5AYryl4t2mtqBngqWDrnMXJ
Presented at the 2024 R/Pharma Conference

817: The Positron IDE, Tidy NLP and MLOps — with Dr. @JuliaSilge
#PositronIDE #Tidyverse #MLOps
Dr. Julia Silge, Engineering Manager at Posit, joins @JonKrohnLearns to introduce the brand-new Positron IDE, perfect for exploratory data analysis and visualization. She also lays out her top picks for LLMs that boost coding efficiency and discusses when traditional NLP methods might be the smarter choice over LLMs. Plus, Julia highlights some must-know open-source libraries that make managing MLOps easier than ever. Tune in for insights that every data scientist, ML engineer, and developer will find useful.
This episode is brought to you by Gurobi (https://www.gurobi.com/personas/optimization-for-data-scientists/) , the Decision Intelligence Leader, and by ODSC (https://odsc.com/california) , the Open Data Science Conference. Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.
In this episode you will learn: • [00:00:00] Introduction • [00:03:23] Overview of Posit and Positron IDE • [00:08:33] How the needs of a data scientist differ from those of a software developer • [00:17:56] How to contribute to the open-source Positron • [00:34:52] MLOps and Vetiver: Tools for deploying and maintaining ML models • [00:48:34] Natural Language Processing (NLP) and the Tidyverse approach • [01:22:18] The role of AI and LLMs in data science education
Additional materials: https://www.superdatascience.com/817

Ask Hadley Anything
A unique opportunity to gain insights directly from a leading expert in open source data science and a driving force behind many popular R packages like ggplot2 and dplyr.
Links from the Q&A: gh-action webscraping demo: https://github.com/hadley/cran-deadlines tidyverse devday 2024: https://www.tidyverse.org/blog/2024/04/tdd-2024/
For the 3 questions on moving from SAS to R in Pharma: Posit and Atorus have partnered on a Posit Academy training: https://posit.co/blog/upskill-to-r-programming-with-posit-and-atorus-research/ And at least 3 pharma companies have shared resources to help people on the transition from statistical programming in SAS, to data science in R: Pfizer exercises: https://github.com/pfizer-opensource/pharma-hands-on-exercises Bayer SAS to R: https://bayer-group.github.io/sas2r/ Roche Coursera course: https://www.coursera.org/learn/making-data-science-work-for-clinical-reporting
Tom Mock @ Posit PBC | Data Science Hangout
We were recently joined by Tom Mock, Product Manager at Posit PBC to chat about career growth, starting out in a sales role, TidyTuesday, and being so good they can’t ignore you.
Speaker Bio: Tom Mock is a Product Manager at Posit, overseeing the Posit Workbench and RStudio team. He fell in love with R and data science through his graduate research, using R and RStudio to wrangle, analyze, model, and visualize my data. He became passionate about growing the R community, and founded #TidyTuesday to help newcomers and seasoned vets improve their Tidyverse skills.
Links mentioned: TidyTuesday: https://github.com/rfordatascience/tidytuesday Table Contest: https://posit.co/blog/announcing-the-2024-table-contest/ Posit Conference: https://posit.co/conference/ Monthly Workflow Demos: https://www.addevent.com/event/Eg16505674 gt package: https://gt.rstudio.com/ So Good They Can’t Ignore You book recommendation: https://www.goodreads.com/book/show/13525945-so-good-they-can-t-ignore-you Community Builder Quarto Site: https://pos.it/community-builder
► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu
Follow Us Here: Website: https://www.posit.co LinkedIn: https://www.linkedin.com/company/posit-software
To join future data science hangouts, add to your calendar here: https://pos.it/dsh
We’d love to have you join us in the conversation live!
Thanks for hanging out with us!
Hadley Wickham - R in Production
R in Production by Hadley Wickham
Visit https://rstats.ai for information on upcoming conferences.
Abstract: In this talk, we delve into the strategic deployment of R in production environments, guided by three core principles to elevate your work from individual exploration to scalable, collaborative data science. The essence of putting R into production lies not just in executing code but in crafting solutions that are robust, repeatable, and collaborative, guided by three key principles:
-
Not just once: Successful data science projects are not a one-off, but will be run repeatedly for months or years. I’ll discuss some of the challenges for creating R scripts and applications that run repeatedly, handle new data seamlessly, and adapt to evolving analytical requirements without constant manual intervention. This principle ensures your analyses are enduring assets not throw away toys.
-
Not just my computer: the transition from development on your laptop (usually windows or mac) to a production environment (usually linux) introduces a number of challenges. Here, I’ll discuss some strategies for making R code portable, how you can minimise pain when something inevitably goes wrong, and few unresolved auth challenges that we’re currently working on.
-
Not just me: R is not just a tool for individual analysts but a platform for collaboration. I’ll cover some of the best practices for writing readable, understandable code, and how you might go about sharing that code with your colleagues. This principle underscores the importance of building R projects that are accessible, editable, and usable by others, fostering a culture of collaboration and knowledge sharing.
By adhering to these principles, we pave the way for R to be a powerful tool not just for individual analyses but as a cornerstone of enterprise-level data science solutions. Join me to explore how to harness the full potential of R in production, creating workflows that are robust, portable, and collaborative.
Bio: Hadley is Chief Scientist at Posit PBC, winner of the 2019 COPSS award, and a member of the R Foundation. He builds tools (both computational and cognitive) to make data science easier, faster, and more fun. His work includes packages for data science (like the tidyverse, which includes ggplot2, dplyr, and tidyr)and principled software development (e.g. roxygen2, testthat, and pkgdown). He is also a writer, educator, and speaker promoting the use of R for data science. Learn more on his website, http://hadley.nz .
Mastodon: https://fosstodon.org/@hadleywickham
Presented at the 2024 New York R Conference (May 17, 2024) Hosted by Lander Analytics (https://landeranalytics.com )

Wes McKinney - The Future Roadmap for the Composable Data Stack
The Future Roadmap for the Composable Data Stack by Wes McKinney
Visit https://rstats.ai for information on upcoming conferences.
Abstract: In this talk, I plan to review the progress we have made in the last 10 years developing composable, interoperable open standards for the data processing stack, from such infrastructure projects as Parquet and Arrow to user-facing interface libraries like Ibis for Python and the tidyverse for R. In discussing the current landscape of projects, I will dig into the different areas where more innovation and growth is needed, and where we would ideally like to end up in the coming years.
Bio: Wes McKinney is an open source software developer and entrepreneur focusing on data processing tools and systems. He created the Python pandas and Ibis projects, and co-created Apache Arrow. He is a Member of the Apache Software Foundation and also a project PMC member for Apache Parquet. He is currently a Principal Architect at Posit PBC and a co-founder of Voltron Data.
Twitter: https://twitter.com/wesmckinn
Presented at the 2024 New York R Conference (May 17, 2024) Hosted by Lander Analytics (https://landeranalytics.com )
Hadley Wickham on R vs Python
Learn about tidyverse, ggplot2, and the secret to a tech company’s longevity as Hadley Wickham joins @JonKrohnLearns in this episode. He talks about Posit’s rebrand, why tidyverse needs to be in every data scientist’s toolkit, and why getting your hands dirty with open-source projects can be so lucrative for your career.
Watch the full interview “779: The Tidyverse of Essential R Libraries and their Python Analogues — with Dr. Hadley Wickham” here: https://www.superdatascience.com/779

779: The Tidyverse of Essential R Libraries and their Python Analogues — with Dr. Hadley Wickham
#Tidyverse #RProgramming #RLibraries
Tidyverse, ggplot2, and the secret to a tech company’s longevity: Hadley Wickham talks to @JonKrohnLearns about Posit’s rebrand, Tidyverse and why it needs to be in every data scientist’s toolkit, and why getting your hands dirty with open-source projects can be so lucrative for your career.
This episode is brought to you by Intel and HPE Ezmeral Software (https://bit.ly/hpeintel) . Interested in sponsoring a SuperDataScience Podcast episode? Visit https://passionfroot.me/superdatascience for sponsorship information.
In this episode you will learn: • [00:00:00] Introduction • [00:02:55] All about the Tidyverse • [00:15:19] Hadley’s favorite R libraries • [00:28:39] The goal of Posit • [00:34:12] On bringing multiple programming languages together • [00:50:19] The principles for a long-lasting tech company • [00:53:34] How Hadley developed ggplot2 • [01:03:52] How to contribute to the open-source community
Additional materials: https://www.superdatascience.com/779

The Future Roadmap for the Composable Data Stack
Discover cutting-edge advancements in data processing stacks. Listen in as Wes McKinney dives into pivotal projects like Parquet and Arrow, alongside essential interface libraries like Ibis and tidyverse. Wes navigates through the current state of these projects, highlighting areas for further innovation and growth.
Sign up for our “No BS” Newsletter to get the latest technical data & AI content: https://hubs.li/Q02vz6xC0
ABOUT THE SPEAKER: Wes McKinney, Principal Architect, Posit PBC (co-founder of Voltron Data)
ABOUT DATA COUNCIL: Data Council brings together the brightest minds in data to share industry knowledge, technical architectures and best practices in building cutting edge data & AI systems and tools.
FIND US: Twitter: https://twitter.com/datacouncilai LinkedIn: https://www.linkedin.com/company/datacouncil-ai/ Website: https://www.datacouncil.ai/
dbtplyr: Bringing Column-Name Contracts from R to dbt - posit::conf(2023)
Presented by Emily Riederer
starts_with(language): Translating select helpers to dbt. Translating syntax between languages transports concepts across communities. We see a case study of adapting a column-naming workflow from dplyr to dbt’s data engineering toolkit.
dplyr’s select helpers exemplify how the tidyverse uses opinionated design to push users into the pit of success. The ability to efficiently operate on names incentivizes good naming patterns and creates efficiency in data wrangling and validation.
However, in a polyglot world, users may find they must leave the pit when comparable syntactic sugar is not accessible in other languages like Python and SQL.
In this talk, I will explain how dplyr’s select helpers inspired my approach to ‘column name contracts,’ how good naming systems can help supercharge data management with packages like {dplyr} and {pointblank}, and my experience building the {dbtplyr} to port this functionality to dbt for building complex SQL-based data pipelines.
Materials:
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Databases for data science with duckdb and dbt. Session Code: TALK-1098
duckplyr: Tight Integration of duckdb with R and the tidyverse - posit::conf(2023)
Presented by Kirill Müller
The duckplyr R package combines the convenience of dplyr with the performance of DuckDB. Better than dbplyr: Data frame in, data frame out, fully compatible with dplyr.
duckdb is the new high-performance analytical database system that works great with R, Python, and other host systems. dplyr is the grammar of data manipulation in the tidyverse, tightly integrated with R, but it works best for small or medium-sized data. The former has been designed with large or big data in mind, but currently, you need to formulate your queries in SQL.
The new duckplyr package offers the best of both worlds. It transforms a dplyr pipe into a query object that duckdb can execute, using an optimized query plan. It is better than dbplyr because the interface is “data frames in, data frames out”, and no intermediate SQL code is generated.
The talk first presents our results, a bit of the mechanics, and an outlook for this ambitious project.
Materials: https://github.com/duckdblabs/duckplyr/
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Databases for data science with duckdb and dbt. Session Code: TALK-1100
How to Keep Your Data Science Meetup Sustainable - posit::conf(2023)
Presented by Ted Laderas
Many data science meetup organizers struggle with burnout. It can be daunting to plan a meetup schedule, especially with the added burden of work and life.
In this talk, I want to highlight some strategies for keeping your data science meetup sustainable. Specifically, I want to highlight the role of self-care in growing and sustaining your group, as well as low-key activities like a data scavenger hunt, watching videos together, styling plots together, and sharing useful tidyverse functions.
By making it easy for your members to contribute and empowering them, it takes a lot of the burden off you as an organizer. You don’t need to reinvent the wheel for meetups or have famous guests for each one. Let’s start the conversation and make your meetup last.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: It takes a village: building and sustaining communities. Session Code: TALK-1129
Parameterized Quarto Reports Improve Understanding of Soil Health - posit::conf(2023)
Presented by Jadey Ryan
Learn how to use R and Quarto parameterized reporting in this four-step workflow to automate custom HTML and Word reports that are thoughtfully designed for audience interpretation and accessibility.
Soil health data are notoriously challenging to tidy and effectively communicate to farmers. We used functional programming with the tidyverse to reproducibly streamline data cleaning and summarization. To improve project outreach, we developed a Quarto project to dynamically create interactive HTML reports and printable PDFs. Custom to every farmer, reports include project goals, measured parameter descriptions, summary statistics, maps, tables, and graphs.
Our case study presents a workflow for data preparation and parameterized reporting, with best practices for effective data visualization, interpretation, and accessibility.
Talk materials: https://jadeyryan.com/talks/2023-09-25_posit_parameterized-quarto/
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Elevating your reports. Session Code: TALK-1160
The ‘I’ in Team: Peer-to-Peer Best Practices for Growing Data Science Teams - posit::conf(2023)
Presented by Liz Roten
R users don’t always come in sets. Often, you may be the only user on in the cubicle-block. But, one miraculous day, your manager finally fills the void and you welcome more folks on your team. Suddenly, the little R system you created to suit your needs, like a custom package, code styling, and file organization, isn’t just for you.
Want to suddenly overhaul that one package you wrote two years ago? It probably won’t work when your colleagues try to update it.
Your new teammates are data.table fans, but you prefer the tidyverse. Do you need to refactor? Are style choices, like indentation important when collaborating, or are you just being persnickety?
In this talk, you will learn how to bring new teammates on board and blend your respective styles without pulling your hair out.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Building effective data science teams. Session Code: TALK-1063
Visualizing Data Analysis Pipelines with Pandas Tutor and Tidy Data Tutor - posit::conf(2023)
Presented by Sean Kross
The data frame is a fundamental data structure for data scientists using Python and R. Pandas and the tidyverse are designed to center building pipelines for the transformation of data frames. However, within these pipelines it is not always clear how each operation is changing the underlying data frame. To explain each step in a pipeline data science instructors resort to hand-drawing diagrams to illustrate the semantics of operations such as filtering, sorting, and grouping.
In this talk, I will introduce Pandas Tutor and Tidy Data Tutor, step-by-step visual representation engines of data frame transformations. Both tools illustrate the row, column, and cell-wise relationships between an operation’s input and output data frames.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Teaching data science. Session Code: TALK-1096
webR 0.2: R Packages and Shiny for WebAssembly | George Stagg | Posit
WebR makes it possible to run R code in the browser without the need for an R server to execute the code: the R interpreter runs directly on the user’s machine. But just running R isn’t enough, you need the R packages you use every day too.
webR 0.2.0 makes many new packages available (10,324 packages - about 51% of CRAN!) and it’s now possible to run Shiny apps under webR, entirely client side.
George Stagg shares how to load packages with webR, know what ones are available, and get started running Shiny apps in the web browser. There’s a demo webR Shiny app too!
00:15 Loading R packages with webR 01:50 Wasm system libraries available for use with webR 05:30 Tidyverse, tidymodels, geospatial data, and database packages available 08:00 Shiny and httpuv: running Shiny apps under webR 11:05 Example Shiny app running in the web browser 12:05 Links with where to learn more
Shiny webR demo app: https://shinylive.io/r/examples/
Website: https://docs.r-wasm.org/ webR REPL example: https://webr.r-wasm.org/latest/
Demo webR Shiny app in this video: https://shiny-standalone-webr-demo.netlify.app/ Source: https://github.com/georgestagg/shiny-standalone-webr-demo/
See the overview of what’s new in webR 0.2.0: https://youtu.be/Mpq9a6yMl_w

Bite-sized tricks for machine learning with tidymodels | Posit
The tidymodels framework is a collection of R packages for modeling and machine learning using tidyverse principles. This video highlights a number of tidymodels features that could improve your modeling workflows.
0:03 Switching modeling engines is easy 0:21 Never lose your tuning results 0:36 Built-in visualizations for modeling objects 1:03 Grouped resampling 1:16 Case weights 1:32 Select variables based on role and type 2:00 Spatial resampling 2:16 Keep your tidymodels objects small
Learn more at https://www.tidymodels.org/
How to train, evaluate, and deploy a machine learning workflow with tidymodels & Posit Team
Helpful resources: Github: https://github.com/simonpcouch/mutagen Follow-up Q&A Session: https://youtube.com/live/vwBVOBQfc_U If you want to book a call with our team to chat more about Posit products: pos.it/chat-with-us Don’t want to meet, but curious who else on your team is using Posit? pos.it/connect-us Blog post on tidymodels + Posit Connect: https://posit.co/blog/pharmaceutical-machine-learning-with-tidymodels-and-posit-connect/ Tidy Modeling with R book: https://www.tmwr.org/
Timestamps: 1:44 - Three steps for developing a machine learning model 3:35 - What is a machine learning model? 7:02 - Overview of machine learning with Posit Team 7:36: Step 1: Understand and clean data 11:05 - Step 2: Train and evaluate models (why you might be interested using tidymodels) 23:02 - Step 3: Deploying a machine learning model from Posit Workbench to Posit Connect 30:14 - Summary 31:21 - Helpful resources
Machine learning models are all around us, from Netflix movie recommendations to Zillow property value estimates to email spam filters.
As these models play an increasingly large role in our personal and professional lives, understanding and embracing them has never been more important; machine learning helps us make better, data-driven decisions.
The tidymodels framework is a powerful set of tools for building—and getting value out of—machine learning models with R.
Data scientists use tidymodels to:
- Gain access to a wide variety of machine learning methods
- Guard against common mistakes
- Easily deploy models through tidymodels’ integration with vetiver
Join Simon Couch from the tidyverse team on Wednesday, October 25th at 11am ET as he walks through an end-to-end machine learning workflow with Posit Team.
No registration is required to attend - simply add it to your calendar using this link: pos.it/team-demo

Hadley Wickham @ Posit | Giving benefit to people using what you build | Data Science Hangout
We were recently joined by Hadley Wickham, Chief Scientist at Posit PBC. Listen in to hear our chat about building tools (like the tidyverse) to make data science easier, faster, and more fun.
36:57 - While I’m bought into developing open source packages to help deliver better processes, any advice to those of us doing that development in getting their company bought in?
You have to give some benefit to the people using (what you’re building)
You’ve got to either remove pain or add pleasure in some way because if you can’t do that and you’re not someone’s direct supervisor, it’s hard to get people to change.
The way I think about the tidyverse is, how do we give people some sort of quick wins so they can be motivated to do the things that are slower where they’re gonna have to learn some new ideas or some new tools. You kind of build up some equity with that person.
They build trust that you’ve helped them in the past and now they’re willing to invest a little bit more time before they see the payoff. But in the early days, it’s all about delivering payoffs as quickly as possible.
And I think if you’re doing, like, you know “my company’s first R package” - the easy pain points are: make themes for your company corporate style guide, make a ggplot2 theme, make an R Markdown, a Quarto theme. Make a Shiny theme that people can just use to get, you know, something that’s reasonably close to whatever your corporate style guide dictates.
That just feels like an easy win for people because it makes them look good inside the corporation and because you’ve put in all the hard work, it’s like three seconds for them to type the right function name to get the right theme.
I think the other bit is making it easier to get access to data. Set up some wrappers around DBI connections to the most important data sources. Provide some conventions around authentication so that stuff just works so that they’re not struggling with “What packages do I need to install? What’s the password? Where’s the path I need?” Just give them some, like, a list of the top ten most common data sources and people will love you by and large.
Follow-up question: Once you identify the things that you think would be useful for people - do you have a philosophy or a way in which you approach putting things together?
When you’re in an environment of scarcity when you’ve only got so much time that you can take out of your everyday job to invest in writing a package, it’s really tough to balance. Like, how do I add new stuff versus making sure the old stuff continues to work?
I think, again, some of it’s about building up trust. So, give people some wins so that when you inevitably break stuff, you’ve got some kind of cushion so people aren’t going to be really angry with you right away. They’re gonna be like, ok, well there’s a little bit of suffering now, but this person saved me so much time.
But yeah, it’s really hard. And particularly as you’re starting out, like, you’re going to make mistakes. That’s inevitable.
You’re going to do things that when you look back a year later, you’re like, why on earth did I do it that way? You’ll want to rip out the whole thing and ride it from scratch. And I think that if it feels horrible, you have to remember, that’s great. It means you’ve grown immensely as a programmer.
Certainly if you have my kind of mindset, you have to resist the temptation to rip things out and redo them as much as possible and just focus on making the next generation better rather than breaking what stuff people already have.
So I don’t have any great answers here, but I think you just have to think about those tensions of “how do I keep my forward velocity up while getting better as a programmer and evolving over time, but also thinking about how do you make the things you did a long time ago better?”
► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu
Follow Us Here: Website: https://www.posit.co LinkedIn: https://www.linkedin.com/company/posit-software Twitter: https://twitter.com/posit_pbc
To join future data science hangouts, add to your calendar here: pos.it/dsh (All are welcome! We’d love to see you!)
Come hangout with us!

Teaching the tidyverse in 2023 | Mine Çetinkaya-Rundel
Recommendations for teaching the tidyverse in 2023, summarizing package updates most relevant for teaching data science with the tidyverse, particularly to new learners.
00:00 Introduction 00:46 Using addins to switch between RStudio themes (See https://github.com/mine-cetinkaya-rundel/addmins for more info) 01:40 Native pipe 03:08 Nine core packages in tidyverse 2.0.0 07:15 Conflict resolution in the tidyverse 11:30 Improved and expanded *_join() functionality 22:05 Per operation grouping 27:41 Quality of life improvements to case_when() and if_else() 31:41 New syntax for separating columns 34:51 New argument for line geoms: linewidth 36:08 Wrap up
See more in the Teaching the tidyverse in 2023 blog post https://www.tidyverse.org/blog/2023/08/teach-tidyverse-23

What does deprecated mean? Package lifecycle and the process of deprecation.
An important part of the process of package lifecycle and package development is not just adding new functions. It’s equally important to remove functions.
Hadley Wickham shares about the package lifecycle process and what ‘deprecation’ means for functions.
See the full video about the purrr 1.0 release: https://youtu.be/EGAs7zuRutY
More about the package lifecycle stages: https://lifecycle.r-lib.org/articles/stages.html
Maintaining the house that tidyverse built: https://youtu.be/izFssYRsLZs

What does superseded mean? Package development lifecycle process and the meaning of superseded.
An important part of the process of package lifecycle and package development is not just adding new functions. It’s is equally important to remove functions.
Hadley Wickham shares about the package lifecycle process and what ‘supersede’ means for functions.
See the full video about the purrr 1.0 release: https://youtu.be/EGAs7zuRutY
More about the package lifecycle stages: https://lifecycle.r-lib.org/articles/stages.html
Maintaining the house that tidyverse built: https://youtu.be/izFssYRsLZs

posit::conf(2023) Workshop: Advanced tidymodels
Register now: http://pos.it/conf Instructor: Max Kuhn, Software Engineer, Posit Workshop Duration: 1-Day Workshop
This workshop is for you if you: • have used tidymodels packages like recipes, rsample, and parsnip • are comfortable with tidyverse syntax (e.g. piping, mutates, pivoting) • have some experience with resampling and modeling (e.g., linear regression, random forests, etc.), but we don’t expect you to be an expert in these
In this workshop, you will learn more about model optimization using the tune and finetune packages, including racing and iterative methods. You’ll be able to do more sophisticated feature engineering with recipes. Time permitting, model ensembles via stacking will be introduced. This course is focused on the analysis of tabular data and does not include deep learning methods.
Participants who have completed the “Introduction to tidymodels” workshop will be well-prepared for this course. Participants who are new to tidymodels will benefit from taking the Introduction to tidymodels workshop before joining this one

posit::conf(2023) Workshop: Big Data with Arrow
Register now: http://pos.it/conf Instructors: Nic Crane and Stephanie Hazlitt Workshop Duration: 1-Day Workshop
This course is for you if you: • want to learn how to work with tabular data that is too large to fit in memory using existing R and tidyverse syntax implemented in Arrow • want to learn about Parquet and other file formats that are powerful alternatives to CSV files • want to learn how to engineer your tabular data storage for more performant access and analysis with Apache Arrow
Data analysis pipelines with larger-than-memory data are becoming more and more commonplace. In this workshop you will learn how to use Apache Arrow, a multi-language toolbox for working with larger-than-memory tabular data, to create seamless “big” data analysis pipelines with R.
The workshop will focus on using the the arrow R package—a mature R interface to Apache Arrow— to process larger-than-memory files and multi-file data sets with arrow using familiar dplyr syntax. You’ll learn to create and use interoperable data file formats like Parquet for efficient data storage and access, with data stored both on disk and in the cloud, and also how to exercise fine control over data types to avoid common large data pipeline problems. This workshop will provide a foundation for using Arrow, giving you access to a powerful suite of tools for performant analysis of larger-than-memory data in R
posit::conf(2023) Workshop: Causal Inference with R
Register now: http://pos.it/conf Instructors: Malcolm Barrett and Travis Gerke
This course is for you if you: • know how to fit a linear regression model in R • have a basic understanding of data manipulation and visualization using tidyverse tools • are interested in understanding the fundamentals behind how to move from estimating correlations to causal relationships
In this workshop, we’ll teach the essential elements of answering causal questions in R through causal diagrams, and causal modeling techniques such as propensity scores and inverse probability weighting.
In both data science and academic research, prediction modeling is often not enough; to answer many questions, we need to approach them causally. In this workshop, we’ll teach the essential elements of answering causal questions in R through causal diagrams, and causal modeling techniques such as propensity scores and inverse probability weighting. We’ll also show that by distinguishing predictive models from causal models, we can better take advantage of both tools. You’ll be able to use the tools you already know–the tidyverse, regression models, and more–to answer the questions that are important to your work
posit::conf(2023) Workshop: From R User to R Programmer
Register now: http://pos.it/conf Instructors: Emma Rand and Ian Lyttle Workshop Duration: 1-Day Workshop
This course is for you if you: • have experience equivalent to an introductory data science course using tidyverse • feel comfortable with the Whole game chapter of R for Data Science
This is a one-day, hands-on workshop for those who have embraced the tidyverse and want to improve their R programming skills and, especially, reduce the amount of duplication in their code. The two main ways to reduce duplication are creating functions and using iteration. We will use a tidyverse approach to cover function design and iteration with {purrr}.
• Master the art of writing functions that do one thing well, adhere to existing conventions and can be fluently combined together to solve more complex problems. • Learn how to perform the same action on many objects using code which is succinct and easy to read
posit::conf(2023) Workshop: Fundamentals of Package Development
Register now: http://pos.it/conf Instructor: Andy Teucher Workshop Duration: 1-Day Workshop
This workshop is for you if: • You have written several R scripts and find yourself wondering how to reuse or share the code you’ve written • You know how to write functions in R • You are looking for a way to take the next step in your R programming journey
We will be demonstrating some workflows using Git and GitHub. Knowledge of these tools is not required, and you will absolutely be able to complete the workshop without them, but some of the lessons will be more rewarding to you if you are prepared to try them out. If you are looking to get started with Git and GitHub, we recommend you register for the “What they forgot to teach you about R” workshop on Day 1, and join us for this workshop on Day 2.
We are often faced with the need to share our code with others, or find ourselves writing similar code over and over again across different projects. In R, the fundamental unit of reusable code is a package, containing helpful functions, documentation, and sometimes sample data. This workshop will teach you the fundamentals of package development in R, using tools and principles developed and used extensively by the tidyverse team - specifically the ‘devtools’ family of packages including usethis, testthat, and roxygen2. These packages and workflows help you focus on the contents of your package rather than the minutiae of package structure.
You will learn the structure of a package, how to organize your code, and workflows to help you develop your package iteratively. You will learn how to write good documentation so that users can learn how to use your package, and how to use automated testing to ensure it is functioning the way you expect it to, now and into the future. You will also learn how to check your package for common problems, and how to distribute your package for others to use.
This will be an interactive 1-day workshop, and we will be using the RStudio IDE to work through the materials, as it has been designed to work well with the development practices we will be featuring
posit::conf(2023) Workshop: Introduction to Data Science with R and Tidyverse
Register now: http://pos.it/conf Instructors: Posit Academy Instructors Workshop Duration: 2-Day Workshop
This course is ideal for: • those new to R or the Tidyverse • anyone who has dabbled in R, but now wants a rigorous foundation in up-to-date data science best practices • SAS and Excel users looking to switch their workflows to R
This is not a standard workshop, but a six-week online apprenticeship that culminates in two in-person days at posit::conf(2023). Begins August 7th, 2023. No knowledge of R required. Visit posit.co/academy to learn more about this uniquely effective learning format.
Here, you will learn the foundations of R and the Tidyverse under the guidance of a Posit Academy mentor and in the company of a close group of fellow learners. You will be expected to complete a weekly curriculum of interactive tutorials, and to attend a weekly presentation meeting with your mentor and fellow students. Topics will include the basics of R, importing data, visualizing data with ggplot2, wrangling data with dplyr and tidyr, working with strings, factors, and date-times, modelling data with base R, and reporting reproducibly with quarto
posit::conf(2023) Workshop: Introduction to tidymodels
Register now: http://pos.it/conf Instructors: Hannah Frick, Simon Couch, Emil Hvitfeldt Workshop Duration: 1-Day Workshop
This workshop is for you if you: • have intermediate R knowledge, experience with tidyverse packages, and either of the R pipes • can read data into R, transform and reshape data, and make a wide variety of graphs • have had some exposure to basic statistical concepts such as linear models, random forests, etc.
Intermediate or expert familiarity with modeling or machine learning is not required.
This workshop will teach you core tidymodels packages and their uses: data splitting/resampling with rsample, model fitting with parsnip, measuring model performance with yardstick, and basic pre-processing with recipes. Time permitting, you’ll be introduced to model optimization using the tune package. You’ll learn tidymodels syntax as well as the process of predictive modeling for tabular data



posit::conf(2023) Workshop: Steal like an Rtist: Creative Coding in R
Register now: http://pos.it/conf Instructors: Ijeamaka Anyene Fumagalli & Sharla Gelfand Workshop Duration: 1-Day Workshop
This workshop is for you if you: • are comfortable with R and RStudio, experience with tidyverse and ggplot2 • are interested in applying data visualization skills more creatively, but may not know where to start or how to develop style/inspiration • are an artist interested in exploring code as another medium for creating their work
R is a tool for data analysis but also can be used for self-expression. This workshop will be an introduction to creative coding in R in order to make visual art. We will take an inspiration-first approach, using compelling pieces to discuss and learn the techniques that shape the work. This workshop takes guidance from its namesake, the book “Steal Like An Artist” by Austin Kleon - once we have identified and learned to recreate existing works, we will cover how to take this inspiration and transform, remix, or reinterpret it in the pursuit of developing our own work and artistic styles.
This workshop is hands-on and will cover color theory and manipulation, a reintroduction of the data frame as the foundation for creating art (instead of just for analyzing data!), using ggplot2 as an artistic canvas, creating basic and specialized shapes, tiling and pattern making, developing your own functions and using iteration. We will also discuss how to use controlled randomness to convert a standalone piece into a generative art system that can produce many distinct outputs. Creative coding may seem a world apart from data analysis, but we see a large overlap and intersection of the skills used in both, not to mention the creative muscles that are already used in data visualization
posit::conf(2023) Workshop: Teaching Data Science Masterclass
Register now: http://pos.it/conf Instructor: Dr. Mine Çetinkaya-Rundel Workshop Duration: 1-Day Workshop
This course is for you if you: • you want to learn / discuss curriculum, pedagogy, and computing infrastructure design for teaching data science with R and RStudio using the tidyverse and Quarto • you are interested in setting up your class in Posit Cloud • you want to integrate version control with git into your teaching and learn about tools and best practices for running your course on GitHub
This masterclass is aimed primarily at participants teaching data science in an academic setting in semester-long courses, however much of the information and tooling we introduce is applicable for shorter teaching experiences like workshops and bootcamps as well. Basic knowledge of R is assumed and familiarity with the tidyverse and Git is preferred.
There has been significant innovation in introductory statistics and data science courses to equip students with the statistical, computing, and communication skills needed for modern data analysis. Success in data science and statistics is dependent on the development of both analytical and computational skills, and the demand for educators who are proficient at teaching both these skills is growing. The goal of this masterclass is to equip educators with concrete information on content, workflows, and infrastructure for painlessly introducing modern computation with R and RStudio within a data science curriculum. In a nutshell, the day you’ll spend in this workshop will save you endless hours of solo work designing and setting up your course.
Topics will cover teaching the tidyverse in 2023, highlighting updates to R for Data Science (2nd ed) and Data Science in a Box as well as present tooling options and workflows for reproducible authoring, computing infrastructure, version control, and collaboration.
The workshop will be comprised of four modules: • Teaching data science with the tidyverse and Quarto • Teaching data science with Git and GitHub • Organizing, publishing, and sharing of course materials • Computing infrastructure for teaching data science
Throughout each module we’ll shift between the student perspective and the instructor perspective. The activities and demos will be hands-on; attendees will also have the opportunity to exchange ideas and ask questions throughout the session.
In addition to gaining technical knowledge, participants will engage in discussion around the decisions that go into developing a data science curriculum and choosing workflows and infrastructure that best support the curriculum and allow for scalability. We will also discuss best practices for configuring and deploying classroom infrastructures to support these tools

posit::conf(2023) Workshop: Tidy time series and forecasting in R
Register now: http://pos.it/conf Instructor: Rob J Hyndman Workshop Duration: 2-Day Workshop
This course is for you if you: • already use the tidyverse packages in R such as dplyr, tidyr, tibble and ggplot2 • need to analyze large collections of related time series • would like to learn how to use some tidy tools for time series analysis including visualization, decomposition and forecasting
It is common for organizations to collect huge amounts of data over time, and existing time series analysis tools are not always suitable to handle the scale, frequency and structure of the data collected. In this workshop, we will look at some packages and methods that have been developed to handle the analysis of large collections of time series.
On day 1, we will look at the tsibble data structure for flexibly managing collections of related time series. We will look at how to do data wrangling, data visualizations and exploratory data analysis. We will explore feature-based methods to explore time series data in high dimensions. A similar feature-based approach can be used to identify anomalous time series within a collection of time series, or to cluster or classify time series. Primary packages for day 1 will be tsibble, lubridate and feasts (along with the tidyverse of course).
Day 2 will be about forecasting. We will look at some classical time series models and how they are automated in the fable package, and we will explore the creation of ensemble forecasts and hybrid forecasts. Best practices for evaluating forecast accuracy will also be covered. Finally, we will look at forecast reconciliation, allowing millions of time series to be forecast in a relatively short time while accounting for constraints on how the series are related
Hadley Wickham | {purrr} 1.0: A complete and consistent set of tools for functions and vectors
{purrr} has reached the 1.0 milestone, with new features like progress bars, improvements to the map family, and tools for list flattening and simplification.
0:00 Introduction 0:11 What is purrr? 00:32 What is functional programming? 03:08 Announcing purrr 1.0 03:58 Progress bars 05:18 Better error messages 07:18 New map function: map_vec() 09:58 New list_* functions 12:04 Flattening and simplification 17:40 Breaking Changes 22:34 How the tidyverse handles deprecation 24:41 An overview of functional programming 26:22 Closing, resources to help with deprecation, how to submit issues
See more in the {purrr} 1.0.0 release blog post! https://www.tidyverse.org/blog/2023/03/tidyverse-2-0-0/

R-Ladies Rome (English) - What’s new in the tidyverse - Isabella Velasquez
Welcome to R-Ladies Rome Chapter!
What’s new in the tidyverse - Speaker: Isabella Velasquez
In this video, Isabella will tell you about What’s new in the tidyverse, a suite of packages that’s revolutionized data wrangling, visualization, and analysis. Recently, Tidyverse has undergone some changes and updates to make it even more user-friendly and powerful. The changes to Tidyverse include new packages, updates to existing ones, and improvements in performance and functionality. Some of the most notable updates include enhancements to package dependencies, performance improvements for specific functions such as group_by(), and the addition of new packages such as ggplot2, readr and dplyr.
You can find the latest news here: https://bit.ly/3z9BcMR To follow Isabella Velásquez: Twitter: twitter.com/ivelasq3 LinkedIn: linkedin.com/in/ivelasq/
Materials: GitHub repo: https://bit.ly/3LHVSmS Website: https://bit.ly/3M5gE03 The tidyverse blog: https://www.tidyverse.org/blog/
Alex Farach | Let’s start at the beginning - bits to character encoding in R | RStudio (2022)
Attendees will recieve a broad overview of the encoding and decoding process in the human-to-computer loop, how bits are used, and the math that gets us to common bit values. A brief history of ASCII, Latin-1, and UTF-8 will be provided as well.
Attendees will also be exposed to how character encoding works in R and in the tidyverse.
Talk materials are available at https://github.com/rstudio/rstudio-conf/blob/master/2022/alexfarach/bits_to_character_in_R_RSTUDIO%20-%20Alex%20F.pdf
Session: Lightning Talks
Josiah Parry | Exploratory Spatial Data Analysis in the tidyverse | RStudio (2022)
R has come quite a long way to enable spatial analysis over the past few years. Packages such as sf have made spatial analysis and mapping easier for many. However, adoption of R for spatial statistics and econometrics has been limited. Many spatial analysts, researchers, and practitioners lean on Python libraries such as pysal.
In this talk I briefly discuss my journey through spatial analysis and introduce a new package sfdep which provides a tidy interface to spatial statistics and noteably exploratory spatial data analysis. sfdep is an interface to the spdep package as well as implements other common exploratory spatial statistics.
Talk materials are available at https://github.com/rstudio/rstudio-conf/blob/master/2022/josiahparry/rstudio__conf(2022L)%20-%20Josiah%20Parry.pdf
Session: Lightning Talks
Welcome to Quarto Workshop! | Led by Tom Mock, RStudio
Welcome to Quarto 2-hour Workshop | Led by Tom Mock, RStudio
Content website: https://jthomasmock.github.io/quarto-2hr-webinar/ FULL Workshop Materials (this was from a 2-day workshop): rstd.io/get-started-quarto Other upcoming live events: rstd.io/community-events
Double-check: Are you on the latest version of RStudio i.e. v2022.07.1 or later?
Packages used: tidyverse, gt, gtExtras, reactable, ggiraph, here, quarto, rmarkdown, gtsummary, palmerpenguins, fs, skimr
️ Pre-built RStudio Cloud with workshop materials already installed: https://rstudio.cloud/content/4332583
For follow-up questions, please use: community.rstudio.com/tag/quarto
Timestamps: 7:16 - What is Quarto? 8:28 - How does R Markdown work? 9:40: Quarto, more than just knitr 13:56 - Quarto can support htmlwidgets in R and Jupyter widgets for Python/Julia 14:18 - Native support for Observable Javascript 19:28 - Quarto in your own workspace (Jupyter Lab, VSCode, RStudio) 20:26 - RStudio Visual Editor mode 23:30 - VS Code YAML 26:02 - Quarto for collaboration 26:55 - How do you publish Quarto? (Quarto Pub, GitHub Pages, RStudio Connect, Netlify) 28:44 - What about Data Science at Work? 29:59 - Formats baked into Quarto (basic formats, beamer, ppt, html slides, advanced layout, cross references, websites, blogs, books, interactivity) 32:13 - What to do with my existing .Rmd or .ipynb? 33:16 - Why Quarto, instead of R Markdown? 40:50 - Text Formatting 41:30 - Headings 41:51 - Code (also merging R and Python in one document) 43:29 - What about the CLI? 44:55 - Navigating in the terminal 57:56 - PART 2: Authoring Quarto 1:00:22 - Output options 1:04:46 - Quarto workflow 1:12:06 - Quarto YAML intelligence 1:13:20 - Divs and Spans 1:22:13 - Figure layout 1:34:40 - Code chunk options 1:41:00 - Quarto and R Markdown (converting R Markdown to Quarto)
This 2-hour virtual session is designed for those who have no or little prior experience with R Markdown and who want to learn Quarto.
Want to get started with Quarto?
- Install RStudio v2022.07.1 from https://www.rstudio.com/products/rstudio/download/#download - this will come with a working version of Quarto!
- Webinar materials/slides: https://jthomasmock.github.io/quarto-2hr-webinar/
- Workshop materials on RStudio Cloud: https://rstudio.cloud/content/4332583
What is Quarto?
Quarto is the next generation of R Markdown for publishing, including dynamic and static documents and multi-lingual programming language support. With Quarto you can create documents, books, presentations, blogs or other online resources.
Should I take this?
As with all the community meetups, everyone is welcome. This will be especially interesting to you if you have experience programming in R and want to learn how to take advantage of Quarto for literate data science programming in academia, science, and industry.
This workshop will be appropriate for attendees who answer yes to these questions:
Have you programmed in R and want to better encapsulate your code, documentation, and outputs in a cohesive “data product”? Do you want to learn about the next generation of R Markdown for data science? Do you want to have a better interactive experience when writing technical or scientific documents with literate programming?
For more info on Quarto: quarto.org
Posit Meetup | Jake Riley, Children’s Hospital of Philadelphia | Translating Facts to Insights
RStudio Healthcare Meetup:
Translating facts into insights at Children’s Hospital of Philadelphia Led by Jake Riley, data analyst at The Children’s Hospital of Philadelphia
Abstract: {headliner} is a new R package to add dynamic, insightful text to plots and reports. {headliner} generates useful talking points that users can string together using {glue} syntax. This makes it easy to write an informative sentences without adding a lot of technical debt to a project. Learn how to get started with {headliner} and ways we have used it at The Children’s Hospital of Philadelphia.
Speaker Bio: Jake Riley is a data analyst at The Children’s Hospital of Philadelphia. He is the author of several R packages related to data visualization and automated exploratory analysis. You can find his published work [simplecolors] and [shinyobjects] on CRAN with more packages on the way.
Timestamps: 0:49 - Start of talk 1:25 - Dashboards focused on facts vs. insights 2:56 - What’s a good title for a chart? 5:09 - Intro to headliner package 7:41- using glue() under the hood 14:04 - helpers for working with data frames: compare_conditions() 18:41 - using ggtext 21:27 - example using pixar_films 23:40 - how they’ve used it at CHOP 28:05 - Next steps for headliner package 29:32 - Start of Q&A session
Questions: 29:32 - Can you use any package you want in your organization? 31:13 - How do you load previous datasets to compare to current datasets? 32:48 - When you mentioned a front page on RStudio Connect (with the headlines), what is that? 33:25 - Is anyone using this for manuscripts at CHOP now? 36:24 - What has the adoption of R or Python been within the hospital analytics team? 37:28 - My manager is very leery of R because of technical depth. Any suggestions for convincing her of R’s value? 42:22 - How does CHOP use R for non-clinical analysis? 43:36 - How do you train new people to use R? 46:28 - How do you compare last week’s analysis to this week’s? 49:37 - Were there any major challenges in creating the hospital’s internal package?
Resources/links shared: Jake’s LinkedIn: https://www.linkedin.com/in/jake-riley-70736a3/ headliner package: https://github.com/rjake/headliner waldo package: https://www.tidyverse.org/blog/2020/10/waldo/ Examples of R in Life Science & Healthcare: https://www.rstudio.com/champion/life-science Chris Bumgardner’s talk on building an R-based analytic practice at Children’s Wisconsin: https://youtu.be/pHZ8dsc0PhY simplecolors package to generate hex codes using uniformly named colors: https://rjake.github.io/simplecolors/ R Packages book by Hadley Wickham & Jenny Bryan: https://r-pkgs.org/
Meetup Links: Future events: rstd.io/community-events-calendar If anyone’s interested in speaking at a future meetup, we’d love to hear from you too! rstd.io/meetup-speaker-form


Enabling Citizen Data Scientists at Dow Chemical with Posit Academy
Led by James Wade, Associate Research Scientist at Dow Chemical
Timestamps: 2:46 - Start of presentation 5:25 - Goal: “apply science and engineering technical expertise along with data science tooling to innovate in the materials science arena.” 6:36 - What does citizen data science mean? 8:05 - Data science as an interdisciplinary endeavor - looking to build a community of innovators 9:30 - Translating data to decisions 11:03 - Guidelines for success (data organizations, data access, data analysis, value preservation) 13:30 - Welcoming new users in an approachable, collaborative, and secure workspace with RStudio Team 14:25 - Making sure you can rapidly deploy your insights to others 16:25 - What is RStudio Academy? 20:55 - What do you need for academy? (Academy learners: 5-7 per cohort, cohort mentors from RStudio & your group, and a project - the closer to your work the better) 22:15 - Who is a good candidate? 23:55 - Who might not be the best candidate? 26:00 - What makes a good cohort? (similar work group, time zone, and skill level) 27:27 - Feedback (Are they still using the content they learned? 16 out of the 17 survey respondents were still writing code 6 months after) 31:42 - Community building (want to have a landing zone for people to continue to learn) 32:31 - RStudio Academy success story at Dow 35:30 - Start of Q&A portion
Questions: 36:00 - How do you help someone who knows coding would be useful but can’t motivate to take 5 steps back to take 10 steps forward? 37:55 - How can more advanced users participate in developing curriculums? 39:44 - Does Academy also teach good coding style and version control? 41:00 - If you’re trying to “sell” Academy to the individual who would fill the group mentor role, what level of commitment and bandwidth do they need to have? 42:13 - Is the type of data you work with relevant to the work you do at Dow? or random / set datasets regardless of which company you’re with? 43:00 - What other ways of teaching R have you tried (or considered) at Dow? How does Academy compare? 44:55 - What is the duration of RStudio Academy? 46:15 - Can you have multiple cohorts go through at the same time? What if we want to up-skill hundreds of people? 48:20 - How did you find out who might be interested and get the word out? 50:08 - Advertising that you help learners up-skill in coding seems like a good way to set your company apart from others, are you hiring? 51:25 - After the RStudio Academy 10 week training is the Academy team still available for questions, support or consult? 53:01 - Is Academy only for R? 53:38 - How can a data science student participate in RStudio Academy? (https://www.rstudio.com/conference/2022/workshops/intro-to-tidyverse/ ) 54:44 - How do you collaborate with others outside of Dow? 57:20 - How does RStudio Academy handle sensitive data? 1:00:20 - Do you have statistics on how many graduates are still using R?
Abstract: In chemistry and materials science research, data is messy, unstructured, and scattered. Solving this problem requires researchers to deeply embed within data generation and analysis workflows.
We are on a multi-year journey to equip scientists and engineers with guidance and tools to extract insights from their data. To this end, we have developed a set of 15 guidelines designed to move our organization toward a collaborative, reproducible work process in a dynamic data-diverse environment.
In this talk, I will share our lessons from this journey learned through teaching, community building, and collaboration with a particular focus on the integration of language agnostic RStudio tools, products, and programs. I will especially be focusing on our experience with RStudio Academy.
Speaker Bio: James is a research scientist working in the chemicals manufacturing industry as part of a research and development team. James applies materials characterization and data science with a special interest in sustainable materials design to develop new capabilities for research. His current focus is on augmenting materials characterization innovations with statistical analysis, machine learning, and data visualization.
For more information on RStudio Academy: rstudio.com/academy Link to speak with RStudio: rstd.io/chat-with-rstudio
Data Science Hangout | Alice Walsh, Pathos | Improving an Interview Experience
We were joined by Alice Walsh, PhD, VP of Translational Research at Pathos. Alice works in drug development, where she is excited about the potential of computational research to yield breakthroughs for patients.
Loved that Alice also asked this question back to the audience:
How do I make an interview a good experience for a candidate? Or have you had any nightmares that’d be helpful to share?
A bunch of thoughts shared from the group: ⬢ I’ve had way more success not giving a technical interview, and having the technical interview be more of a discussion where I’m not even asking them to whiteboard anything or it’s just talking.
⬢ If I asked them, “how do you develop a shiny app”, I’d much rather someone tell me I’ve never developed a shiny app in my life but I use R Markdown every day. That tells me a lot about their ability to actually jump in and learn something new and their transparency.
⬢ I’m much more interested in the process. How do people approach a problem and solve challenges that they encounter versus the specific project they worked on because they’re not going to work on that project ever again with me. It’s going to be a new project so they will need to learn something anyway.
⬢ I’ve had success hiring from meetups or hackathons. Seeing people here and the way they problem solve gives you a lot of insight about these individuals.
⬢ My company actually does do a technical interview and we give candidates a data set while they’re on site or in a Teams meeting. We give them an hour to see what sorts of insights they find with a few very specifically directed questions. What we’re often looking for is not someone to have perfect answers to those questions - it’s really about understanding how they looked at the data set, what other information they want, and what do you wish you had more time to do. You get to see how people work through something and it’s okay if they don’t have a perfectly polished presentation.
⬢ I’ve had a nightmare interview that became a pop quiz on R stuff. What are all the packages in the tidyverse (and at the time I didn’t use tidyverse I was base R)
⬢ A nightmare one that sticks in my head was, please describe in detail the differences between Python 3 and Python 2.
⬢ I think, “this is something I would Google” is a valid answer sometimes because even if I don’t know this, I know where to find it and am really confident in my ability to Google this.
⬢ Honestly if I ask somebody a question and they said this is something that I know I could find the answer to, that would be a perfect answer to me. Not knowing but knowing where to find the information great.
⬢ I went on 3 interviews and they each had a technical part where for every single one of them the answer was: dynamic programming. They must have gone somewhere and decided that was the algorithm to ask about. I found that a bit ridiculous because it wasn’t relevant to what they were working on and it was off-putting. Now, when I interview people I try to make it more of a conversation around the data and what we might actually be doing.
⬢ I have an optional take home: “here’s a data set, take an hour and tell me something. Use whatever tools you want: Excel, R, Python, an abacus.” The key thing I want to see is a written output of what you did. I’m still on the fence, though, because I know a lot of people are anti-technical component. I’m still trying to figure out if it is really helping us make the best hiring decisions.
Also wanted to share a link to a job posting on Alice’s team! She has simplified to just one posting, but they have a couple of openings. They are hiring both for more experienced scientists and folks transitioning from academic research and no drug development experience:
Where to find more?
► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu ► Data Science Hangout site: rstudio.com/data-science-hangout ► Add the Data Science Hangout to your calendar: rstd.io/datasciencehangout
Follow Us Here: Website: https://www.rstudio.com LinkedIn:https://www.linkedin.com/company/rstudio Twitter: https://twitter.com/rstudio
Alan Carlson | Robust, modular dashboards that minimize tech debt | RStudio
Robust, modular dashboards that minimize tech debt Presented by Alan Carlson, Snap Finance
Abstract Dashboards can be complex but building them shouldn’t be! We’ve built a wrapper for developing production level dashboards that streamlines onboarding new developers and standardizes the initial infrastructure to mitigate tech debt. Now you and your team can spend more time developing insights and less time trying to spin up shiny code with {graveler}.
Speaker Bio As the Tech Lead for the BI (Business Intelligence) team, Alan’s primary focus at Snap is researching, creating, and maintaining methods that help the rest of Snap’s BI Team in their work. From dashboards to visualizations to R code in general, he has built multiple packages and bookdowns that make BI easier to train and to use within the RStudio environment.
Helpful Links: Blog Post: https://www.rstudio.com/blog/make-robust-modular-dashboards-with-golem-and-graveler/ Graveler package: https://github.com/ghcarlalan/graveler Environment variables: https://docs.rstudio.com/connect/user/content-settings/#content-vars Git-backed publishing: https://docs.rstudio.com/connect/user/git-backed/ If you’d like to join events live: colorado.rstudio.com/rsc/community-events
Question about style guides: Tidyverse Style Guide: https://style.tidyverse.org/ Efficient R Programming book that Colin Gillespie wrote: https://csgillespie.github.io/efficientR/
Questions about RStudio Team: ⬢ RStudio Connect: https://www.rstudio.com/products/connect/ ⬢ Chat with RStudio about RStudio Team: rstd.io/chat-with-rstudio
Data Science Hangout | Mike Smith, Pfizer | Building an R Center of Excellence
We were joined by Mike Smith, Senior Director, Pfizer R&D UK Ltd at the Data Science Hangout - a weekly, free-to-join open conversation for the data science community. If you’d like to join us live, you can add it to your calendar here: rstd.io/datasciencehangout
Mike shared with us all that they are building up a Center of Excellence at Pfizer to help teams across the business build reproducible workflows and use analytics tools effectively & efficiently.
What led to the creation of the CoE within Pfizer and how could we do something similar?
Mike: ⬢ Last year before R/Pharma, we did a poll & found that 1,500+ colleagues had downloaded R. I wanted to service & build up that community to find out what other people are doing and share that. (2:45)
⬢ We’re a very decentralized disparate team, so there are subject matter experts (SMEs) throughout the organization. The Center of Excellence is focused on building connections between SMEs and helping the teams where there isn’t an SME available.
⬢ What we saw was that it’s hard to sometimes get an effective strategy across people in such a big company. We also saw that there were other places within the organization that wanted data science work but they didn’t have an R subject matter expert there. We want to be able to help them solve their problems and set them up with a proof of concept that they can tweak.
33:52 -
Ok so how to do this?
⬢ Find out how many people are using the tools and who you could help.
⬢ Be that translator role between the business people who need solutions with the technical side - folks who are building things.
Communicate the value:
⬢ We may have a bunch of people trying to write the same function or access the same data. We could solve this problem once and then make that into a package and serve that out to everybody and streamline their workflow for the future.
⬢ There’s a benefit in being able to solve problems strategically. We’re trying to build the lego pieces so that the next time we see a problem like this, we can use that. We can also offer this as a package or via something that allows other people to solve that problem for themselves.
Talk to someone who has experience in this, other community builders
⬢ Doug Robinson helped start this at Pfizer because he had set-up something like this at Novartis before as well. Talking with someone who has done this before is really helpful because they have the experience of : who do we need to tell, what do we need to tell them, what’s our purpose for being, who do you have to speak to and convince. That has to be ready to go.
Find a champion in leadership:
⬢ We went to the head of Statistical programming and said we’d like to do something like this. Fortunately, she was 110% supportive here.
How did they phrase this CoE at Pfizer?
⬢ Check out this description from the job post: https://lnkd.in/g776nYVF
Resources shared: Ethan shared: I saw on RStudio blog the other day the {sassy} system for SAS programmer transitioning to R: https://sassy.r-sassy.org/index.html Tatsu shared: For folks that have RStudio Connect and Tableau, there’s now a supported integration https://www.rstudio.com/blog/dynamic-r-and-python-models-in-tableau-using-plumbertableau/ Tatsu shared the Working with IT section of the champion site: https://www.rstudio.com/champion/working-with-it Mike’s Bandcamp: https://mikeksmith.bandcamp.com/ R Consortium Pharma Working Groups: https://www.r-consortium.org/projects/isc-working-groups R in Pharma Conference: https://rinpharma.com/ Upcoming Pharma meetup with Merck: https://youtu.be/RBVqKi3FV30
Question about style guides: Jesus shared: Tidyverse Style Guide: https://style.tidyverse.org/ Jesus shared: One guide overall guide on better clean R code is the contributing.md of the ggplot2 package: https://github.com/tidyverse/ggplot2/blob/main/CONTRIBUTING.md Sam shared: Efficient R Programming book that Colin wrote: https://csgillespie.github.io/efficientR/
Where to find more?
► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu ► Data Science Hangout site: rstudio.com/data-science-hangout ► Add the Data Science Hangout to your calendar: rstd.io/datasciencehangout
Follow Us Here: Website: https://www.rstudio.com LinkedIn:https://www.linkedin.com/company/rstudio Twitter: https://twitter.com/rstudio
Data Science Hangout | Joe Gibson, de Beaumont Foundation | Collaboration Across a Team
We were joined by Joe Gibson, Senior Project Director at de Beaumont Foundation.
At the hangout, there was a great conversation on building a code archive database/using snippets ️
38:16 - Sharing a bit of the conversation below:
Stephanie asked: How do you build a code archive database? I’ve worked places with Word documents and need something more user-friendly.
Joe: Each time that we create a work product, we log into our database - it creates a new ID for that project and we add some basic information then create a folder for it. We then store it in our structure we have in our network to make it easier for people to find things. In addition to having the library of all our code, we have some folders that have handy code snippets.
Steve: We’re developing an internal package for getting data and doing common tasks is a more standard way, which is nice. Nothing too fancy but it streamlines things.
Ethan: If you’re using GitHub, Gist is also a great way to share snippets of code. I have a snippet that set up the header/documentation structure for a script. On of the first bits is library(tidyverse)
Mike asked: Has anyone developed R snippets and distributed them across a team? I don’t know if people are familiar with the snippets within RStudio but they are cool because you can use template frameworks and it jumps you to the next thing you need to tailor for your own situation. It’s essentially a function.
Tatsu shared: https://lnkd.in/gwWCB3T2
Javier: These are super helpful, Mike. I recently learned about them and was shocked. I have all kinds of snippets for myself and my team now.
Jordan: I’ve saved snippets inside a core package and have a function that updates your RStudio snippets. Saved a snippet update gist here: https://lnkd.in/gEcmNViN requires a snippets folder in your package/inst. You can have this .onLoad() if you’re feeling lucky
Resources shared: ◘ Rachael shared the new champion site: https://lnkd.in/gaHt_8Br ◘ Jen & Michelle shared the National Syndromic Surveillance Program Community of Practice: https://lnkd.in/gHE3C94s ◘ Joe shared Harold’s GitHub for NSSP projects: https://lnkd.in/gFRsezTM ◘ Joe mentioned, Mozilla Foundation to ensure the internet remains a public resource that is open and accessible to us all. https://lnkd.in/gbvdQXwH ◘ Rachael shared AI Inclusive https://lnkd.in/gt8cQUUX ◘ Cris shared Fairlearn to improve fairness of AI systems:: https://fairlearn.org/ ◘ Angela shared openscapes, helping teams develop collaborative practices that are more reproducible, transparent, inclusive, and kind: https://lnkd.in/gs-6_-ZA
Where to find more?
► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu ► Data Science Hangout site: rstudio.com/data-science-hangout ► Add the Data Science Hangout to your calendar: rstd.io/datasciencehangout
Follow Us Here: Website: https://www.rstudio.com LinkedIn:https://www.linkedin.com/company/rstudio Twitter: https://twitter.com/rstudio
Rich Iannone || {gt} Intendo Game Data Project Walkthrough || RStudio
00:00 Introduction 00:11 Setting up our environment 01:21 Importing data 01:56 Data preparation using the tidyverse 14:12 Basic gt table 16:25 Specifying row order with row_group_order() 17:20 Formatting currency with fmt_currency() 18:10 Formatting missing values with fmt_missing() 18:55 Creating row groups with tab_options() 19:50 Relabel column names with cols_label() 20:41 Creating tab spanners with tab_spanner() 23:00 Creating a table title and subtitle with tab_header() 24:40 Aligning table title and subtitle with opt_align_table_header() 25:16 Creating a stubhead label with tab_stubhead() 26:00 Format all table cell text using tab_style() 27:25 Automatically format data color based on value using data_color() 30:45 Creating Markdown-friendly source notes using tab_source_note() 32:45 Creating Markdown-friendly footnotes using tab_footnote() 39:28 Adjust table column width using cols_width() 40:55 Adjust cell padding using opt_horizontal_padding() and opt_vertical_padding() 42:22 Change row group headers using tab_style() 43:40 Convert all table text to small caps using opt_all_caps() 43:58 Change all table text font using opt_table_font() 44:28 Changing table, table heading, footnotes, and source notes background color using tab_options() 46:41 Add a table “cap” at the top and bottom using table.border.top.width() and table.border.bottom.width() 47:23 Use multiline formatting with footnotes using footnotes.multiline() 47:34 Change line style using table_body.hlines.style() 47:55 Change table title and subtitle font sizes using heading.title.font.size() and heading.subtitle.font.size() 48:11 Checking out our final table!
Code to recreate the table from the video: https://github.com/kierisi/rstudio_videos/blob/main/gt/rich-intendo-project-walkthrough/intendo-30032022.R
Learn more about the gt package here: https://gt.rstudio.com/
Got questions? The RStudio Community site is a great place to get assistance: https://community.rstudio.com/
Content: Rich Iannone (@riannone) Motion design and editing: Jesse Mostipak (@kierisi) Music: Nu Fornacis by Blue Dot Sessions https://app.sessions.blue/browse/track/98983

George Mount | R for Excel Users - First Steps | RStudio Meetup
Abstract: Excel’s built-in programming language has served as an entry point to coding for many. If you’re a data analyst steeped in Excel, chances are you could also benefit from learning R for projects of increased scope and complexity.
This presentation serves as a hands-on introduction to R for Excel users:
How R differs from Excel as an open source software tool How to translate common Excel concepts such as cells, ranges, and tables to R equivalents Example use cases that you can take and apply to your own work How to enhance Excel and Power BI with R By the end of this presentation, you will have a clear path forward for building repeatable processes, compelling visualizations, and robust data analyses in R.
Speaker Bio: George Mount is the founder of Stringfest Analytics, a consulting firm specializing in analytics education and upskilling. He has worked with leading bootcamps, learning platforms and practice organizations to help individuals excel at analytics. George regularly blogs and speaks on data analysis, data education and workforce development. He is the author of Advancing into Analytics: From Excel to Python and R (O’Reilly).
Link to George’s white paper “Five things Excel users should know about R”https://stringfestanalytics.com/five-things-r-excel/
Working group sign-up for those interested!
Within many organizations Microsoft Excel is a preferred tool for working with data for non data analytics users. In order to build a data driven organization, source data and analytical models must be accessible to all data users (technical and non-technical) within their preferred tool. Let’s rally the R community to welcome Excel users into our data driven culture by building an Excel add-on to access data and models available within RStudio. If you’re interested in continuing this conversation and joining a working group, let us know! rstd.io/excel-r-community
Links shared at the meetup! George’s GitHub/ Presentation Resources: https://github.com/stringfestdata/rstudio-mar-2022
Packages? Where to find them & recommendations:
CRAN Task Views: https://cran.r-project.org/web/views/
Mark shared: for folks who primarily use excel to present formatted tables, the gt package is a great way to start doing this programmatically in R: https://gt.rstudio.com/
Ivan shared: In addition to regular Google, I’d recommend https://rseek.org/
, given that the character ‘R’ is sometimes not search friendly :)
Jeff shared: Fpp2 is great for forecasting and time series analysis - https://otexts.com/fpp2/
Floris shared: https://otexts.com/fpp3/
Ivan shared: If you’re into tidyverse, there’s an equivalent for time-series: https://tidyverts.org/
George shared: https://dplyr.tidyverse.org/
Ryan shared: This can be a helpful package for dynamically editing tables, like in excel https://github.com/DillonHammill/DataEditR
Ryan shared: This is a great package for making and learning ggplot visualizations: https://cran.r-project.org/web/packages/esquisse/vignettes/get-started.html
Other resources: Monaly shared: There is a R help group: r-help@r-project.org George shared: Helpful book/site on statistics: https://moderndive.com/ Ryan shared:Harvard has a good online source (free options) that has a number of classes, the following for stats: https://www.edx.org/professional-certificate/harvardx-data-science George shared: R for Data Science free book: https://r4ds.had.co.nz/ Fernando shared: big book of R https://www.bigbookofr.com/index.html Floris shared: Advanced R Book: https://adv-r.hadley.nz/ Pedro shared: The R for Data Science Slack channel is a great learning resource! r4ds.io/join (we just made a channel there called #chat-excel_to_r Ivan shared: For teams who are deeply entrenched in Excel (like my old team), this tool may be useful - https://bert-toolkit.com/ . It allows running R code in .xls, so you can learn R while doing .xls :)
Re: Glossary of terms: Ivan shared: inner_join() is like VLOOKUP in .xls. Dan shared: Here’s one cheat sheet (glossary of Excel to R) that I just found; https://paulvanderlaken.com/2018/07/31/transitioning-from-excel-to-r-dictionary-of-common-functions/
Extra Meetup Links Feedback: rstd.io/meetup-feedback Talk submission: rstd.io/meetup-speaker-form If you’d like to find out about upcoming events you can also add this calendar: rstd.io/community-events RStudio conference/submit a talk: https://www.rstudio.com/conference/ Recordings of all meetups: https://www.youtube.com/playlist?list=PL9HYL-VRX0oRKK9ByULWulAOO5jN70eXv
Building R packages with devtools and usethis | RStudio
Package building doesn’t have to be scary! The tidyverse team has made it easy to get started with RStudio and the devtools/usethis packages. This hour long presentation will walk you through the basics of R package building, and hopefully leave you prepared to go out and build your own package!
Slides: https://colorado.rstudio.com/rsc/pkg-building/ Source Code: https://github.com/jthomasmock/pkg-building
devtools: https://devtools.r-lib.org/ usethis: https://usethis.r-lib.org/ R Packages book: https://r-pkgs.org/index.html
Jenny Bryan | Help me help you: creating reproducible examples | RStudio (2018)
What is a reprex? It’s a reproducible example. Making a great reprex is both an art and a science and this webinar will cover both aspects. A reprex makes a conversation about code more efficient and pleasant for all. This comes up whenever you ask someone for help, report a bug in software, or propose a new feature. The reprex package (https://reprex.Tidyverse.org ) makes it especially easy to prepare R code as a reprex, in order to share on sites such as https://community.rstudio.com , https://github.com , or https://stackoverflow.com . The habit of making little, rigorous, self-contained examples also has the great side effect of making you think more clearly about your programming problems.
Webinar materials: https://rstudio.com/resources/webinars/help-me-help-you-creating-reproducible-examples/
About Jenny: Jenny is a software engineer on the tidyverse team. She is a recovering biostatistician who takes special delight in eliminating the small agonies of data analysis. Jenny is known for smoothing the interfaces between R and spreadsheets, web APIs, and Git/GitHub. She’s been working in R/S for over 20 years and is a member of the R Foundation. She also serves in the leadership of rOpenSci and Forwards and is an adjunct professor at the University of British Columbia

Tom Mock | A Gentle Introduction to Tidy Statistics in R | RStudio (2019)
R is a fantastic language for statistical programming, but making the jump from point and click interfaces to code can be intimidating for individuals new to R. In this webinar I will gently cover how to get started quickly with the basics of research statistics in R, providing an emphasis on reading data into R, exploratory data analysis with the Tidyverse, statistical testing with ANOVAs, and finally producing a publication-ready plot in ggplot2.
Use the code presented instantly on RStudio Cloud!
RStudio Cloud: rstudio.cloud Webinar materials: https://rstudio.com/resources/webinars/a-gentle-introduction-to-tidy-statistics-in-r/
About Thomas: Thomas is involved in the local and global data science community, serving as Outreach Coordinator for the Dallas R User Group, as a mentor for the R for Data Science Online Learning Community, as co-founder of #TidyTuesday, attending various Data Science and R-related conferences/meetups, and participated in Startup Weekend Fort Worth as a data scientist/entrepreneur
Hadley Wickham | testthat 3.0.0 | RStudio (2020)
In this webinar, I’ll introduce some of the major changes coming in testthat 3.0.0. The biggest new idea in testthat 3.0.0 is the idea of an edition. You must deliberately choose to use the 3rd edition, which allows us to make breaking changes without breaking old packages. testthat 3e deprecates a number of older functions that we no longer believe are a good idea, and tweaks the behaviour of expect_equal() and expect_identical() to give considerably more informative output (using the new waldo package).
testthat 3e also introduces the idea of snapshot tests which record expected value in external files, rather than in code. This makes them particularly well suited to testing user output and complex objects. I’ll show off the main advantages of snapshot testing, and why it’s better than our previous approaches of verify_output() and expect_known_output().
Finally, I’ll go over a bunch of smaller quality-of-life improvements, including tweaks to test reporting and improvements to expect_error(), expect_warning() and expect_message().
Webinar materials: https://rstudio.com/resources/webinars/testthat-3/
About Hadley: Hadley Wickham is the Chief Scientist at RStudio, a member of the R Foundation, and Adjunct Professor at Stanford University and the University of Auckland. He builds tools (both computational and cognitive) to make data science easier, faster, and more fun. You may be familiar with his packages for data science (the tidyverse: including ggplot2, dplyr, tidyr, purrr, and readr) and principled software development (roxygen2, testthat, devtools, pkgdown). Much of the material for the course is drawn from two of his existing books, Advanced R and R Packages, but the course also includes a lot of new material that will eventually become a book called “Tidy tools”

Roche & Novartis: Effective Visualizations for Data Driven Decisions || Posit (2020)
Effective visual communication is a core task for all data scientists including statisticians, epidemiologists, machine learning experts, bioinformaticians, etc.
By using the right graphical principles, we can better understand data, highlight core insights and influence decisions toward appropriate actions. Without it, we can fool ourselves and others and pave the way to wrong conclusions and actions. While numerous solutions exist to analyze data, these often require many manual steps to convert them into visually convincing and meaningful reports. How do we put this in practice in an accurate, transparent and reproducible way?
In this webinar we will introduce an open collaborative effort, currently undertaken by Roche and Novartis, to develop solutions for effective visual communication with a focus on reporting medical and clinical data. The aim of the collaboration is to develop a user-friendly, fit for purpose, open source package to simplify the use of good graphical principles for effective visual communication of typical analyses of interventional and observational data encountered in clinical drug development. We will introduce the initial visR package design which easily integrates into a typical tidyverse workflow. The package provides guidance and meaningful default parameters covering all aspects from the design, implementation and review of statistical graphics.
Webinar materials: https://posit.co/resources/videos/effective-visualizations-for-data-driven-decisions/
About Charlotta: Charlotta is a computational biologist by training and works as a data scientist in the Personalized Healthcare department at Roche where she uses R to untap the wealth of information coming from healthcare data collected in real-world settings to support the development of new medicines.
About Diego: Diego is a data scientist specializing in applied machine learning at Roche Personalized Healthcare since March 2019. He has developed models to perform various tasks and analyze diverse data sources. Currently, his main applications of interest are in onocology and clinico-genomics.
About Mark: Mark is a methodologist supporting the clinical development and analytics department at Novartis. He has a focus on data visualization working on a number of internal and external initiatives to improve the reporting of clinical trials and observational studies.
About Marc: Marc is a biostatistics group head at Novartis. He is interested in advancing the methods and practice of clinical development, for instance through effective use of graphics. https://graphicsprinciples.github.io/
Matt Thomas & Mike Page | How the Tidyverse helped the British Red Cross respond to COVID | RStudio
Full title: Cognitive speed: How the Tidyverse helped the British Red Cross respond quickly to COVID-19
We will discuss the importance of cognitive speed, defined here as the rate in which an idea can be translated into code, and why the Tidyverse excels in this domain. We will demonstrate this idea in relation to a suite of tools we were required to rapidly develop at the British Red Cross in order to respond effectively to the COVID-19 pandemic. To do this, we will exhibit how elements of the unifying design principles outlined in the ‘tidyverse design guide - Tidyverse team’ relate to the notion of cognitive speed, giving specific examples for various design considerations. We believe this talk will encourage reflection on better design practices for future R developers, using the design principles of the tidyverse as the guiding beacon.
About Matt: Dr. Matt Thomas is Head of Strategic Insight and Foresight at the British Red Cross. Matt’s team aims to help the Red Cross become more anticipatory and proactive by producing insights and tools including the Vulnerability Index (https://britishredcrosssociety.github.io/covid-19-vulnerability/ ) and Resilience Index (https://britishredcross.shinyapps.io/resilience-index/) . He holds a PhD in Evolutionary Anthropology and, prior to joining the British Red Cross, was researching topics including reindeer herders in the Arctic, hunter-gatherers in the Philippines, and witches in China. Outside of work, Matt writes a column for an anthropology magazine (https://www.sapiens.org/column/machinations/ ) as well as fiction.
About Mike: Mike Page is a data scientist on the Strategic Insight and Foresight team at the British Red Cross. Here, he helps to develop a suite of open source tools and dashboards including the Vulnerability Index (https://britishredcrosssociety.github.io/covid-19-vulnerability/ ) and Resilience Index (https://britishredcross.shinyapps.io/resilience-index/) . Mike is also the author of several R packages including mortyr and newsrivr. In his spare time you can find him rock climbing around the Alps
Megan Beckett | Aesthetically automated figure production | RStudio
Automation, reproducibility, data driven. These are not normally concepts one would associate with the traditional publishing industry, where designers normally manually produce every artefact in proprietary software. And, when you have 1000s of figures to produce and update for a single textbook, this becomes an insurmountable task, meaning our textbooks quickly become outdated, especially in our rapidly advancing world.
With R and the tidyverse in our back pocket, we rose to the challenge to revolutionize this workflow. I will explain how we collaborated with a publishing group to develop a system to aesthetically automate the production of figures for a textbook including translations into several languages.
I think you’ll find this talk interesting as it shows how we applied tools that are familiar to us, but in an unconventional way to fundamentally transform a conventional process.
About Megan: Megan Beckett is a Data Scientist at Exegetic Analytics, where she consults, develops and leads several analytical projects across a wide range of fields and industries. “Scientifically creative; creatively scientific.” This aptly describes her philosophy and approach in her work and life. Megan helped co-found and organises the Cape Town R-Ladies chapter and is a co-organiser of the satRday events in South Africa. She loves to paint, with her most recent work exploring the biodiversity of southern Africa , and running is her passion, whether on the road or the trail
Shelmith Kariuki | rKenyaCensus Package | RStudio
The rKenyaCensus package contains the results of the 2019 Kenya Population Census. The census exercise was carried out in August 2019, and the results were released in February 2020. Kenya leveraged on technology to capture data during cartographic mapping, enumeration and data transmission, making the 2019 Census the first paperless census to be conducted in Kenya. The data was published in four different pdf files (Volume 1 - Volume 4) which can be found in the Kenya National Bureau of statistics website. The data in its current form was open and accessible, but not usable and so there was need to convert it into a machine readable format. This data can be used by the government, non-governmental organizations and any other entities for data driven policy making and development. During the talk, I will explain the reasons behind development of the package, take you through the steps I took during the process and finally showcase analysis of certain aspects of the data.
About Shelmith: Shelmith Kariuki is a Senior Data Analyst based in Nairobi, Kenya. She is an RStudio Certified Tidyverse trainer (https://education.rstudio.com/trainers/) , currently working as a Data Analytics consultant with UN DESA. She previously worked as a Research Manager at Geopoll, and as a Data Analyst at Busara Center for Behavioral Economics. She also worked as an assistant lecturer in various Kenyan universities, teaching units in Statistics and Actuarial Science. She has extensive experience in data analysis using R. She co-organizes a community of R users in Nairobi (https://www.linkedin.com/feed/hashtag/nairobir/ ) and in Africa (https://twitter.com/AfricaRUsers) . One of the missions of her community work is to make sure that there is an increased number of R adopters, in Africa. She is very passionate about training and using data analytics to drive development projects in Africa
Ahmadou Dicko | Humanitarian Data Science with R | RStudio
Humanitarian actors are increasingly using data to drive their decisions. Since the Haiti 2010 earthquake, the volume of data collected and used by humanitarians has been growing exponentially and organizations are now relying on data specialists to turn all this data into life-saving data products.
These data products are created by teams using proprietary point and click software. The process from the raw data to the final data product involves a lot of clicking, copying and pasting and is usually not reproducible.
Another approach to humanitarian data science is possible using R. In this talk, I will show how to seamlessly develop reproducible, reusable humanitarian data products using the tidyverse, rmarkdown and some domain-focused R packages.
About Ahmadou: Ahmadou Dicko is a statistics and data analysis officer at the United Nations High Commissioner for Refugees (UNHCR) where he uses statistics and data science to help safeguard the rights and well-being of refugees in West and Central Africa. He has an extensive experience in the use of statistics and data science in development and humanitarian projects. Ahmadou was the lead of the OCHA Center for Humanitarian Data team for West and Central Africa and has worked with several humanitarian and development organizations such as IFRC, FAO, IAEA, OCHA. Ahmadou is a RStudio trainer (https://education.rstudio.com/trainers/ ) and he is passionate about the R community. He is currently co-organizing the Dakar R User Group (https://www.meetup.com/DakaR-R-User-Group/ ) and co-leading the AfricaR initiative (https://africa-r.org/ )
Tracy Teal | Teaching R using inclusive pedagogy: Carpentries workshops lessons learned | RStudio
Talk from rstudio::conf(2019)
The Carpentries is an open, global community teaching researchers the skills to turn data into knowledge. Since 2012 we have taught 700+ R workshops & trained 1600+ volunteer instructors. Our workshops use evidence-based teaching, focus on foundational and relevant skills and create an inclusive environment. Teaching the Tidyverse allows learners to start working with data quickly, and keeps them motivated to begin and sustain their learning. Our assessment show that these approaches have been successful in attracting diverse learners, building confidence & increasing coding usage. Through our train-the-trainer model and open, collaborative lessons, this approach scales globally to reach more learners and further democratize data.
About Tracy Teal: Executive Director of The Carpentries (https://carpentries.org ) and co-founder of Data Carpentry (http://www.datacarpentry.org ), a non-profit organization that develops and runs workshops training researchers in effective data analysis and visualization to enable data-driven discovery. Manages projects, operations and finances. Leads lesson development and volunteer coordination and is responsible for strategic and business planning
Amelia McNamara | Working with categorical data in R without losing your mind | RStudio (2019)
Categorical data, called “factor” data in R, presents unique challenges in data wrangling. R users often look down at tools like Excel for automatically coercing variables to incorrect datatypes, but factor data in R can produce very similar issues. The stringsAsFactors=HELLNO movement and standard tidyverse defaults have moved us away from the use of factors, but they are sometimes still necessary for analysis. This talk will outline common problems arising from categorical variable transformations in R, and show strategies to avoid them, using both base R and the tidyverse (particularly, dplyr and forcats functions).
VIEW MATERIALS http://www.amelia.mn/WranglingCats.pdf
(related paper from the DSS collection) http://bitly.com/WranglingCats https://peerj.com/collections/50-practicaldatascistats/
About the Author Amelia McNamara My work is focused on creating better tools for novices to use for data analysis. I have a theory about what the future of statistical programming should look like, and am working on next steps toward those tools. For more on that, see my dissertation. My research interests include statistics education, statistical computing, data visualization, and spatial statistics. At the moment, I am very interested in the effects of parameter choices on data analysis, particularly data visualizations. My collaborator Aran Lunzer and I have produced an interactive essay on histograms, and an initial foray into the effects of spatial aggregation. I talked more about spatial aggregation in my 2017 OpenVisConf talk, How Spatial Polygons Shape Our World
Jim Hester | It depends: A dialog about dependencies | RStudio (2019)
Software dependencies can often be a double-edged sword. On one hand, they let you take advantage of others’ work, giving your software marvelous new features and reducing bugs. On the other hand, they can change, causing your software to break unexpectedly and increasing your maintenance burden. These problems occur everywhere, in R scripts, R packages, Shiny applications and deployed ML pipelines. So when should you take a dependency and when should you avoid them? Well, it depends! This talk will show ways to weigh the pros and cons of a given dependency and provide tools for calculating the weights for your project. It will also provide strategies for dealing with dependency changes, and if needed, removing them. We will demonstrate these techniques with some real-life cases from packages in the tidyverse and r-lib.
VIEW MATERIALS https://speakerdeck.com/jimhester/it-depends
About the Author Jim Hester Jim is a software engineer at RStudio working with Hadley to build better tools for data science. He is the author of a number of R packages including lintr and covr, tools to provide code linting and test coverage for R
Jenny Bryan | Lazy evaluation | RStudio (2019)
The “tidy eval” framework is implemented in the rlang package and is rolling out in packages across the tidyverse and beyond. There is a lively conversation these days, as people come to terms with tidy eval and share their struggles and successes with the community. Why is this such a big deal? For starters, never before have so many people engaged with R’s lazy evaluation model and been encouraged and/or required to manipulate it. I’ll cover some background fundamentals that provide the rationale for tidy eval and that equip you to get the most from other talks.
VIEW MATERIALS https://github.com/jennybc/tidy-eval-context#readme
About the Author Jenny Bryan Jenny is a recovering biostatistician who takes special delight in eliminating the small agonies of data analysis. She’s part of Hadley’s team, working on R packages and integrating them into fluid workflows. She’s been working in R/S for over 20 years, serves in the leadership of rOpenSci and Forwards, and is an Ordinary Member of the R Foundation. Jenny is an Associate Professor of Statistics (on leave) at the University of British Columbia, where she created the course STAT 545

Jesse Sadler | Learning and using the tidyverse for historical research | RStudio (2019)
My talk will discuss how R, the tidyverse, and the community around R helped me to learn to code and create my first R package. My positive experiences with the resources for learning R and the community itself led me to create a blog detailing my experiences with R as a way to pass along the knowledge that I gained. The next step was to develop my first package. The debkeepr package integrates non-decimal monetary systems of pounds, shillings, and pence into R, making it possible to accurately analyze and visualize historical account books. It is my hope that debkeepr can help bring to light crucial and interesting social interactions that are buried in economic manuscripts, making these stories accessible to a wider audience.
VIEW MATERIALS https://github.com/jessesadler/rstudioconf-2019-slides
About the Author Jesse Sadler I am an early modern historian interested in the social and familial basis of politics, religion, and trade. I received a Ph.D. in European History from UCLA in 2015 and have taught courses on cultural and intellectual history of early modern Europe and the Atlantic. My research investigates the familial basis of the early modern capitalism through archival research on two mercantile families from Antwerp at the end of the sixteenth and beginning of the seventeenth century. I am currently working on a manuscript that argues for the significance of sibling relationships and inheritance in the development of early modern trade. My manuscript places concepts such as patriarchy, emotion, exile, and friendship at the heart of the efficacy of long-distance trade networks and the growth of capitalism
Edzer Pebesma | Spatial data science in the Tidyverse | RStudio (2019)
Package sf (simple feature) and ggplot2::geom_sf have caused a fast uptake of tidy spatial data analysis by data scientists. Important spatial data science challenges are not handled by them, including raster and vector data cubes (e.g. socio-economic time series, satellite imagery, weather forecast or climate predictions data), and out-of-memory datasets. Powerful methods to analyse such datasets have been developed in packages stars (spatiotemporal tidy arrays) and tidync (tidy analysis of NetCDF files). This talk discusses how the simple feature and tidy data frameworks are extended to handle these challenging data types, and shows how R can be used for out-of-memory spatial and spatiotemporal datasets using tidy concepts.
VIEW MATERIALS https://edzer.github.io/rstudio_conf/2019/index.html
About the Author Edzer Pebesma I lead the spatio-temporal modelling laboratory at the institute for geoinformatics. I hold a PhD in geosciences, and am interested in spatial statistics, environmental modelling, geoinformatics and GI Science, semantic technology for spatial analysis, optimizing environmental monitoring, but also in e-Science and reproducible research. I am an ordinary member of the R foundation. I am one of the authors of Applied Spatial Data Analysis with R (second edition), am Co-Editor-in-Chief for the Journal of Statistical Software, and associate editor for Spatial Statistics. I believe that research is useful in particular when it helps solving real-world problems
Irene Steves | Teaching data science with puzzles | RStudio (2019)
Of the many coding puzzles on the web, few focus on the programming skills needed for handling untidy data. During my summer internship at RStudio, I worked with Jenny Bryan to develop a series of data science puzzles known as the “Tidies of March.” These puzzles isolate data wrangling tasks into bite-sized pieces to nurture core data science skills such as importing, reshaping, and summarizing data. We also provide access to puzzles and puzzle data directly in R through an accompanying Tidies of March package. I will show how this package models best practices for both data wrangling and project management.
VIEW MATERIALS https://github.com/isteves/ds-puzzles
About the Author Irene Steves This summer I was an intern at RStudio, where I worked with Jenny Bryan to develop a series of coding challenges to cultivate and reward the mastery of R and the tidyverse. I was previously a Data Science Fellow at the National Center for Ecological Analysis and Synthesis (NCEAS), where I reviewed data submissions to a national repository for completion, clarity, and data management best practices. As a fellow, I also collaborated on a number of open science projects to improve access to Ecological Metadata Language (EML) and datasets in the DataONE network (see metajam, dataspice)

Reproducible Examples with the reprex package
Reproducible Examples and the reprex package.
https://speakerdeck.com/jennybc/reprex-reproducible-examples-with-r
Jump to: 0:08 Intro 0:40 Basic usage of reprex 3:35 Motivation, why use reprex? “Help me help you”
4:08 Define reprex?
Three commons ways to use the term.
- noun, a reproducible example
- the reprex package. a tool to build R
reprexs - reprex::reprex(), a function in
reprexto make a reprex.
5:26 When should you use a reprex?
6:14 reprex installation and setup. How do you actually get repex on your machine? 7:59 Advanced setup and discussion. 9:45 Please use advanced features responsibly.
11:02 Why does the reprex package exist? Anyone who has helped teach R or dealt with github issues, twitter, stack overflow & RStudio community questions knows that helping people diagnose their coding problems can be hard. This tool comes from hard-won experience. It’s aim to is help people ask well formed questions and increase the chances of getting well formed answers quickly.
12:52 philosophy behind reprex
- code that I can run
- code that I don’t need to run
- code that I can easily run
13:52 code that I can run.
17:25 Tips on writing good reprexs. Dos and don’ts.
18:52 How do I get my data into my reprex?
Getting small data and CSV type data into your reprex is easy.
“I have a big hairy data object and I can only show their problem by using it”, but that’s not always the case.
21:02 code that I don’t need to run reprex gives your reader the code and reveals the output being produced by that code. For experienced coders, that might be enough to help you.
22:44 code that I can easily run Don’t copy and paste from the R console. This is usually annoying for your reader. Worse than console copy-pasta is the screenshot. (Many people think screenshots of code are downright offensive.)
25:03 reprex_clean If you copy someone else’s reprex into your consolve, it may include their output, making your new reprex a untidy. Here are tips for taking someone else’s reprex code and output, and create a clean reprex reply.
25:54 shock and awe More interesting features of the reprex package.
- 26:29 What about figures and plots in your reprex? So happy you asked about that. reprex will automatically upload your images to imgur.com.
- 28:23 Create a reprex by explicitly providing your code in the reprex call.
- 29:00 when you need your reprex to work in the current working directory.
- 30:45 Differently flavored markdown. Optimize your reprex markdown output for github, stack overflow, or the RStudio community.
- 30:31 Make your reprex create an R script, with your reprex outputs as comments. This is handy for pasting into an email or slack-type-app.
- 32:25 Rich text format, rtf output. (currently experimental feature as of this video)
- 33:06 supress the reprex add at the bottom of your reprex
- 33:19 Include session info.
- 33:54 Auto styling of your code. Good if you’re dealing with poorly formatting code.
- 34:25 Change your comments string.
- 34:32 Silence Tidyverse startup messages.
- 35:00 Capture a reprex that sends messages to standard output and standard input (e.g. package installation compilation messages).
36:13 Set up personal defaults for your reprex usage.
36:54 reprex RStudio addins; render reprex and reprex selection. These accelerate your use of reprex.
39:01 The human side of reproducible examples. How to ask questions in ways that are most likely to get answered. Sorry for the tough love, but this is important. Why are you always asked to give a reprex?
- Experts try to use reproducible examples to ensure their advice works.
- Making a good reprex is hard. But, you are asking them to solve a problem for you, so meet them halfway.
- Creating reprexes is good coding practice.
- Making a good reprex is often a good way to debug your issue in the embarrassment-free privacy of your own home.
- reprexes lead to discussions more likely to help people in the future.
44:34 Behind the scenes of reprex
44:44 Thanks for those that helped make reprex possible.
Questions and Answers
- 46:05 can reprex capture variables and objects in the current environment? (not yet, maybe in development)
- 47:25 does reprex actually check that the code is self contained? (self contained)
- 48:08 does readr::read_csv support the text argument? (yep, just read the help manual for readr)
Shiny Train-the-Trainer Workshop - rstudio::conf(2019L)
What is the 2-day Shiny Train-the-Trainer Workshop? That’s a great question, I’m glad you asked.
Register at https://rstd.io/conf Learn more at https://rstd.io/conf-agenda
Shiny Train-the-Trainer Certification Workshop - 2 Day
- Day 1 of the course will be co-taught by Mine Cetinkaya-Rundel and Garrett Grolemund, RStudio Data Scientists and Professional Educators.
- On Day 2, Mine will teach the Shiny track and Garrett will teach the Tidyverse track.
This two-day workshop will equip you to teach R effectively. We will draw on RStudio’s experience teaching R to recommend tips for designing, teaching, and supporting short R courses.
On Day 1 of the course, you will learn practical activities that you can use immediately to improve your presentation style, learning outcomes, and student engagement. You will leave the class with a cognitive model of learning that you can use to develop your own effective workshops or courses within your organization. The course will also cover how to use RStudio Cloud and its curriculum of tutorials to jump-start your own lessons.
On Day 2 of the course, participants will have the option to choose one of two tracks: Teaching the Tidyverse or Teaching Shiny.
- Teaching Shiny: Classroom examples will focus on teaching Shiny at the beginner and intermediate levels. The course materials will build on RStudio’s Mastering Shiny workshop as well as the upcoming book from the author of the Shiny package, Joe Cheng, and they will cover the entire lifecycle of a Shiny app: build ️ improving ️ share. Participants will receive the course materials for teaching Mastering Shiny. You should take this workshop if you work as a training partner and want to qualify as an RStudio Certified Shiny Instructor or if you are an advocate for R in your organization. You should be proficient in Shiny already and be prepared to submit examples of your work. Prior teaching experience is helpful, but not required. Please bring a laptop and a device that has video recording capabilities (such as a laptop or cell phone).
Instructors: Garrett Grolemund, Mine Çetinkaya-Rundel


Tidyverse Train-the-Trainer Certification Workshop - rstudio::conf(2019L)
What is the 2-day Tidyverse Train-the-Trainer Workshop? That’s a great question, I’m glad you asked.
Register at https://rstd.io/conf Learn more at https://rstd.io/conf-agenda
Tidyverse Train-the-Trainer Certification Workshop - 2 Days
- Day 1 of the course will be co-taught by Mine Cetinkaya-Rundel and Garrett Grolemund, RStudio Data Scientists and Professional Educators.
- On Day 2, Mine will teach the Shiny track and Garrett will teach the Tidyverse track.
This two-day workshop will equip you to teach R effectively. We will draw on RStudio’s experience teaching R to recommend tips for designing, teaching, and supporting short R courses.
On Day 1 of the course, you will learn practical activities that you can use immediately to improve your presentation style, learning outcomes, and student engagement. You will leave the class with a cognitive model of learning that you can use to develop your own effective workshops or courses within your organization. The course will also cover how to use RStudio Cloud and its curriculum of tutorials to jump-start your own lessons.
On Day 2 of the course, participants will have the option to choose one of two tracks: Teaching the Tidyverse or Teaching Shiny.
- Teaching the Tidyverse: Classroom examples will focus on how to teach students to do data analysis with the Tidyverse. We will use Master the Tidyverse, which is an award-winning two-day workshop developed by RStudio, as an example. Participants will receive the course materials for teaching Master the Tidyverse. You should take this workshop if you work for a training partner and want to qualify as an RStudio Certified Tidyverse Instructor or if you are an advocate for R in your organization. You should be proficient in the Tidyverse already and be prepared to submit examples of your work. Prior teaching experience is helpful, but not required. Please bring a laptop and a device that has video recording capabilities (such as a laptop or cell phone).
Instructors: Garrett Grolemund, Mine Çetinkaya-Rundel

Data Manipulation Tools: dplyr – Pt 3 Intro to the Grammar of Data Manipulation with R
Data wrangling is too often the most time-consuming part of data science and applied statistics. Two tidyverse packages, tidyr and dplyr, help make data manipulation tasks easier. Keep your code clean and clear and reduce the cognitive load required for common but often complex data science tasks.
dplyr docs: dplyr.tidyverse.org/reference/
- http://dplyr.tidyverse.org/reference/union.html
- http://dplyr.tidyverse.org/reference/intersect.html
- http://dplyr.tidyverse.org/reference/set_diff.htm
Pt. 1: What is data wrangling? Intro, Motivation, Outline, Setup https://youtu.be/jOd65mR1zfw
- /01:44 Intro and what’s covered Ground Rules
- /02:40 What’s a tibble
- /04:50 Use View
- /05:25 The Pipe operator:
- /07:20 What do I mean by data wrangling?
Pt. 2: Tidy Data and tidyr https://youtu.be/1ELALQlO-yM
- /00:48 Goal 1 Making your data suitable for R
- /01:40
tidyr“Tidy” Data introduced and motivated - /08:10
tidyr::gather - /12:30
tidyr::spread - /15:23
tidyr::unite - /15:23
tidyr::separate
Pt. 3: Data manipulation tools: dplyr https://youtu.be/Zc_ufg4uW4U
- 00.40 setup
- 02:00
dplyr::select - 03:40
dplyr::filter - 05:05
dplyr::mutate - 07:05
dplyr::summarise - 08:30
dplyr::arrange - 09:55 Combining these tools with the pipe (Setup for the Grammar of Data Manipulation)
- 11:45
dplyr::group_by
Pt. 4: Working with Two Datasets: Binds, Set Operations, and Joins https://youtu.be/AuBgYDCg1Cg Combining two datasets together
- /00.42
dplyr::bind_cols - /01:27
dplyr::bind_rows - /01:42 Set operations
dplyr::union,dplyr::intersect,dplyr::set_diff - /02:15 joining data
dplyr::left_join,dplyr::inner_join,dplyr::right_join,dplyr::full_join,
Cheatsheets: https://www.rstudio.com/resources/cheatsheets/
Documentation:
tidyr docs: tidyr.tidyverse.org/reference/
tidyrvignette: https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.htmldplyrdocs: http://dplyr.tidyverse.org/reference/dplyrone-table vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.htmldplyrtwo-table (join operations) vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/two-table.html
Tidy Data and tidyr – Pt 2 Intro to Data Wrangling with R and the Tidyverse
Data wrangling is too often the most time-consuming part of data science and applied statistics. Two tidyverse packages, tidyr and dplyr, help make data manipulation tasks easier. Keep your code clean and clear and reduce the cognitive load required for common but often complex data science tasks.
http://tidyr.tidyverse.org/reference/
- http://tidyr.tidyverse.org/reference/gather
- http://tidyr.tidyverse.org/reference/spread
- http://tidyr.tidyverse.org/reference/unite
- http://tidyr.tidyverse.org/reference/separate
Pt. 1: What is data wrangling? Intro, Motivation, Outline, Setup https://youtu.be/jOd65mR1zfw
- /01:44 Intro and what’s covered Ground Rules
- /02:40 What’s a tibble
- /04:50 Use View
- /05:25 The Pipe operator:
- /07:20 What do I mean by data wrangling?
Pt. 2: Tidy Data and tidyr https://youtu.be/1ELALQlO-yM
- 00:48 Goal 1 Making your data suitable for R
- 01:40
tidyr“Tidy” Data introduced and motivated - 08:10
tidyr::gather - 12:30
tidyr::spread - 15:23
tidyr::unite - 15:23
tidyr::separate
Pt. 3: Data manipulation tools: dplyr https://youtu.be/Zc_ufg4uW4U
- 00.40 setup
- /02:00
dplyr::select - /03:40
dplyr::filter - /05:05
dplyr::mutate - /07:05
dplyr::summarise - /08:30
dplyr::arrange - /09:55 Combining these tools with the pipe (Setup for the Grammar of Data Manipulation)
- /11:45
dplyr::group_by - /15:00
dplyr::group_by
Pt. 4: Working with Two Datasets: Binds, Set Operations, and Joins https://youtu.be/AuBgYDCg1Cg Combining two datasets together
- /00.42
dplyr::bind_cols - /01:27
dplyr::bind_rows - /01:42 Set operations
dplyr::union,dplyr::intersect,dplyr::set_diff - /02:15 joining data
dplyr::left_join,dplyr::inner_join,dplyr::right_join,dplyr::full_join,
Cheatsheets: https://www.rstudio.com/resources/cheatsheets/
Documentation:
tidyr docs: tidyr.tidyverse.org/reference/
tidyrvignette: https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.htmldplyrdocs: http://dplyr.tidyverse.org/reference/dplyrone-table vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.htmldplyrtwo-table (join operations) vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/two-table.html
What is data wrangling? Intro, Motivation, Outline, Setup – Pt. 1 Data Wrangling Introduction
Data wrangling is too often the most time-consuming part of data science and applied statistics. Two tidyverse packages, tidyr and dplyr, help make data manipulation tasks easier. These videos introduce you to these tools. Keep your R code clean and clear and reduce the cognitive load required for common but often complex data science tasks.
Pt. 1: What is data wrangling? Intro, Motivation, Outline, Setup https://youtu.be/jOd65mR1zfw
- 01:44 Intro and what’s covered Ground Rules
- 02:40 What’s a tibble
- 04:50 Use View
- 05:25 The Pipe operator:
- 07:20 What do I mean by data wrangling?
Pt. 2: Tidy Data and tidyr https://youtu.be/1ELALQlO-yM
- /00:48 Goal 1 Making your data suitable for R
- /01:40
tidyr“Tidy” Data introduced and motivated - /08:15
tidyr::gather - /12:38
tidyr::spread - /15:30
tidyr::unite - /15:30
tidyr::separate
Pt. 3: Data manipulation tools: dplyr https://youtu.be/Zc_ufg4uW4U
- 00.40 setup
- /02:00
dplyr::select - /03:40
dplyr::filter - /05:05
dplyr::mutate - /07:05
dplyr::summarise - /08:30
dplyr::arrange - /09:55 Combining these tools with the pipe (Setup for the Grammar of Data Manipulation)
- /11:45
dplyr::group_by - /15:00
dplyr::group_by
Pt. 4: Working with Two Datasets: Binds, Set Operations, and Joins https://youtu.be/AuBgYDCg1Cg Combining two datasets together
- /00.42
dplyr::bind_cols - /01:27
dplyr::bind_rows - /01:42 Set operations
dplyr::union,dplyr::intersect,dplyr::set_diff - /02:15 joining data
dplyr::left_join,dplyr::inner_join,dplyr::right_join,dplyr::full_join,
Cheatsheets: https://www.rstudio.com/resources/cheatsheets/
Documentation:
tidyr docs: tidyr.tidyverse.org/reference/
tidyrvignette: https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.htmldplyrdocs: http://dplyr.tidyverse.org/reference/dplyrone-table vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.htmldplyrtwo-table (join operations) vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/two-table.html
New York Times “For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights”, By STEVE LOHRAUG. 17, 2014 https://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-work.html
Working with Two Datasets: Binds, Set Operations, and Joins – Pt 4 Intro to Data Manipulation
Data wrangling is too often the most time-consuming part of data science and applied statistics. Two tidyverse packages, tidyr and dplyr, help make data manipulation tasks easier. Keep your R code clean and clear and reduce the cognitive load required for common but often complex data science tasks.
dplyr docs: dplyr.tidyverse.org/reference/
Pt. 1: What is data wrangling? Intro, Motivation, Outline, Setup https://youtu.be/jOd65mR1zfw
- /01:44 Intro and what’s covered Ground Rules:
- /02:40 What’s a tibble
- /04:50 Use View
- /05:25 The Pipe operator:
- /07:20 What do I mean by data wrangling?
Pt. 2: Tidy Data and tidyr https://youtu.be/1ELALQlO-yM
- /00:48 Goal 1 Making your data suitable for R
- /01:40
tidyr“Tidy” Data introduced and motivated - /08:10
tidyr::gather - /12:30
tidyr::spread - /15:23
tidyr::unite - /15:23
tidyr::separate
Pt. 3: Data manipulation tools: dplyr https://youtu.be/Zc_ufg4uW4U
- /00.40 setup
- /02:00
dplyr::select - /03:40
dplyr::filter - /05:05
dplyr::mutate - /07:05
dplyr::summarise - /08:30
dplyr::arrange - /09:55 Combining these tools with the pipe (Setup for the Grammar of Data Manipulation)
- /11:45
dplyr::group_by
Pt. 4: Working with Two Datasets: Binds, Set Operations, and Joins https://youtu.be/AuBgYDCg1Cg Combining two datasets together
- 00.42
dplyr::bind_cols - 01:27
dplyr::bind_rows - 01:42 Set operations
dplyr::union,dplyr::intersect,dplyr::set_diff - 02:15 joining data -
dplyr::left_join,dplyr::inner_join, -dplyr::right_join,dplyr::full_join,
Cheatsheets: https://www.rstudio.com/resources/cheatsheets/
Documentation:
tidyr docs: tidyr.tidyverse.org/reference/
tidyrvignette: https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.htmldplyrdocs: http://dplyr.tidyverse.org/reference/dplyrone-table vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.htmldplyrtwo-table (join operations) vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/two-table.html
The Tidyverse and RStudio Connect | RStudio Webinar - 2017
This is a recording of an RStudio webinar. You can subscribe to receive invitations to future webinars at https://www.rstudio.com/resources/webinars/ . We try to host a couple each month with the goal of furthering the R community’s understanding of R and RStudio’s capabilities.
We are always interested in receiving feedback, so please don’t hesitate to comment or reach out with a personal message



