tidyverse

Coding vs. thinking programmatically | Samia Baig | Data Science Hangout

ADD THE DATA SCIENCE HANGOUT TO YOUR CALENDAR HERE: https://pos.it/dsh - All are welcome! We’d love to see you!

This week’s guest was Samia Baig, Senior Data Scientist/Data Engineer at Johnson & Johnson Innovative Medicine!

Some topics covered in this week’s Hangout were transitioning from a background in pharmacy and public health to a data career in pharma, distinguishing the responsibilities of data scientists versus analytics engineers, strategies for making data pipelines more robust (and convincing your team that you NEED robust pipelines in the first place), and the value of joining open-source communities like Tidy Tuesday.

Resources mentioned in the video and chat: Posit Data Science Lab → https://pos.it/dslab Tidy Tuesday GitHub Repository → https://github.com/rfordatascience/tidytuesday {dbplyr} → https://dbplyr.tidyverse.org/ The Missing Semester of Your CS Education → https://missing.csail.mit.edu/

► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu

Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh The Lab: https://pos.it/dslab LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co

Thanks for hanging out with us!

Timestamps 00:00 Introduction 03:22 “Do you feel like analytics engineer is a good descriptor for what you do?” 04:44 “How did you get into data from being on a pharmacist’s job path?” 11:55 “What was it like in the move that you made from public health to pharma?” 16:57 “What do you say are distinguishing factors between data science and engineering?” 20:16 “What are the most popular tools that you and your team use, in your job at J&J?” 24:00 “What do you use SQL in?” 27:40 “How would you go about convincing a team of the need for a more robust pipeline?” 31:10 “Can you define robust?” 33:31 “Do you happen to have any specific resources or strategies or examples that might help students or others with that mindset of thinking programmatically?” 37:06 “Are there any non data science skills that are very helpful in your either current or former job?” 40:23 “Is there any kind of community among data scientists across the whole company?” 45:44 “What are your biggest data challenges that you have?” 46:12 “If you had a magic wand, what problem would you solve in that area?” 49:52 “What is a piece of career advice that maybe you wish you could go back in time and give yourself?”

Data analysis with Posit AI-assistants | Sara Altman & Simon Couch | Data Science Lab

The Data Science Lab is a live weekly call. Register at pos.it/dslab! Discord invites go out each week on lives calls. We’d love to have you!

The Lab is an open, messy space for learning and asking questions. Think of it like pair coding with a friend or two. Learn something new, and share what you know to help others grow.

On this call, Libby Heeren is joined by Sara Altman who walks through using Posit’s AI assistants to analyze data, including a sneak peek at Posit Assistant, and Simon Couch drops by to give us a demo of the reviewer package! Together, Sara and Simon author the Posit AI Newsletter, the best place to stay up-to-date with all the cool tools and advice on staying an informed and level-headed AI user.

Hosting crew from Posit: Libby Heeren, Isabella Velasquez, Sara Altman, Simon Couch

Sara’s Bluesky: https://bsky.app/profile/sara-altman.bsky.social Sara’s LinkedIn: https://www.linkedin.com/in/sarakaltman/ Sara’s GitHub: https://github.com/skaltman Posit AI Newsletter by Sara and Simon: https://posit.co/blog/?category=roundups

Resources from the hosts and chat:

Positron IDE → https://positron.posit.co/ Databot Extension → https://positron.posit.co/databot.html Getting started with Positron Assistant → https://positron.posit.co/assistant-getting-started.html Posit Assistant (Private Beta) → https://posit-ai-beta.share.connect.posit.cloud/ Reviewer Package (by Simon Couch) → https://github.com/simonpcouch/reviewer ellmer Package → https://elmer.tidyverse.org/ chatlas Package → https://github.com/posit-dev/chatlas Read the Posit AI Newsletter → https://posit.co/blog/?category=roundups Sign up to get the Posit AI Newsletter → http://pos.it/ai-news Simon’s blog post about local LLMs not quite being ready for primetime → https://posit.co/blog/local-models-are-not-there-yet/ Join the waitlist for Posit AI in RStudio → https://posit.co/products/ai/ Posit AI Known Issues & FAQs → https://posit-ai-beta.share.connect.posit.cloud/#frequently-asked-questions-faqs Blog post from Simon and Sara about Privacy and LLMs → https://posit.co/blog/trust-llm-tools/ DS Lab YouTube playlist → https://youtube.com/playlist?list=PL9HYL-VRX0oSeWeMEGQt0id7adYQXebhT&si=7tmU6EAJpO5S7GBh

► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu

Follow Us Here: Website: https://www.posit.co The Lab: https://pos.it/dslab Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co

Thanks for learning with us!

Timestamps 00:00 Introduction 07:23 “Would you mind real quick just briefly explaining the differences between Positron Assistant and Databot?” 15:01 “Is there any way to configure reasoning efforts when signing in with GitHub Copilot?” 15:49 “Does DataBot already support other providers beyond Cloud?” 20:36 “What is the cases with monetary penalty in the console output?” 22:14 “Do you happen to know if the column names of the dataset are very, very messy?” 23:18 “Can you add skills to DataBot?” 26:36 “This code isn’t being saved anywhere. So where does it go?” 27:38 “There a way to know what all the slash commands are?” 28:51 Requesting Databot to use the namespace operator 33:58 “Is there a way to search within that Databot pane?” 39:34 “Have you noticed any time differences with how quickly things run-in RStudio versus Positron?” 40:33 “What happens if you open that URL that it mentions at the bottom in your browser?” 40:50 Clarifying the difference between Posit Assistant and Positron Assistant 43:18 “What is the typical token burn rate?” 53:31 “Is this on CRAN and working in both Positron and RStudio?”

Simon Couch

The mall package: using LLMs with data frames in R & Python | Edgar Ruiz | Data Science Lab

The Data Science Lab is a live weekly call. Register at pos.it/dslab! Discord invites go out each week on lives calls. We’d love to have you!

The Lab is an open, messy space for learning and asking questions. Think of it like pair coding with a friend or two. Learn something new, and share what you know to help others grow.

On this call, Libby Heeren is joined by Edgar Ruiz as they walk through how mall works (with ellmer) in R, and then python. The mall package lets you use LLMs to process tabular or vectors of data, letting you do things such as feeding it a column of reviews and asking mall to use an anthropic model via ellmer to add a column of summaries or sentiments. Follow along with the code here: https://github.com/LibbyHeeren/mall-package-r

Hosting crew from Posit: Libby Heeren, Isabella Velasquez, Edgar Ruiz

Edgar’s Bluesky: https://bsky.app/profile/theotheredgar.bsky.social Edgar’s LinkedIn: https://www.linkedin.com/in/edgararuiz/ Edgar’s GitHub: https://github.com/edgararuiz

Resources from the hosts and chat:

Ollama → https://ollama.com/download Posit Data Science Lab → https://posit.co/dslab mall package → https://mlverse.github.io/mall/ ellmer package → https://elmer.tidyverse.org/ Libby’s Positron theme (Catppuccin) → https://marketplace.visualstudio.com/items?itemName=Catppuccin.catppuccin-vsc GitHub repo with Libby and Edgar’s code → https://github.com/LibbyHeeren/mall-package-r LLM providers supported by ellmer → https://ellmer.tidyverse.org/index.html#providers vitals package → https://vitals.tidyverse.org/ chatlas package → https://posit-dev.github.io/chatlas/ polars package → https://pola.rs/ narwhals package → https://narwhals-dev.github.io/narwhals/ pandas package → https://pandas.pydata.org/ LM Studio → https://lmstudio.ai/ Simon Couch’s blog → https://www.simonpcouch.com/ Edgar’s dataset: TidyTuesday Animal Crossing Dataset (May 5, 2020) → https://github.com/rfordatascience/tidytuesday Libby’s dataset: Kaggle Tweets Dataset → https://www.kaggle.com/datasets/mmmarchetti/tweets-dataset Blog from Sara and Simon on evaluating LLMs → https://posit.co/blog/r-llm-evaluation-03/ Data Science Lab YouTube playlist → https://www.youtube.com/watch?v=LDHGENv1NP4&list=PL9HYL-VRX0oSeWeMEGQt0id7adYQXebhT&index=2 AWS Bedrock → https://aws.amazon.com/bedrock/ Anthropic → https://www.anthropic.com/ Google Gemini → https://gemini.google.com/ What is rubber duck debugging anyway?? → https://en.wikipedia.org/wiki/Rubber_duck_debugging

► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu

Follow Us Here: Website: https://www.posit.co The Lab: https://pos.it/dslab Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co

Thanks for learning with us!

Timestamps 00:00 Introduction to Libby, Isabella, Edgar, and the mall package + ellmer package 07:14 “What’s the difference between using mall for these NLP tasks versus traditional or classical NLP?” 09:37 “Can mall be used with a local LLM?” 17:32 “What kind of laptop specs should I realistically have to make good use of these models?” 22:12 “Are you limited to three output options?” 22:55 “Can mall return the prediction probabilities?” 24:14 “What are a rule of thumb set of specs for a machine so local LLMs are practically feasible?” 24:47 “Would that be in the additional prompt area where you’re defining things?” 25:04 “You could use the vitals package to compare models, right?” 25:24 “Can we use LM Studio instead of Ollama?” 28:35 “How do you iterate and validate the model?” 36:39 “Why use paste if it is all text?” 37:31 “Are these recent tweets (from X) or older ones from actual Twitter?” 40:23 “Is there a playlist for the Data Science Labs on YouTube?” 46:11 “Does that mean that the python version does not work with pandas?” 50:14 “Where is this data set from?”

Edgar Ruiz, Simon Couch

Open source development practices | Isabel Zimmerman & Davis Vaughan | Data Science Hangout

ADD THE DATA SCIENCE HANGOUT TO YOUR CALENDAR HERE: https://pos.it/dsh - All are welcome! We’d love to see you!

We were recently joined by Isabel Zimmerman and Davis Vaughan, Software Engineers at Posit, to chat about the life of an open source developer, strategies for navigating complex codebases, and how to leverage AI in data science workflows. Plus, NERDY BOOKS!

In this Hangout, we explore the differences between maintaining established ecosystems like the Tidyverse as well as building new tools like the Positron IDE. Davis and Isabel (and sometimes Libby ) share practical advice for developers, such as the utility of AI for writing tests and “rubber ducking”, and their various approaches to writing accessible documentation that bridges the expert-novice gap.

Resources mentioned in the video and zoom chat: Positron IDE → https://posit.co/positron/ Air (R formatter) → https://posit-dev.github.io/air/ Python Packages Book (free) → https://py-pkgs.org/ R Packages Book (free) → https://r-pkgs.org/ DeepWiki (AI tool mentioned for docs) → https://deepwiki.com/tidyverse/vroom

If you didn’t join live, one great discussion you missed from the zoom chat was about Brandon Sanderson’s Cosmere books and the debate between starting with Mistborn vs. The Stormlight Archive. Are you a Cosmere fan?! Which book did you start with? (Libby started with Elantris years before picking up Mistborn Era 1 book 1, but she’d now recommend maybe starting with Warbreaker!)

► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu

Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh The Lab: https://pos.it/dslab LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co

Thanks for hanging out with us!

Timestamps: 00:00 Introduction 04:41 “What does a day in the life of an open source dev look like?” 09:43 “What got you into building your own R packages?” 13:00 “Personal tips for working with code bases you’re not familiar with?” 16:35 “How much of what you build is in R/Python vs. lower-level languages?” 19:57 “Does Air work inside code chunks in Positron?” 20:12 “Changing the Python Quarto formatter in Positron without an extension” 22:56 “What do your side projects look like?” 26:40 “How do you approach writing documentation?” 30:55 “What interesting trends in data science are you noticing?” 33:38 “How do you leverage AI in your work?” 37:30 “What are the hexes on Davis’s back wall?” 38:50 “What career advice would you give to someone in a similar position?” 43:45 “How can I be more resilient when things go wrong?” 47:59 “Do you have keyboard preferences?” 49:25 “What is the best way to report bugs in packages?” 50:56 “Open source dev work vs. in-house dev work” 51:50 “Tips for getting started with Positron”

Davis Vaughan, Isabel Zimmerman

Advent of Code for R users | Emil Hvitfeldt | Data Science Lab

The Data Science Lab is a live weekly call. Register at pos.it/dslab! Discord invites go out each week on lives calls. We’d love to have you!

The Lab is an open, messy space for learning and asking questions. Think of it like pair coding with a friend or two. Learn something new, and share what you know to help others grow.

On this call, Libby Heeren is joined by Posit engineer Emil Hvitfeldt as he walks through Day 1 of Advent of Code 2026 using R. This is a super friendly, collaborative, and cheery intro to AoC! Don’t forget, you can do Advent of Code at any ole time of year

Hosting crew from Posit: Libby Heeren, Isabella Velasquez, Daniel Chen, Emil Hvitfeldt

Emil’s socials and urls: website: https://emilhvitfeldt.com/ GitHub: https://github.com/emilhvitfeldt Bluesky: https://bsky.app/profile/emilhvitfeldt.bsky.social LinkedIn: https://www.linkedin.com/in/emilhvitfeldt/

Resources from the hosts and chat:

Advent of Code: https://adventofcode.com/ Install Positron: https://positron.posit.co/ Eric Wastl, Advent of Code: Behind the Scenes: https://www.youtube.com/watch?v=_oNOTknRTSU AoC Subreddit: https://www.reddit.com/r/adventofcode/ Kieran Healy shared a reddit post with an Advent of Code answer done in Minecraft: https://www.reddit.com/r/adventofcode/comments/1pbeyxx/2025_day_01_part_2_advent_of_code_in_minecraft/ Emil’s Solutions: https://github.com/EmilHvitfeldt/rstats-adventofcode Emil’s helper package: https://github.com/EmilHvitfeldt/aocfuns purrr::accumulate() function: https://purrr.tidyverse.org/reference/accumulate.html

And, for anyone hangin’ in there at the end, Emil updated us on Discord that he figured out why his cumsum() didn’t work: he forgot to start the dial at 50! Once you fix that, it works to solve part 1 :)

► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu

Follow Us Here: Website: https://www.posit.co The Lab: https://pos.it/dslab Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co

Thanks for learning with us!

Timestamps 00:00 Introduction 01:01 Tour of the Advent of Code website 02:30 Dashboard overview and puzzle schedule 03:23 How to view and access previous years’ events 03:37 Structure of puzzles: Two parts and stars 04:40 Understanding the global leaderboard 05:08 “Does that ASCII art build itself? 06:16 Setting up private leaderboards for friend 07:54 Starting Day 1: Story prompt and mechanics 09:30 Understanding unique puzzle inputs 10:51 Submission feedback and delay penalties 11:44 Safe dial logic: Left, Right, and circularity 12:50 Starting position and Part 1 success criteria 14:09 Setting up the project in Positron 16:26 Strategy for speed: Reading from the bottom up 18:49 Problem-solving strategies: Pen, paper, and visualization 19:22 Walking through the logic with a sample case 20:52 Coding Part 1: Data parsing and vectorization 23:17 Positron keyboard shortcuts for duplicating lines 24:40 Debugging the logic and handling negative numbers 26:03 Explaining the Modulo operator (%%) 28:15 Managing large inputs of over 4,000 instructions 29:21 Submitting Part 1 and transitioning to Part 2 32:03 Part 2 challenge: Counting zero “clicks” 34:02 Brainstorming Part 2 code modifications 36:19 Checking important warnings for edge cases 37:00 Coding Part 2: Nested loops and incrementing counters 38:23 Hint: Modulo vs. integer division 40:40 Success with the Part 2 test case 42:30 Alternative method: Vectorized cumulative sums 45:29 “What’s the difference between % and %%?” (percent vs modulo) 46:50 Mathematical optimization to avoid inner loops

Emil Hvitfeldt

Supporting 100 Data Scientists with a Small Team | Mike Thomson | Data Science Hangout

ADD THE DATA SCIENCE HANGOUT TO YOUR CALENDAR HERE: https://pos.it/dsh - All are welcome! We’d love to see you!

We were recently joined by Mike Thomson, Data Science Manager at Flatiron Health, to chat about managing open source tools and maintaining R packages, creating reproducible reports for Word and Excel using Quarto, the “hub and spoke” support model for data scientists, and applying R and Posit tools in Real World Evidence (RWE) oncology space.

In this Hangout, we explore creating reproducible outputs using Quarto for formats like Word and Excel. Flatiron Health uses Quarto because it allows the reproducible publication of analyses to multiple formats simultaneously (like HTML and a downloadable Word document) from the same source code. A specific challenge discussed was outputting formatted analytic tables to Excel, as this is not natively supported by Quarto. Erica Yim, from Mike’s team, detailed how they built an internal R function that uses the flexlsx package along with flextable to easily output pre-existing formatted tables from a Quarto document into an Excel template.

Resources mentioned in the video and zoom chat: flexlsx R package GitHub repository → https://github.com/pteridin/flexlsx DBPlier PR for Snowflake Translations (contributed to by Flat Iron Health) → https://github.com/tidyverse/dbplyr/pull/860

If you didn’t join live, one great discussion you missed from the zoom chat was about the pain points of exporting data from Quarto to Word or Excel, particularly concerning table formatting and styles. Attendees in the chat strongly highlighted the difficulty of managing table formatting, including issues with table cross-references, headers, and footers. They noted that dealing with styles often requires workarounds, such as creating flextables that match desired Word styles instead of relying on default table styles. Let us know below if you’d like to hear more about this topic!

► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu

Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co

Thanks for hanging out with us!

Timestamps: 00:00 Introduction 02:23 “Can you talk about what Flatiron does and what your teams do?” 03:29 “Could you give us a few examples of the data types or collections that you might be working with?” 05:00 “Do you have longitudinal data?” 07:46 “Are you aware of any computer vision applications in the health care industry from your perspective?” 09:38 “Do you use mixed models or Bayesian MCMC?” 10:56 “How does your team use Quarto?” 16:59 “How do you convince stakeholders of the value of going open source (and handle security concerns)?” 22:56 “Do you allow people to have a certain amount of time to contribute back to open source?” 26:03 “I just want to understand a little bit about your support model for that group.” 29:57 “Do you have any tips for asynchronous working?” 31:02 “Are you like a Jira team or an Asana team for assigning tasks or tickets?” 32:10 “How many people on your platform team support Posit teams?” 34:24 “What does your team use for unstructured document analysis?” 36:24 “How important is domain knowledge in your recruitment?” 40:02 “Where do you store all of this stuff (data storage and databases)?” 42:04 “What is the approximate timeline from the time you do analysis to final deployment of results in the real world?” 44:31 “Is there a process for people getting things approved to use in your environment?” 47:39 “How do you handle the challenge of going back from Word to Quarto source code (after changes are tracked)?” 50:22 “What does a typical Workday look like for you?” 51:47 “Is there a piece of career advice that has either really helped you, that you’ve really liked, that you try to give to other people?”

R & Python Interoperability in Data Science Teams | Dave Gruenewald | Data Science Hangout

ADD THE DATA SCIENCE HANGOUT TO YOUR CALENDAR HERE: https://pos.it/dsh - All are welcome! We’d love to see you!

We were recently joined by Dave Gruenewald, Senior Director of Data Science at Centene, to chat about polyglot teams, data science best practices, right-sizing development efforts, and process automation.

In this Hangout, we explore working in a polyglot team and fostering interoperability (a word that Libby loves, but struggles to pronounce out loud). Dave Gruenewald emphasizes that teams should use the tools they are comfortable with, whether that’s R or Python. Some strategies for collaboration across languages that Dave suggests include tools like Quarto to seamlessly run R and Python code in the same report. Teams utilize data science checkpoints, saving outputs as platform-agnostic file types like Parquet so that they can be accessed by any language. The use of REST APIs allows R processes to be accessed programmatically by Python (and vice versa), which can be a real game-changer. The newly released nanonext package was also highlighted as a promising development for improved interoperability.

Resources mentioned in the video and zoom chat: Posit Conf 2025 Table and Plotnine Contests → https://posit.co/blog/announcing-the-2025-table-and-plotnine-contests/ nanonext 1.7.0 Tidyverse Blog Post → https://www.tidyverse.org/blog/2025/09/nanonext-1-7-0/

If you didn’t join live, one great discussion you missed from the zoom chat was about pivoting away from academia, including leaving PhD programs. Many attendees shared their personal experiences of making the difficult decision to drop out of a PhD program. The community suggested alternative terms like “pivot,” “reallocating your resources,” or being a “refugee fleeing academia” instead of “drop out.” Dave Gruenewald shared that he himself left a PhD program but has “no regrets about that.” Did you leave a PhD program? You’re not alone!

► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu

Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co

Thanks for hanging out with us!

Timestamps: 00:00 Introduction 02:21 “What types of data do your teams use?” 06:53 “Which of the three pillars you mentioned is your personal favorite to work on?” 09:26 “How do you avoid or divert scope creep?” 11:41 “How much of the project should be “planning” before any code happens?” 13:53 “Do you feel like people are just hopping in and going, hey, LLM, make me a POC?” 14:28 “Do you give them what they say they want, or do you give them what they need?” 16:40 “I’m wondering what public data do you wish existed?” 18:48 “Why not Positron yet?” 20:43 “How do you unify as a team and make it so that I can always read everybody else’s code?” 23:10 “Could you talk a little bit about how R and Python work together?” 27:28 “How to start package development with a team who are very new to package development.” 33:01 “What’s your greatest regret career wise?” 35:53 “What about your biggest wins, specifically in your early career?” 39:40 “How would you recommend building a data science culture and community from scratch?” 41:49 “Would you set a specific timeline for EDA, exploratory analysis, to scope the project better?” 45:15 “How do you define fun projects, and how much time do you allocate for exploration in those?” 48:21 “Does your team use DVC or something similar for data version control?” 50:00 “Can you talk a bit more about your pivot from academia into data science?” 51:31 “Any advice on where to look for opportunities in data science after getting a masters degree?”

I wrote this talk with an LLM - Hadley Wickham (useR! 2025 Keynote 1)

Presented by: Hadley Wickham (PositPBC)

In this keynote, I’ll explore the evolving relationship between data scientists, statisticians, and large language models through a unique experiment: this entire talk was created in collaboration with an LLM. From outline to slides, from code examples to key insights, I’ll share the practical realities of using AI as a thought partner in the R ecosystem.

Drawing on my experience developing tidyverse packages and teaching data science, I’ll demonstrate how LLMs can augment (rather than replace) the R user’s workflow. We’ll examine specific examples where AI assistance shines—rapid prototyping, documentation generation, and creative ideation—alongside areas where human expertise remains irreplaceable.

Most importantly, I’ll reflect on what this experiment reveals about the future of our community: How might AI change the way we teach R? What new skills should we prioritize? And how can we ensure that the tools we build remain accessible and empowering for all users?

Join me for this meta-exploration of AI’s role in our work, with honest reflections on both the promise and limitations of these new collaborators in our statistical computing journey.

This abstract was generated by Claude Sonnet 3.7 and lightly edited by me. I used the prompt: I am Hadley Wickham, chief scientist at RStudio/Posit and I’ve been invited to give a keynote on AI at the useR conference. Please write a talk abstract for a talk entitled ‘I wrote this talk with an LLM’

Hadley Wickham

New data science tools & old laptops on fire | Jenny Bryan | Data Science Hangout

ADD THE DATA SCIENCE HANGOUT TO YOUR CALENDAR HERE: https://pos.it/dsh - All are welcome! We’d love to see you!

We were joined by Jenny Bryan, Senior Software Engineer at Posit, to chat about (setting laptops on fire,) adapting careers to embrace change and new technologies, behind-the-scenes technical advancements powering the R ecosystem with tools like Positron, demystifying project-based workflows, plus LLM integration and best practices in programming.

Listen to this episode to hear us chat about topics like this:

the benefits and limitations of using Large Language Models (LLMs) in programming. Jenny shared her initial skepticism towards LLMs for coding in R, but her attitude changed significantly when applying LLMs to problems involving languages she was less familiar with, like Rust or TypeScript.
adapting in your career to embrace change and new technologies. Jenny, who describes herself as being on a “third career”, transitioned from management consulting to a statistics professor, and then to a senior software engineer at Posit. She talks a bit about her career journey and how she’s embracing new stuff (ahem, Typescript) so that she gets to keep doing cool stuff!
Positron IDE for R package development. She specifically praises Positron’s unique test explorer and reliable console, and its integrated Data Explorer. For many, Positron offers out-of-the-box data science functionality, unlike other IDEs that require extensive customization.
what new technologies like Ark, Air, and Positron mean for the longterm health of R. Jenny’s been working on lots of nerdy things behind the scenes at Posit and she talks all about how they’re great for developers, package builders, data scientists, and engineers alike.

Another tidbit from this hangout: Jenny gave some advice for those looking to branch into software engineering without formal training: try reading code from admired developers, inviting code reviews, and undertaking small, recreational package development projects to gain practical experience and confidence. She also advocates for adopting a project-oriented workflow (associated with her famous “laptop on fire” remark, of course) using tools like the here package for managing project paths.

Resources mentioned in the video and zoom chat: Positron IDE → https://positron.posit.co/ Happy Git with R → https://happygitwithr.com/ Jenny Bryan’s “Project-oriented workflow” blog post → https://www.tidyverse.org/blog/2017/12/workflow-vs-script/ Air R code formatter → https://posit-dev.github.io/air/ The here() package → https://here.r-lib.org/ Posit Conf → https://posit.co/conference/ Tidy Dev Day 2025 → https://www.tidyverse.org/blog/2025/07/tdd-2025/ R Packages book → https://r-pkgs.org/

If you didn’t join live, you missed a ROARINGLY active chat. Let’s just say, if you’ve ever broken down in tears over a programming project, you’re not alone! Come join us live each week if you’d like to hang out in the chat with us!

► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu

Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co

Thanks for hanging out with us!

Timestamps 00:00 Introduction 03:39 “Is that a Wooble on your desk?” (Spoiler, it’s a gnome!!) 06:23 “As a builder of data science tools, what are the tool features data scientists want most?” 08:43 “Have you experienced needing to adapt to change recently and how have you embraced it?” 13:46 “What is ‘setting laptops on fire’ about?” 13:50 “How did you decide to change your career a few times?” 21:23 “What are your thoughts on the ease of putting models into production in Python versus R and does it make sense to shift everybody to one language or the other?” 27:30 “How do you navigate the ‘I have a hammer so everything looks like a nail’ feeling when working with emerging tools like LLMs?” 33:24 “Do you have any general advice for those data scientists who find themselves wanting to branch out more into software engineering but don’t have formal training?” 39:39 “Why should I use Positron instead of Versus Code?” 47:57 “Can you speak to the value of developing an R package and how to clear the mental hurdle of it being a huge challenge?” 52:34 “What does your career trajectory look like and what is your advice for other people who are looking to grow their career but don’t know if they want to be an IC or a manager? Does being a manager mean you don’t get to write code anymore?”

Jenny Bryan

Bringing data science to the construction industry | Blake Abbenante | Data Science Hangout

To join future data science hangouts, add it to your calendar here: https://pos.it/dsh - All are welcome! We’d love to see you!

We were recently joined by Blake Abbenante, Director of Analytics and Data Science at Suffolk Construction, to chat about his career journey in data science, implementing modern data practices in the construction industry, innovative applications of AI and data science in construction, and building a data-driven culture in a traditionally less tech-focused sector.

In this Hangout, we explore innovative applications of AI and data science in construction. Blake shared how Suffolk Construction is leveraging cutting-edge technologies like AI to revolutionize traditional processes. One focus is their GenAI scheduling tool, which aims to augment and speed up the design and planning phases of building projects. This tool has the potential to significantly reduce the time planners spend on creating schedules, moving from weeks to potentially minutes or hours for an 80% completion rate. Blake discussed the development and implementation of safety models that forecast risk on projects, enabling proactive measures to ensure safer construction sites by predicting which projects might require additional safety personnel based on historical data.

Resources mentioned in the video and zoom chat: The ellmer R package → https://ellmer.tidyverse.org/ The chatlas R package → https://github.com/posit-dev/chatlas Posit Blog Post on ellmer → https://posit.co/blog/announcing-ellmer/

If you didn’t join live, one great discussion you missed from the zoom chat was about the challenges of data collection and analysis when encountering pushback from those whose work is being analyzed, and strategies to build trust and demonstrate value. Let us know below if you’d like to hear more about this topic!

► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu

Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co

Thanks for hanging out with us!

Shiny community, hackathons, and his AI mindset | Joe Cheng | Data Science Hangout

To join future data science hangouts, add it to your calendar here: https://pos.it/dsh - All are welcome! We’d love to see you!

We were recently joined by Joe Cheng, CTO at Posit, to chat about the Shiny contest, the use of AI in data science, and designing hackathons for learning new technologies. We were joined by several past and present Shiny contest winners who gave great advice on how to get started if you want to participate (and we really hope you do)!

In this Hangout, we explore the evolution of the Shiny contest since its inception, including what made the 2024 submissions unique and the ways the contest encourages community contribution and learning. Joe also shared about his personal journey from feeling skepticism about AI to seeing and embracing its potential. We got some amazing questions from the Hangout attendees! We hope you join us live next time to ask some of your own questions

Resources mentioned in the video and zoom chat:
2024 Shiny Contest Winners → https://posit.co/blog/winners-of-the-2024-shiny-contest/
Joe’s AI Hackathon Slides → https://jcheng5.github.io/llm-quickstart/quickstart.html Shiny Assistant → https://gallery.shinyapps.io/assistant/ Isabella’s blog post on prototyping with Shiny Assistant → https://posit.co/blog/ai-powered-shiny-app-prototyping/ Posit Conf Workshops → https://reg.rainfocus.com/flow/posit/positconf25/attendee-portal/page/sessioncatalog?tab.day=20250916&search.sessiontype=1675316728702001wr6r Shiny Conference 2025 → https://www.shinyconf.com/ Call for Speakers Shiny Conf 2025 → https://sessionize.com/shiny-conf-2025/ Shiny Tableau → https://rstudio.github.io/shinytableau/ Echarts4r → https://echarts4r.john-coene.com Elmer package on Github → https://github.com/tidyverse/ellmer

All the Shiny app links mentioned in the video and zoom chat: Eric Nantz 2021 Shiny Contest Submission → https://forum.posit.co/t/the-hotshots-racing-dashboard-shiny-contest-submission/104925 Eric Nantz’s R/Pharma conference keynote on AI → https://youtu.be/AfMa1CVUdXU?si=ThLsKFyonntxzBUF Eric Nantz’s Haunted Places app → https://youtu.be/vX09QGMuOfo?si=K5_uPfK5bcfZZ92l Umair Durrani’s Shiny Storytelling app → https://umair.shinyapps.io/storytimegcp/ Umair’s Blue Sky profile → https://bsky.app/profile/transport-talk.bsky.social Umair’s Shiny meetings project on Github → https://github.com/shiny-meetings/shiny-meetings Abby Stamm’s Shiny Accessibility app → https://github.com/ajstamm/shiny-a11y-app

If you didn’t join live, one great discussion you missed from the zoom chat was about everyone’s favorite interactive plotting tools. Someone asked whether Plotly was the best option, and lots of people said they loved ggiraph, echarts4r, ObservableJS, and others. What about you?! What’s your favorite interactive plotting library?

► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu

Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co

Thanks for hanging out with us!

Joe Cheng

Hadley Wickham - R in Production

R in Production by Hadley Wickham

Visit https://rstats.ai for information on upcoming conferences.

Abstract: In this talk, we delve into the strategic deployment of R in production environments, guided by three core principles to elevate your work from individual exploration to scalable, collaborative data science. The essence of putting R into production lies not just in executing code but in crafting solutions that are robust, repeatable, and collaborative, guided by three key principles:

Not just once: Successful data science projects are not a one-off, but will be run repeatedly for months or years. I’ll discuss some of the challenges for creating R scripts and applications that run repeatedly, handle new data seamlessly, and adapt to evolving analytical requirements without constant manual intervention. This principle ensures your analyses are enduring assets not throw away toys.
Not just my computer: the transition from development on your laptop (usually windows or mac) to a production environment (usually linux) introduces a number of challenges. Here, I’ll discuss some strategies for making R code portable, how you can minimise pain when something inevitably goes wrong, and few unresolved auth challenges that we’re currently working on.
Not just me: R is not just a tool for individual analysts but a platform for collaboration. I’ll cover some of the best practices for writing readable, understandable code, and how you might go about sharing that code with your colleagues. This principle underscores the importance of building R projects that are accessible, editable, and usable by others, fostering a culture of collaboration and knowledge sharing.

By adhering to these principles, we pave the way for R to be a powerful tool not just for individual analyses but as a cornerstone of enterprise-level data science solutions. Join me to explore how to harness the full potential of R in production, creating workflows that are robust, portable, and collaborative.

Bio: Hadley is Chief Scientist at Posit PBC, winner of the 2019 COPSS award, and a member of the R Foundation. He builds tools (both computational and cognitive) to make data science easier, faster, and more fun. His work includes packages for data science (like the tidyverse, which includes ggplot2, dplyr, and tidyr)and principled software development (e.g. roxygen2, testthat, and pkgdown). He is also a writer, educator, and speaker promoting the use of R for data science. Learn more on his website, http://hadley.nz .

Mastodon: https://fosstodon.org/@hadleywickham

Presented at the 2024 New York R Conference (May 17, 2024) Hosted by Lander Analytics (https://landeranalytics.com )

Hadley Wickham

How to train, evaluate, and deploy a machine learning workflow with tidymodels & Posit Team

Helpful resources: Github: https://github.com/simonpcouch/mutagen Follow-up Q&A Session: https://youtube.com/live/vwBVOBQfc_U If you want to book a call with our team to chat more about Posit products: pos.it/chat-with-us Don’t want to meet, but curious who else on your team is using Posit? pos.it/connect-us Blog post on tidymodels + Posit Connect: https://posit.co/blog/pharmaceutical-machine-learning-with-tidymodels-and-posit-connect/ Tidy Modeling with R book: https://www.tmwr.org/

Timestamps: 1:44 - Three steps for developing a machine learning model 3:35 - What is a machine learning model? 7:02 - Overview of machine learning with Posit Team 7:36: Step 1: Understand and clean data 11:05 - Step 2: Train and evaluate models (why you might be interested using tidymodels) 23:02 - Step 3: Deploying a machine learning model from Posit Workbench to Posit Connect 30:14 - Summary 31:21 - Helpful resources

Machine learning models are all around us, from Netflix movie recommendations to Zillow property value estimates to email spam filters.

As these models play an increasingly large role in our personal and professional lives, understanding and embracing them has never been more important; machine learning helps us make better, data-driven decisions.

The tidymodels framework is a powerful set of tools for building—and getting value out of—machine learning models with R.

Data scientists use tidymodels to:

Gain access to a wide variety of machine learning methods
Guard against common mistakes
Easily deploy models through tidymodels’ integration with vetiver

Join Simon Couch from the tidyverse team on Wednesday, October 25th at 11am ET as he walks through an end-to-end machine learning workflow with Posit Team.

No registration is required to attend - simply add it to your calendar using this link: pos.it/team-demo

Simon Couch

Teaching the tidyverse in 2023 | Mine Çetinkaya-Rundel

Recommendations for teaching the tidyverse in 2023, summarizing package updates most relevant for teaching data science with the tidyverse, particularly to new learners.

00:00 Introduction 00:46 Using addins to switch between RStudio themes (See https://github.com/mine-cetinkaya-rundel/addmins for more info) 01:40 Native pipe 03:08 Nine core packages in tidyverse 2.0.0 07:15 Conflict resolution in the tidyverse 11:30 Improved and expanded *_join() functionality 22:05 Per operation grouping 27:41 Quality of life improvements to case_when() and if_else() 31:41 New syntax for separating columns 34:51 New argument for line geoms: linewidth 36:08 Wrap up

See more in the Teaching the tidyverse in 2023 blog post https://www.tidyverse.org/blog/2023/08/teach-tidyverse-23

Mine Çetinkaya-Rundel

rstudio tidyverse tidyverse.org Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

posit::conf(2023) Workshop: Advanced tidymodels

Register now: http://pos.it/conf Instructor: Max Kuhn, Software Engineer, Posit Workshop Duration: 1-Day Workshop

This workshop is for you if you: • have used tidymodels packages like recipes, rsample, and parsnip • are comfortable with tidyverse syntax (e.g. piping, mutates, pivoting) • have some experience with resampling and modeling (e.g., linear regression, random forests, etc.), but we don’t expect you to be an expert in these

In this workshop, you will learn more about model optimization using the tune and finetune packages, including racing and iterative methods. You’ll be able to do more sophisticated feature engineering with recipes. Time permitting, model ensembles via stacking will be introduced. This course is focused on the analysis of tabular data and does not include deep learning methods.

Participants who have completed the “Introduction to tidymodels” workshop will be well-prepared for this course. Participants who are new to tidymodels will benefit from taking the Introduction to tidymodels workshop before joining this one

Max Kuhn

finetune parsnip rsample tidymodels tidyverse Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

posit::conf(2023) Workshop: Big Data with Arrow

Register now: http://pos.it/conf Instructors: Nic Crane and Stephanie Hazlitt Workshop Duration: 1-Day Workshop

This course is for you if you: • want to learn how to work with tabular data that is too large to fit in memory using existing R and tidyverse syntax implemented in Arrow • want to learn about Parquet and other file formats that are powerful alternatives to CSV files • want to learn how to engineer your tabular data storage for more performant access and analysis with Apache Arrow

Data analysis pipelines with larger-than-memory data are becoming more and more commonplace. In this workshop you will learn how to use Apache Arrow, a multi-language toolbox for working with larger-than-memory tabular data, to create seamless “big” data analysis pipelines with R.

The workshop will focus on using the the arrow R package—a mature R interface to Apache Arrow— to process larger-than-memory files and multi-file data sets with arrow using familiar dplyr syntax. You’ll learn to create and use interoperable data file formats like Parquet for efficient data storage and access, with data stored both on disk and in the cloud, and also how to exercise fine control over data types to avoid common large data pipeline problems. This workshop will provide a foundation for using Arrow, giving you access to a powerful suite of tools for performant analysis of larger-than-memory data in R

dplyr tidyverse Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

posit::conf(2023) Workshop: Causal Inference with R

Register now: http://pos.it/conf Instructors: Malcolm Barrett and Travis Gerke

This course is for you if you: • know how to fit a linear regression model in R • have a basic understanding of data manipulation and visualization using tidyverse tools • are interested in understanding the fundamentals behind how to move from estimating correlations to causal relationships

In this workshop, we’ll teach the essential elements of answering causal questions in R through causal diagrams, and causal modeling techniques such as propensity scores and inverse probability weighting.

In both data science and academic research, prediction modeling is often not enough; to answer many questions, we need to approach them causally. In this workshop, we’ll teach the essential elements of answering causal questions in R through causal diagrams, and causal modeling techniques such as propensity scores and inverse probability weighting. We’ll also show that by distinguishing predictive models from causal models, we can better take advantage of both tools. You’ll be able to use the tools you already know–the tidyverse, regression models, and more–to answer the questions that are important to your work

tidyverse Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

posit::conf(2023) Workshop: From R User to R Programmer

Register now: http://pos.it/conf Instructors: Emma Rand and Ian Lyttle Workshop Duration: 1-Day Workshop

This course is for you if you: • have experience equivalent to an introductory data science course using tidyverse • feel comfortable with the Whole game chapter of R for Data Science

This is a one-day, hands-on workshop for those who have embraced the tidyverse and want to improve their R programming skills and, especially, reduce the amount of duplication in their code. The two main ways to reduce duplication are creating functions and using iteration. We will use a tidyverse approach to cover function design and iteration with {purrr}.

• Master the art of writing functions that do one thing well, adhere to existing conventions and can be fluently combined together to solve more complex problems. • Learn how to perform the same action on many objects using code which is succinct and easy to read

purrr tidyverse Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

posit::conf(2023) Workshop: Fundamentals of Package Development

Register now: http://pos.it/conf Instructor: Andy Teucher Workshop Duration: 1-Day Workshop

This workshop is for you if: • You have written several R scripts and find yourself wondering how to reuse or share the code you’ve written • You know how to write functions in R • You are looking for a way to take the next step in your R programming journey

We will be demonstrating some workflows using Git and GitHub. Knowledge of these tools is not required, and you will absolutely be able to complete the workshop without them, but some of the lessons will be more rewarding to you if you are prepared to try them out. If you are looking to get started with Git and GitHub, we recommend you register for the “What they forgot to teach you about R” workshop on Day 1, and join us for this workshop on Day 2.

We are often faced with the need to share our code with others, or find ourselves writing similar code over and over again across different projects. In R, the fundamental unit of reusable code is a package, containing helpful functions, documentation, and sometimes sample data. This workshop will teach you the fundamentals of package development in R, using tools and principles developed and used extensively by the tidyverse team - specifically the ‘devtools’ family of packages including usethis, testthat, and roxygen2. These packages and workflows help you focus on the contents of your package rather than the minutiae of package structure.

You will learn the structure of a package, how to organize your code, and workflows to help you develop your package iteratively. You will learn how to write good documentation so that users can learn how to use your package, and how to use automated testing to ensure it is functioning the way you expect it to, now and into the future. You will also learn how to check your package for common problems, and how to distribute your package for others to use.

This will be an interactive 1-day workshop, and we will be using the RStudio IDE to work through the materials, as it has been designed to work well with the development practices we will be featuring

devtools roxygen2 rstudio testthat tidyverse usethis Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

posit::conf(2023) Workshop: Introduction to Data Science with R and Tidyverse

Register now: http://pos.it/conf Instructors: Posit Academy Instructors Workshop Duration: 2-Day Workshop

This course is ideal for: • those new to R or the Tidyverse • anyone who has dabbled in R, but now wants a rigorous foundation in up-to-date data science best practices • SAS and Excel users looking to switch their workflows to R

This is not a standard workshop, but a six-week online apprenticeship that culminates in two in-person days at posit::conf(2023). Begins August 7th, 2023. No knowledge of R required. Visit posit.co/academy to learn more about this uniquely effective learning format.

Here, you will learn the foundations of R and the Tidyverse under the guidance of a Posit Academy mentor and in the company of a close group of fellow learners. You will be expected to complete a weekly curriculum of interactive tutorials, and to attend a weekly presentation meeting with your mentor and fellow students. Topics will include the basics of R, importing data, visualizing data with ggplot2, wrangling data with dplyr and tidyr, working with strings, factors, and date-times, modelling data with base R, and reporting reproducibly with quarto

dplyr ggplot2 Quarto tidyr tidyverse Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

posit::conf(2023) Workshop: Introduction to tidymodels

Register now: http://pos.it/conf Instructors: Hannah Frick, Simon Couch, Emil Hvitfeldt Workshop Duration: 1-Day Workshop

This workshop is for you if you: • have intermediate R knowledge, experience with tidyverse packages, and either of the R pipes • can read data into R, transform and reshape data, and make a wide variety of graphs • have had some exposure to basic statistical concepts such as linear models, random forests, etc.

Intermediate or expert familiarity with modeling or machine learning is not required.

This workshop will teach you core tidymodels packages and their uses: data splitting/resampling with rsample, model fitting with parsnip, measuring model performance with yardstick, and basic pre-processing with recipes. Time permitting, you’ll be introduced to model optimization using the tune package. You’ll learn tidymodels syntax as well as the process of predictive modeling for tabular data

Emil Hvitfeldt, Hannah Frick, Simon Couch

parsnip rsample tidymodels tidyverse yardstick Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

posit::conf(2023) Workshop: Steal like an Rtist: Creative Coding in R

Register now: http://pos.it/conf Instructors: Ijeamaka Anyene Fumagalli & Sharla Gelfand Workshop Duration: 1-Day Workshop

This workshop is for you if you: • are comfortable with R and RStudio, experience with tidyverse and ggplot2 • are interested in applying data visualization skills more creatively, but may not know where to start or how to develop style/inspiration • are an artist interested in exploring code as another medium for creating their work

R is a tool for data analysis but also can be used for self-expression. This workshop will be an introduction to creative coding in R in order to make visual art. We will take an inspiration-first approach, using compelling pieces to discuss and learn the techniques that shape the work. This workshop takes guidance from its namesake, the book “Steal Like An Artist” by Austin Kleon - once we have identified and learned to recreate existing works, we will cover how to take this inspiration and transform, remix, or reinterpret it in the pursuit of developing our own work and artistic styles.

This workshop is hands-on and will cover color theory and manipulation, a reintroduction of the data frame as the foundation for creating art (instead of just for analyzing data!), using ggplot2 as an artistic canvas, creating basic and specialized shapes, tiling and pattern making, developing your own functions and using iteration. We will also discuss how to use controlled randomness to convert a standalone piece into a generative art system that can produce many distinct outputs. Creative coding may seem a world apart from data analysis, but we see a large overlap and intersection of the skills used in both, not to mention the creative muscles that are already used in data visualization

ggplot2 rstudio tidyverse Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

posit::conf(2023) Workshop: Teaching Data Science Masterclass

Register now: http://pos.it/conf Instructor: Dr. Mine Çetinkaya-Rundel Workshop Duration: 1-Day Workshop

This course is for you if you: • you want to learn / discuss curriculum, pedagogy, and computing infrastructure design for teaching data science with R and RStudio using the tidyverse and Quarto • you are interested in setting up your class in Posit Cloud • you want to integrate version control with git into your teaching and learn about tools and best practices for running your course on GitHub

This masterclass is aimed primarily at participants teaching data science in an academic setting in semester-long courses, however much of the information and tooling we introduce is applicable for shorter teaching experiences like workshops and bootcamps as well. Basic knowledge of R is assumed and familiarity with the tidyverse and Git is preferred.

There has been significant innovation in introductory statistics and data science courses to equip students with the statistical, computing, and communication skills needed for modern data analysis. Success in data science and statistics is dependent on the development of both analytical and computational skills, and the demand for educators who are proficient at teaching both these skills is growing. The goal of this masterclass is to equip educators with concrete information on content, workflows, and infrastructure for painlessly introducing modern computation with R and RStudio within a data science curriculum. In a nutshell, the day you’ll spend in this workshop will save you endless hours of solo work designing and setting up your course.

Topics will cover teaching the tidyverse in 2023, highlighting updates to R for Data Science (2nd ed) and Data Science in a Box as well as present tooling options and workflows for reproducible authoring, computing infrastructure, version control, and collaboration.

The workshop will be comprised of four modules: • Teaching data science with the tidyverse and Quarto • Teaching data science with Git and GitHub • Organizing, publishing, and sharing of course materials • Computing infrastructure for teaching data science

Throughout each module we’ll shift between the student perspective and the instructor perspective. The activities and demos will be hands-on; attendees will also have the opportunity to exchange ideas and ask questions throughout the session.

In addition to gaining technical knowledge, participants will engage in discussion around the decisions that go into developing a data science curriculum and choosing workflows and infrastructure that best support the curriculum and allow for scalability. We will also discuss best practices for configuring and deploying classroom infrastructures to support these tools

Mine Çetinkaya-Rundel

Quarto rstudio tidyverse Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

posit::conf(2023) Workshop: Tidy time series and forecasting in R

Register now: http://pos.it/conf Instructor: Rob J Hyndman Workshop Duration: 2-Day Workshop

This course is for you if you: • already use the tidyverse packages in R such as dplyr, tidyr, tibble and ggplot2 • need to analyze large collections of related time series • would like to learn how to use some tidy tools for time series analysis including visualization, decomposition and forecasting

It is common for organizations to collect huge amounts of data over time, and existing time series analysis tools are not always suitable to handle the scale, frequency and structure of the data collected. In this workshop, we will look at some packages and methods that have been developed to handle the analysis of large collections of time series.

On day 1, we will look at the tsibble data structure for flexibly managing collections of related time series. We will look at how to do data wrangling, data visualizations and exploratory data analysis. We will explore feature-based methods to explore time series data in high dimensions. A similar feature-based approach can be used to identify anomalous time series within a collection of time series, or to cluster or classify time series. Primary packages for day 1 will be tsibble, lubridate and feasts (along with the tidyverse of course).

Day 2 will be about forecasting. We will look at some classical time series models and how they are automated in the fable package, and we will explore the creation of ensemble forecasts and hybrid forecasts. Best practices for evaluating forecast accuracy will also be covered. Finally, we will look at forecast reconciliation, allowing millions of time series to be forecast in a relatively short time while accounting for constraints on how the series are related

dplyr ggplot2 lubridate tibble tidyr tidyverse Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Welcome to Quarto Workshop! | Led by Tom Mock, RStudio

Welcome to Quarto 2-hour Workshop | Led by Tom Mock, RStudio

Content website: https://jthomasmock.github.io/quarto-2hr-webinar/ FULL Workshop Materials (this was from a 2-day workshop): rstd.io/get-started-quarto Other upcoming live events: rstd.io/community-events

Double-check: Are you on the latest version of RStudio i.e. v2022.07.1 or later?

Packages used: tidyverse, gt, gtExtras, reactable, ggiraph, here, quarto, rmarkdown, gtsummary, palmerpenguins, fs, skimr

️ Pre-built RStudio Cloud with workshop materials already installed: https://rstudio.cloud/content/4332583

For follow-up questions, please use: community.rstudio.com/tag/quarto

Timestamps: 7:16 - What is Quarto? 8:28 - How does R Markdown work? 9:40: Quarto, more than just knitr 13:56 - Quarto can support htmlwidgets in R and Jupyter widgets for Python/Julia 14:18 - Native support for Observable Javascript 19:28 - Quarto in your own workspace (Jupyter Lab, VSCode, RStudio) 20:26 - RStudio Visual Editor mode 23:30 - VS Code YAML 26:02 - Quarto for collaboration 26:55 - How do you publish Quarto? (Quarto Pub, GitHub Pages, RStudio Connect, Netlify) 28:44 - What about Data Science at Work? 29:59 - Formats baked into Quarto (basic formats, beamer, ppt, html slides, advanced layout, cross references, websites, blogs, books, interactivity) 32:13 - What to do with my existing .Rmd or .ipynb? 33:16 - Why Quarto, instead of R Markdown? 40:50 - Text Formatting 41:30 - Headings 41:51 - Code (also merging R and Python in one document) 43:29 - What about the CLI? 44:55 - Navigating in the terminal 57:56 - PART 2: Authoring Quarto 1:00:22 - Output options 1:04:46 - Quarto workflow 1:12:06 - Quarto YAML intelligence 1:13:20 - Divs and Spans 1:22:13 - Figure layout 1:34:40 - Code chunk options 1:41:00 - Quarto and R Markdown (converting R Markdown to Quarto)

This 2-hour virtual session is designed for those who have no or little prior experience with R Markdown and who want to learn Quarto.

Want to get started with Quarto?

Install RStudio v2022.07.1 from https://www.rstudio.com/products/rstudio/download/#download - this will come with a working version of Quarto!
Webinar materials/slides: https://jthomasmock.github.io/quarto-2hr-webinar/
Workshop materials on RStudio Cloud: https://rstudio.cloud/content/4332583

What is Quarto?

Quarto is the next generation of R Markdown for publishing, including dynamic and static documents and multi-lingual programming language support. With Quarto you can create documents, books, presentations, blogs or other online resources.

Should I take this?

As with all the community meetups, everyone is welcome. This will be especially interesting to you if you have experience programming in R and want to learn how to take advantage of Quarto for literate data science programming in academia, science, and industry.

This workshop will be appropriate for attendees who answer yes to these questions:

Have you programmed in R and want to better encapsulate your code, documentation, and outputs in a cohesive “data product”? Do you want to learn about the next generation of R Markdown for data science? Do you want to have a better interactive experience when writing technical or scientific documents with literate programming?

For more info on Quarto: quarto.org

Jenny Bryan | Help me help you: creating reproducible examples | RStudio (2018)

What is a reprex? It’s a reproducible example. Making a great reprex is both an art and a science and this webinar will cover both aspects. A reprex makes a conversation about code more efficient and pleasant for all. This comes up whenever you ask someone for help, report a bug in software, or propose a new feature. The reprex package (https://reprex.Tidyverse.org ) makes it especially easy to prepare R code as a reprex, in order to share on sites such as https://community.rstudio.com , https://github.com , or https://stackoverflow.com . The habit of making little, rigorous, self-contained examples also has the great side effect of making you think more clearly about your programming problems.

Webinar materials: https://rstudio.com/resources/webinars/help-me-help-you-creating-reproducible-examples/

About Jenny: Jenny is a software engineer on the tidyverse team. She is a recovering biostatistician who takes special delight in eliminating the small agonies of data analysis. Jenny is known for smoothing the interfaces between R and spreadsheets, web APIs, and Git/GitHub. She’s been working in R/S for over 20 years and is a member of the R Foundation. She also serves in the leadership of rOpenSci and Forwards and is an adjunct professor at the University of British Columbia

Jenny Bryan

reprex rstudio tidyverse tidyverse.org webinars Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Reprex

Andy Nicholls & Michael Rimler | Using R to Drive Agility in Clinical Reporting | RStudio (2020)

The R language is used extensively throughout the pharmaceutical industry. But its use within the tightly regulated clinical reporting workflows has remained limited. GSK Biostatistics has embarked upon a journey to embed R as a primary statistical analysis tool for clinical reporting.

Enabling R within a global department of over 600 Statisticians, Programmers and Data Scientists is challenging! It requires planning, patience, and a strong foundation that enables consistency across the enterprise. We invite you to learn more about how we achieved this at GSK.

You’ll learn about our tidyverse-centric training program, a future-ready Working Area for R Programming (WARP) environment, and a leading-edge R for Clinical Reporting (R4CR) initiative. The goal: help embed R in every-day clinical reporting output.

About Andy: Andy Nicholls has a long history with the R language and in Data Science, authoring the book ‘R in 24 Hours’. He is currently Head of Statistical Data Sciences within GSK’s Biostatistics department. One of his team’s main objectives is to embed the R language within Biostatistics; developing training materials, overseeing various adoption initiatives and provisioning a world-class environment for R . He is also the lead for the cross-industry R Validation Hub initiative that aims to support the use of R for regulatory work.

About Michael: Michael Rimler is a clinical programmer in GSK Biostatistics and passionate about influencing the evolving role of open source technologies and data science capabilities on clinical data analytics. He is involved with numerous internal initiatives aimed at moving the organization in this direction, including leading the effort to fully integrate R into the clinical reporting process. Michael is a co-lead of the Open Source Technologies in Clinical Research PHUSE working group project, has chaired a PHUSE US Single Day Event on Data Visualization, and will serve as a co-chair for the 2021 PHUSE US Connect

rstudio tidyverse Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Forcats Rstats Open Source OSS Reticulate Clinical Reporting GSK Pharma Andy Nicholls Michael Rimler

Alison Hill & Desirée De Leon | How to Get Your Materials Online With R Markdown | RStudio (2020)

Full title: Sharing on Short Notice: How to Get Your Teaching Materials Online With R Markdown

Educators create a lot of files for teaching- slides, exercises, solutions, assignments, data, figures- that all ultimately need to be shared with other people. Having a link for sharing your teaching materials can save you time and pain, but it is hard to get started if you’ve never shared your resources online before. In this webinar, we’ll give a tour of the R Markdown ecosystem for educators that you can start to use right away. We’ll show how it can help you make your teaching more shareable, reproducible, and resilient.

About Alison: I studied psychology and quantitative methods, receiving my Ph.D. from Vanderbilt University (2008). For eight years, I was a professor and scientist at Oregon Health & Science University, where my research was funded by the National Institutes of Health, the Oregon Clinical and Translational Research Institute, and Autism Speaks. I have written numerous scientific journal articles and book chapters on autism and neurodevelopmental disorders. I have developed and delivered workshops, graduate-level courses, and curricula based on teaching R, the tidyverse, and literate programming. You can follow my current work for RStudio Education on GitHub.

About Desirée: I am neuroscience PhD student at Emory University and also a former summer intern at RStudio. This past summer, I worked with Alison Hill to develop a handbook filled with practical advice and resources for educators who teach with R and RStudio. I enjoy spending my time on collaborative projects that involve coding, teaching, and illustration

rstudio tidyverse Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Alison Hill Desiree De Leon Reproducibility

Simon Couch | tidymodels/stacks: A Grammar for Stacked Ensemble Modeling | RStudio

Full title: tidymodels/stacks, Or, In Preparation for Pesto: A Grammar for Stacked Ensemble Modeling

Through a community survey conducted over the summer, the RStudio tidymodels team learned that users felt the #1 priority for future development in the tidymodels package ecosystem should be ensembling, a statistical modeling technique involving the synthesis of multiple learning algorithms to improve predictive performance. This December, we were delighted to announce the initial release of stacks, a package for tidymodels-aligned ensembling. A particularly statistically-involved pesto recipe will help us get a sense for how the package works and how it advances the tidymodels package ecosystem as a whole.

About Simon: Simon Couch is an R developer and statistics student at Reed College, where he is entering the final semester of his undergraduate degree. He co-authors and maintains R packages including broom, infer, and stacks, leads trainings and workshops as an RStudio-certified tidyverse trainer, and researches in algorithmic data privacy. He interned on the RStudio tidymodels team in summer 2020, and is currently applying to doctoral programs in statistics

Simon Couch

infer rstudio tidymodels tidyverse Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Simon Couch Stacks

Michael Chow | Bringing the Tidyverse to Python with Siuba | RStudio

Last January I left my job to spend a year developing siuba, a python port of dplyr. At its core, this decision was driven by a decade of watching python and R users produce similar analyses, but in very different ways.

In this talk, I’ll discuss 3 ways siuba enables R users to transfer their hard-earned programming knowledge to python: (1) leveraging the power of dplyr syntax, (2) options to generate SQL code, and (3) working with the plotnine plotting library.

Looking back, I’ll consider two critical pieces that have helped me develop siuba: using it to livecode TidyTuesday analyses, and building an interactive tutorial for absolute beginners.

About Michael: Michael Chow is a data scientist and learning researcher. He serves as a co-director at Code for Philly. In past lives, he worked on adaptive assessment tools in ed tech, and received a PhD in cognitive psychology from Princeton University

Michael Chow

dplyr plotnine rstudio tidyverse Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Siuba Michael Chow SQL

Hadley Wickham | Maintaining the house the tidyverse built | RStudio

Hadley will talk about how the tidyverse has evolved since its creation (just five years ago!). You’ll learn about our greatest successes, learn from our biggest failures, and get some hints of what’s coming down the pipeline for the future.

About Hadley: Hadley Wickham is the Chief Scientist at RStudio, a member of the R Foundation, and Adjunct Professor at Stanford University and the University of Auckland. He builds tools (both computational and cognitive) to make data science easier, faster, and more fun. You may be familiar with his packages for data science (the tidyverse: including ggplot2, dplyr, tidyr, purrr, and readr) and principled software development (roxygen2, testthat, devtools, pkgdown). Much of the material for the course is drawn from two of his existing books, Advanced R and R Packages, but the course also includes a lot of new material that will eventually become a book called “Tidy tools”

Hadley Wickham

devtools dplyr ggplot2 pkgdown purrr readr roxygen2 rstudio testthat tidyr tidyverse Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Hadley Wickham

JooYoung Seo | Accessible Data Science Beyond Visual Models | RStudio

Full title: Accessible Data Science Beyond Visual Models: Non-Visual Interactions with R and RStudio Packages

Data science is full of vision-dominant practices, and most data scientists rely heavily on visual models.

However, data science itself should require insight and computational thinking beyond what is just seen by eyes.

JooYoung Seo, who is a blind data scientist and who was working for RStudio’s accessibility projects over the summer 2020, will talk about his experience with some non-visual techniques to interact with data.

If you would like to know more about various ways of making data science accessible via R, and new accessibility features introduced in RStudio IDE and Shiny, his demonstration without sight will be thought-provoking.

About JooYoung: JooYoung Seo is a Ph.D candidate in the Learning, Design, and Technology program at the Pennsylvania State University, and internationally certified accessibility professional whose research and development focuses on accessible computing for all. As an RStudio’s double-certified data science instructor (i.e., Tidyverse + Shiny), who is blind, he is committed to making data science ecosystem more accessible to people with and without dis/abilities using R. To this end, he has been actively contributing to R open-source projects including Shiny, RMarkdown, bookdown, and distill for accessibility, and interned on the RStudio IDE and Shiny team as an accessibility engineer in summer 2020

Shiny Team

bookdown rmarkdown rstudio Shiny tidyverse Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate JooYoung Seo Accessibility

Mike K Smith | Using rmarkdown and parameterised reports | RStudio

My brain is lazy, shallow and easily distracted. Learn how I use notebooks to keep my present-self organised, my future-self up to speed with what I was thinking months ago, and also how I use parameterised reports to share results for both quantitative and non-quantitative audiences across multiple endpoints. I can update and render outputs for a variety of outputs from a single markdown notebook or report. I’ll show you how I organise my work using the Tidyverse, use child documents with parameterisation and also how this is served out to my colleagues via RStudio Connect.

About Mike K Smith: I have 25 years experience of working in the Pharmaceutical industry (Pfizer), with more than 15 years working on modelling and simulation projects. I am a keen advocate of smarter drug development with a particular interest in Bayesian methods, dose-response, reproducible research and knowledge management. My particular expertise is in the use of simulation methodology to predict drug outcomes, find efficient trial designs, assess decision criteria and evaluate analysis methodologies. My current role at Pfizer is as specialist in computation and modeling solutions - evaluating and deploying new tools and training colleagues. I am an RStudio certified tidyverse trainer

rmarkdown rstudio tidyverse Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Mike Smith

Namita Nandakumar | R + Tidyverse in Sports | RStudio (2020)

This talk will use a case study, most likely in hockey, to showcase the many ways in which R and the Tidyverse can be used to analyze sports data as well as the unique priorities and considerations that are involved in applying statistical tools to sports problems

rstudio tidyverse Rstudio::conf(2020) Namita Nandakumar Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate Sports

Colin Gillespie | How to win an AI Hackathon, without using AI | RStudio (2020)

Anyone reading a newspaper or listening to the news is led to believe that AI is the solution to all problems. From self-driving cars to detecting disease to catching fraud, there doesn’t seem to be a situation that AI can’t tackle. Once “big data” is thrown into the mix, the AI solution is all but certain. But is AI always needed? Over the last eighteen months, Jumping Rivers has entered (and won) four Hackathons. All Hackathons were characterised with “big data” and the need to improve prediction. All Hackathons were won without using AI (or any sort of machine learning). This talk will focus on one particular competition around reducing leakage at Northumbrian Water. Using a combination of R, Shiny, and Tidyverse (and a few other tricks), we were able to demonstrate within the short Hackathon time frame that clear presentation of data to the front line engineers was more likely to reduce leakage, than simply providing vague estimates of a potential future leak

rstudio Shiny tidyverse Rstudio::conf(2020) Colin Gillespie Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Tyson Barrett | List-columns in data.table | RStudio (2020)

The use of list-columns in data frames and tibbles is well documented (e.g. Bryan, 2018), providing a cognitively efficient way to organize results of complex data (e.g. several statistical models, groupings of text, data summaries, or even graphics) with corresponding data. For example, one can store student information within classrooms, player information within teams, or analyses within groups. This allows the data to be of variable sizes without overly complicating or adding redundancies to the structure of the data. In turn, this can improve the reliability to appropriately analyze the data. Because of its efficiency and speed, being able to use data.table to work with list-columns would be beneficial in many data contexts (e.g. to reduce memory usage in large data sets). Herein, I demonstrate how one can create list-columns in a data table using the by argument in data.table and purrr::map(). I compare the behavior of the data.table approaches to the dplyr::group_nest() function and tidyr::unnest(), two of the several powerful Tidyverse nesting and unnesting functions. Results using bench::mark() show the speed and efficiency of using data.table to work with list-columns

dplyr purrr rstudio tidyr tidyverse Rstudio::conf(2020) Tyson Barrett Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Ben Joaquin | Data Science in Meatspace | RStudio (2020)

“The Data Science community is dominated by folks doing amazing work with data that starts in and never leaves cyberspace. This talk is about best practices and playbooks for doing data science that involves meatspace (the opposite of cyberspace) and why R is such a great language for working with data that originated in the physical world. While the concrete examples in this talk will mostly come from the manufacturing space, where I have the most experience, I believe the themes are relevant to many meatspace workflows. We’ll talk through effective playbooks that can help you navigate common tasks throughout the life-cycle of a project. We’ll also weave in how R’s glorious package ecosystem, including Tidyverse, can be combined with other languages like python, and with enterprise products like RStudio Connect to great effect. Specifically, we’ll discuss practices in these areas:

best practices for data collection in meatspace the importance of quantifying measurement system error collecting the correct data for training computer vision models the rarely discussed cost of maintaining models in production”

rstudio tidyverse Rstudio::conf(2020) Ben Joaquin Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Lionel Henry | Interactivity and Programming in the Tidyverse | RStudio (2020)

In Tidyverse grammars such as dplyr you can refer to the columns in your data frames as if they were objects in the workspace. This syntax is optimised for interactivity and is a great fit for data analysis, but it makes it harder to write functions and reuse code. In this talk we present some advances in the tidy eval framework that make it easier to program around Tidyverse pipelines without having to learn a lot of theory

Lionel Henry

dplyr rstudio tidyverse Rstudio::conf(2020) Lionel Henry Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Dr. Amelia McNamara | Making a tidy dress | RStudio (2020)

After at least a year of dreaming about it, I finally produced the #rstats / #Tidyverse dress of my dreams. This involved designing fabric, getting it custom printed, making a pattern from an existing garment, and sewing the dress. I learned a lot of useful lessons during this project, including "do unit tests" (make a practice dress) and "document your work" (get your BFF to take pictures of you)

rstudio tidyverse Rstudio::conf(2020) Dr. Amelia McNamara Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Larry Fenn | Journalism with RStudio, R, and the tidyverse | RStudio (2020)

The Associated Press data team primarily uses R and the Tidyverse as the main tool for doing data processing and analysis. In this talk, some of the technology behind the published stories will be showcased: - Using dbplyr to work off a hosted database containing 380 million opioid records to identify “pill mills”. - Using open-sourced AP style templates for R Markdown and ggplot to quickly produce graphics and reports off breaking news. - Using R Markdown and htmlwidgets to give reporters and editors interactive reports to identify reporting leads

dbplyr rstudio tidyverse Larry Fenn Rstudio::conf(2020) Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Jeffrey Arnold | Solving R for data science | RStudio (2019)

While teaching a course using “R for Data Science”, I wrote a complete set of solutions to its exercises and posted them on GitHub. Then other people started finding them. And now I’m here. In this talk, I’ll discuss why I did it, and what I learned from the process, both what I learned about the Tidyverse itself, and what I learned from teaching it.

About Jeffrey: I am formerly an Assistant Professor of Political Science of Political Science University of Washington and Core Faculty Member in the Center for Statistics and the Social Sciences, an Instructor of Political Science and QuanTM Pre-Doctoral Fellow at Emory University. I received Ph.D. in political science at the University of Rochester. Prior to graduate studies, I was a Research Associate/Economist in the Money and Payments Studies research group at the Federal Reserve Bank of New York

rstudio tidyverse Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Jeffrey Arnold

Brooke Watson | R at the ACLU Joining tables to to reunite families | RStudio (2019)

Last year, over 2500 immigrant children were separated from their family while in government custody. Information about their status is scattered across several government agencies, and throughout the national class-action lawsuit “Ms. L vs ICE,” the Analytics team of the ACLU has been using R to join, deduplicate, validate, and analyze it. Using specifics of this case, this talk will address common challenges arising from human-generated data in spreadsheets. With generalizable examples, I will discuss data tidying, standardization, deduplication, and validation using the tidyverse, janitor, assertthat, and other packages. Finally, I will share best practices for requesting useful data from non-quantitative subject matter experts.

About the Author Brooke Watson I am a Data Scientist at the ACLU, where I use code and statistics to support civil rights litigation and advocacy. Previously, I worked in public health and disease research, most recently as a Research Scientist with the EcoHealth Alliance. I completed my Master’s degree in epidemiology from the London School of Hygiene and Tropical Medicine and swam for Tennessee’s Lady Vols as an undergrad

rstudio tidyverse Brooke Watson ACLU Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats

Matt Dancho | Using R, the Tidyverse, H2O, and Shiny to reduce employee attrition | RStudio (2019)

An organization that loses 200 high-performing employees per year has a lost productivity cost of about $15M/year. This cost is massive, yet many organizations don’t know it exists. It doesn’t show up on a financial statement. Therefore, it goes unnoticed. This presentation showcases how several open source tools integrate to form a solution to the employee attrition problem. Specifically: (1) How the tidyverse enables problem identification through visualization. (2) How recipes + H2O can be combined to explain key relationships to attrition and predict employee attrition. (3) How Shiny can be used to create a powerful dashboard that empowers business leaders to make data-driven decisions across the organization

rstudio Shiny tidyverse Matt Dancho Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS

Hadley Wickham | vctrs Tools for making size and type consistent functions | RStudio (2019)

vctrs is a new package that provides tools (cognitive and computational) to ensure that functions behave consistently with respect to inputs of varying length and type. The end goal of vctrs is to be invisible to the end user of the tidyverse (simply enabling their predictions about function outputs to be more correct), but will help developers write functions that “just work”

Hadley Wickham

rstudio tidyverse vctrs Hadley Wickham Vctrs Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS

Reproducible Examples with the reprex package

Reproducible Examples and the reprex package. https://speakerdeck.com/jennybc/reprex-reproducible-examples-with-r

Jump to: 0:08 Intro 0:40 Basic usage of reprex 3:35 Motivation, why use reprex? “Help me help you”

4:08 Define reprex? Three commons ways to use the term.

noun, a reproducible example
the reprex package. a tool to build R reprexs
reprex::reprex(), a function in reprex to make a reprex.

5:26 When should you use a reprex?

6:14 reprex installation and setup. How do you actually get repex on your machine? 7:59 Advanced setup and discussion. 9:45 Please use advanced features responsibly.

11:02 Why does the reprex package exist? Anyone who has helped teach R or dealt with github issues, twitter, stack overflow & RStudio community questions knows that helping people diagnose their coding problems can be hard. This tool comes from hard-won experience. It’s aim to is help people ask well formed questions and increase the chances of getting well formed answers quickly.

12:52 philosophy behind reprex

code that I can run
code that I don’t need to run
code that I can easily run

13:52 code that I can run.

17:25 Tips on writing good reprexs. Dos and don’ts.

18:52 How do I get my data into my reprex? Getting small data and CSV type data into your reprex is easy.
“I have a big hairy data object and I can only show their problem by using it”, but that’s not always the case.

21:02 code that I don’t need to run reprex gives your reader the code and reveals the output being produced by that code. For experienced coders, that might be enough to help you.

22:44 code that I can easily run Don’t copy and paste from the R console. This is usually annoying for your reader. Worse than console copy-pasta is the screenshot. (Many people think screenshots of code are downright offensive.)

25:03 reprex_clean If you copy someone else’s reprex into your consolve, it may include their output, making your new reprex a untidy. Here are tips for taking someone else’s reprex code and output, and create a clean reprex reply.

25:54 shock and awe More interesting features of the reprex package.

26:29 What about figures and plots in your reprex? So happy you asked about that. reprex will automatically upload your images to imgur.com.
28:23 Create a reprex by explicitly providing your code in the reprex call.
29:00 when you need your reprex to work in the current working directory.
30:45 Differently flavored markdown. Optimize your reprex markdown output for github, stack overflow, or the RStudio community.
30:31 Make your reprex create an R script, with your reprex outputs as comments. This is handy for pasting into an email or slack-type-app.
32:25 Rich text format, rtf output. (currently experimental feature as of this video)
33:06 supress the reprex add at the bottom of your reprex
33:19 Include session info.
33:54 Auto styling of your code. Good if you’re dealing with poorly formatting code.
34:25 Change your comments string.
34:32 Silence Tidyverse startup messages.
35:00 Capture a reprex that sends messages to standard output and standard input (e.g. package installation compilation messages).

36:13 Set up personal defaults for your reprex usage.

36:54 reprex RStudio addins; render reprex and reprex selection. These accelerate your use of reprex.

39:01 The human side of reproducible examples. How to ask questions in ways that are most likely to get answered. Sorry for the tough love, but this is important. Why are you always asked to give a reprex?

Experts try to use reproducible examples to ensure their advice works.
Making a good reprex is hard. But, you are asking them to solve a problem for you, so meet them halfway.
Creating reprexes is good coding practice.
Making a good reprex is often a good way to debug your issue in the embarrassment-free privacy of your own home.
reprexes lead to discussions more likely to help people in the future.

44:34 Behind the scenes of reprex

44:44 Thanks for those that helped make reprex possible.

Questions and Answers

46:05 can reprex capture variables and objects in the current environment? (not yet, maybe in development)
47:25 does reprex actually check that the code is self contained? (self contained)
48:08 does readr::read_csv support the text argument? (yep, just read the help manual for readr)

Shiny Train-the-Trainer Workshop - rstudio::conf(2019L)

What is the 2-day Shiny Train-the-Trainer Workshop? That’s a great question, I’m glad you asked.

Register at https://rstd.io/conf Learn more at https://rstd.io/conf-agenda

Shiny Train-the-Trainer Certification Workshop - 2 Day

Day 1 of the course will be co-taught by Mine Cetinkaya-Rundel and Garrett Grolemund, RStudio Data Scientists and Professional Educators.
On Day 2, Mine will teach the Shiny track and Garrett will teach the Tidyverse track.

This two-day workshop will equip you to teach R effectively. We will draw on RStudio’s experience teaching R to recommend tips for designing, teaching, and supporting short R courses.

On Day 1 of the course, you will learn practical activities that you can use immediately to improve your presentation style, learning outcomes, and student engagement. You will leave the class with a cognitive model of learning that you can use to develop your own effective workshops or courses within your organization. The course will also cover how to use RStudio Cloud and its curriculum of tutorials to jump-start your own lessons.

On Day 2 of the course, participants will have the option to choose one of two tracks: Teaching the Tidyverse or Teaching Shiny.

Teaching Shiny: Classroom examples will focus on teaching Shiny at the beginner and intermediate levels. The course materials will build on RStudio’s Mastering Shiny workshop as well as the upcoming book from the author of the Shiny package, Joe Cheng, and they will cover the entire lifecycle of a Shiny app: build ️ improving ️ share. Participants will receive the course materials for teaching Mastering Shiny. You should take this workshop if you work as a training partner and want to qualify as an RStudio Certified Shiny Instructor or if you are an advocate for R in your organization. You should be proficient in Shiny already and be prepared to submit examples of your work. Prior teaching experience is helpful, but not required. Please bring a laptop and a device that has video recording capabilities (such as a laptop or cell phone).

Instructors: Garrett Grolemund, Mine Çetinkaya-Rundel

Joe Cheng, Mine Çetinkaya-Rundel

Tidyverse Train-the-Trainer Certification Workshop - rstudio::conf(2019L)

What is the 2-day Tidyverse Train-the-Trainer Workshop? That’s a great question, I’m glad you asked.

Register at https://rstd.io/conf Learn more at https://rstd.io/conf-agenda

Tidyverse Train-the-Trainer Certification Workshop - 2 Days

Day 1 of the course will be co-taught by Mine Cetinkaya-Rundel and Garrett Grolemund, RStudio Data Scientists and Professional Educators.
On Day 2, Mine will teach the Shiny track and Garrett will teach the Tidyverse track.

This two-day workshop will equip you to teach R effectively. We will draw on RStudio’s experience teaching R to recommend tips for designing, teaching, and supporting short R courses.

On Day 1 of the course, you will learn practical activities that you can use immediately to improve your presentation style, learning outcomes, and student engagement. You will leave the class with a cognitive model of learning that you can use to develop your own effective workshops or courses within your organization. The course will also cover how to use RStudio Cloud and its curriculum of tutorials to jump-start your own lessons.

On Day 2 of the course, participants will have the option to choose one of two tracks: Teaching the Tidyverse or Teaching Shiny.

Teaching the Tidyverse: Classroom examples will focus on how to teach students to do data analysis with the Tidyverse. We will use Master the Tidyverse, which is an award-winning two-day workshop developed by RStudio, as an example. Participants will receive the course materials for teaching Master the Tidyverse. You should take this workshop if you work for a training partner and want to qualify as an RStudio Certified Tidyverse Instructor or if you are an advocate for R in your organization. You should be proficient in the Tidyverse already and be prepared to submit examples of your work. Prior teaching experience is helpful, but not required. Please bring a laptop and a device that has video recording capabilities (such as a laptop or cell phone).

Instructors: Garrett Grolemund, Mine Çetinkaya-Rundel

Mine Çetinkaya-Rundel

Data Manipulation Tools: dplyr – Pt 3 Intro to the Grammar of Data Manipulation with R

Data wrangling is too often the most time-consuming part of data science and applied statistics. Two tidyverse packages, tidyr and dplyr, help make data manipulation tasks easier. Keep your code clean and clear and reduce the cognitive load required for common but often complex data science tasks.

dplyr docs: dplyr.tidyverse.org/reference/

Pt. 1: What is data wrangling? Intro, Motivation, Outline, Setup https://youtu.be/jOd65mR1zfw

/01:44 Intro and what’s covered Ground Rules
/02:40 What’s a tibble
/04:50 Use View
/05:25 The Pipe operator:
/07:20 What do I mean by data wrangling?

Pt. 2: Tidy Data and tidyr https://youtu.be/1ELALQlO-yM

/00:48 Goal 1 Making your data suitable for R
/01:40 tidyr “Tidy” Data introduced and motivated
/08:10 tidyr::gather
/12:30 tidyr::spread
/15:23 tidyr::unite
/15:23 tidyr::separate

Pt. 3: Data manipulation tools: dplyr https://youtu.be/Zc_ufg4uW4U

00.40 setup
02:00 dplyr::select
03:40 dplyr::filter
05:05 dplyr::mutate
07:05 dplyr::summarise
08:30 dplyr::arrange
09:55 Combining these tools with the pipe (Setup for the Grammar of Data Manipulation)
11:45 dplyr::group_by

Pt. 4: Working with Two Datasets: Binds, Set Operations, and Joins https://youtu.be/AuBgYDCg1Cg Combining two datasets together

/00.42 dplyr::bind_cols
/01:27 dplyr::bind_rows
/01:42 Set operations dplyr::union, dplyr::intersect, dplyr::set_diff
/02:15 joining data dplyr::left_join, dplyr::inner_join, dplyr::right_join, dplyr::full_join,

Cheatsheets: https://www.rstudio.com/resources/cheatsheets/

Documentation: tidyr docs: tidyr.tidyverse.org/reference/

tidyr vignette: https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html dplyr docs: http://dplyr.tidyverse.org/reference/
dplyr one-table vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html
dplyr two-table (join operations) vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/two-table.html

Tidy Data and tidyr – Pt 2 Intro to Data Wrangling with R and the Tidyverse

Data wrangling is too often the most time-consuming part of data science and applied statistics. Two tidyverse packages, tidyr and dplyr, help make data manipulation tasks easier. Keep your code clean and clear and reduce the cognitive load required for common but often complex data science tasks.

http://tidyr.tidyverse.org/reference/

Pt. 1: What is data wrangling? Intro, Motivation, Outline, Setup https://youtu.be/jOd65mR1zfw

/01:44 Intro and what’s covered Ground Rules
/02:40 What’s a tibble
/04:50 Use View
/05:25 The Pipe operator:
/07:20 What do I mean by data wrangling?

Pt. 2: Tidy Data and tidyr https://youtu.be/1ELALQlO-yM

00:48 Goal 1 Making your data suitable for R
01:40 tidyr “Tidy” Data introduced and motivated
08:10 tidyr::gather
12:30 tidyr::spread
15:23 tidyr::unite
15:23 tidyr::separate

Pt. 3: Data manipulation tools: dplyr https://youtu.be/Zc_ufg4uW4U

00.40 setup
/02:00 dplyr::select
/03:40 dplyr::filter
/05:05 dplyr::mutate
/07:05 dplyr::summarise
/08:30 dplyr::arrange
/09:55 Combining these tools with the pipe (Setup for the Grammar of Data Manipulation)
/11:45 dplyr::group_by
/15:00 dplyr::group_by

Pt. 4: Working with Two Datasets: Binds, Set Operations, and Joins https://youtu.be/AuBgYDCg1Cg Combining two datasets together

/00.42 dplyr::bind_cols
/01:27 dplyr::bind_rows
/01:42 Set operations dplyr::union, dplyr::intersect, dplyr::set_diff
/02:15 joining data dplyr::left_join, dplyr::inner_join, dplyr::right_join, dplyr::full_join,

Cheatsheets: https://www.rstudio.com/resources/cheatsheets/

Documentation: tidyr docs: tidyr.tidyverse.org/reference/

tidyr vignette: https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html dplyr docs: http://dplyr.tidyverse.org/reference/
dplyr one-table vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html
dplyr two-table (join operations) vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/two-table.html

What is data wrangling? Intro, Motivation, Outline, Setup – Pt. 1 Data Wrangling Introduction

Data wrangling is too often the most time-consuming part of data science and applied statistics. Two tidyverse packages, tidyr and dplyr, help make data manipulation tasks easier. These videos introduce you to these tools. Keep your R code clean and clear and reduce the cognitive load required for common but often complex data science tasks.

Pt. 1: What is data wrangling? Intro, Motivation, Outline, Setup https://youtu.be/jOd65mR1zfw

01:44 Intro and what’s covered Ground Rules
02:40 What’s a tibble
04:50 Use View
05:25 The Pipe operator:
07:20 What do I mean by data wrangling?

Pt. 2: Tidy Data and tidyr https://youtu.be/1ELALQlO-yM

/00:48 Goal 1 Making your data suitable for R
/01:40 tidyr “Tidy” Data introduced and motivated
/08:15 tidyr::gather
/12:38 tidyr::spread
/15:30 tidyr::unite
/15:30 tidyr::separate

Pt. 3: Data manipulation tools: dplyr https://youtu.be/Zc_ufg4uW4U

00.40 setup
/02:00 dplyr::select
/03:40 dplyr::filter
/05:05 dplyr::mutate
/07:05 dplyr::summarise
/08:30 dplyr::arrange
/09:55 Combining these tools with the pipe (Setup for the Grammar of Data Manipulation)
/11:45 dplyr::group_by
/15:00 dplyr::group_by

Pt. 4: Working with Two Datasets: Binds, Set Operations, and Joins https://youtu.be/AuBgYDCg1Cg Combining two datasets together

/00.42 dplyr::bind_cols
/01:27 dplyr::bind_rows
/01:42 Set operations dplyr::union, dplyr::intersect, dplyr::set_diff
/02:15 joining data dplyr::left_join, dplyr::inner_join, dplyr::right_join, dplyr::full_join,

Cheatsheets: https://www.rstudio.com/resources/cheatsheets/

Documentation: tidyr docs: tidyr.tidyverse.org/reference/

tidyr vignette: https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html dplyr docs: http://dplyr.tidyverse.org/reference/
dplyr one-table vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html
dplyr two-table (join operations) vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/two-table.html

New York Times “For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights”, By STEVE LOHRAUG. 17, 2014 https://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-work.html

Working with Two Datasets: Binds, Set Operations, and Joins – Pt 4 Intro to Data Manipulation

Data wrangling is too often the most time-consuming part of data science and applied statistics. Two tidyverse packages, tidyr and dplyr, help make data manipulation tasks easier. Keep your R code clean and clear and reduce the cognitive load required for common but often complex data science tasks.

dplyr docs: dplyr.tidyverse.org/reference/

Pt. 1: What is data wrangling? Intro, Motivation, Outline, Setup https://youtu.be/jOd65mR1zfw

/01:44 Intro and what’s covered Ground Rules:
/02:40 What’s a tibble
/04:50 Use View
/05:25 The Pipe operator:
/07:20 What do I mean by data wrangling?

Pt. 2: Tidy Data and tidyr https://youtu.be/1ELALQlO-yM

/00:48 Goal 1 Making your data suitable for R
/01:40 tidyr “Tidy” Data introduced and motivated
/08:10 tidyr::gather
/12:30 tidyr::spread
/15:23 tidyr::unite
/15:23 tidyr::separate

Pt. 3: Data manipulation tools: dplyr https://youtu.be/Zc_ufg4uW4U

/00.40 setup
/02:00 dplyr::select
/03:40 dplyr::filter
/05:05 dplyr::mutate
/07:05 dplyr::summarise
/08:30 dplyr::arrange
/09:55 Combining these tools with the pipe (Setup for the Grammar of Data Manipulation)
/11:45 dplyr::group_by

Pt. 4: Working with Two Datasets: Binds, Set Operations, and Joins https://youtu.be/AuBgYDCg1Cg Combining two datasets together

00.42 dplyr::bind_cols
01:27 dplyr::bind_rows
01:42 Set operations dplyr::union, dplyr::intersect, dplyr::set_diff
02:15 joining data - dplyr::left_join, dplyr::inner_join, - dplyr::right_join, dplyr::full_join,

Cheatsheets: https://www.rstudio.com/resources/cheatsheets/

Documentation: tidyr docs: tidyr.tidyverse.org/reference/

tidyr vignette: https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html dplyr docs: http://dplyr.tidyverse.org/reference/
dplyr one-table vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html
dplyr two-table (join operations) vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/two-table.html

tidyverse

Contributors#

Hadley Wickham

Jenny Bryan

Gábor Csárdi

Jeroen Ooms

Max Kuhn

Resources featuring tidyverse#

Coding vs. thinking programmatically | Samia Baig | Data Science Hangout

Data analysis with Posit AI-assistants | Sara Altman & Simon Couch | Data Science Lab

The mall package: using LLMs with data frames in R & Python | Edgar Ruiz | Data Science Lab

Getting Started with LLM APIs in R

Inspecting websites to find JSON data APIs | Marcos Huerta | Data Science Lab

Open source development practices | Isabel Zimmerman & Davis Vaughan | Data Science Hangout

Advent of Code for R users | Emil Hvitfeldt | Data Science Lab

Simon Couch - Practical AI for data science

Hadley Wickham: Spreadsheets, bikes, and the accidental empire of R packages

Air: A blazingly fast R code formatter - Davis Vaughan, Lionel Henry

Keynote: I Wrote This Talk with an LLM - Hadley Wickham

Supporting 100 Data Scientists with a Small Team | Mike Thomson | Data Science Hangout

R & Python Interoperability in Data Science Teams | Dave Gruenewald | Data Science Hangout

Building the Future of Data Apps: LLMs Meet Shiny

Purrrfectly parallel, purrrfectly distributed (Charlie Gao, Posit) | posit::conf(2025)

Translating R for Data Science into Portuguese: A Community-Led Initiative (Beatriz Milz, UFABC)

Using Quarto to Improve Formatting/Automate the Generation of Hundreds of Reports (Keaton Wilson)

I wrote this talk with an LLM - Hadley Wickham (useR! 2025 Keynote 1)

New data science tools & old laptops on fire | Jenny Bryan | Data Science Hangout

Harnessing LLMs for Data Analysis | Led by Joe Cheng, CTO at Posit

Bringing data science to the construction industry | Blake Abbenante | Data Science Hangout

Shiny community, hackathons, and his AI mindset | Joe Cheng | Data Science Hangout

Wes McKinney & Hadley Wickham (on cross-language collaboration, Positron, career beginnings, & more)

Joe Cheng - Summer is Coming: AI for R, Shiny, and Pharma

Emily Riederer - Python Rgonomics

Hubert Halun - Art of R Packages: Forging Community with Hex Stickers

Novice to data scientist: a pediatric anesthesiologist uses RStudio to help kids access surgical ca

Tidymodels: Now Also for Time-to-Event Data! - Hannah Frick

817: The Positron IDE, Tidy NLP and MLOps — with Dr. @JuliaSilge

Ask Hadley Anything

Tom Mock @ Posit PBC | Data Science Hangout

Hadley Wickham - R in Production

Wes McKinney - The Future Roadmap for the Composable Data Stack

Hadley Wickham on R vs Python

779: The Tidyverse of Essential R Libraries and their Python Analogues — with Dr. Hadley Wickham

The Future Roadmap for the Composable Data Stack

dbtplyr: Bringing Column-Name Contracts from R to dbt - posit::conf(2023)

Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#

duckplyr: Tight Integration of duckdb with R and the tidyverse - posit::conf(2023)

Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#

How to Keep Your Data Science Meetup Sustainable - posit::conf(2023)

Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#

Parameterized Quarto Reports Improve Understanding of Soil Health - posit::conf(2023)

Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#

The ‘I’ in Team: Peer-to-Peer Best Practices for Growing Data Science Teams - posit::conf(2023)

Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#

Visualizing Data Analysis Pipelines with Pandas Tutor and Tidy Data Tutor - posit::conf(2023)

Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#

webR 0.2: R Packages and Shiny for WebAssembly | George Stagg | Posit

Bite-sized tricks for machine learning with tidymodels | Posit

How to train, evaluate, and deploy a machine learning workflow with tidymodels & Posit Team

Hadley Wickham @ Posit | Giving benefit to people using what you build | Data Science Hangout

What’s new in the tidyverse? by Professor Mine Çetinkaya-Rundel

Teaching the tidyverse in 2023 | Mine Çetinkaya-Rundel

What does deprecated mean? Package lifecycle and the process of deprecation.

What does superseded mean? Package development lifecycle process and the meaning of superseded.

posit::conf(2023) Workshop: Advanced tidymodels

posit::conf(2023) Workshop: Big Data with Arrow

posit::conf(2023) Workshop: Causal Inference with R

posit::conf(2023) Workshop: From R User to R Programmer

posit::conf(2023) Workshop: Fundamentals of Package Development

posit::conf(2023) Workshop: Introduction to Data Science with R and Tidyverse

posit::conf(2023) Workshop: Introduction to tidymodels

posit::conf(2023) Workshop: Steal like an Rtist: Creative Coding in R

posit::conf(2023) Workshop: Teaching Data Science Masterclass

posit::conf(2023) Workshop: Tidy time series and forecasting in R

Hadley Wickham | {purrr} 1.0: A complete and consistent set of tools for functions and vectors

R-Ladies Rome (English) - What’s new in the tidyverse - Isabella Velasquez

Hannah Frick - “Censored: A tidymodels package for survival models”

Alex Farach | Let’s start at the beginning - bits to character encoding in R | RStudio (2022)

Amelia McNamara | Implications of R Syntax in Intro Stats | Posit (2022)

Hadley Wickham | An introduction to R7 | RStudio (2022)

`dplyr::if_else()` and `dplyr::case_when()` are up to 30x faster