tidyverse.org

Coding vs. thinking programmatically | Samia Baig | Data Science Hangout

ADD THE DATA SCIENCE HANGOUT TO YOUR CALENDAR HERE: https://pos.it/dsh - All are welcome! We’d love to see you!

This week’s guest was Samia Baig, Senior Data Scientist/Data Engineer at Johnson & Johnson Innovative Medicine!

Some topics covered in this week’s Hangout were transitioning from a background in pharmacy and public health to a data career in pharma, distinguishing the responsibilities of data scientists versus analytics engineers, strategies for making data pipelines more robust (and convincing your team that you NEED robust pipelines in the first place), and the value of joining open-source communities like Tidy Tuesday.

Resources mentioned in the video and chat: Posit Data Science Lab → https://pos.it/dslab Tidy Tuesday GitHub Repository → https://github.com/rfordatascience/tidytuesday {dbplyr} → https://dbplyr.tidyverse.org/ The Missing Semester of Your CS Education → https://missing.csail.mit.edu/

► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu

Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh The Lab: https://pos.it/dslab LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co

Thanks for hanging out with us!

Timestamps 00:00 Introduction 03:22 “Do you feel like analytics engineer is a good descriptor for what you do?” 04:44 “How did you get into data from being on a pharmacist’s job path?” 11:55 “What was it like in the move that you made from public health to pharma?” 16:57 “What do you say are distinguishing factors between data science and engineering?” 20:16 “What are the most popular tools that you and your team use, in your job at J&J?” 24:00 “What do you use SQL in?” 27:40 “How would you go about convincing a team of the need for a more robust pipeline?” 31:10 “Can you define robust?” 33:31 “Do you happen to have any specific resources or strategies or examples that might help students or others with that mindset of thinking programmatically?” 37:06 “Are there any non data science skills that are very helpful in your either current or former job?” 40:23 “Is there any kind of community among data scientists across the whole company?” 45:44 “What are your biggest data challenges that you have?” 46:12 “If you had a magic wand, what problem would you solve in that area?” 49:52 “What is a piece of career advice that maybe you wish you could go back in time and give yourself?”

Data analysis with Posit AI-assistants | Sara Altman & Simon Couch | Data Science Lab

The Data Science Lab is a live weekly call. Register at pos.it/dslab! Discord invites go out each week on lives calls. We’d love to have you!

The Lab is an open, messy space for learning and asking questions. Think of it like pair coding with a friend or two. Learn something new, and share what you know to help others grow.

On this call, Libby Heeren is joined by Sara Altman who walks through using Posit’s AI assistants to analyze data, including a sneak peek at Posit Assistant, and Simon Couch drops by to give us a demo of the reviewer package! Together, Sara and Simon author the Posit AI Newsletter, the best place to stay up-to-date with all the cool tools and advice on staying an informed and level-headed AI user.

Hosting crew from Posit: Libby Heeren, Isabella Velasquez, Sara Altman, Simon Couch

Sara’s Bluesky: https://bsky.app/profile/sara-altman.bsky.social Sara’s LinkedIn: https://www.linkedin.com/in/sarakaltman/ Sara’s GitHub: https://github.com/skaltman Posit AI Newsletter by Sara and Simon: https://posit.co/blog/?category=roundups

Resources from the hosts and chat:

Positron IDE → https://positron.posit.co/ Databot Extension → https://positron.posit.co/databot.html Getting started with Positron Assistant → https://positron.posit.co/assistant-getting-started.html Posit Assistant (Private Beta) → https://posit-ai-beta.share.connect.posit.cloud/ Reviewer Package (by Simon Couch) → https://github.com/simonpcouch/reviewer ellmer Package → https://elmer.tidyverse.org/ chatlas Package → https://github.com/posit-dev/chatlas Read the Posit AI Newsletter → https://posit.co/blog/?category=roundups Sign up to get the Posit AI Newsletter → http://pos.it/ai-news Simon’s blog post about local LLMs not quite being ready for primetime → https://posit.co/blog/local-models-are-not-there-yet/ Join the waitlist for Posit AI in RStudio → https://posit.co/products/ai/ Posit AI Known Issues & FAQs → https://posit-ai-beta.share.connect.posit.cloud/#frequently-asked-questions-faqs Blog post from Simon and Sara about Privacy and LLMs → https://posit.co/blog/trust-llm-tools/ DS Lab YouTube playlist → https://youtube.com/playlist?list=PL9HYL-VRX0oSeWeMEGQt0id7adYQXebhT&si=7tmU6EAJpO5S7GBh

► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu

Follow Us Here: Website: https://www.posit.co The Lab: https://pos.it/dslab Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co

Thanks for learning with us!

Timestamps 00:00 Introduction 07:23 “Would you mind real quick just briefly explaining the differences between Positron Assistant and Databot?” 15:01 “Is there any way to configure reasoning efforts when signing in with GitHub Copilot?” 15:49 “Does DataBot already support other providers beyond Cloud?” 20:36 “What is the cases with monetary penalty in the console output?” 22:14 “Do you happen to know if the column names of the dataset are very, very messy?” 23:18 “Can you add skills to DataBot?” 26:36 “This code isn’t being saved anywhere. So where does it go?” 27:38 “There a way to know what all the slash commands are?” 28:51 Requesting Databot to use the namespace operator 33:58 “Is there a way to search within that Databot pane?” 39:34 “Have you noticed any time differences with how quickly things run-in RStudio versus Positron?” 40:33 “What happens if you open that URL that it mentions at the bottom in your browser?” 40:50 Clarifying the difference between Posit Assistant and Positron Assistant 43:18 “What is the typical token burn rate?” 53:31 “Is this on CRAN and working in both Positron and RStudio?”

Simon Couch

The mall package: using LLMs with data frames in R & Python | Edgar Ruiz | Data Science Lab

The Data Science Lab is a live weekly call. Register at pos.it/dslab! Discord invites go out each week on lives calls. We’d love to have you!

The Lab is an open, messy space for learning and asking questions. Think of it like pair coding with a friend or two. Learn something new, and share what you know to help others grow.

On this call, Libby Heeren is joined by Edgar Ruiz as they walk through how mall works (with ellmer) in R, and then python. The mall package lets you use LLMs to process tabular or vectors of data, letting you do things such as feeding it a column of reviews and asking mall to use an anthropic model via ellmer to add a column of summaries or sentiments. Follow along with the code here: https://github.com/LibbyHeeren/mall-package-r

Hosting crew from Posit: Libby Heeren, Isabella Velasquez, Edgar Ruiz

Edgar’s Bluesky: https://bsky.app/profile/theotheredgar.bsky.social Edgar’s LinkedIn: https://www.linkedin.com/in/edgararuiz/ Edgar’s GitHub: https://github.com/edgararuiz

Resources from the hosts and chat:

Ollama → https://ollama.com/download Posit Data Science Lab → https://posit.co/dslab mall package → https://mlverse.github.io/mall/ ellmer package → https://elmer.tidyverse.org/ Libby’s Positron theme (Catppuccin) → https://marketplace.visualstudio.com/items?itemName=Catppuccin.catppuccin-vsc GitHub repo with Libby and Edgar’s code → https://github.com/LibbyHeeren/mall-package-r LLM providers supported by ellmer → https://ellmer.tidyverse.org/index.html#providers vitals package → https://vitals.tidyverse.org/ chatlas package → https://posit-dev.github.io/chatlas/ polars package → https://pola.rs/ narwhals package → https://narwhals-dev.github.io/narwhals/ pandas package → https://pandas.pydata.org/ LM Studio → https://lmstudio.ai/ Simon Couch’s blog → https://www.simonpcouch.com/ Edgar’s dataset: TidyTuesday Animal Crossing Dataset (May 5, 2020) → https://github.com/rfordatascience/tidytuesday Libby’s dataset: Kaggle Tweets Dataset → https://www.kaggle.com/datasets/mmmarchetti/tweets-dataset Blog from Sara and Simon on evaluating LLMs → https://posit.co/blog/r-llm-evaluation-03/ Data Science Lab YouTube playlist → https://www.youtube.com/watch?v=LDHGENv1NP4&list=PL9HYL-VRX0oSeWeMEGQt0id7adYQXebhT&index=2 AWS Bedrock → https://aws.amazon.com/bedrock/ Anthropic → https://www.anthropic.com/ Google Gemini → https://gemini.google.com/ What is rubber duck debugging anyway?? → https://en.wikipedia.org/wiki/Rubber_duck_debugging

► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu

Follow Us Here: Website: https://www.posit.co The Lab: https://pos.it/dslab Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co

Thanks for learning with us!

Timestamps 00:00 Introduction to Libby, Isabella, Edgar, and the mall package + ellmer package 07:14 “What’s the difference between using mall for these NLP tasks versus traditional or classical NLP?” 09:37 “Can mall be used with a local LLM?” 17:32 “What kind of laptop specs should I realistically have to make good use of these models?” 22:12 “Are you limited to three output options?” 22:55 “Can mall return the prediction probabilities?” 24:14 “What are a rule of thumb set of specs for a machine so local LLMs are practically feasible?” 24:47 “Would that be in the additional prompt area where you’re defining things?” 25:04 “You could use the vitals package to compare models, right?” 25:24 “Can we use LM Studio instead of Ollama?” 28:35 “How do you iterate and validate the model?” 36:39 “Why use paste if it is all text?” 37:31 “Are these recent tweets (from X) or older ones from actual Twitter?” 40:23 “Is there a playlist for the Data Science Labs on YouTube?” 46:11 “Does that mean that the python version does not work with pandas?” 50:14 “Where is this data set from?”

Edgar Ruiz, Simon Couch

Advent of Code for R users | Emil Hvitfeldt | Data Science Lab

The Data Science Lab is a live weekly call. Register at pos.it/dslab! Discord invites go out each week on lives calls. We’d love to have you!

The Lab is an open, messy space for learning and asking questions. Think of it like pair coding with a friend or two. Learn something new, and share what you know to help others grow.

On this call, Libby Heeren is joined by Posit engineer Emil Hvitfeldt as he walks through Day 1 of Advent of Code 2026 using R. This is a super friendly, collaborative, and cheery intro to AoC! Don’t forget, you can do Advent of Code at any ole time of year

Hosting crew from Posit: Libby Heeren, Isabella Velasquez, Daniel Chen, Emil Hvitfeldt

Emil’s socials and urls: website: https://emilhvitfeldt.com/ GitHub: https://github.com/emilhvitfeldt Bluesky: https://bsky.app/profile/emilhvitfeldt.bsky.social LinkedIn: https://www.linkedin.com/in/emilhvitfeldt/

Resources from the hosts and chat:

Advent of Code: https://adventofcode.com/ Install Positron: https://positron.posit.co/ Eric Wastl, Advent of Code: Behind the Scenes: https://www.youtube.com/watch?v=_oNOTknRTSU AoC Subreddit: https://www.reddit.com/r/adventofcode/ Kieran Healy shared a reddit post with an Advent of Code answer done in Minecraft: https://www.reddit.com/r/adventofcode/comments/1pbeyxx/2025_day_01_part_2_advent_of_code_in_minecraft/ Emil’s Solutions: https://github.com/EmilHvitfeldt/rstats-adventofcode Emil’s helper package: https://github.com/EmilHvitfeldt/aocfuns purrr::accumulate() function: https://purrr.tidyverse.org/reference/accumulate.html

And, for anyone hangin’ in there at the end, Emil updated us on Discord that he figured out why his cumsum() didn’t work: he forgot to start the dial at 50! Once you fix that, it works to solve part 1 :)

► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu

Follow Us Here: Website: https://www.posit.co The Lab: https://pos.it/dslab Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co

Thanks for learning with us!

Timestamps 00:00 Introduction 01:01 Tour of the Advent of Code website 02:30 Dashboard overview and puzzle schedule 03:23 How to view and access previous years’ events 03:37 Structure of puzzles: Two parts and stars 04:40 Understanding the global leaderboard 05:08 “Does that ASCII art build itself? 06:16 Setting up private leaderboards for friend 07:54 Starting Day 1: Story prompt and mechanics 09:30 Understanding unique puzzle inputs 10:51 Submission feedback and delay penalties 11:44 Safe dial logic: Left, Right, and circularity 12:50 Starting position and Part 1 success criteria 14:09 Setting up the project in Positron 16:26 Strategy for speed: Reading from the bottom up 18:49 Problem-solving strategies: Pen, paper, and visualization 19:22 Walking through the logic with a sample case 20:52 Coding Part 1: Data parsing and vectorization 23:17 Positron keyboard shortcuts for duplicating lines 24:40 Debugging the logic and handling negative numbers 26:03 Explaining the Modulo operator (%%) 28:15 Managing large inputs of over 4,000 instructions 29:21 Submitting Part 1 and transitioning to Part 2 32:03 Part 2 challenge: Counting zero “clicks” 34:02 Brainstorming Part 2 code modifications 36:19 Checking important warnings for edge cases 37:00 Coding Part 2: Nested loops and incrementing counters 38:23 Hint: Modulo vs. integer division 40:40 Success with the Part 2 test case 42:30 Alternative method: Vectorized cumulative sums 45:29 “What’s the difference between % and %%?” (percent vs modulo) 46:50 Mathematical optimization to avoid inner loops

Emil Hvitfeldt

R & Python Interoperability in Data Science Teams | Dave Gruenewald | Data Science Hangout

ADD THE DATA SCIENCE HANGOUT TO YOUR CALENDAR HERE: https://pos.it/dsh - All are welcome! We’d love to see you!

We were recently joined by Dave Gruenewald, Senior Director of Data Science at Centene, to chat about polyglot teams, data science best practices, right-sizing development efforts, and process automation.

In this Hangout, we explore working in a polyglot team and fostering interoperability (a word that Libby loves, but struggles to pronounce out loud). Dave Gruenewald emphasizes that teams should use the tools they are comfortable with, whether that’s R or Python. Some strategies for collaboration across languages that Dave suggests include tools like Quarto to seamlessly run R and Python code in the same report. Teams utilize data science checkpoints, saving outputs as platform-agnostic file types like Parquet so that they can be accessed by any language. The use of REST APIs allows R processes to be accessed programmatically by Python (and vice versa), which can be a real game-changer. The newly released nanonext package was also highlighted as a promising development for improved interoperability.

Resources mentioned in the video and zoom chat: Posit Conf 2025 Table and Plotnine Contests → https://posit.co/blog/announcing-the-2025-table-and-plotnine-contests/ nanonext 1.7.0 Tidyverse Blog Post → https://www.tidyverse.org/blog/2025/09/nanonext-1-7-0/

If you didn’t join live, one great discussion you missed from the zoom chat was about pivoting away from academia, including leaving PhD programs. Many attendees shared their personal experiences of making the difficult decision to drop out of a PhD program. The community suggested alternative terms like “pivot,” “reallocating your resources,” or being a “refugee fleeing academia” instead of “drop out.” Dave Gruenewald shared that he himself left a PhD program but has “no regrets about that.” Did you leave a PhD program? You’re not alone!

► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu

Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co

Thanks for hanging out with us!

Timestamps: 00:00 Introduction 02:21 “What types of data do your teams use?” 06:53 “Which of the three pillars you mentioned is your personal favorite to work on?” 09:26 “How do you avoid or divert scope creep?” 11:41 “How much of the project should be “planning” before any code happens?” 13:53 “Do you feel like people are just hopping in and going, hey, LLM, make me a POC?” 14:28 “Do you give them what they say they want, or do you give them what they need?” 16:40 “I’m wondering what public data do you wish existed?” 18:48 “Why not Positron yet?” 20:43 “How do you unify as a team and make it so that I can always read everybody else’s code?” 23:10 “Could you talk a little bit about how R and Python work together?” 27:28 “How to start package development with a team who are very new to package development.” 33:01 “What’s your greatest regret career wise?” 35:53 “What about your biggest wins, specifically in your early career?” 39:40 “How would you recommend building a data science culture and community from scratch?” 41:49 “Would you set a specific timeline for EDA, exploratory analysis, to scope the project better?” 45:15 “How do you define fun projects, and how much time do you allocate for exploration in those?” 48:21 “Does your team use DVC or something similar for data version control?” 50:00 “Can you talk a bit more about your pivot from academia into data science?” 51:31 “Any advice on where to look for opportunities in data science after getting a masters degree?”

New data science tools & old laptops on fire | Jenny Bryan | Data Science Hangout

ADD THE DATA SCIENCE HANGOUT TO YOUR CALENDAR HERE: https://pos.it/dsh - All are welcome! We’d love to see you!

We were joined by Jenny Bryan, Senior Software Engineer at Posit, to chat about (setting laptops on fire,) adapting careers to embrace change and new technologies, behind-the-scenes technical advancements powering the R ecosystem with tools like Positron, demystifying project-based workflows, plus LLM integration and best practices in programming.

Listen to this episode to hear us chat about topics like this:

the benefits and limitations of using Large Language Models (LLMs) in programming. Jenny shared her initial skepticism towards LLMs for coding in R, but her attitude changed significantly when applying LLMs to problems involving languages she was less familiar with, like Rust or TypeScript.
adapting in your career to embrace change and new technologies. Jenny, who describes herself as being on a “third career”, transitioned from management consulting to a statistics professor, and then to a senior software engineer at Posit. She talks a bit about her career journey and how she’s embracing new stuff (ahem, Typescript) so that she gets to keep doing cool stuff!
Positron IDE for R package development. She specifically praises Positron’s unique test explorer and reliable console, and its integrated Data Explorer. For many, Positron offers out-of-the-box data science functionality, unlike other IDEs that require extensive customization.
what new technologies like Ark, Air, and Positron mean for the longterm health of R. Jenny’s been working on lots of nerdy things behind the scenes at Posit and she talks all about how they’re great for developers, package builders, data scientists, and engineers alike.

Another tidbit from this hangout: Jenny gave some advice for those looking to branch into software engineering without formal training: try reading code from admired developers, inviting code reviews, and undertaking small, recreational package development projects to gain practical experience and confidence. She also advocates for adopting a project-oriented workflow (associated with her famous “laptop on fire” remark, of course) using tools like the here package for managing project paths.

Resources mentioned in the video and zoom chat: Positron IDE → https://positron.posit.co/ Happy Git with R → https://happygitwithr.com/ Jenny Bryan’s “Project-oriented workflow” blog post → https://www.tidyverse.org/blog/2017/12/workflow-vs-script/ Air R code formatter → https://posit-dev.github.io/air/ The here() package → https://here.r-lib.org/ Posit Conf → https://posit.co/conference/ Tidy Dev Day 2025 → https://www.tidyverse.org/blog/2025/07/tdd-2025/ R Packages book → https://r-pkgs.org/

If you didn’t join live, you missed a ROARINGLY active chat. Let’s just say, if you’ve ever broken down in tears over a programming project, you’re not alone! Come join us live each week if you’d like to hang out in the chat with us!

► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu

Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co

Thanks for hanging out with us!

Timestamps 00:00 Introduction 03:39 “Is that a Wooble on your desk?” (Spoiler, it’s a gnome!!) 06:23 “As a builder of data science tools, what are the tool features data scientists want most?” 08:43 “Have you experienced needing to adapt to change recently and how have you embraced it?” 13:46 “What is ‘setting laptops on fire’ about?” 13:50 “How did you decide to change your career a few times?” 21:23 “What are your thoughts on the ease of putting models into production in Python versus R and does it make sense to shift everybody to one language or the other?” 27:30 “How do you navigate the ‘I have a hammer so everything looks like a nail’ feeling when working with emerging tools like LLMs?” 33:24 “Do you have any general advice for those data scientists who find themselves wanting to branch out more into software engineering but don’t have formal training?” 39:39 “Why should I use Positron instead of Versus Code?” 47:57 “Can you speak to the value of developing an R package and how to clear the mental hurdle of it being a huge challenge?” 52:34 “What does your career trajectory look like and what is your advice for other people who are looking to grow their career but don’t know if they want to be an IC or a manager? Does being a manager mean you don’t get to write code anymore?”

Jenny Bryan

Bringing data science to the construction industry | Blake Abbenante | Data Science Hangout

To join future data science hangouts, add it to your calendar here: https://pos.it/dsh - All are welcome! We’d love to see you!

We were recently joined by Blake Abbenante, Director of Analytics and Data Science at Suffolk Construction, to chat about his career journey in data science, implementing modern data practices in the construction industry, innovative applications of AI and data science in construction, and building a data-driven culture in a traditionally less tech-focused sector.

In this Hangout, we explore innovative applications of AI and data science in construction. Blake shared how Suffolk Construction is leveraging cutting-edge technologies like AI to revolutionize traditional processes. One focus is their GenAI scheduling tool, which aims to augment and speed up the design and planning phases of building projects. This tool has the potential to significantly reduce the time planners spend on creating schedules, moving from weeks to potentially minutes or hours for an 80% completion rate. Blake discussed the development and implementation of safety models that forecast risk on projects, enabling proactive measures to ensure safer construction sites by predicting which projects might require additional safety personnel based on historical data.

Resources mentioned in the video and zoom chat: The ellmer R package → https://ellmer.tidyverse.org/ The chatlas R package → https://github.com/posit-dev/chatlas Posit Blog Post on ellmer → https://posit.co/blog/announcing-ellmer/

If you didn’t join live, one great discussion you missed from the zoom chat was about the challenges of data collection and analysis when encountering pushback from those whose work is being analyzed, and strategies to build trust and demonstrate value. Let us know below if you’d like to hear more about this topic!

► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu

Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co

Thanks for hanging out with us!

Teaching the tidyverse in 2023 | Mine Çetinkaya-Rundel

Recommendations for teaching the tidyverse in 2023, summarizing package updates most relevant for teaching data science with the tidyverse, particularly to new learners.

00:00 Introduction 00:46 Using addins to switch between RStudio themes (See https://github.com/mine-cetinkaya-rundel/addmins for more info) 01:40 Native pipe 03:08 Nine core packages in tidyverse 2.0.0 07:15 Conflict resolution in the tidyverse 11:30 Improved and expanded *_join() functionality 22:05 Per operation grouping 27:41 Quality of life improvements to case_when() and if_else() 31:41 New syntax for separating columns 34:51 New argument for line geoms: linewidth 36:08 Wrap up

See more in the Teaching the tidyverse in 2023 blog post https://www.tidyverse.org/blog/2023/08/teach-tidyverse-23

Mine Çetinkaya-Rundel

rstudio tidyverse tidyverse.org Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Jenny Bryan | Help me help you: creating reproducible examples | RStudio (2018)

What is a reprex? It’s a reproducible example. Making a great reprex is both an art and a science and this webinar will cover both aspects. A reprex makes a conversation about code more efficient and pleasant for all. This comes up whenever you ask someone for help, report a bug in software, or propose a new feature. The reprex package (https://reprex.Tidyverse.org ) makes it especially easy to prepare R code as a reprex, in order to share on sites such as https://community.rstudio.com , https://github.com , or https://stackoverflow.com . The habit of making little, rigorous, self-contained examples also has the great side effect of making you think more clearly about your programming problems.

Webinar materials: https://rstudio.com/resources/webinars/help-me-help-you-creating-reproducible-examples/

About Jenny: Jenny is a software engineer on the tidyverse team. She is a recovering biostatistician who takes special delight in eliminating the small agonies of data analysis. Jenny is known for smoothing the interfaces between R and spreadsheets, web APIs, and Git/GitHub. She’s been working in R/S for over 20 years and is a member of the R Foundation. She also serves in the leadership of rOpenSci and Forwards and is an adjunct professor at the University of British Columbia

Jenny Bryan

reprex rstudio tidyverse tidyverse.org webinars Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Reprex

Data Manipulation Tools: dplyr – Pt 3 Intro to the Grammar of Data Manipulation with R

Data wrangling is too often the most time-consuming part of data science and applied statistics. Two tidyverse packages, tidyr and dplyr, help make data manipulation tasks easier. Keep your code clean and clear and reduce the cognitive load required for common but often complex data science tasks.

dplyr docs: dplyr.tidyverse.org/reference/

Pt. 1: What is data wrangling? Intro, Motivation, Outline, Setup https://youtu.be/jOd65mR1zfw

/01:44 Intro and what’s covered Ground Rules
/02:40 What’s a tibble
/04:50 Use View
/05:25 The Pipe operator:
/07:20 What do I mean by data wrangling?

Pt. 2: Tidy Data and tidyr https://youtu.be/1ELALQlO-yM

/00:48 Goal 1 Making your data suitable for R
/01:40 tidyr “Tidy” Data introduced and motivated
/08:10 tidyr::gather
/12:30 tidyr::spread
/15:23 tidyr::unite
/15:23 tidyr::separate

Pt. 3: Data manipulation tools: dplyr https://youtu.be/Zc_ufg4uW4U

00.40 setup
02:00 dplyr::select
03:40 dplyr::filter
05:05 dplyr::mutate
07:05 dplyr::summarise
08:30 dplyr::arrange
09:55 Combining these tools with the pipe (Setup for the Grammar of Data Manipulation)
11:45 dplyr::group_by

Pt. 4: Working with Two Datasets: Binds, Set Operations, and Joins https://youtu.be/AuBgYDCg1Cg Combining two datasets together

/00.42 dplyr::bind_cols
/01:27 dplyr::bind_rows
/01:42 Set operations dplyr::union, dplyr::intersect, dplyr::set_diff
/02:15 joining data dplyr::left_join, dplyr::inner_join, dplyr::right_join, dplyr::full_join,

Cheatsheets: https://www.rstudio.com/resources/cheatsheets/

Documentation: tidyr docs: tidyr.tidyverse.org/reference/

tidyr vignette: https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html dplyr docs: http://dplyr.tidyverse.org/reference/
dplyr one-table vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html
dplyr two-table (join operations) vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/two-table.html

Tidy Data and tidyr – Pt 2 Intro to Data Wrangling with R and the Tidyverse

Data wrangling is too often the most time-consuming part of data science and applied statistics. Two tidyverse packages, tidyr and dplyr, help make data manipulation tasks easier. Keep your code clean and clear and reduce the cognitive load required for common but often complex data science tasks.

http://tidyr.tidyverse.org/reference/

Pt. 1: What is data wrangling? Intro, Motivation, Outline, Setup https://youtu.be/jOd65mR1zfw

/01:44 Intro and what’s covered Ground Rules
/02:40 What’s a tibble
/04:50 Use View
/05:25 The Pipe operator:
/07:20 What do I mean by data wrangling?

Pt. 2: Tidy Data and tidyr https://youtu.be/1ELALQlO-yM

00:48 Goal 1 Making your data suitable for R
01:40 tidyr “Tidy” Data introduced and motivated
08:10 tidyr::gather
12:30 tidyr::spread
15:23 tidyr::unite
15:23 tidyr::separate

Pt. 3: Data manipulation tools: dplyr https://youtu.be/Zc_ufg4uW4U

00.40 setup
/02:00 dplyr::select
/03:40 dplyr::filter
/05:05 dplyr::mutate
/07:05 dplyr::summarise
/08:30 dplyr::arrange
/09:55 Combining these tools with the pipe (Setup for the Grammar of Data Manipulation)
/11:45 dplyr::group_by
/15:00 dplyr::group_by

Pt. 4: Working with Two Datasets: Binds, Set Operations, and Joins https://youtu.be/AuBgYDCg1Cg Combining two datasets together

/00.42 dplyr::bind_cols
/01:27 dplyr::bind_rows
/01:42 Set operations dplyr::union, dplyr::intersect, dplyr::set_diff
/02:15 joining data dplyr::left_join, dplyr::inner_join, dplyr::right_join, dplyr::full_join,

Cheatsheets: https://www.rstudio.com/resources/cheatsheets/

Documentation: tidyr docs: tidyr.tidyverse.org/reference/

tidyr vignette: https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html dplyr docs: http://dplyr.tidyverse.org/reference/
dplyr one-table vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html
dplyr two-table (join operations) vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/two-table.html

What is data wrangling? Intro, Motivation, Outline, Setup – Pt. 1 Data Wrangling Introduction

Data wrangling is too often the most time-consuming part of data science and applied statistics. Two tidyverse packages, tidyr and dplyr, help make data manipulation tasks easier. These videos introduce you to these tools. Keep your R code clean and clear and reduce the cognitive load required for common but often complex data science tasks.

Pt. 1: What is data wrangling? Intro, Motivation, Outline, Setup https://youtu.be/jOd65mR1zfw

01:44 Intro and what’s covered Ground Rules
02:40 What’s a tibble
04:50 Use View
05:25 The Pipe operator:
07:20 What do I mean by data wrangling?

Pt. 2: Tidy Data and tidyr https://youtu.be/1ELALQlO-yM

/00:48 Goal 1 Making your data suitable for R
/01:40 tidyr “Tidy” Data introduced and motivated
/08:15 tidyr::gather
/12:38 tidyr::spread
/15:30 tidyr::unite
/15:30 tidyr::separate

Pt. 3: Data manipulation tools: dplyr https://youtu.be/Zc_ufg4uW4U

00.40 setup
/02:00 dplyr::select
/03:40 dplyr::filter
/05:05 dplyr::mutate
/07:05 dplyr::summarise
/08:30 dplyr::arrange
/09:55 Combining these tools with the pipe (Setup for the Grammar of Data Manipulation)
/11:45 dplyr::group_by
/15:00 dplyr::group_by

Pt. 4: Working with Two Datasets: Binds, Set Operations, and Joins https://youtu.be/AuBgYDCg1Cg Combining two datasets together

/00.42 dplyr::bind_cols
/01:27 dplyr::bind_rows
/01:42 Set operations dplyr::union, dplyr::intersect, dplyr::set_diff
/02:15 joining data dplyr::left_join, dplyr::inner_join, dplyr::right_join, dplyr::full_join,

Cheatsheets: https://www.rstudio.com/resources/cheatsheets/

Documentation: tidyr docs: tidyr.tidyverse.org/reference/

tidyr vignette: https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html dplyr docs: http://dplyr.tidyverse.org/reference/
dplyr one-table vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html
dplyr two-table (join operations) vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/two-table.html

New York Times “For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights”, By STEVE LOHRAUG. 17, 2014 https://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-work.html

Working with Two Datasets: Binds, Set Operations, and Joins – Pt 4 Intro to Data Manipulation

Data wrangling is too often the most time-consuming part of data science and applied statistics. Two tidyverse packages, tidyr and dplyr, help make data manipulation tasks easier. Keep your R code clean and clear and reduce the cognitive load required for common but often complex data science tasks.

dplyr docs: dplyr.tidyverse.org/reference/

Pt. 1: What is data wrangling? Intro, Motivation, Outline, Setup https://youtu.be/jOd65mR1zfw

/01:44 Intro and what’s covered Ground Rules:
/02:40 What’s a tibble
/04:50 Use View
/05:25 The Pipe operator:
/07:20 What do I mean by data wrangling?

Pt. 2: Tidy Data and tidyr https://youtu.be/1ELALQlO-yM

/00:48 Goal 1 Making your data suitable for R
/01:40 tidyr “Tidy” Data introduced and motivated
/08:10 tidyr::gather
/12:30 tidyr::spread
/15:23 tidyr::unite
/15:23 tidyr::separate

Pt. 3: Data manipulation tools: dplyr https://youtu.be/Zc_ufg4uW4U

/00.40 setup
/02:00 dplyr::select
/03:40 dplyr::filter
/05:05 dplyr::mutate
/07:05 dplyr::summarise
/08:30 dplyr::arrange
/09:55 Combining these tools with the pipe (Setup for the Grammar of Data Manipulation)
/11:45 dplyr::group_by

Pt. 4: Working with Two Datasets: Binds, Set Operations, and Joins https://youtu.be/AuBgYDCg1Cg Combining two datasets together

00.42 dplyr::bind_cols
01:27 dplyr::bind_rows
01:42 Set operations dplyr::union, dplyr::intersect, dplyr::set_diff
02:15 joining data - dplyr::left_join, dplyr::inner_join, - dplyr::right_join, dplyr::full_join,

Cheatsheets: https://www.rstudio.com/resources/cheatsheets/

Documentation: tidyr docs: tidyr.tidyverse.org/reference/

tidyr vignette: https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html dplyr docs: http://dplyr.tidyverse.org/reference/
dplyr one-table vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html
dplyr two-table (join operations) vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/two-table.html

tidyverse.org

Contributors#

Hadley Wickham

Max Kuhn

Jenny Bryan

Lionel Henry

Davis Vaughan

Thomas Lin Pedersen

Simon Couch

Julia Silge

Emil Hvitfeldt

Mine Çetinkaya-Rundel

Gábor Csárdi

Hannah Frick

Teun Van den Brand

George Stagg

Charlie Gao

Tomasz Kalinowski

Edgar Ruiz

Garrick Aden-Buie

Resources featuring tidyverse.org#

Coding vs. thinking programmatically | Samia Baig | Data Science Hangout

Data analysis with Posit AI-assistants | Sara Altman & Simon Couch | Data Science Lab

The mall package: using LLMs with data frames in R & Python | Edgar Ruiz | Data Science Lab

Getting Started with LLM APIs in R

Inspecting websites to find JSON data APIs | Marcos Huerta | Data Science Lab

Advent of Code for R users | Emil Hvitfeldt | Data Science Lab

Simon Couch - Practical AI for data science

Air: A blazingly fast R code formatter - Davis Vaughan, Lionel Henry

R & Python Interoperability in Data Science Teams | Dave Gruenewald | Data Science Hangout

Building the Future of Data Apps: LLMs Meet Shiny

New data science tools & old laptops on fire | Jenny Bryan | Data Science Hangout

Harnessing LLMs for Data Analysis | Led by Joe Cheng, CTO at Posit

Bringing data science to the construction industry | Blake Abbenante | Data Science Hangout

Wes McKinney & Hadley Wickham (on cross-language collaboration, Positron, career beginnings, & more)

Joe Cheng - Summer is Coming: AI for R, Shiny, and Pharma

Ask Hadley Anything

Teaching the tidyverse in 2023 | Mine Çetinkaya-Rundel

Hadley Wickham | {purrr} 1.0: A complete and consistent set of tools for functions and vectors

R-Ladies Rome (English) - What’s new in the tidyverse - Isabella Velasquez

Posit Meetup | Jake Riley, Children’s Hospital of Philadelphia | Translating Facts to Insights

Alan Carlson | Robust, modular dashboards that minimize tech debt | RStudio

Data Science Hangout | Mike Smith, Pfizer | Building an R Center of Excellence

George Mount | R for Excel Users - First Steps | RStudio Meetup

Jenny Bryan | Help me help you: creating reproducible examples | RStudio (2018)

Data Manipulation Tools: dplyr – Pt 3 Intro to the Grammar of Data Manipulation with R

Tidy Data and tidyr – Pt 2 Intro to Data Wrangling with R and the Tidyverse

What is data wrangling? Intro, Motivation, Outline, Setup – Pt. 1 Data Wrangling Introduction

Working with Two Datasets: Binds, Set Operations, and Joins – Pt 4 Intro to Data Manipulation