tidyverse.org
Source of tidyverse.org
The tidyverse.org repository contains the source code for the tidyverse website, built using hugodown (not blogdown) and Hugo. It allows contributors to create and publish blog posts and event announcements through a two-step rendering process: hugodown converts R Markdown files to Markdown, then Hugo generates the final HTML.
The repository provides a structured workflow for content creation with helper functions like hugodown::use_tidy_post() for blog posts and automatic live previews via Netlify for every pull request. The hugodown approach cleanly separates R Markdown rendering from site generation, meaning .Rmd files are only rendered when explicitly knitted rather than automatically rebuilt. Contributors can fix small issues directly through pull requests and are encouraged to open issues for larger changes.
Contributors#
Resources featuring tidyverse.org#
Coding vs. thinking programmatically | Samia Baig | Data Science Hangout
ADD THE DATA SCIENCE HANGOUT TO YOUR CALENDAR HERE: https://pos.it/dsh - All are welcome! We’d love to see you!
This week’s guest was Samia Baig, Senior Data Scientist/Data Engineer at Johnson & Johnson Innovative Medicine!
Some topics covered in this week’s Hangout were transitioning from a background in pharmacy and public health to a data career in pharma, distinguishing the responsibilities of data scientists versus analytics engineers, strategies for making data pipelines more robust (and convincing your team that you NEED robust pipelines in the first place), and the value of joining open-source communities like Tidy Tuesday.
Resources mentioned in the video and chat: Posit Data Science Lab → https://pos.it/dslab Tidy Tuesday GitHub Repository → https://github.com/rfordatascience/tidytuesday {dbplyr} → https://dbplyr.tidyverse.org/ The Missing Semester of Your CS Education → https://missing.csail.mit.edu/
► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu
Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh The Lab: https://pos.it/dslab LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co
Thanks for hanging out with us!
Timestamps 00:00 Introduction 03:22 “Do you feel like analytics engineer is a good descriptor for what you do?” 04:44 “How did you get into data from being on a pharmacist’s job path?” 11:55 “What was it like in the move that you made from public health to pharma?” 16:57 “What do you say are distinguishing factors between data science and engineering?” 20:16 “What are the most popular tools that you and your team use, in your job at J&J?” 24:00 “What do you use SQL in?” 27:40 “How would you go about convincing a team of the need for a more robust pipeline?” 31:10 “Can you define robust?” 33:31 “Do you happen to have any specific resources or strategies or examples that might help students or others with that mindset of thinking programmatically?” 37:06 “Are there any non data science skills that are very helpful in your either current or former job?” 40:23 “Is there any kind of community among data scientists across the whole company?” 45:44 “What are your biggest data challenges that you have?” 46:12 “If you had a magic wand, what problem would you solve in that area?” 49:52 “What is a piece of career advice that maybe you wish you could go back in time and give yourself?”
Data analysis with Posit AI-assistants | Sara Altman & Simon Couch | Data Science Lab
The Data Science Lab is a live weekly call. Register at pos.it/dslab! Discord invites go out each week on lives calls. We’d love to have you!
The Lab is an open, messy space for learning and asking questions. Think of it like pair coding with a friend or two. Learn something new, and share what you know to help others grow.
On this call, Libby Heeren is joined by Sara Altman who walks through using Posit’s AI assistants to analyze data, including a sneak peek at Posit Assistant, and Simon Couch drops by to give us a demo of the reviewer package! Together, Sara and Simon author the Posit AI Newsletter, the best place to stay up-to-date with all the cool tools and advice on staying an informed and level-headed AI user.
Hosting crew from Posit: Libby Heeren, Isabella Velasquez, Sara Altman, Simon Couch
Sara’s Bluesky: https://bsky.app/profile/sara-altman.bsky.social Sara’s LinkedIn: https://www.linkedin.com/in/sarakaltman/ Sara’s GitHub: https://github.com/skaltman Posit AI Newsletter by Sara and Simon: https://posit.co/blog/?category=roundups
Resources from the hosts and chat:
Positron IDE → https://positron.posit.co/ Databot Extension → https://positron.posit.co/databot.html Getting started with Positron Assistant → https://positron.posit.co/assistant-getting-started.html Posit Assistant (Private Beta) → https://posit-ai-beta.share.connect.posit.cloud/ Reviewer Package (by Simon Couch) → https://github.com/simonpcouch/reviewer ellmer Package → https://elmer.tidyverse.org/ chatlas Package → https://github.com/posit-dev/chatlas Read the Posit AI Newsletter → https://posit.co/blog/?category=roundups Sign up to get the Posit AI Newsletter → http://pos.it/ai-news Simon’s blog post about local LLMs not quite being ready for primetime → https://posit.co/blog/local-models-are-not-there-yet/ Join the waitlist for Posit AI in RStudio → https://posit.co/products/ai/ Posit AI Known Issues & FAQs → https://posit-ai-beta.share.connect.posit.cloud/#frequently-asked-questions-faqs Blog post from Simon and Sara about Privacy and LLMs → https://posit.co/blog/trust-llm-tools/ DS Lab YouTube playlist → https://youtube.com/playlist?list=PL9HYL-VRX0oSeWeMEGQt0id7adYQXebhT&si=7tmU6EAJpO5S7GBh
► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu
Follow Us Here: Website: https://www.posit.co The Lab: https://pos.it/dslab Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co
Thanks for learning with us!
Timestamps 00:00 Introduction 07:23 “Would you mind real quick just briefly explaining the differences between Positron Assistant and Databot?” 15:01 “Is there any way to configure reasoning efforts when signing in with GitHub Copilot?” 15:49 “Does DataBot already support other providers beyond Cloud?” 20:36 “What is the cases with monetary penalty in the console output?” 22:14 “Do you happen to know if the column names of the dataset are very, very messy?” 23:18 “Can you add skills to DataBot?” 26:36 “This code isn’t being saved anywhere. So where does it go?” 27:38 “There a way to know what all the slash commands are?” 28:51 Requesting Databot to use the namespace operator 33:58 “Is there a way to search within that Databot pane?” 39:34 “Have you noticed any time differences with how quickly things run-in RStudio versus Positron?” 40:33 “What happens if you open that URL that it mentions at the bottom in your browser?” 40:50 Clarifying the difference between Posit Assistant and Positron Assistant 43:18 “What is the typical token burn rate?” 53:31 “Is this on CRAN and working in both Positron and RStudio?”

The mall package: using LLMs with data frames in R & Python | Edgar Ruiz | Data Science Lab
The Data Science Lab is a live weekly call. Register at pos.it/dslab! Discord invites go out each week on lives calls. We’d love to have you!
The Lab is an open, messy space for learning and asking questions. Think of it like pair coding with a friend or two. Learn something new, and share what you know to help others grow.
On this call, Libby Heeren is joined by Edgar Ruiz as they walk through how mall works (with ellmer) in R, and then python. The mall package lets you use LLMs to process tabular or vectors of data, letting you do things such as feeding it a column of reviews and asking mall to use an anthropic model via ellmer to add a column of summaries or sentiments. Follow along with the code here: https://github.com/LibbyHeeren/mall-package-r
Hosting crew from Posit: Libby Heeren, Isabella Velasquez, Edgar Ruiz
Edgar’s Bluesky: https://bsky.app/profile/theotheredgar.bsky.social Edgar’s LinkedIn: https://www.linkedin.com/in/edgararuiz/ Edgar’s GitHub: https://github.com/edgararuiz
Resources from the hosts and chat:
Ollama → https://ollama.com/download Posit Data Science Lab → https://posit.co/dslab mall package → https://mlverse.github.io/mall/ ellmer package → https://elmer.tidyverse.org/ Libby’s Positron theme (Catppuccin) → https://marketplace.visualstudio.com/items?itemName=Catppuccin.catppuccin-vsc GitHub repo with Libby and Edgar’s code → https://github.com/LibbyHeeren/mall-package-r LLM providers supported by ellmer → https://ellmer.tidyverse.org/index.html#providers vitals package → https://vitals.tidyverse.org/ chatlas package → https://posit-dev.github.io/chatlas/ polars package → https://pola.rs/ narwhals package → https://narwhals-dev.github.io/narwhals/ pandas package → https://pandas.pydata.org/ LM Studio → https://lmstudio.ai/ Simon Couch’s blog → https://www.simonpcouch.com/ Edgar’s dataset: TidyTuesday Animal Crossing Dataset (May 5, 2020) → https://github.com/rfordatascience/tidytuesday Libby’s dataset: Kaggle Tweets Dataset → https://www.kaggle.com/datasets/mmmarchetti/tweets-dataset Blog from Sara and Simon on evaluating LLMs → https://posit.co/blog/r-llm-evaluation-03/ Data Science Lab YouTube playlist → https://www.youtube.com/watch?v=LDHGENv1NP4&list=PL9HYL-VRX0oSeWeMEGQt0id7adYQXebhT&index=2 AWS Bedrock → https://aws.amazon.com/bedrock/ Anthropic → https://www.anthropic.com/ Google Gemini → https://gemini.google.com/ What is rubber duck debugging anyway?? → https://en.wikipedia.org/wiki/Rubber_duck_debugging
► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu
Follow Us Here: Website: https://www.posit.co The Lab: https://pos.it/dslab Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co
Thanks for learning with us!
Timestamps 00:00 Introduction to Libby, Isabella, Edgar, and the mall package + ellmer package 07:14 “What’s the difference between using mall for these NLP tasks versus traditional or classical NLP?” 09:37 “Can mall be used with a local LLM?” 17:32 “What kind of laptop specs should I realistically have to make good use of these models?” 22:12 “Are you limited to three output options?” 22:55 “Can mall return the prediction probabilities?” 24:14 “What are a rule of thumb set of specs for a machine so local LLMs are practically feasible?” 24:47 “Would that be in the additional prompt area where you’re defining things?” 25:04 “You could use the vitals package to compare models, right?” 25:24 “Can we use LM Studio instead of Ollama?” 28:35 “How do you iterate and validate the model?” 36:39 “Why use paste if it is all text?” 37:31 “Are these recent tweets (from X) or older ones from actual Twitter?” 40:23 “Is there a playlist for the Data Science Labs on YouTube?” 46:11 “Does that mean that the python version does not work with pandas?” 50:14 “Where is this data set from?”


Getting Started with LLM APIs in R
Getting Started with LLM APIs in R - Sara Altman
Abstract: LLMs are transforming how we write code, build tools, and analyze data, but getting started with directly working with LLM APIs can feel daunting. This workshop will introduce participants to programming with LLM APIs in R using ellmer, an open-source package that makes it easy to work with LLMs from R. We’ll cover the basics of calling LLMs from R, as well as system prompt design, tool calling, and building basic chatbots. No AI or machine learning background is required—just basic R familiarity. Participants will leave with example scripts they can adapt to their own projects.
Resources mentioned in the workshop:
- Workshop site: https://skaltman.github.io/r-pharma-llm/
- ellmer documentation: https://ellmer.tidyverse.org/
- shinychat documentation: https://posit-dev.github.io/shinychat/
Inspecting websites to find JSON data APIs | Marcos Huerta | Data Science Lab
The Data Science Lab is a live weekly call. Register at pos.it/dslab! Discord invites go out each week on lives calls. We’d love to have you!
The Lab is an open, messy space for learning and asking questions. Think of it like pair coding with a friend or two. Learn something new, and share what you know to help others grow.
On this call, Libby Heeren is joined by Marcos Huerta, a Data Science Manager at Carmax, as he walks us through the guts of websites looking for data we can play with. He shows us how to find hidden REST/JSON APIs by using the web inspector in Safari/Firefox and then how to get what’s necessary to pull the same data programmatically in python or R.
Hosting crew from Posit: Libby Heeren, Isabella Velasquez, Daniel Chen
Marcos’s urls: Website: https://marcoshuerta.com GitHub: https://github.com/astrowonk/
Resources from the hosts and from participants in the Discord chat:
Postman: https://www.postman.com/ Insomnia (open source alternative to Postman): https://insomnia.rest/ Baseball Savant website Marcos is using: https://baseballsavant.mlb.com/gamefeed/?gamePk=777076 Isabella Velasquez’s blog on using {polite} R package to help scrape Wikipedia: https://ivelasq.rbind.io/blog/politely-scraping/ Festivas Mac app Marcos used to add the lights to his desktop: https://festivitas.app/ Ted Laderas blog post on parsing JSON in R: https://laderast.github.io/intro_apis_json_cascadia/#/how-does-r-translate-json New rvest read_html_live() function: https://rvest.tidyverse.org/reference/read_html_live.html yyjsonr R package: https://github.com/coolbutuseless/yyjsonr tuber R package: https://github.com/gojiplus/tuber WikipediaR R package: https://www.quantargo.com/help/r/latest/packages/WikipediaR/1.1/WikipediaR-package rookiepy python package: https://pypi.org/project/rookiepy/
► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu
Follow Us Here: Website: https://www.posit.co The Lab: https://pos.it/dslab Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co
Thanks for learning with us!
Timestamps 00:00 Introduction 03:05 Web scraping vs. API calls 04:12 Server-side rendering vs. client-side JSON 06:12 Warning: Rate limits and business ethics (ahem) 08:39 Demo: Baseball Savant website 08:57 Using browser Developer Tools and the Network tab 12:15 “What is curl?” 13:30 Importing curl into Postman 16:03 Generating Python code from Postman 16:50 “Are there open source alternatives to Postman?” 17:50 Using the generated code in Python/Jupyter 22:28 R packages for JSON (jsonlite, yyjsonr) 25:09 Demo: Massachusetts Lottery website 28:17 Example: scripts Marcos automated with Cron jobs 30:17 Handling logins and cookies with RookiePie 32:19 Demo: CNN Election Data 34:26 Inspecting ESPN’s website 36:58 “Can you scrape YouTube?” 38:19 Finding hidden JSON in CardsMania history 45:00 Benefits of API inspection over Beautiful Soup 46:59 New rvest function: read_html_live 50:40 Inspecting LinkedIn and finding GraphQL 53:58 Encouragement on handling API pagination
Advent of Code for R users | Emil Hvitfeldt | Data Science Lab
The Data Science Lab is a live weekly call. Register at pos.it/dslab! Discord invites go out each week on lives calls. We’d love to have you!
The Lab is an open, messy space for learning and asking questions. Think of it like pair coding with a friend or two. Learn something new, and share what you know to help others grow.
On this call, Libby Heeren is joined by Posit engineer Emil Hvitfeldt as he walks through Day 1 of Advent of Code 2026 using R. This is a super friendly, collaborative, and cheery intro to AoC! Don’t forget, you can do Advent of Code at any ole time of year
Hosting crew from Posit: Libby Heeren, Isabella Velasquez, Daniel Chen, Emil Hvitfeldt
Emil’s socials and urls: website: https://emilhvitfeldt.com/ GitHub: https://github.com/emilhvitfeldt Bluesky: https://bsky.app/profile/emilhvitfeldt.bsky.social LinkedIn: https://www.linkedin.com/in/emilhvitfeldt/
Resources from the hosts and chat:
Advent of Code: https://adventofcode.com/ Install Positron: https://positron.posit.co/ Eric Wastl, Advent of Code: Behind the Scenes: https://www.youtube.com/watch?v=_oNOTknRTSU AoC Subreddit: https://www.reddit.com/r/adventofcode/ Kieran Healy shared a reddit post with an Advent of Code answer done in Minecraft: https://www.reddit.com/r/adventofcode/comments/1pbeyxx/2025_day_01_part_2_advent_of_code_in_minecraft/ Emil’s Solutions: https://github.com/EmilHvitfeldt/rstats-adventofcode Emil’s helper package: https://github.com/EmilHvitfeldt/aocfuns purrr::accumulate() function: https://purrr.tidyverse.org/reference/accumulate.html
And, for anyone hangin’ in there at the end, Emil updated us on Discord that he figured out why his cumsum() didn’t work: he forgot to start the dial at 50! Once you fix that, it works to solve part 1 :)
► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu
Follow Us Here: Website: https://www.posit.co The Lab: https://pos.it/dslab Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co
Thanks for learning with us!
Timestamps 00:00 Introduction 01:01 Tour of the Advent of Code website 02:30 Dashboard overview and puzzle schedule 03:23 How to view and access previous years’ events 03:37 Structure of puzzles: Two parts and stars 04:40 Understanding the global leaderboard 05:08 “Does that ASCII art build itself? 06:16 Setting up private leaderboards for friend 07:54 Starting Day 1: Story prompt and mechanics 09:30 Understanding unique puzzle inputs 10:51 Submission feedback and delay penalties 11:44 Safe dial logic: Left, Right, and circularity 12:50 Starting position and Part 1 success criteria 14:09 Setting up the project in Positron 16:26 Strategy for speed: Reading from the bottom up 18:49 Problem-solving strategies: Pen, paper, and visualization 19:22 Walking through the logic with a sample case 20:52 Coding Part 1: Data parsing and vectorization 23:17 Positron keyboard shortcuts for duplicating lines 24:40 Debugging the logic and handling negative numbers 26:03 Explaining the Modulo operator (%%) 28:15 Managing large inputs of over 4,000 instructions 29:21 Submitting Part 1 and transitioning to Part 2 32:03 Part 2 challenge: Counting zero “clicks” 34:02 Brainstorming Part 2 code modifications 36:19 Checking important warnings for edge cases 37:00 Coding Part 2: Nested loops and incrementing counters 38:23 Hint: Modulo vs. integer division 40:40 Success with the Part 2 test case 42:30 Alternative method: Vectorized cumulative sums 45:29 “What’s the difference between % and %%?” (percent vs modulo) 46:50 Mathematical optimization to avoid inner loops

Simon Couch - Practical AI for data science
Practical AI for data science (Simon Couch)
Abstract: While most discourse about AI focuses on glamorous, ungrounded applications, data scientists spend most of their days tackling unglamorous problems in sensitive data. Integrated thoughtfully, LLMs are quite useful in practice for all sorts of everyday data science tasks, even when restricted to secure deployments that protect proprietary information. At Posit, our work on ellmer and related R packages has focused on enabling these practical uses. This talk will outline three practical AI use-cases—structured data extraction, tool calling, and coding—and offer guidance on getting started with LLMs when your data and code is confidential.
Presented at the 2025 R/Pharma Conference Europe/US Track.
Resources mentioned in the presentation:
- {vitals}: Large Language Model Evaluations https://vitals.tidyverse.org/
- {mcptools}: Model Context Protocol for R https://posit-dev.github.io/mcptools/
- {btw}: A complete toolkit for connecting R and LLMs https://posit-dev.github.io/btw/
- {gander}: High-performance, low-friction Large Language Model chat for data scientists https://simonpcouch.github.io/gander/
- {chores}: A collection of large language model assistants https://simonpcouch.github.io/chores/
- {predictive}: A frontend for predictive modeling with tidymodels https://github.com/simonpcouch/predictive
- {kapa}: RAG-based search via the kapa.ai API https://github.com/simonpcouch/kapa
- Databot https://positron.posit.co/dat

Air: A blazingly fast R code formatter - Davis Vaughan, Lionel Henry
In Python, Rust, Go, and many other languages, code formatters are widely loved. They run on every save, on every pull request, and in git pre-commit hooks to ensure code consistently looks its best at all times.
In this talk, you’ll learn about Air, a new R code formatter. Air is extremely fast, capable of formatting individual files so fast that you’ll question if its even running, and of formatting entire projects in under a second. Air integrates directly with your favorite IDEs, like Positron, RStudio, and VS Code, and is available on the command line, making it easy to standardize on one tool even for teams using various IDEs.
Once you start using Air, you’ll never worry about code style ever again!
https://www.tidyverse.org/blog/2025/02/air/ https://github.com/posit-dev/air


R & Python Interoperability in Data Science Teams | Dave Gruenewald | Data Science Hangout
ADD THE DATA SCIENCE HANGOUT TO YOUR CALENDAR HERE: https://pos.it/dsh - All are welcome! We’d love to see you!
We were recently joined by Dave Gruenewald, Senior Director of Data Science at Centene, to chat about polyglot teams, data science best practices, right-sizing development efforts, and process automation.
In this Hangout, we explore working in a polyglot team and fostering interoperability (a word that Libby loves, but struggles to pronounce out loud). Dave Gruenewald emphasizes that teams should use the tools they are comfortable with, whether that’s R or Python. Some strategies for collaboration across languages that Dave suggests include tools like Quarto to seamlessly run R and Python code in the same report. Teams utilize data science checkpoints, saving outputs as platform-agnostic file types like Parquet so that they can be accessed by any language. The use of REST APIs allows R processes to be accessed programmatically by Python (and vice versa), which can be a real game-changer. The newly released nanonext package was also highlighted as a promising development for improved interoperability.
Resources mentioned in the video and zoom chat: Posit Conf 2025 Table and Plotnine Contests → https://posit.co/blog/announcing-the-2025-table-and-plotnine-contests/ nanonext 1.7.0 Tidyverse Blog Post → https://www.tidyverse.org/blog/2025/09/nanonext-1-7-0/
If you didn’t join live, one great discussion you missed from the zoom chat was about pivoting away from academia, including leaving PhD programs. Many attendees shared their personal experiences of making the difficult decision to drop out of a PhD program. The community suggested alternative terms like “pivot,” “reallocating your resources,” or being a “refugee fleeing academia” instead of “drop out.” Dave Gruenewald shared that he himself left a PhD program but has “no regrets about that.” Did you leave a PhD program? You’re not alone!
► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu
Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co
Thanks for hanging out with us!
Timestamps: 00:00 Introduction 02:21 “What types of data do your teams use?” 06:53 “Which of the three pillars you mentioned is your personal favorite to work on?” 09:26 “How do you avoid or divert scope creep?” 11:41 “How much of the project should be “planning” before any code happens?” 13:53 “Do you feel like people are just hopping in and going, hey, LLM, make me a POC?” 14:28 “Do you give them what they say they want, or do you give them what they need?” 16:40 “I’m wondering what public data do you wish existed?” 18:48 “Why not Positron yet?” 20:43 “How do you unify as a team and make it so that I can always read everybody else’s code?” 23:10 “Could you talk a little bit about how R and Python work together?” 27:28 “How to start package development with a team who are very new to package development.” 33:01 “What’s your greatest regret career wise?” 35:53 “What about your biggest wins, specifically in your early career?” 39:40 “How would you recommend building a data science culture and community from scratch?” 41:49 “Would you set a specific timeline for EDA, exploratory analysis, to scope the project better?” 45:15 “How do you define fun projects, and how much time do you allocate for exploration in those?” 48:21 “Does your team use DVC or something similar for data version control?” 50:00 “Can you talk a bit more about your pivot from academia into data science?” 51:31 “Any advice on where to look for opportunities in data science after getting a masters degree?”
Building the Future of Data Apps: LLMs Meet Shiny
GenAI in Pharma 2025 kicks off with Posit’s Phil Bowsher and Garrick Aiden-Buie sharing a technical overview of how LLMs can integrate with Shiny applications and much more!
Abstract: When we think of LLMs (large language models), usually what comes to mind are general purpose chatbots like ChatGPT or code assistants like GitHub Copilot. But as useful as ChatGPT and Copilot are, LLMs have so much more to offer—if you know how to code. In this demo Garrick will explain LLM APIs from zero, and have you building and deploying custom LLM-empowered data workflows and apps in no time.
Resources mentioned in the session:
- GitHub Repository for session: https://github.com/gadenbuie/genAI-2025-llms-meet-shiny
- {mcptools} - Model Context Protocols servers and clients https://posit-dev.github.io/mcptools/
- {vitals} - Large language model evaluation for R https://vitals.tidyverse.org/
New data science tools & old laptops on fire | Jenny Bryan | Data Science Hangout
ADD THE DATA SCIENCE HANGOUT TO YOUR CALENDAR HERE: https://pos.it/dsh - All are welcome! We’d love to see you!
We were joined by Jenny Bryan, Senior Software Engineer at Posit, to chat about (setting laptops on fire,) adapting careers to embrace change and new technologies, behind-the-scenes technical advancements powering the R ecosystem with tools like Positron, demystifying project-based workflows, plus LLM integration and best practices in programming.
Listen to this episode to hear us chat about topics like this:
-
the benefits and limitations of using Large Language Models (LLMs) in programming. Jenny shared her initial skepticism towards LLMs for coding in R, but her attitude changed significantly when applying LLMs to problems involving languages she was less familiar with, like Rust or TypeScript.
-
adapting in your career to embrace change and new technologies. Jenny, who describes herself as being on a “third career”, transitioned from management consulting to a statistics professor, and then to a senior software engineer at Posit. She talks a bit about her career journey and how she’s embracing new stuff (ahem, Typescript) so that she gets to keep doing cool stuff!
-
Positron IDE for R package development. She specifically praises Positron’s unique test explorer and reliable console, and its integrated Data Explorer. For many, Positron offers out-of-the-box data science functionality, unlike other IDEs that require extensive customization.
-
what new technologies like Ark, Air, and Positron mean for the longterm health of R. Jenny’s been working on lots of nerdy things behind the scenes at Posit and she talks all about how they’re great for developers, package builders, data scientists, and engineers alike.
Another tidbit from this hangout: Jenny gave some advice for those looking to branch into software engineering without formal training: try reading code from admired developers, inviting code reviews, and undertaking small, recreational package development projects to gain practical experience and confidence. She also advocates for adopting a project-oriented workflow (associated with her famous “laptop on fire” remark, of course) using tools like the here package for managing project paths.
Resources mentioned in the video and zoom chat: Positron IDE → https://positron.posit.co/ Happy Git with R → https://happygitwithr.com/ Jenny Bryan’s “Project-oriented workflow” blog post → https://www.tidyverse.org/blog/2017/12/workflow-vs-script/ Air R code formatter → https://posit-dev.github.io/air/ The here() package → https://here.r-lib.org/ Posit Conf → https://posit.co/conference/ Tidy Dev Day 2025 → https://www.tidyverse.org/blog/2025/07/tdd-2025/ R Packages book → https://r-pkgs.org/
If you didn’t join live, you missed a ROARINGLY active chat. Let’s just say, if you’ve ever broken down in tears over a programming project, you’re not alone! Come join us live each week if you’d like to hang out in the chat with us!
► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu
Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co
Thanks for hanging out with us!
Timestamps 00:00 Introduction 03:39 “Is that a Wooble on your desk?” (Spoiler, it’s a gnome!!) 06:23 “As a builder of data science tools, what are the tool features data scientists want most?” 08:43 “Have you experienced needing to adapt to change recently and how have you embraced it?” 13:46 “What is ‘setting laptops on fire’ about?” 13:50 “How did you decide to change your career a few times?” 21:23 “What are your thoughts on the ease of putting models into production in Python versus R and does it make sense to shift everybody to one language or the other?” 27:30 “How do you navigate the ‘I have a hammer so everything looks like a nail’ feeling when working with emerging tools like LLMs?” 33:24 “Do you have any general advice for those data scientists who find themselves wanting to branch out more into software engineering but don’t have formal training?” 39:39 “Why should I use Positron instead of Versus Code?” 47:57 “Can you speak to the value of developing an R package and how to clear the mental hurdle of it being a huge challenge?” 52:34 “What does your career trajectory look like and what is your advice for other people who are looking to grow their career but don’t know if they want to be an IC or a manager? Does being a manager mean you don’t get to write code anymore?”

Harnessing LLMs for Data Analysis | Led by Joe Cheng, CTO at Posit
When we think of LLMs (large language models), usually what comes to mind are general purpose chatbots like ChatGPT or code assistants like GitHub Copilot. But as useful as ChatGPT and Copilot are, LLMs have so much more to offer—if you know how to code. In this demo Joe Cheng will explain LLM APIs from zero, and have you building and deploying custom LLM-empowered data workflows and apps in no time.
Posit PBC hosts these Workflow Demos the last Wednesday of every month. To join us for future events, you can register here: https://posit.co/events/
Slides: https://jcheng5.github.io/workflow-demo/ GitHub repo: https://github.com/jcheng5/workflow-demo
Resources shared during the demo: Ellmer https://ellmer.tidyverse.org/ Chatlas https://posit-dev.github.io/chatlas/
Environment variable management: For R: https://docs.posit.co/ide/user/ide/guide/environments/r/managing-r.html#renviron For Python https://pypi.org/project/python-dotenv/
Shiny chatbot UI: For R, Shinychat https://posit-dev.github.io/shinychat/ For Python, ui.Chat https://shiny.posit.co/py/docs/genai-inspiration.html
Deployment Cloud hosting https://connect.posit.cloud On-premises (Enterprise) https://posit.co/products/enterprise/connect/ On-premises (Open source) https://posit.co/products/open-source/shiny-server/
Querychat Demo: https://jcheng.shinyapps.io/sidebot/ Package: https://github.com/posit-dev/querychat/
If you have specific follow-up questions about our professional products, you can schedule time to chat with our team: pos.it/llm-demo

Bringing data science to the construction industry | Blake Abbenante | Data Science Hangout
To join future data science hangouts, add it to your calendar here: https://pos.it/dsh - All are welcome! We’d love to see you!
We were recently joined by Blake Abbenante, Director of Analytics and Data Science at Suffolk Construction, to chat about his career journey in data science, implementing modern data practices in the construction industry, innovative applications of AI and data science in construction, and building a data-driven culture in a traditionally less tech-focused sector.
In this Hangout, we explore innovative applications of AI and data science in construction. Blake shared how Suffolk Construction is leveraging cutting-edge technologies like AI to revolutionize traditional processes. One focus is their GenAI scheduling tool, which aims to augment and speed up the design and planning phases of building projects. This tool has the potential to significantly reduce the time planners spend on creating schedules, moving from weeks to potentially minutes or hours for an 80% completion rate. Blake discussed the development and implementation of safety models that forecast risk on projects, enabling proactive measures to ensure safer construction sites by predicting which projects might require additional safety personnel based on historical data.
Resources mentioned in the video and zoom chat: The ellmer R package → https://ellmer.tidyverse.org/ The chatlas R package → https://github.com/posit-dev/chatlas Posit Blog Post on ellmer → https://posit.co/blog/announcing-ellmer/
If you didn’t join live, one great discussion you missed from the zoom chat was about the challenges of data collection and analysis when encountering pushback from those whose work is being analyzed, and strategies to build trust and demonstrate value. Let us know below if you’d like to hear more about this topic!
► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu
Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co
Thanks for hanging out with us!
Wes McKinney & Hadley Wickham (on cross-language collaboration, Positron, career beginnings, & more)
We hosted a special event hosted by Posit PBC with Wes McKinney (Pandas & Apache Arrow) and Hadley Wickham (rstats & tidyverse) to ask questions, share your thoughts, and exchange insights about cross-language collaboration with fellow data community members.
Here’s a preview into what came up in conversation:
- Cross-language collaboration between R and Python
- Positron, a new polyglot data science IDE
- Open source development, how Wes and Hadley got involved in open source and their experiences in building and maintaining open-source projects such as Pandas and the tidyverse.
- Documentation for R and Python, especially in the context of teams that use both languages (shoutout to Quarto!)
- The use of LLMs in data science
- The emergence of libraries like Polars and DuckDB
- Challenges of switching between the two languages
- Package development and maintenance for polyglot teams that have internal packages in both languages
- The future of data science
The chat was on fire for this conversation and we’ve gathered most of the links shared among the community below:
Documentation mentioned: Positron, next-generation data science IDE built by Posit: https://positron.posit.co/ Quarto tabset documentation: https://quarto.org/docs/output-formats/html-basics.html#tabset-groups
Packages / Extensions mentioned: Pins: https://pins.rstudio.com/ Vetiver: https://vetiver.posit.co Orbital: https://orbital.tidymodels.org Elmer: https://elmer.tidyverse.org Tabby Extension: https://quarto.thecoatlessprofessor.com/tabby/
Blog posts: AI chat apps with Shiny for Python: https://shiny.posit.co/blog/posts/shiny-python-chatstream/ Using an LLM to enhance a data dashboard written in Shiny: R Sidebot & Python Sidebot Marco Gorelli Data Science Hangout (polars): https://youtu.be/lhAc51QtTHk?feature=shared Emily Riederer’s blog post on Polars: https://www.emilyriederer.com/post/py-rgo-polars/ Jeffrey Sumner’s tabset example: https://rpy.ai/posts/visualizations%20with%20r%20and%20python/r_python_visualizations Emily Riederer’s blog post on Python and R ergonomics: https://www.emilyriederer.com/post/py-rgo/11 Sam Tyner’s blog post on Lessons from “Tidy Data”: https://medium.com/@sctyner90/10-lessons-from-tidy-data-on-its-10th-anniversary-dbe2195a82b7
Other: Hadley Wickham’s cocktails website: https://cocktails.hadley.nz 5 Posit subscription management to find out about new tools, events, etc.: https://posit.co/about/subscription-management/
New to Posit? Posit builds enterprise solutions and open source tools for people who do data science with R and Python. (We are also the company formerly called RStudio) We’d love to have you join us for future community events!
Every Thursday from 12-1pm ET we host a Data Science Hangout with the community and invite you to join us! You can add that event to your calendar with this link: https://www.addevent.com/event/Qv9211919

Joe Cheng - Summer is Coming: AI for R, Shiny, and Pharma
Summer is Coming: AI for R, Shiny, and Pharma - Joe Cheng
Abstract: R users tend to be skeptical of modern AI models, given our weird insistence on answers being accurate, or at least supported by the data. But I believe the time has come—or maybe it’s a little late—for even the most AI-cynical among us to push past their discomfort and get informed about what these tools are truly capable of. And key to that is moving beyond using AI-enabled apps, and towards building our own scripts, packages, and apps that make judicious use of AI.
In this talk, I’ll tell you why I believe AI has more to offer the R community than just wrong answers from chat windows or mediocre code suggestions in our IDEs. I’ll also introduce brand-new tools we’re developing at Posit that put powerful AI tools within reach of every R user. And finally, I’ll show how adding some AI could make your next Shiny app dramatically more useful for your users.
Resources mentioned in the talk:
- Slides: https://jcheng5.github.io/pharma-ai-2024
- {elmer} Call LLM APIs from R: https://elmer.tidyverse.org/
- {shinychat} Chat UI component for Shiny for R https://github.com/jcheng5/shinychat
- R/Pharma GenAI Day Recordings: https://www.youtube.com/playlist?list=PLMtxz1fUYA5AYryl4t2mtqBngqWDrnMXJ
Presented at the 2024 R/Pharma Conference

Ask Hadley Anything
A unique opportunity to gain insights directly from a leading expert in open source data science and a driving force behind many popular R packages like ggplot2 and dplyr.
Links from the Q&A: gh-action webscraping demo: https://github.com/hadley/cran-deadlines tidyverse devday 2024: https://www.tidyverse.org/blog/2024/04/tdd-2024/
For the 3 questions on moving from SAS to R in Pharma: Posit and Atorus have partnered on a Posit Academy training: https://posit.co/blog/upskill-to-r-programming-with-posit-and-atorus-research/ And at least 3 pharma companies have shared resources to help people on the transition from statistical programming in SAS, to data science in R: Pfizer exercises: https://github.com/pfizer-opensource/pharma-hands-on-exercises Bayer SAS to R: https://bayer-group.github.io/sas2r/ Roche Coursera course: https://www.coursera.org/learn/making-data-science-work-for-clinical-reporting
Teaching the tidyverse in 2023 | Mine Çetinkaya-Rundel
Recommendations for teaching the tidyverse in 2023, summarizing package updates most relevant for teaching data science with the tidyverse, particularly to new learners.
00:00 Introduction 00:46 Using addins to switch between RStudio themes (See https://github.com/mine-cetinkaya-rundel/addmins for more info) 01:40 Native pipe 03:08 Nine core packages in tidyverse 2.0.0 07:15 Conflict resolution in the tidyverse 11:30 Improved and expanded *_join() functionality 22:05 Per operation grouping 27:41 Quality of life improvements to case_when() and if_else() 31:41 New syntax for separating columns 34:51 New argument for line geoms: linewidth 36:08 Wrap up
See more in the Teaching the tidyverse in 2023 blog post https://www.tidyverse.org/blog/2023/08/teach-tidyverse-23

Hadley Wickham | {purrr} 1.0: A complete and consistent set of tools for functions and vectors
{purrr} has reached the 1.0 milestone, with new features like progress bars, improvements to the map family, and tools for list flattening and simplification.
0:00 Introduction 0:11 What is purrr? 00:32 What is functional programming? 03:08 Announcing purrr 1.0 03:58 Progress bars 05:18 Better error messages 07:18 New map function: map_vec() 09:58 New list_* functions 12:04 Flattening and simplification 17:40 Breaking Changes 22:34 How the tidyverse handles deprecation 24:41 An overview of functional programming 26:22 Closing, resources to help with deprecation, how to submit issues
See more in the {purrr} 1.0.0 release blog post! https://www.tidyverse.org/blog/2023/03/tidyverse-2-0-0/

R-Ladies Rome (English) - What’s new in the tidyverse - Isabella Velasquez
Welcome to R-Ladies Rome Chapter!
What’s new in the tidyverse - Speaker: Isabella Velasquez
In this video, Isabella will tell you about What’s new in the tidyverse, a suite of packages that’s revolutionized data wrangling, visualization, and analysis. Recently, Tidyverse has undergone some changes and updates to make it even more user-friendly and powerful. The changes to Tidyverse include new packages, updates to existing ones, and improvements in performance and functionality. Some of the most notable updates include enhancements to package dependencies, performance improvements for specific functions such as group_by(), and the addition of new packages such as ggplot2, readr and dplyr.
You can find the latest news here: https://bit.ly/3z9BcMR To follow Isabella Velásquez: Twitter: twitter.com/ivelasq3 LinkedIn: linkedin.com/in/ivelasq/
Materials: GitHub repo: https://bit.ly/3LHVSmS Website: https://bit.ly/3M5gE03 The tidyverse blog: https://www.tidyverse.org/blog/
Posit Meetup | Jake Riley, Children’s Hospital of Philadelphia | Translating Facts to Insights
RStudio Healthcare Meetup:
Translating facts into insights at Children’s Hospital of Philadelphia Led by Jake Riley, data analyst at The Children’s Hospital of Philadelphia
Abstract: {headliner} is a new R package to add dynamic, insightful text to plots and reports. {headliner} generates useful talking points that users can string together using {glue} syntax. This makes it easy to write an informative sentences without adding a lot of technical debt to a project. Learn how to get started with {headliner} and ways we have used it at The Children’s Hospital of Philadelphia.
Speaker Bio: Jake Riley is a data analyst at The Children’s Hospital of Philadelphia. He is the author of several R packages related to data visualization and automated exploratory analysis. You can find his published work [simplecolors] and [shinyobjects] on CRAN with more packages on the way.
Timestamps: 0:49 - Start of talk 1:25 - Dashboards focused on facts vs. insights 2:56 - What’s a good title for a chart? 5:09 - Intro to headliner package 7:41- using glue() under the hood 14:04 - helpers for working with data frames: compare_conditions() 18:41 - using ggtext 21:27 - example using pixar_films 23:40 - how they’ve used it at CHOP 28:05 - Next steps for headliner package 29:32 - Start of Q&A session
Questions: 29:32 - Can you use any package you want in your organization? 31:13 - How do you load previous datasets to compare to current datasets? 32:48 - When you mentioned a front page on RStudio Connect (with the headlines), what is that? 33:25 - Is anyone using this for manuscripts at CHOP now? 36:24 - What has the adoption of R or Python been within the hospital analytics team? 37:28 - My manager is very leery of R because of technical depth. Any suggestions for convincing her of R’s value? 42:22 - How does CHOP use R for non-clinical analysis? 43:36 - How do you train new people to use R? 46:28 - How do you compare last week’s analysis to this week’s? 49:37 - Were there any major challenges in creating the hospital’s internal package?
Resources/links shared: Jake’s LinkedIn: https://www.linkedin.com/in/jake-riley-70736a3/ headliner package: https://github.com/rjake/headliner waldo package: https://www.tidyverse.org/blog/2020/10/waldo/ Examples of R in Life Science & Healthcare: https://www.rstudio.com/champion/life-science Chris Bumgardner’s talk on building an R-based analytic practice at Children’s Wisconsin: https://youtu.be/pHZ8dsc0PhY simplecolors package to generate hex codes using uniformly named colors: https://rjake.github.io/simplecolors/ R Packages book by Hadley Wickham & Jenny Bryan: https://r-pkgs.org/
Meetup Links: Future events: rstd.io/community-events-calendar If anyone’s interested in speaking at a future meetup, we’d love to hear from you too! rstd.io/meetup-speaker-form


Alan Carlson | Robust, modular dashboards that minimize tech debt | RStudio
Robust, modular dashboards that minimize tech debt Presented by Alan Carlson, Snap Finance
Abstract Dashboards can be complex but building them shouldn’t be! We’ve built a wrapper for developing production level dashboards that streamlines onboarding new developers and standardizes the initial infrastructure to mitigate tech debt. Now you and your team can spend more time developing insights and less time trying to spin up shiny code with {graveler}.
Speaker Bio As the Tech Lead for the BI (Business Intelligence) team, Alan’s primary focus at Snap is researching, creating, and maintaining methods that help the rest of Snap’s BI Team in their work. From dashboards to visualizations to R code in general, he has built multiple packages and bookdowns that make BI easier to train and to use within the RStudio environment.
Helpful Links: Blog Post: https://www.rstudio.com/blog/make-robust-modular-dashboards-with-golem-and-graveler/ Graveler package: https://github.com/ghcarlalan/graveler Environment variables: https://docs.rstudio.com/connect/user/content-settings/#content-vars Git-backed publishing: https://docs.rstudio.com/connect/user/git-backed/ If you’d like to join events live: colorado.rstudio.com/rsc/community-events
Question about style guides: Tidyverse Style Guide: https://style.tidyverse.org/ Efficient R Programming book that Colin Gillespie wrote: https://csgillespie.github.io/efficientR/
Questions about RStudio Team: ⬢ RStudio Connect: https://www.rstudio.com/products/connect/ ⬢ Chat with RStudio about RStudio Team: rstd.io/chat-with-rstudio
Data Science Hangout | Mike Smith, Pfizer | Building an R Center of Excellence
We were joined by Mike Smith, Senior Director, Pfizer R&D UK Ltd at the Data Science Hangout - a weekly, free-to-join open conversation for the data science community. If you’d like to join us live, you can add it to your calendar here: rstd.io/datasciencehangout
Mike shared with us all that they are building up a Center of Excellence at Pfizer to help teams across the business build reproducible workflows and use analytics tools effectively & efficiently.
What led to the creation of the CoE within Pfizer and how could we do something similar?
Mike: ⬢ Last year before R/Pharma, we did a poll & found that 1,500+ colleagues had downloaded R. I wanted to service & build up that community to find out what other people are doing and share that. (2:45)
⬢ We’re a very decentralized disparate team, so there are subject matter experts (SMEs) throughout the organization. The Center of Excellence is focused on building connections between SMEs and helping the teams where there isn’t an SME available.
⬢ What we saw was that it’s hard to sometimes get an effective strategy across people in such a big company. We also saw that there were other places within the organization that wanted data science work but they didn’t have an R subject matter expert there. We want to be able to help them solve their problems and set them up with a proof of concept that they can tweak.
33:52 -
Ok so how to do this?
⬢ Find out how many people are using the tools and who you could help.
⬢ Be that translator role between the business people who need solutions with the technical side - folks who are building things.
Communicate the value:
⬢ We may have a bunch of people trying to write the same function or access the same data. We could solve this problem once and then make that into a package and serve that out to everybody and streamline their workflow for the future.
⬢ There’s a benefit in being able to solve problems strategically. We’re trying to build the lego pieces so that the next time we see a problem like this, we can use that. We can also offer this as a package or via something that allows other people to solve that problem for themselves.
Talk to someone who has experience in this, other community builders
⬢ Doug Robinson helped start this at Pfizer because he had set-up something like this at Novartis before as well. Talking with someone who has done this before is really helpful because they have the experience of : who do we need to tell, what do we need to tell them, what’s our purpose for being, who do you have to speak to and convince. That has to be ready to go.
Find a champion in leadership:
⬢ We went to the head of Statistical programming and said we’d like to do something like this. Fortunately, she was 110% supportive here.
How did they phrase this CoE at Pfizer?
⬢ Check out this description from the job post: https://lnkd.in/g776nYVF
Resources shared: Ethan shared: I saw on RStudio blog the other day the {sassy} system for SAS programmer transitioning to R: https://sassy.r-sassy.org/index.html Tatsu shared: For folks that have RStudio Connect and Tableau, there’s now a supported integration https://www.rstudio.com/blog/dynamic-r-and-python-models-in-tableau-using-plumbertableau/ Tatsu shared the Working with IT section of the champion site: https://www.rstudio.com/champion/working-with-it Mike’s Bandcamp: https://mikeksmith.bandcamp.com/ R Consortium Pharma Working Groups: https://www.r-consortium.org/projects/isc-working-groups R in Pharma Conference: https://rinpharma.com/ Upcoming Pharma meetup with Merck: https://youtu.be/RBVqKi3FV30
Question about style guides: Jesus shared: Tidyverse Style Guide: https://style.tidyverse.org/ Jesus shared: One guide overall guide on better clean R code is the contributing.md of the ggplot2 package: https://github.com/tidyverse/ggplot2/blob/main/CONTRIBUTING.md Sam shared: Efficient R Programming book that Colin wrote: https://csgillespie.github.io/efficientR/
Where to find more?
► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu ► Data Science Hangout site: rstudio.com/data-science-hangout ► Add the Data Science Hangout to your calendar: rstd.io/datasciencehangout
Follow Us Here: Website: https://www.rstudio.com LinkedIn:https://www.linkedin.com/company/rstudio Twitter: https://twitter.com/rstudio
George Mount | R for Excel Users - First Steps | RStudio Meetup
Abstract: Excel’s built-in programming language has served as an entry point to coding for many. If you’re a data analyst steeped in Excel, chances are you could also benefit from learning R for projects of increased scope and complexity.
This presentation serves as a hands-on introduction to R for Excel users:
How R differs from Excel as an open source software tool How to translate common Excel concepts such as cells, ranges, and tables to R equivalents Example use cases that you can take and apply to your own work How to enhance Excel and Power BI with R By the end of this presentation, you will have a clear path forward for building repeatable processes, compelling visualizations, and robust data analyses in R.
Speaker Bio: George Mount is the founder of Stringfest Analytics, a consulting firm specializing in analytics education and upskilling. He has worked with leading bootcamps, learning platforms and practice organizations to help individuals excel at analytics. George regularly blogs and speaks on data analysis, data education and workforce development. He is the author of Advancing into Analytics: From Excel to Python and R (O’Reilly).
Link to George’s white paper “Five things Excel users should know about R”https://stringfestanalytics.com/five-things-r-excel/
Working group sign-up for those interested!
Within many organizations Microsoft Excel is a preferred tool for working with data for non data analytics users. In order to build a data driven organization, source data and analytical models must be accessible to all data users (technical and non-technical) within their preferred tool. Let’s rally the R community to welcome Excel users into our data driven culture by building an Excel add-on to access data and models available within RStudio. If you’re interested in continuing this conversation and joining a working group, let us know! rstd.io/excel-r-community
Links shared at the meetup! George’s GitHub/ Presentation Resources: https://github.com/stringfestdata/rstudio-mar-2022
Packages? Where to find them & recommendations:
CRAN Task Views: https://cran.r-project.org/web/views/
Mark shared: for folks who primarily use excel to present formatted tables, the gt package is a great way to start doing this programmatically in R: https://gt.rstudio.com/
Ivan shared: In addition to regular Google, I’d recommend https://rseek.org/
, given that the character ‘R’ is sometimes not search friendly :)
Jeff shared: Fpp2 is great for forecasting and time series analysis - https://otexts.com/fpp2/
Floris shared: https://otexts.com/fpp3/
Ivan shared: If you’re into tidyverse, there’s an equivalent for time-series: https://tidyverts.org/
George shared: https://dplyr.tidyverse.org/
Ryan shared: This can be a helpful package for dynamically editing tables, like in excel https://github.com/DillonHammill/DataEditR
Ryan shared: This is a great package for making and learning ggplot visualizations: https://cran.r-project.org/web/packages/esquisse/vignettes/get-started.html
Other resources: Monaly shared: There is a R help group: r-help@r-project.org George shared: Helpful book/site on statistics: https://moderndive.com/ Ryan shared:Harvard has a good online source (free options) that has a number of classes, the following for stats: https://www.edx.org/professional-certificate/harvardx-data-science George shared: R for Data Science free book: https://r4ds.had.co.nz/ Fernando shared: big book of R https://www.bigbookofr.com/index.html Floris shared: Advanced R Book: https://adv-r.hadley.nz/ Pedro shared: The R for Data Science Slack channel is a great learning resource! r4ds.io/join (we just made a channel there called #chat-excel_to_r Ivan shared: For teams who are deeply entrenched in Excel (like my old team), this tool may be useful - https://bert-toolkit.com/ . It allows running R code in .xls, so you can learn R while doing .xls :)
Re: Glossary of terms: Ivan shared: inner_join() is like VLOOKUP in .xls. Dan shared: Here’s one cheat sheet (glossary of Excel to R) that I just found; https://paulvanderlaken.com/2018/07/31/transitioning-from-excel-to-r-dictionary-of-common-functions/
Extra Meetup Links Feedback: rstd.io/meetup-feedback Talk submission: rstd.io/meetup-speaker-form If you’d like to find out about upcoming events you can also add this calendar: rstd.io/community-events RStudio conference/submit a talk: https://www.rstudio.com/conference/ Recordings of all meetups: https://www.youtube.com/playlist?list=PL9HYL-VRX0oRKK9ByULWulAOO5jN70eXv
Jenny Bryan | Help me help you: creating reproducible examples | RStudio (2018)
What is a reprex? It’s a reproducible example. Making a great reprex is both an art and a science and this webinar will cover both aspects. A reprex makes a conversation about code more efficient and pleasant for all. This comes up whenever you ask someone for help, report a bug in software, or propose a new feature. The reprex package (https://reprex.Tidyverse.org ) makes it especially easy to prepare R code as a reprex, in order to share on sites such as https://community.rstudio.com , https://github.com , or https://stackoverflow.com . The habit of making little, rigorous, self-contained examples also has the great side effect of making you think more clearly about your programming problems.
Webinar materials: https://rstudio.com/resources/webinars/help-me-help-you-creating-reproducible-examples/
About Jenny: Jenny is a software engineer on the tidyverse team. She is a recovering biostatistician who takes special delight in eliminating the small agonies of data analysis. Jenny is known for smoothing the interfaces between R and spreadsheets, web APIs, and Git/GitHub. She’s been working in R/S for over 20 years and is a member of the R Foundation. She also serves in the leadership of rOpenSci and Forwards and is an adjunct professor at the University of British Columbia

Data Manipulation Tools: dplyr – Pt 3 Intro to the Grammar of Data Manipulation with R
Data wrangling is too often the most time-consuming part of data science and applied statistics. Two tidyverse packages, tidyr and dplyr, help make data manipulation tasks easier. Keep your code clean and clear and reduce the cognitive load required for common but often complex data science tasks.
dplyr docs: dplyr.tidyverse.org/reference/
- http://dplyr.tidyverse.org/reference/union.html
- http://dplyr.tidyverse.org/reference/intersect.html
- http://dplyr.tidyverse.org/reference/set_diff.htm
Pt. 1: What is data wrangling? Intro, Motivation, Outline, Setup https://youtu.be/jOd65mR1zfw
- /01:44 Intro and what’s covered Ground Rules
- /02:40 What’s a tibble
- /04:50 Use View
- /05:25 The Pipe operator:
- /07:20 What do I mean by data wrangling?
Pt. 2: Tidy Data and tidyr https://youtu.be/1ELALQlO-yM
- /00:48 Goal 1 Making your data suitable for R
- /01:40
tidyr“Tidy” Data introduced and motivated - /08:10
tidyr::gather - /12:30
tidyr::spread - /15:23
tidyr::unite - /15:23
tidyr::separate
Pt. 3: Data manipulation tools: dplyr https://youtu.be/Zc_ufg4uW4U
- 00.40 setup
- 02:00
dplyr::select - 03:40
dplyr::filter - 05:05
dplyr::mutate - 07:05
dplyr::summarise - 08:30
dplyr::arrange - 09:55 Combining these tools with the pipe (Setup for the Grammar of Data Manipulation)
- 11:45
dplyr::group_by
Pt. 4: Working with Two Datasets: Binds, Set Operations, and Joins https://youtu.be/AuBgYDCg1Cg Combining two datasets together
- /00.42
dplyr::bind_cols - /01:27
dplyr::bind_rows - /01:42 Set operations
dplyr::union,dplyr::intersect,dplyr::set_diff - /02:15 joining data
dplyr::left_join,dplyr::inner_join,dplyr::right_join,dplyr::full_join,
Cheatsheets: https://www.rstudio.com/resources/cheatsheets/
Documentation:
tidyr docs: tidyr.tidyverse.org/reference/
tidyrvignette: https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.htmldplyrdocs: http://dplyr.tidyverse.org/reference/dplyrone-table vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.htmldplyrtwo-table (join operations) vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/two-table.html
Tidy Data and tidyr – Pt 2 Intro to Data Wrangling with R and the Tidyverse
Data wrangling is too often the most time-consuming part of data science and applied statistics. Two tidyverse packages, tidyr and dplyr, help make data manipulation tasks easier. Keep your code clean and clear and reduce the cognitive load required for common but often complex data science tasks.
http://tidyr.tidyverse.org/reference/
- http://tidyr.tidyverse.org/reference/gather
- http://tidyr.tidyverse.org/reference/spread
- http://tidyr.tidyverse.org/reference/unite
- http://tidyr.tidyverse.org/reference/separate
Pt. 1: What is data wrangling? Intro, Motivation, Outline, Setup https://youtu.be/jOd65mR1zfw
- /01:44 Intro and what’s covered Ground Rules
- /02:40 What’s a tibble
- /04:50 Use View
- /05:25 The Pipe operator:
- /07:20 What do I mean by data wrangling?
Pt. 2: Tidy Data and tidyr https://youtu.be/1ELALQlO-yM
- 00:48 Goal 1 Making your data suitable for R
- 01:40
tidyr“Tidy” Data introduced and motivated - 08:10
tidyr::gather - 12:30
tidyr::spread - 15:23
tidyr::unite - 15:23
tidyr::separate
Pt. 3: Data manipulation tools: dplyr https://youtu.be/Zc_ufg4uW4U
- 00.40 setup
- /02:00
dplyr::select - /03:40
dplyr::filter - /05:05
dplyr::mutate - /07:05
dplyr::summarise - /08:30
dplyr::arrange - /09:55 Combining these tools with the pipe (Setup for the Grammar of Data Manipulation)
- /11:45
dplyr::group_by - /15:00
dplyr::group_by
Pt. 4: Working with Two Datasets: Binds, Set Operations, and Joins https://youtu.be/AuBgYDCg1Cg Combining two datasets together
- /00.42
dplyr::bind_cols - /01:27
dplyr::bind_rows - /01:42 Set operations
dplyr::union,dplyr::intersect,dplyr::set_diff - /02:15 joining data
dplyr::left_join,dplyr::inner_join,dplyr::right_join,dplyr::full_join,
Cheatsheets: https://www.rstudio.com/resources/cheatsheets/
Documentation:
tidyr docs: tidyr.tidyverse.org/reference/
tidyrvignette: https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.htmldplyrdocs: http://dplyr.tidyverse.org/reference/dplyrone-table vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.htmldplyrtwo-table (join operations) vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/two-table.html
What is data wrangling? Intro, Motivation, Outline, Setup – Pt. 1 Data Wrangling Introduction
Data wrangling is too often the most time-consuming part of data science and applied statistics. Two tidyverse packages, tidyr and dplyr, help make data manipulation tasks easier. These videos introduce you to these tools. Keep your R code clean and clear and reduce the cognitive load required for common but often complex data science tasks.
Pt. 1: What is data wrangling? Intro, Motivation, Outline, Setup https://youtu.be/jOd65mR1zfw
- 01:44 Intro and what’s covered Ground Rules
- 02:40 What’s a tibble
- 04:50 Use View
- 05:25 The Pipe operator:
- 07:20 What do I mean by data wrangling?
Pt. 2: Tidy Data and tidyr https://youtu.be/1ELALQlO-yM
- /00:48 Goal 1 Making your data suitable for R
- /01:40
tidyr“Tidy” Data introduced and motivated - /08:15
tidyr::gather - /12:38
tidyr::spread - /15:30
tidyr::unite - /15:30
tidyr::separate
Pt. 3: Data manipulation tools: dplyr https://youtu.be/Zc_ufg4uW4U
- 00.40 setup
- /02:00
dplyr::select - /03:40
dplyr::filter - /05:05
dplyr::mutate - /07:05
dplyr::summarise - /08:30
dplyr::arrange - /09:55 Combining these tools with the pipe (Setup for the Grammar of Data Manipulation)
- /11:45
dplyr::group_by - /15:00
dplyr::group_by
Pt. 4: Working with Two Datasets: Binds, Set Operations, and Joins https://youtu.be/AuBgYDCg1Cg Combining two datasets together
- /00.42
dplyr::bind_cols - /01:27
dplyr::bind_rows - /01:42 Set operations
dplyr::union,dplyr::intersect,dplyr::set_diff - /02:15 joining data
dplyr::left_join,dplyr::inner_join,dplyr::right_join,dplyr::full_join,
Cheatsheets: https://www.rstudio.com/resources/cheatsheets/
Documentation:
tidyr docs: tidyr.tidyverse.org/reference/
tidyrvignette: https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.htmldplyrdocs: http://dplyr.tidyverse.org/reference/dplyrone-table vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.htmldplyrtwo-table (join operations) vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/two-table.html
New York Times “For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights”, By STEVE LOHRAUG. 17, 2014 https://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-work.html
Working with Two Datasets: Binds, Set Operations, and Joins – Pt 4 Intro to Data Manipulation
Data wrangling is too often the most time-consuming part of data science and applied statistics. Two tidyverse packages, tidyr and dplyr, help make data manipulation tasks easier. Keep your R code clean and clear and reduce the cognitive load required for common but often complex data science tasks.
dplyr docs: dplyr.tidyverse.org/reference/
Pt. 1: What is data wrangling? Intro, Motivation, Outline, Setup https://youtu.be/jOd65mR1zfw
- /01:44 Intro and what’s covered Ground Rules:
- /02:40 What’s a tibble
- /04:50 Use View
- /05:25 The Pipe operator:
- /07:20 What do I mean by data wrangling?
Pt. 2: Tidy Data and tidyr https://youtu.be/1ELALQlO-yM
- /00:48 Goal 1 Making your data suitable for R
- /01:40
tidyr“Tidy” Data introduced and motivated - /08:10
tidyr::gather - /12:30
tidyr::spread - /15:23
tidyr::unite - /15:23
tidyr::separate
Pt. 3: Data manipulation tools: dplyr https://youtu.be/Zc_ufg4uW4U
- /00.40 setup
- /02:00
dplyr::select - /03:40
dplyr::filter - /05:05
dplyr::mutate - /07:05
dplyr::summarise - /08:30
dplyr::arrange - /09:55 Combining these tools with the pipe (Setup for the Grammar of Data Manipulation)
- /11:45
dplyr::group_by
Pt. 4: Working with Two Datasets: Binds, Set Operations, and Joins https://youtu.be/AuBgYDCg1Cg Combining two datasets together
- 00.42
dplyr::bind_cols - 01:27
dplyr::bind_rows - 01:42 Set operations
dplyr::union,dplyr::intersect,dplyr::set_diff - 02:15 joining data -
dplyr::left_join,dplyr::inner_join, -dplyr::right_join,dplyr::full_join,
Cheatsheets: https://www.rstudio.com/resources/cheatsheets/
Documentation:
tidyr docs: tidyr.tidyverse.org/reference/
tidyrvignette: https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.htmldplyrdocs: http://dplyr.tidyverse.org/reference/dplyrone-table vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.htmldplyrtwo-table (join operations) vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/two-table.html