rstudio

Regression models still rule in P&C Insurance | Jim Weiss | Data Science Hangout

ADD THE DATA SCIENCE HANGOUT TO YOUR CALENDAR HERE: https://pos.it/dsh - All are welcome! We’d love to see you!

This week’s guest was Jim Weiss, Chief Risk Officer, commercial and executive at Crum and Forster!

Some topics covered in this week’s Hangout were the use of regression modeling (have you ever heard of the Tweedie distribution?) in property and casualty insurance, handling changing model results and communicating them to business stakeholders, using GenAI and “co-opetition” to identify and prevent fraudulent claims, and identifying and managing bias or confounding effects in pricing models.

Resources mentioned in the video and chat: The Once and Future C&F → https://www.cfins.com/the-once-and-future-cf-landing/ Tweedie distribution → https://en.wikipedia.org/wiki/Tweedie_distribution Statistical Rethinking Lectures Playlist → https://www.youtube.com/playlist?list=PLDcUM9US4XdNOlqSyhe38US8mFgmqzI14 Considerations for Managing Potential Bias in Pricing Models → https://eforum.casact.org/article/91188-considerations-for-managing-potential-bias-in-pricing-models Pointblank for data validation (Python) → https://posit-dev.github.io/pointblank/ Pointblank (R) → https://rstudio.github.io/pointblank/

► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu

Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh The Lab: https://pos.it/dslab LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co

Thanks for hanging out with us!

Timestamps 00:00 Introduction 02:16 “Who is Crum and Forster? What do you do? What types of problems does your team help them solve?” 04:14 “What kind of data types would you see in your day to day or your team would see in your day to day? And what is an example of a problem that you feel like you’ve solved lately or that you’ve been working on lately?” 10:14 “What types of regression do you use?” 13:35 “In your insurance modeling career, what is the most unusual or unexpected variable that has contributed to one of your models?” 18:48 “How do you handle it when you see that the results are significantly different across models?” 21:45 “Are you Team Bayesian or Team Frequentist when it comes to your statistics?” 27:13 “My health care organization is interested in identifying fraudulent claims. Currently, they’re looking at Excel spreadsheets, same time, different person. Do you have any advice on a better way to guide them to automation?” 30:51 “What software do you use to do your job?” 35:40 “Do you ever use instrumental variables in your models?” 47:47 “Do you have any career advice for us? Is there something you wish that you had known when you were first entering the industry?” 50:38 “How much do you and your team use AI to help you along?” 52:56 “How does your team span expertise in such varied fields?”

Data analysis with Posit AI-assistants | Sara Altman & Simon Couch | Data Science Lab

The Data Science Lab is a live weekly call. Register at pos.it/dslab! Discord invites go out each week on lives calls. We’d love to have you!

The Lab is an open, messy space for learning and asking questions. Think of it like pair coding with a friend or two. Learn something new, and share what you know to help others grow.

On this call, Libby Heeren is joined by Sara Altman who walks through using Posit’s AI assistants to analyze data, including a sneak peek at Posit Assistant, and Simon Couch drops by to give us a demo of the reviewer package! Together, Sara and Simon author the Posit AI Newsletter, the best place to stay up-to-date with all the cool tools and advice on staying an informed and level-headed AI user.

Hosting crew from Posit: Libby Heeren, Isabella Velasquez, Sara Altman, Simon Couch

Sara’s Bluesky: https://bsky.app/profile/sara-altman.bsky.social Sara’s LinkedIn: https://www.linkedin.com/in/sarakaltman/ Sara’s GitHub: https://github.com/skaltman Posit AI Newsletter by Sara and Simon: https://posit.co/blog/?category=roundups

Resources from the hosts and chat:

Positron IDE → https://positron.posit.co/ Databot Extension → https://positron.posit.co/databot.html Getting started with Positron Assistant → https://positron.posit.co/assistant-getting-started.html Posit Assistant (Private Beta) → https://posit-ai-beta.share.connect.posit.cloud/ Reviewer Package (by Simon Couch) → https://github.com/simonpcouch/reviewer ellmer Package → https://elmer.tidyverse.org/ chatlas Package → https://github.com/posit-dev/chatlas Read the Posit AI Newsletter → https://posit.co/blog/?category=roundups Sign up to get the Posit AI Newsletter → http://pos.it/ai-news Simon’s blog post about local LLMs not quite being ready for primetime → https://posit.co/blog/local-models-are-not-there-yet/ Join the waitlist for Posit AI in RStudio → https://posit.co/products/ai/ Posit AI Known Issues & FAQs → https://posit-ai-beta.share.connect.posit.cloud/#frequently-asked-questions-faqs Blog post from Simon and Sara about Privacy and LLMs → https://posit.co/blog/trust-llm-tools/ DS Lab YouTube playlist → https://youtube.com/playlist?list=PL9HYL-VRX0oSeWeMEGQt0id7adYQXebhT&si=7tmU6EAJpO5S7GBh

► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu

Follow Us Here: Website: https://www.posit.co The Lab: https://pos.it/dslab Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co

Thanks for learning with us!

Timestamps 00:00 Introduction 07:23 “Would you mind real quick just briefly explaining the differences between Positron Assistant and Databot?” 15:01 “Is there any way to configure reasoning efforts when signing in with GitHub Copilot?” 15:49 “Does DataBot already support other providers beyond Cloud?” 20:36 “What is the cases with monetary penalty in the console output?” 22:14 “Do you happen to know if the column names of the dataset are very, very messy?” 23:18 “Can you add skills to DataBot?” 26:36 “This code isn’t being saved anywhere. So where does it go?” 27:38 “There a way to know what all the slash commands are?” 28:51 Requesting Databot to use the namespace operator 33:58 “Is there a way to search within that Databot pane?” 39:34 “Have you noticed any time differences with how quickly things run-in RStudio versus Positron?” 40:33 “What happens if you open that URL that it mentions at the bottom in your browser?” 40:50 Clarifying the difference between Posit Assistant and Positron Assistant 43:18 “What is the typical token burn rate?” 53:31 “Is this on CRAN and working in both Positron and RStudio?”

Simon Couch

Positron workflows that make life easier | Andrew Heiss | Data Science Lab

The Data Science Lab is a live weekly call. Register at pos.it/dslab! Discord invites go out each week on lives calls. We’d love to have you!

The Lab is an open, messy space for learning and asking questions. Think of it like pair coding with a friend or two. Learn something new, and share what you know to help others grow.

On this call, Libby Heeren is joined by Andrew Heiss as he demonstrates some tips and tricks about his personal workflow and tools that he actually uses to make life easier in Positron. This is the ultimate list of data life hacks to make your workflow soooo much nicer. Check out Andrew’s blog post here to follow along with the tools he mentions: https://andhs.co/dsl

Hosting crew from Posit: Libby Heeren, Isabella Velasquez

Andrew’s Website: https://www.andrewheiss.com/ Andrew’s Bluesky: https://bsky.app/profile/andrew.heiss.phd Andrew’s LinkedIn: https://www.linkedin.com/in/andrewheiss/

Resources from the hosts and chat:

Andrew’s blog post containing links to all of the tools he mentions: https://www.andrewheiss.com/blog/2026/01/13/dsl-positron-workflow/ Open VSX Registry: https://open-vsx.org/ DataPasta: https://milesmcbain.github.io/datapasta/ Pastum (like datapasta for Positron): https://open-vsx.org/extension/atsyplenkov/pastum Positron Project docs: https://positron.posit.co/migrate-rstudio-rproj.html Garrick’s data science extension bundle package: https://github.com/gadenbuie/positron-plus-1-e Emil’s keyboard shortcut blog post: https://emilhvitfeldt.com/post/positron-key-bindings/ Native Tabs for Mac: https://lucasprag.com/posts/underrated-vscode-feature-native-tabs/ Andrew’s posit::conf(2025) Talk: https://youtu.be/UCloM4GcfVY Arc browser that Andrew is using: https://arc.net/ Andrew’s YAML headers he sets up using espanso: https://github.com/andrewheiss/espanso/blob/52da6c43c6d1ebaf3231770b1b66971d1dfb374a/match/markdown-pandoc-quarto.yml#L118

► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu

Follow Us Here: Website: https://www.posit.co The Lab: https://pos.it/dslab Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co

Thanks for learning with us!

Timestamps: 00:00 Introduction 01:50 Switching to Positron full-time 03:19 Extensions in Positron 04:44 How to evaluate if an extension is safe 07:05 Air extension (auto-formatting) 08:21 Better Comments extension 10:15 Moving the Activity Bar 12:20 Pastum extension (DataPasta equivalent) 14:26 Rainbow CSV extension 15:34 Spell Right extension 17:34 Managing projects in Positron vs RStudio 20:18 “Do you know if there are extensions… that will conditionally format cells?” 20:40 “Do you explicitly add a dot here file?” 21:44 Project Manager extension 25:34 “How did you discover all of these?” 26:38 “How is GitHub integrated into Positron?” 29:10 Peacock extension 31:16 The Connections pane 36:38 “When I change the Peacock color, it’s changing colors for everything.” 37:59 “Does he use any DuckDB extensions?” 39:05 Raycast 43:35 Raycast scripts 44:30 NotebookLM 45:31 “Is there a hack to manage a repo that is both a project and an R package?” 48:00 Espanso 53:15 “Is Raycast a replacement for Spotlight and Bartender?” 54:00 “Is there an easy way to see all of the shortcuts?”

Seriously, have you tried this AI stuff? I’ve never written code like this in my life.

Unearned confidence

#datascience #datasciencetok #python #swe #datavisualization #dataanalytics #codinglife #vscode #ide #rstudio #positron #pycharm #jupyter #cursor #windsurf #positshorts

Positron rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

How to deploy Shiny apps in 2026 | Alex Chisholm | Data Science Lab

The Data Science Lab is a live weekly call. Register at pos.it/dslab! Discord invites go out each week on lives calls. We’d love to have you!

The Lab is an open, messy space for learning and asking questions. Think of it like pair coding with a friend or two. Learn something new, and share what you know to help others grow.

On this call, Libby Heeren is joined by Posit product manager Alex Chisholm as he walks through the evolution of shiny app deployment over the years, how to deploy shiny apps in the modern era, and peeks into Posit’s roadmap for future development. Do you call it “deployment” or “publishing” when it comes to Shiny apps? 🤔

This is a super friendly and conversational space, and being there live in the Discord chat can’t be beat!! We hope you get to join us sometime soon.

Hosting crew from Posit: Libby Heeren, Isabella Velasquez, Daniel Chen, Alex Chisholm

Alex Chisholm’s LinkedIn: http://www.linkedin.com/in/chisholm1

Resources from the hosts and chat:

Posit Connect Cloud for deploying Shiny apps in the modern era: https://connect.posit.cloud/ Install Positron: https://positron.posit.co/ Simon Couch’s blog post on local LLMs not being good enough yet: https://www.simonpcouch.com/blog/2025-12-04-local-agents/ Blue-Green Shiny App Deployments using Posit Connect posit::conf(2025) talk by Ryszard Szymański: https://youtu.be/QEEGLWj0nas Digital Ocean: https://www.digitalocean.com/ Ollama local LLM: https://ollama.com/ py-sidebot app template: https://shiny.posit.co/py/templates/sidebot/ querychat app template: https://shiny.posit.co/py/templates/querychat/ Dan Chen mentioned Render in the chat as an alternative to Digital Ocean: https://render.com/ Alex Chisholm’s AB testing GitHub repo example: https://github.com/alex-chisholm/shiny-r-abtesting Edward in the chat shared a GitHub repo for using GitHub actions to execute remote SSH commands: https://github.com/appleboy/ssh-action Abu in the chat shared blue-green vs. canary deployments: https://octopus.com/devops/software-deployments/blue-green-vs-canary-deployments/ Frank in the chat mentioned Simon’s blog on using local LLMs with the chores package: https://www.simonpcouch.com/blog/2025-12-10-chores-0-3-0/

► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu

Follow Us Here: Website: https://www.posit.co The Lab: https://pos.it/dslab Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co

Thanks for learning with us!

Timestamps 00:00 Introduction 03:03 Meaningful applications and value creation 05:31 The evolution of Shinyapps.io and Posit Connect 08:12 DigitalOcean and Droplets 09:36 DigitalOcean vs. commercial cloud providers, 11:48 Comparisons: DigitalOcean, Azure, and AWS 14:47 Replicating local environments with Docker 16:51 The open-source Shiny Server 18:20 Use case: University of Illinois CITL 20:02 Key considerations for deployment decisions 21:53 GitHub Actions and version control 23:31 Addressing single points of failure and maintainability 24:38 Posit Connect Cloud features and portfolio 26:01 Beyond Shiny: Quarto, Streamlit, and Dash 27:07 Handling secrets and database credentials 28:56 Custom vanity links vs. UUIDs 30:04 Blue-Green deployment strategies 31:55 “Is it easy to set up a developer workflow?” 34:46 Guardrails for AI powered apps and token usage 37:32 Small language models and Ollama 38:29 Sidebot AI demo and LLM integration 39:41 Understanding manifest.json and dependencies 45:00 Automatic publish on GitHub push 46:51 The future of Shinyapps.io and migration 48:33 “Did you just build a custom agent for that specific dashboard?” 51:43 Publishing from RStudio IDE to Connect Cloud 54:16 Preview: Inspecting website APIs for data harvesting

Simon Couch

Integrating Shiny with Epic EHR | Matt Maloney | Data Science Hangout

ADD THE DATA SCIENCE HANGOUT TO YOUR CALENDAR HERE: https://pos.it/dsh - All are welcome! We’d love to see you!

We were recently joined by Matt Maloney, Director of Applied AI and Data Science at City of Hope, to chat about applying data science to cancer care operations, integrating open source data science tools like Shiny with Electronic Health Records (like Epic), and the evolving governance of generative AI in healthcare.

In this Hangout, we explore the technical and operational strategies behind integrating custom data science applications directly into clinical workflows. Matt discusses how his team moves beyond standalone tools by embedding Shiny apps and other solutions into Epic, allowing medical coders and providers to access predictions and summaries without leaving their primary software environment-of-choice. He also mentions the “build vs. buy” decision-making process as vendors release their own AI solutions, emphasizing the importance of validating external models against their specific patient population.

Resources mentioned in the video and zoom chat: City of Hope → https://www.cityofhope.org Unity Health Toronto Customer Story → https://posit.co/about/customer-stories/unity-health-toronto/ pointblank (Data Validation package) → https://rstudio.github.io/pointblank/

If you didn’t join live, one great discussion you missed from the zoom chat was about where data science teams sit within community members’ organizations and whether they like it or not, specifically the pros and cons of being housed within IT versus embedded inside business units. Participants debated access to infrastructure versus proximity to business stakeholders, with several sharing their own experiences of shifting between these departments (or between companies with different structures). Let us know below if you’d like to hear more about this topic!

► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu

Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh The Lab: https://pos.it/dslab LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co

Thanks for hanging out with us!

Timestamps: 00:00 Introduction 03:37 “What does the data science function at City of Hope help with?” 08:52 “Tell us a little bit more about how you’re integrating Posit with Epic” 16:08 “How do you handle the needs of privacy with the push to adopt AI?” 18:40 “How do you manage to stay abreast of technical advancements?” 22:45 “At what point do you hand off your data work to the software engineering team?” 27:23 “How much has development that involves LLMs and generative AI taken hold?” 30:38 “Does your team evaluate a lot of the things that Epic might be throwing your way?” 34:41 “How does Epic pass an encounter number or a patient ID to Posit Connect?” 35:57 “How does your team handle these nuanced pieces of clinical information?” 40:29 “Do the administrators appreciate the time that it takes to do things?” 44:22 “What happens in the academic division?” 46:10 “Do you have a piece of career advice for us?”

Exploring Positron settings | Isabel Zimmerman & Davis Vaughan | Data Science Lab

The Data Science Lab is a live weekly call. Register at pos.it/dslab! Discord invites go out each week on lives calls. We’d love to have you!

The Lab is an open, messy space for learning and asking questions. Think of it like pair coding with a friend or two. Learn something new, and share what you know to help others grow.

On this call, Libby Heeren is joined by Posit engineers Isabel Zimmerman and Davis Vaughan as they share some of their favorite settings in Positron, a super customizable data science IDE. Come laugh with us as we can’t seem to figure out that VSCode calls rainbow parentheses “bracket pair colorization”

Hosting crew from Posit: Libby Heeren, Isabella Velasquez, Daniel Chen, Isabel Zimmerman, Davis Vaughan

Resources from the hosts and chat: Install Positron: https://positron.posit.co/ Positron docs on keyboard shortcuts: https://positron.posit.co/keyboard-shortcuts.html Nathan Jeffery’s “click to open a .RDS file” keybinding: https://nathan-jeffery.netlify.app/blog/2025-08-26-read-rds-positron/ Positron R pipe setting (paste in browser and it’ll open in Positron): positron://settings/positron.r.pipe One of Dan Chen’s faves, the native tab feature in VSCode + Positron: https://lucasprag.com/posts/underrated-vscode-feature-native-tabs/ The list of RStudio keybindings that you get when you turn on RStudio keybindings in Positron: https://positron.posit.co/migrate-rstudio-keybindings.html Indent rainbow extension: https://open-vsx.org/extension/oderwat/indent-rainbow Rainbow brackets setting (paste in browser and it’ll open in Positron): positron://settings/editor.bracketPairColorization.enabled Setting hierarchy (User vs Workspace settings) in Positron: https://code.visualstudio.com/docs/configure/settings#_settings-precedence Rainbow CSV extension (not by Posit): https://marketplace.visualstudio.com/items?itemName=mechatroner.rainbow-csv Positron +1ePositron, an extension pack for dev and data science, by Garrick Aden-Buie: https://open-vsx.org/extension/grrrck/positron-plus-1-e Publishing from VS Code or Positron: https://docs.posit.co/connect/user/publishing-positron-vscode/ Posit Connect Cloud plans: https://connect.posit.cloud/plans Enter Folder extension that Libby mentions: https://open-vsx.org/extension/xiangda/enter-folder Catppuccin themes (shared by Rory Lawless, and now some of Libby’s favorites!): https://open-vsx.org/extension/Catppuccin/catppuccin-vsc

► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu

Follow Us Here: Website: https://www.posit.co The Lab: https://pos.it/dslab Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co

Thanks for learning with us!

Timestamps 00:00 Introduction 00:42 Guest Introductions: Isabel and Davis 02:41 Positron Settings overview 04:11 How to enable “Format on Save” 04:34 “How do I open settings in JSON or UI?” 05:10 Auto Save on focus change 08:26 Enabling RStudio key bindings 09:28 “Why doesn’t the cursor move with code edits?” 12:18 User vs. Workspace settings 14:34 Creating and using Profiles 16:13 “Can I use the magrittr pipe with Control+Shift+M?” 17:23 Searching and managing keyboard shortcuts 19:42 Creating custom code snippets 21:31 The Indent Rainbow extension 24:04 Enabling rainbow parenthesis/brackets 25:08 Managing Python and R interpreters 26:32 Rearranging and hiding UI panes 28:04 Rainbow CSV and favorite extensions 29:26 Using the Enter Folder extension 31:05 Understanding the setting hierarchy 32:48 Adding symbols to Quick Open search 36:00 “Is there a way to shift focus using keyboard shortcuts?” 38:04 Modifying keybindings JSON for specific languages 39:20 “How do you find trustworthy extensions?” 43:11 “How can I publish to shinyapps.io from Positron?” 44:03 Deploying with Posit Publisher and Connect Cloud 48:32 Customizing themes with RainGlow extension 50:36 “Is there an Import Data Set wizard in Positron?” 53:01 Conclusion and community resources

Davis Vaughan, Garrick Aden-Buie, Isabel Zimmerman

Lessons from a Broad & Varied Data Science Career | Arcenis Rojas | Data Science Hangout

ADD THE DATA SCIENCE HANGOUT TO YOUR CALENDAR HERE: https://pos.it/dsh - All are welcome! We’d love to see you!

We were recently joined by Arcenis Rojas, Data Scientist at Indeed, to chat about econometrics, public vs private sector data science, navigating a varied career trajectory, AI integration in the hiring sphere, and making friends at conferences.

In this Hangout, Arcenis talked about how his career journey has been wide as opposed to vertically narrow. He shared that this breadth of experience has given him confidence that he can quickly figure out any dataset. He feels it also taught him how to communicate effectively about data to people at different levels and across various domains. He also shared his tech stack at Indeed, including RStudio, Positron, AWS, Snowflake, Quarto for reporting, Shiny for apps, and Posit Connect for deploying them.

An attendee asked about the impacts of AI on the job search space, and Arcenis shared the AI at Work Report (linked below) from the Indeed Hiring Lab. He says, based on research, generative AI is expected to assist many people but only replace small segments of the workforce in the coming 5-10 years, and that entry-level knowledge work is predicted to be the most highly impacted area.

Resources mentioned in the video and zoom chat: Indeed Hiring Lab: AI at Work Report 2025 → https://www.hiringlab.org/2025/09/23/ai-at-work-report-2025-how-genai-is-rewiring-the-dna-of-jobs/ To Explain or to Predict? (Galit Shmueli, 2010) → https://arxiv.org/abs/1101.0891 Announcing the 2025 table and plotnine contests → https://posit.co/blog/announcing-the-2025-table-and-plotnine-contests/

If you didn’t join live, one great discussion you missed from the zoom chat was about the wide variety of data types data scientists work with. Attendees shared that their data included genomics, finance/trading, environmental/natural resources, e-commerce products, and medical/clinical data. What kind of data types do you work with?

► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu

Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co

Thanks for hanging out with us!

Timestamps 00:00 Introduction 06:16 “What do you like to do for fun?” 08:51 “What are the unique aspects of financial and economic data science?” 15:07 “What are econometrics?” 16:02 “Is the difference that hard sciences stats is trying to explain what happened where econometrics might be what might happen in the future?” 19:39 “Suggestions for making data friends and going to a conference alone.” 23:26 “Do you see any misconceptions about the job market online, specifically the ATS thing?” 29:52 “How has your varied career trajectory been an advantage or a challenge in data science?” 34:08 “How is the recent hype wave of AI integration manifesting in the hiring sphere?” 40:08 “What are the tools that you use in your job for reporting?” 41:42 “How do you know when it is time to pivot and leave your role because your skills are stagnating?” 45:56 “How would you persuade leadership to use R or Python?” 49:32 “Did you find yourself always trying to use more complex models when simpler ones would serve the audience better?”

Top AI Powered Coding Assistants

which have you tried? #datascience #datasciencetok #python #swe #datavisualization #dataanalytics #codinglife #vscode #ide #rstudio #positron #pycharm #jupyter #cursor #windsurf #positshorts

Positron rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Deploy a Streamlit web app with me

deploy all the things! #datascience #datasciencetok #python #swe #datavisualization #dataanalytics #codinglife #vscode #ide #rstudio #positron #pycharm #jupyternotebook #positshorts

Positron rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

10 tips for data science beginners

Do you have any to add? #datascience #datasciencetok #python #swe #datavisualization #dataanalytics #codinglife #vscode #ide #rstudio #positron #pycharm #jupyternotebook #positshorts

Positron rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Let’s try Positron Assistant for the first time together

lil ai buddy #datascience #datasciencetok #python #swe #datavisualization #dataanalytics #codinglife #vscode #ide #rstudio #positron #pycharm #jupyter #positshorts

Positron rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Data Exploration 101

Don’t skimp #datascience #datasciencetok #python #swe #datavisualization #dataanalytics #codinglife #vscode #ide #rstudio #positron #pycharm #jupyternotebook #positshorts

Positron rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Positron 101: Make your DS life easier

Personally, I think the Plots pane is #datascience #datasciencetok #python #swe #datavisualization #dataanalytics #codinglife #vscode #ide #rstudio #positron #pycharm #jupyter #positshorts

Positron rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Set up your first coding project in less than 10 minutes

What are you waiting for #datascience #datasciencetok #python #swe #datavisualization #dataanalytics #codinglife #vscode #ide #rstudio #positron #pycharm #jupyter #positshorts

Positron rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

I wrote this talk with an LLM - Hadley Wickham (useR! 2025 Keynote 1)

Presented by: Hadley Wickham (PositPBC)

In this keynote, I’ll explore the evolving relationship between data scientists, statisticians, and large language models through a unique experiment: this entire talk was created in collaboration with an LLM. From outline to slides, from code examples to key insights, I’ll share the practical realities of using AI as a thought partner in the R ecosystem.

Drawing on my experience developing tidyverse packages and teaching data science, I’ll demonstrate how LLMs can augment (rather than replace) the R user’s workflow. We’ll examine specific examples where AI assistance shines—rapid prototyping, documentation generation, and creative ideation—alongside areas where human expertise remains irreplaceable.

Most importantly, I’ll reflect on what this experiment reveals about the future of our community: How might AI change the way we teach R? What new skills should we prioritize? And how can we ensure that the tools we build remain accessible and empowering for all users?

Join me for this meta-exploration of AI’s role in our work, with honest reflections on both the promise and limitations of these new collaborators in our statistical computing journey.

This abstract was generated by Claude Sonnet 3.7 and lightly edited by me. I used the prompt: I am Hadley Wickham, chief scientist at RStudio/Posit and I’ve been invited to give a keynote on AI at the useR conference. Please write a talk abstract for a talk entitled ‘I wrote this talk with an LLM’

Hadley Wickham

Get starting with data - load data into your IDE

Finding the data is the hardest part, prove me wrong #datascience #datasciencetok #python #swe #datavisualization #dataanalytics #codinglife #vscode #ide #rstudio #positron #pycharm #jupyternotebook #positshorts

Positron rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

What is an IDE & which one should you use as a beginner

Did you have a first? #datascience #datasciencetok #python #swe #datavisualization #dataanalytics #codinglife #vscode #ide #rstudio #positron #pycharm #jupyter #positshorts

Positron rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Posit Conf 2025 Keynote Previews | Kieran Healy & Jonathan McPherson | Data Science Hangout

To join future data science hangouts, add it to your calendar here: https://pos.it/dsh - All are welcome! We’d love to see you! Thursdays at 12PM US Eastern

We were recently joined by upcoming Posit Conf 2025 keynote speakers Kieran Healy, Professor of Sociology at Duke University, and Jonathan McPherson, Software Architect at Posit PBC, to chat about how and why open-source IDEs like RStudio and Positron get made, how to do data visualization for discovery and explanation, what their keynotes are going to be about, and what’s next for Posit’s IDE development, including AI integration.

In this Hangout, Kieran talked about the trustworthy data visualization. He highlighted that while data visualization is a powerful way to condense and present information, often creating compelling and authoritative artifacts, phrases like “visual storytelling” can be problematic if they encourage presenting a predetermined narrative not fully supported by data. He emphasized that the trustworthiness of visualizations does not come solely from the techniques used or the software, but from a “web of social processes and individual commitments” that cannot be easily automated.

Jonathan talked about the future of Positron and its relationship with RStudio, addressing whether Positron is intended to replace RStudio. He clarified that the long-term goal for Positron is to make it the best Integrated Development Environment (IDE) for working with data in any language. He explained that Positron is built with an extensibility layer, allowing anyone to write plugins for new languages or capabilities, making it a robust and evolving data science workbench. It does not have all of RStudio’s features and makes different design trade-offs. RStudio, having evolved over decades, is highly optimized for specific R-based workflows and remains the best at what it does for those use cases.

Resources mentioned in the video and zoom chat: Posit Conference 2025 Registration → https://posit.co/conference/ Kieran Healy’s Website → https://kieranhealy.org Kieran Healy’s book “The Ordinal Society” → https://theordinalsociety.com/ Kieran Healy’s book “Data Visualization: A Practical Introduction” → https://socviz.co/ Jonathan McPherson’s LinkedIn → https://www.linkedin.com/in/jonathanmcpherson Joe Cheng’s AI Talk on Harnessing LLMs for Data Analysis → https://youtu.be/owDd1CJ17uQ?feature=shared TidyTuesday GitHub → https://github.com/rfordatascience/tidytuesday Positron IDE → https://positron.posit.co/ Will R Chase’s talk on making clear plots → https://www.youtube.com/watch?v=h5cTacaWE6I

If you didn’t join live, one great discussion you missed from the zoom chat was about the ongoing debate and practical tips for moving from presenting tables of numbers to visualizations. Community members shared various strategies, including using color-mapped tables as an intermediate step, providing both tables and visuals, and ensuring accessibility and interpretability for diverse audiences. Are you team tables or team graphs?

► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu

Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co

Thanks for hanging out with us!

Joe Cheng

Data Science at the Command Line and Polars | Jeroen Janssens | Data Science Hangout

To join future data science hangouts, add it to your calendar here: https://pos.it/dsh - All are welcome! We’d love to see you!

We were recently joined by Jeroen Janssens, Senior Developer Relations Engineer at Posit, to chat about his career journey from machine learning to developer relations, the advantages of using the command line for data science, his books “Data Science at the Command Line” and “Python Polars”, and advice for aspiring DevRel professionals.

In this Hangout, we explore the benefits of working on the command line versus not. Jeroen explained that while the initial command line interface might seem stark, it offers a very different and powerful way to interact with your computer. The Unix command line is ubiquitous across various systems, from Raspberry Pis to supercomputers. Its strength lies in the ability to connect tools together through standard output and input, allowing for quick and iterative solutions by combining specialized tools. This fosters an interactive nature with a short feedback loop and provides closer interaction with the file system, making ad hoc data exploration efficient.

Resources mentioned in the video and zoom chat: Jeroen’s LinkedIn → https://www.linkedin.com/in/jeroenjanssens/ Data Science at the Command Line → https://jeroenjanssens.com/dsatcl/ Python Polars: The Definitive Guide → https://polarsguide.com/ Plotnine → https://plotnine.org/ Winner of the 2024 plotnine Plotting Contest → https://posit.co/blog/winner-of-the-2024-plotnine-plotting-contest/ Talk about plotnine → https://www.youtube.com/watch?v=xdD8r84sqYY R for Data Science → https://r4ds.had.co.nz/ Jeroen’s plotnine translation of R for Data Science → https://jeroenjanssens.com/plotnine/ froggeR package → https://azimuth-project.tech/froggeR/ Reticulate → https://rstudio.github.io/reticulate/ Install Windows Subsystem for Linux (WSL) → https://learn.microsoft.com/en-us/windows/wsl/install UTM for macOS (Virtualization) → https://mac.getutm.app fish shell → https://fishshell.com/ Quartodoc → https://github.com/machow/quartodoc Focusmate (Accountability Partner Tool) → https://www.focusmate.com/ Surface Area of Luck → https://modelthinkers.com/mental-model/surface-area-of-luck CRAN R Extensions Manual → https://cran.r-project.org/doc/manuals/r-release/R-exts.html

If you didn’t join live, one great thing you missed from the zoom chat was people sharing their varied experiences with the command line, with many admitting they primarily use it for basic navigation or only when necessary, and some sharing helpful tools and tips for those less familiar. Let us know below if you’d like to hear more about this topic!

► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co

Thanks for hanging out with us!

Jeroen Janssens

Company-branded reports, apps, and dashboards made easier with brand.yml & Posit

You will learn: How to apply consistent company branding across reports, dashboards, and apps

Key Links:

GitHub Repo for Example: https://github.com/skaltman/brand-yml-demo
brand.yml GitHub repo: https://posit-dev.github.io/brand-yml/
Follow-along blog post: https://posit.co/blog/unified-branding-across-posit-tools-with-brand-yml/
Q&A after the Demo: https://youtube.com/live/kuEbRfmm4G4?feature=share

Additional Resources Mentioned in Q&A:

Quarto specific page on brand: https://quarto.org/docs/authoring/brand.html
Typography: https://posit-dev.github.io/brand-yml/brand/typography.html
brand.yml + pkgdown: https://github.com/rstudio/bslib/tree/main/pkgdown
LLM brand.yml prompt: https://posit-dev.github.io/brand-yml/articles/llm-brand-yml-prompt/
Inspiration/gallery: https://posit-dev.github.io/brand-yml/inspiration/

Why we think this is important: Consistent company branding in your reports and apps (with your logo, colors, and fonts) can help make your work look more professional, but are often tricky to get right.

Common challenges we’ve heard from the community:

Excessive manual effort: Applying colors, fonts, and logos across reports, apps, and dashboards takes time and is prone to errors.
Difficult to update: When brand guidelines change, it’s difficult to update all products consistently.
Team consistency: Ensuring all contributors follow branding guidelines is challenging to manage.

How to join future events: We host workflow demos the last Wednesday of every month. You can add them to your calendar with this link: https://www.addevent.com/event/Eg16505674

Full playlist of workflow demo recordings: https://www.youtube.com/playlist?list=PL9HYL-VRX0oRsUB5AgNMQuKuHPpNDLBVt

Have suggestions? Comment below.

Thank you for joining us!

Shiny community, hackathons, and his AI mindset | Joe Cheng | Data Science Hangout

To join future data science hangouts, add it to your calendar here: https://pos.it/dsh - All are welcome! We’d love to see you!

We were recently joined by Joe Cheng, CTO at Posit, to chat about the Shiny contest, the use of AI in data science, and designing hackathons for learning new technologies. We were joined by several past and present Shiny contest winners who gave great advice on how to get started if you want to participate (and we really hope you do)!

In this Hangout, we explore the evolution of the Shiny contest since its inception, including what made the 2024 submissions unique and the ways the contest encourages community contribution and learning. Joe also shared about his personal journey from feeling skepticism about AI to seeing and embracing its potential. We got some amazing questions from the Hangout attendees! We hope you join us live next time to ask some of your own questions

Resources mentioned in the video and zoom chat:
2024 Shiny Contest Winners → https://posit.co/blog/winners-of-the-2024-shiny-contest/
Joe’s AI Hackathon Slides → https://jcheng5.github.io/llm-quickstart/quickstart.html Shiny Assistant → https://gallery.shinyapps.io/assistant/ Isabella’s blog post on prototyping with Shiny Assistant → https://posit.co/blog/ai-powered-shiny-app-prototyping/ Posit Conf Workshops → https://reg.rainfocus.com/flow/posit/positconf25/attendee-portal/page/sessioncatalog?tab.day=20250916&search.sessiontype=1675316728702001wr6r Shiny Conference 2025 → https://www.shinyconf.com/ Call for Speakers Shiny Conf 2025 → https://sessionize.com/shiny-conf-2025/ Shiny Tableau → https://rstudio.github.io/shinytableau/ Echarts4r → https://echarts4r.john-coene.com Elmer package on Github → https://github.com/tidyverse/ellmer

All the Shiny app links mentioned in the video and zoom chat: Eric Nantz 2021 Shiny Contest Submission → https://forum.posit.co/t/the-hotshots-racing-dashboard-shiny-contest-submission/104925 Eric Nantz’s R/Pharma conference keynote on AI → https://youtu.be/AfMa1CVUdXU?si=ThLsKFyonntxzBUF Eric Nantz’s Haunted Places app → https://youtu.be/vX09QGMuOfo?si=K5_uPfK5bcfZZ92l Umair Durrani’s Shiny Storytelling app → https://umair.shinyapps.io/storytimegcp/ Umair’s Blue Sky profile → https://bsky.app/profile/transport-talk.bsky.social Umair’s Shiny meetings project on Github → https://github.com/shiny-meetings/shiny-meetings Abby Stamm’s Shiny Accessibility app → https://github.com/ajstamm/shiny-a11y-app

If you didn’t join live, one great discussion you missed from the zoom chat was about everyone’s favorite interactive plotting tools. Someone asked whether Plotly was the best option, and lots of people said they loved ggiraph, echarts4r, ObservableJS, and others. What about you?! What’s your favorite interactive plotting library?

► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu

Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co

Thanks for hanging out with us!

Joe Cheng

Sharpening your axe and the BAU trap | Steph Locke | Data Science Hangout

To join future data science hangouts, add it to your calendar here: https://pos.it/dsh - All are welcome! We’d love to see you!

We were recently joined by Steph Locke, Digital and App Innovation Leader at Microsoft, to chat about how to persuade your manager to give you more time for professional development, why “business as usual” (BAU) work can choke development, and how killing projects is actually a good thing.

In this Hangout, we explore the importance of investing time in skill development and how this can lead to long-term gains in efficiency and quality. Steph shares advice on how to talk to managers about the value of “sharpening your axe,” and why it is more efficient to train in order to do things well initially than to spend time on continuous maintenance of subpar work.

Is a bunch of “business as usual” work bogging down the potential output of development teams? This is where Steph’s concept of investing in high quality work up-front comes in. She talks about the dangers of rushing to release products that aren’t built with high quality and low ongoing maintenance in mind: “…when [people] don’t necessarily have time to invest in doing things in a way that’s going to have high quality and low maintainability requirements and is easy to extend and create new things, when people aren’t doing that, anything they ship is going to then cost them more time to do something to to look after whatever they’ve shipped. That then gives them less time to ship the next thing.” If this resonates with you, give Steph a follow on LinkedIn and make sure you read her article.

Resources mentioned in the video and zoom chat: Steph’s article on how BAU chokes development → https://www.linkedin.com/pulse/developer-velocity-choked-bau-stephanie-locke-fsiqe Posit’s documentation on using copilot in the RStudio IDE → https://docs.posit.co/ide/user/ide/guide/tools/copilot.html A YouTube video about using Shiny apps for data entry into a backend database → https://www.youtube.com/watch?v=zDJc8sXh2qw

If you didn’t join live, one great discussion you missed from the zoom chat was around the discrepancies in pay and responsibilities between individual contributor (IC) and management tracks in tech companies, and the perception that management roles often have better compensation and advancement opportunities, even when the work may not be more valuable than IC contributions. Let us know below if you’d like to hear more about this topic! Is there a need for more non-people-leader technical leadership roles for highly skilled individual contributors?

► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu

Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co

Thanks for hanging out with us!

Quarto Websites 2: Add pages and navigation | Charlotte Wickham | Posit

Now you’ve got a homepage, you’ll likely want to add some other pages. In this video, learn how to add pages to your website, and help people find them, by adding them to your website navigation.

In this video: 1:00 Add a page to your website 2:54 Your file structure determines your URL structure 5:49 Add a link to your page in navigation 7:50 Customize navigation item text and icon 9:12 Control where items appear in the navigation bar 10:16 Navigation bar options 11:11 Switch to side navigation 12:22 Other types of navigation 16:30 Wrap Up

Links: List of icons you can use in navigation items: https://icons.getbootstrap.com/ Top navigation bar options: https://quarto.org/docs/websites/website-navigation.html#top-navigation Quarto website navigation: https://quarto.org/docs/websites/website-navigation.html

Code: Starter source code: https://github.com/cwickham/quarto-website-video/tree/v0.1 Final source code: https://github.com/cwickham/quarto-website-video/tree/v0.2

For more in-depth coverage and slides check out: https://posit-conf-2024.github.io/quarto-websites/

Do you need a professional website to showcase your work? If you’ve used Quarto to produce a document, you’ve already got the technical skills to create a Quarto website. In this video series, you’ll learn everything else you need to build a website and customize its appearance.

This video series is for you if you:

Have used Quarto to generate documents (e.g. HTML, PDF, MS Word etc.)
Are comfortable editing plain text documents (e.g .qmd) in your IDE (e.g. RStudio, Visual Studio Code etc.)
Want to walk away with your own personal website

Taught by: Charlotte Wickham (https://www.cwick.co.nz/ ) Emil Hvitfeldt (https://emilhvitfeldt.com/ )

Videos in this series:

Build your homepage [https://youtu.be/l7r24gTEkEY]
Add pages and navigation [https://youtu.be/k65E-8PXZmA] 3: Customize appearance with CSS/SCSS [https://youtu.be/pAN2Hiq0XGs] 4: Add lists of content with listings [https://youtu.be/bv_Cw-3HI1Y]

Charlotte Wickham, Emil Hvitfeldt

Quarto rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Quarto Websites 4: Add lists of content with listings | Charlotte Wickham | Posit

Adding a listing page to your website is a great way to showcase your projects, talks, publications or blog posts. In this video you’ll learn how to create a listing page in Quarto and see two ways to populate it with content: Quarto documents, or a yaml file.

In this video: 0:50 Use a listing to add a blog 3:36 Listing options 5:47 Why use a listing? 7:22 Use a YAML file to populate a project portfolio 9:50 Customize the display of a listing 12:10 Advanced customization of listings 13:42 Remove pages

Links: Listings: https://quarto.org/docs/websites/website-listings.html Andrew Heiss’ teaching listing: https://www.andrewheiss.com/teaching/

Code: Starter source code: https://github.com/cwickham/quarto-website-video/tree/v0.3 Final source code: https://github.com/cwickham/quarto-website-video/tree/v0.4

For more in-depth coverage and slides check out: https://posit-conf-2024.github.io/quarto-websites/

Do you need a professional website to showcase your work? If you’ve used Quarto to produce a document, you’ve already got the technical skills to create a Quarto website. In this video series, you’ll learn everything else you need to build a website and customize its appearance.

This video series is for you if you:

Have used Quarto to generate documents (e.g. HTML, PDF, MS Word etc.)
Are comfortable editing plain text documents (e.g .qmd) in your IDE (e.g. RStudio, Visual Studio Code etc.)
Want to walk away with your own personal website

Taught by: Charlotte Wickham (https://www.cwick.co.nz/ ) Emil Hvitfeldt (https://emilhvitfeldt.com/ )

Videos in this series:

Build your homepage [https://youtu.be/l7r24gTEkEY]
Add pages and navigation [https://youtu.be/k65E-8PXZmA] 3: Customize appearance with CSS/SCSS [https://youtu.be/pAN2Hiq0XGs] 4: Add lists of content with listings [https://youtu.be/bv_Cw-3HI1Y]

Charlotte Wickham, Emil Hvitfeldt

Quarto rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Quarto Dashboards 3: Theming and Styling | Mine Çetinkaya-Rundel | Posit

Theming and styling Quarto dashboards built with R and/or Python.

Before watching this video, you might want to watch Parts 1 & 2.

This video takes you through

0:00 - Theming (including Bootswatch themes, light/dark mode, customizing themes with SCSS) 3:55 - Styling 4:55 - Live coding demo

Slides can be found at https://mine.quarto.pub/quarto-dashboards/3-theming-styling and the starter documents for the accompanying exercises at https://github.com/mine-cetinkaya-rundel/olympicdash .

Materials for all parts of the videos can be accessed at https://mine.quarto.pub/quarto-dashboards .

You already analyze and summarize your data in computational notebooks with R and/or Python. What’s next? You can share your insights or allow others to make their own conclusions in eye-catching dashboards and straight-forward to author, design, and deploy Quarto Dashboards, regardless of the language of your data processing, visualization, analysis, etc. With Quarto Dashboards, you can create elegant and production-ready dashboards using a variety of components, including static graphics (ggplot2, Matplotlib, Seaborn, etc.), interactive widgets (Plotly, Leaflet, Jupyter Widgets, htmlwidgets, etc.), tabular data, value boxes, text annotations, and more. Additionally, with intelligent resizing of components, your Quarto Dashboards look great on devices of all sizes. And importantly, you can author Quarto Dashboards without leaving the comfort of your “home” – in plain text markdown with any text editor (VS Code, RStudio, Neovim, etc.) or any notebook editor (JupyterLab, etc.).

This workshop will walk you through building an increasingly complex dashboard using various layout options and deploy them as static web pages (with no special server required) as well as with a Shiny Server on the backend for enhanced interactivity.

This course is for you if you:

do data analysis in computational notebooks
share your results with your audience in static or interactive dashboards
want to improve the design, user interface, and experience of your dashboards

Mine Çetinkaya-Rundel

ggplot2 leaflet Quarto rstudio Shiny Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Simon Couch: Fair machine learning

Simon Couch Fair machine learning Cascadia R Conf 2024 Regular talk, 10:25-10:40

In recent years, high-profile analyses have called attention to many contexts where the use of machine learning deepened inequities in our communities. A machine learning model resulted in wealthy homeowners being taxed at a significantly lower rate than poorer homeowners; a model used in criminal sentencing disproportionately predicted black defendants would commit a crime in the future compared to white defendants; a recruiting and hiring model penalized feminine-coded words—like the names of historically women’s colleges—when evaluating résumés. In late 2022, a group of Posit employees across teams, roles, and technical backgrounds formed a reading group to engage with literature on machine learning fairness, a research field that aims to define what it means for a statistical model to act unfairly and take measures to address that unfairness. We then designed functionality and resources to help data scientists measure and critique the ways in which the machine learning models they’ve built might disparately impact people affected by that model. This talk will introduce the research field of machine learning fairness and demonstrate a fairness-oriented analysis of a model with tidymodels, a framework for machine learning in R.

Pronouns: he/him Chicago, IL Simon Couch is a software engineer at Posit PBC (formerly RStudio) where he works on open source statistical software. With an academic background in statistics and sociology, Simon believes that principled tooling has a profound impact on our ability to think rigorously about data. He authors and maintains a number of R packages and blogs about the process at simonpcouch.com

Simon Couch

How to automatically detect data changes for your Shiny Calendar app (ft: Jira, pins, Posit Connect)

Do you manage constantly changing data and need your Shiny app to automatically update?

On August 28th at 11 am ET, Isabella Velásquez demonstrated a streamlined workflow for handling frequently updated datasets in Shiny. You’ll see how to simplify your process for keeping dynamic data current and how to reflect those changes in your app or dashboard.

Github repo to follow along or make it your own! https://github.com/posit-marketing/shiny-calendar

Timestamps: 1:03 - Introduction of the project (end goal: calendar that integrates with Jira to track and visualize a schedule for managing deadlines of content) 2:26 - Pulling data from an API in Python or R 2:56 - Introduction to pins (and scheduling automatic refreshes of it in Posit Connect) 4:30 - Introduction to Shiny for both Python and R (its power lies in reactivity) 5:10 - Enter pin_reactive_read() function 6:12 - Introduction to Posit Team 6:37 - Opening a new session within Posit Workbench and overview of code needed to create the calendar [Github repo: https://github.com/posit-marketing/shiny-calendar] 12:07 - toastui package used for Calendar (ex: adding colors to labels) 12:47 - Writing clean data to Posit Connect board 13:16 - Rendered Quarto doc for pulling Jira data from the board 14:00 - Deploying Quarto to Posit Connect (using push button deployment) and scheduling to run 16:54 - Using the data just pinned in the Shiny app 21:17 - Overview of Shiny Content Calendar application 23:04 - Creating an issue in Jira board and adjusting schedule in Posit Connect to show new item in Shiny calendar. 24:00 - pin_reactive_read automatically detects change and shows it in the Shiny app

During this workflow demo, you will learn:

How {pins} stores and retrieves ever-changing data with ease
How to use pin_reactive_read() in Shiny to automatically trigger updates when your data changes
How Posit Connect can be set up to rerun your {pin} on a schedule, ensuring your app is updated without disruption
How to deploy an always-up-to-date app for seamless sharing with stakeholders

Other helpful links: pin_reactive_read: https://pins.rstudio.com/reference/pin_reactive_read.html Basic reactivity in Mastering Shiny: https://mastering-shiny.org/basic-reactivity.html#reactive-programming Understanding reactivity on the Shiny site: https://shiny.posit.co/r/articles/build/understanding-reactivity/ Github repo: https://github.com/posit-marketing/shiny-calendar Shiny Calendar: https://pub.demo.posit.team/public/shiny-calendar/ Q&A Recording

If you like these workflow demos, you can join us monthly! They happen the last Wednesday of every month at 11 am ET. Add it to your calendar here: https://pos.it/team-demo

Isabella Velásquez

How to build business reports with Quarto

How do you create the report look and feel that your leadership team expects?

Christophe Dervieux at Posit joined us on Wednesday, March 27th to share how to style Quarto docs and send scheduled email updates to required stakeholders.

Helpful resources: ️ Getting started with Quarto: https://quarto.org/docs/get-started ️ User guide: https://quarto.org/docs/guide ️ Github repo with this example: https://github.com/quarto-examples/quarto-business-report ️ Q&A Recording: https://youtube.com/live/bqk75igHo8M?feature=share ️ If you’re interested in learning more about Posit Connect, pos.it/chat-with-us

Timestamps: 02:00 - What is Quarto? 02:40 - How does Quarto work? (.md, .qmd or .ipynb as source files) 03:45 - How to get started with Quarto if you’re new to it? 04:51 - Using Quarto from within RStudio 05:00 - Using Quarto within VSCode with extension & Jupyter Lab extension 05:37 - Visual Editor for Quarto 07:22 - Customer Tracker Report in RStudio IDE (using source code: https://github.com/quarto-examples/quarto-business-report ) 10:39 - Making Quarto report downloadable as Excel doc (adding download button) 11:37 - Adding a table of contents to your Quarto report 12:23 - Spread Quarto graphics across page so that they go into margin 13:10 - Customizing theme in Quarto (Bootstrap 5) https://quarto.org/docs/output-formats/html-themes.html 14:45 - Increasing font size in Quarto report 17:10 - Customizing theme rules 21:16 - Publishing Quarto report to Posit Connect 22:35 - Scheduling Quarto report to automatically run 23:35 - Preview of default / non-customized email 23:58 - Customizing your Quarto email 26:52 - Customized email preview that Posit Connect can send 27:56 - Setting access controls for Quarto report on Connect and when you want emails to send

Resources shared in Q&A session: Community discussion for ongoing Quarto questions: https://forum.posit.co/tag/quarto Quarto document language: https://quarto.org/docs/authoring/language.html babelquarto (for multilingual project, book, or website): https://docs.ropensci.org/babelquarto/ Quarto Manuscripts: https://quarto.org/docs/manuscripts/ Managing Execution in Quarto: https://quarto.org/docs/projects/code-execution.html Quarto Extensions: https://quarto.org/docs/extensions/ Project Profiles in Quarto: https://quarto.org/docs/projects/profiles.html Custom branding deeper dive: https://www.youtube.com/watch?v=V82BBU9ldcM Quarto Parameters: https://quarto.org/docs/computations/parameters.html Lua Development: https://quarto.org/docs/extensions/lua.html Quarto CLI Discussions on Github: https://github.com/quarto-dev/quarto-cli/discussions Data Science Hangout every Thursday at 12 ET: https://posit.co/data-science-hangout/ Get connected with others at your org using Posit: pos.it/connect-us

There is no need to register; join us here on YouTube at the time above or you can add to your calendar using the link below:

pos.it/team-demo

We host these Workflow Demos on the last Wednesday of every month, so you can use the link above to add the recurring event as well. If you ever have ideas for topics or questions about them, you can comment below in YouTube!

Christophe Dervieux

GitHub Copilot on Posit Cloud

Speed up your coding projects in the RStudio IDE on Posit Cloud with GitHub Copilot, an AI coding assistant.

Learn more in our blog post: https://posit.co/blog/github-copilot-on-posit-cloud/

Posit Cloud: https://posit.cloud/ GitHub Copilot: https://github.com/features/copilot RStudio User Guide: https://docs.posit.co/ide/user/ide/guide/tools/copilot.html

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

J.J. Allaire - Keynote: Dashboards with Jupyter and Quarto | PyData NYC 2023

www.pydata.org

https://drive.google.com/file/d/1O_ed6OKEXZBIzKn6yyF9W7f6NaNV-L3J/view?usp=drive_link

Keynote by JJ Allaire

J.J. is the Founder and CEO of Posit (which you might only know by its previous name, RStudio). J.J. is now leading the Quarto project, a Jupyter-based scientific and technical publishing system. In this talk, J.J. will introduce Quarto Dashboards, an easy way to create production quality dashboards from Jupyter Notebooks. J.J. will also more broadly discuss Posit’s recent work in the open source PyData ecosystem along with plans for significantly expanding that work in the future.

PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.

PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.

00:00 Welcome! 00:10 Help us add time stamps or captions to this video! See the description for details.

Want to help add timestamps to our YouTube videos to help with discoverability? Find out more here: https://github.com/numfocus/YouTubeVideoTimestamps

JJ Allaire

The River is our Relative | The Story of the Penobscot Nation

The Penobscot Nation has been around for millennia.

On Indian Island, a river gives the Penobscot people life.

“The tribe views the river like its own highway. We look at her as a relative, a sister, someone that gives life. Sea Lamprey, Sturgeon, Striper Bass.”

Jan Paul works for the Penobscot Nation in the Department of Natural Resources and Water Quality.

She’s using open-source technology, like R and the RStudio IDE, to keep the river healthy, so in turn, it can keep her people healthy.

This is Jan’s inspiring story, featured for the first time at posit::conf(2023).

Credits: Creative Director: Jason Restivo & Scout Studios Art Director & Illustrator: I-Nu Yeh Animator: Theera (Jay) Keeree Content Producer & Writer: Shannon McGarvey Sound Mixer: Caleb Theimer

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

How to keep data up-to-date with 6 pins workflows (aka avoid data-final.csv & data-final-final.csv)

Ever chase a CSV through a series of emails or had to decide between data-final.csv and data-final-final.csv?

Pins (both for R & Python) is a package that a bunch of people at the Data Science Hangout wish they knew about earlier. It allows you to publish and share objects (data, models, etc.) across projects and with your colleagues.

Pins package (R) - https://pins.rstudio.com/ Pins package (Python) - https://pypi.org/project/pins/

Timestamps: 1:15 - Posit Team Overview 2:18 - Introduction to pins (scenarios where you might want to consider using pins) 4:42 - Installing pins 6:24 - Workflow #1: Pinning an R Object to Posit Connect (from RStudio) 10:23 - Workflow #2: Pinning a Python Object to Posit Connect (from JupyterLab) 15:19 - Workflow #3: Reading in a Python pin in an R Session 16:07 - Workflow #4: Reading an R pin into a Python session 17:50 - Workflow #5: Pin versioning 21:50 - Workflow #6: Automating the pin writing process (through job scheduling on Connect)

Helpful resources: Q&A for this session on August 30th: https://youtube.com/live/8hc9ck1ZNLE Blog post on pinning an R dataset to Posit Connect: https://posit.co/blog/pins-posit-connect/

Many people find this useful for:

Scheduling reports that need to be updated with the newest data each week
Reusing data across multiple projects or content (Shiny app, Jupyter Notebook, Quarto doc, etc.)

We host these end-to-end workflow demos on the last Wednesday of every month. No registration is required to attend - simply add it to your calendar using this link: pos.it/team-demo

If you ever have ideas for topics or questions about them, please let us know in the comments

RStudio IDE | Background and Posit Workbench Jobs

This is a great feature for any long running scripts you may have, including, for example, a machine learning training step. It also allows you to run multiple jobs simultaneously.

Now the main difference between background jobs and workbench jobs is that background jobs run as a child process within the same session.

And workbench jobs run as a new session, either on the local server, or off host if you’re running in a multi server or clustered environment

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Teaching the tidyverse in 2023 | Mine Çetinkaya-Rundel

Recommendations for teaching the tidyverse in 2023, summarizing package updates most relevant for teaching data science with the tidyverse, particularly to new learners.

00:00 Introduction 00:46 Using addins to switch between RStudio themes (See https://github.com/mine-cetinkaya-rundel/addmins for more info) 01:40 Native pipe 03:08 Nine core packages in tidyverse 2.0.0 07:15 Conflict resolution in the tidyverse 11:30 Improved and expanded *_join() functionality 22:05 Per operation grouping 27:41 Quality of life improvements to case_when() and if_else() 31:41 New syntax for separating columns 34:51 New argument for line geoms: linewidth 36:08 Wrap up

See more in the Teaching the tidyverse in 2023 blog post https://www.tidyverse.org/blog/2023/08/teach-tidyverse-23

Mine Çetinkaya-Rundel

rstudio tidyverse tidyverse.org Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

[84] Reproducible Publications with Python and Quarto (Thomas Mock)

Join our Meetup group: https://www.meetup.com/data-umbrella

Tom Mock: Reproducible Publications with Python and Quarto

Resources#

slides: https://thomasmock.quarto.pub/python-umbrella/#/

Full transcript#

https://blog.dataumbrella.org/quarto-blog

About the Event#

Quarto is an open-source scientific and technical publishing system that builds on standard markdown with features essential for scientific communication. The system has support for reproducible embedded computations, equations, citations, crossrefs, figure panels, callouts, advanced layouts, and more. In this talk we’ll explore the use of Quarto with Python, describing both integration with IPython/Jupyter and the Quarto VS Code extension. Users can author Jupyter notebooks or documents as plain text markdowns with code in Python, R, Julia or Observable. Quarto includes the ability to publish high-quality articles, reports, presentations, websites, blogs, and books in HTML, PDF, MS Word, ePub, Reveal.js and more.

Timestamps#

00:00 Data Umbrella introduction 03:41 Introduce the speaker, Thomas Mock 04:14 Thomas begins 05:14 RStudio is now Posit 05:55 What is Quarto? 07:13 Origins of Quarto 08:31 Goal: Computation Document 09:09 Goal: Scientific Markdown 10:03 Goal: Single Source Publishing 10:33 Simple example of what Quarto looks like (YAML, Markup, Markdown, code chunks) 12:29 Simple example: multi-format (output formats: html, pdf, docx, epub, pptx, revealjs) 13:16 List of what is possible with Quarto 14:02 So, what is Quarto: quarto is a language-agnostic command line interface (CLI) 15:27 Basic Quarto workflow 16:43 Difference between “render” and “preview” 17:16 IPython 18:43 Stored/frozen computation and reproducibility 20:36 A *.qmd is a plain text file 21:28 Quarto doesn’t have to be plain text 22:12 Rendering pipeline 22:57 What to do with my existing .ipynb? 24:23 Comfort of your own workspace: JupyterLab, Visual Studio Code, 25:00 Auto-completion in RStudio + VSCode 26:01 Quarto Extensions and Visual / Live Editor 27:19 Quarto, unified document layout 29:54 Quarto, unified syntax across Markdown and code 31:11 Built-in vs Custom 33:01 Extending Quarto with Extensions 33:51 Interactivity, Jupyter Widgets (with plots, matplotlib, etc) 34:15 Interactivity, Observable 35:01 Interactivity, on the fly Observable “widgets” 36:24 Parameters - one source, many outputs 37:36 Rendering with parameters 38:27 Quarto Publish 38:57 Quarto, crafted with love and care (the team) 39:30 Quarto Resources (installation) 39:44 Quarto resources: video tutorials 40:13 Q: Can Quarto documents be shared like Overleaf docs and can users import article templates for specific journals into Quarto? 41:39 new! Manuscript option to bundle an entire project together (bundle can be shipped to a journal) 42:48 Q: Is Quarto git friendly? 43:28 Q: Has Quarto already been used in published scientific work? 44:14 publishing books with Quarto 44:22 Q: Any general suggestions for outputting to docx (Word)? 45:20 Q: Any tips on how Quarto can help conda users? 46:14 Q: Can you use GitHub Actions with Quarto? 47:18 Q: Can you have individual environments for each blog post? 49:50 Download CLI (command line interface) for Quarto 51:10 Example Gallery 51:44 nbdev project 53:14 Quarto blog, Shinylive extension 55:12 Q: How can I use Quarto to write scientific papers?

About the Speaker: Tom Mock#

Twitter: https://twitter.com/thomas_mock
GitHub: https://github.com/jthomasmock

#python #quarto #rstats

How to schedule a Quarto document on Posit Connect

Episode 3: Scheduling a Quarto Doc (with custom branding) on Posit Connect Led by: Ryan Johnson, Data Science Advisor

Live Q&A recording: https://youtu.be/JUgChPCa3vs

Follow-up links:

Posit Team: https://posit.co/products/enterprise/team/
Talk to us directly: https://posit.co/schedule-a-call/?booking_calendar__c=RST_YT_Demo
Follow-along blog post: https://posit.co/blog/scheduling-a-quarto-doc-on-posit-connect/
Source code for example: https://github.com/ryjohnson09/quarto-job-scheduling
Posit Team demo resources: pos.it/demo-resources

Timestamps: 1:45 - What is Posit Team? 3:31 - The data we are analyzing: R package download data from within Posit Package Manager (via experimental API) 4:44 - What is Quarto? (we are creating two Quarto docs today) 8:06 - Create a new session in Posit Workbench 8:56 - Create a new project within the RStudio IDE 10:13 - Create the first Quarto document (ETL: extra, transform, load workflow) 13:55 - Publish the first Quarto doc to Posit Connect 16:50 - Take the package download results and pin it to Posit Connect (overview of pins) 18:30 - Schedule the first Quarto doc to run every day at 7am on Posit Connect 20:49 - Create the second Quarto document (report for stakeholders) 23:41 - View the first “boring report” 24:54 - Using a custom Posit format for the report 27:39 - First look of the themed report without modifications 29:00 - Adding Posit themed colors to the gt table 31:35 - Apply code chunk options to hide code from output 32:44 - Publish the second Quarto document to Posit Connect 34:28 - View finished custom branded Quarto document 34:44 - Define specific users who have access to the Quarto doc on Posit Connect 35:10 - Schedule the second Quarto doc to read in the pinned data from the first Quarto doc 36:30 - Example emailed report from the scheduled Quarto report

On the last Wednesday of every month, we host a Posit Team demo and Q&A session that is open to all. You can use this to add the event to your own calendar.

Who are these monthly demos for? Everyone is welcome to join us - regardless of industry, background, or experience!

We will discuss topics that will speak to:

Data scientists and administrators new to Posit Team or are looking to grow their understanding of our toolchain,
Teams searching for a new analytic platform built to support open-source data science,
And, those that are just curious about Posit Team!

What you can expect from the monthly Posit Team demo:

During the session, we will walk through an end-to-end data science workflow and demo the core functionality of Posit Team while highlighting some of our latest features!

While each session’s content will vary slightly, here are a few core topics we will address each month:

Open Source Analytics: The future of data science is open source. We’ll discuss methods for leveraging open-source tools and packages in a secure and scalable way!
Deployment: How to share the amazing data science assets your Team has built, including web applications, machine learning models, APIs, and more!
Data Access: Data comes in various forms and is stored in various ways. We’ll discuss best practices for accessing, reading, and writing data!
Job Scheduling: Do you have recurring data science jobs? We’ll show you how to automate these processes using Posit Connect.

What is Posit Team?

Posit Team is a bundle of our popular professional software (Posit Workbench, Posit Connect, and Posit Package Manager) for developing data science projects, publishing data products, and managing packages.

Registration is not required. The event will be streamed through YouTube Premiere

How to deploy a Shiny application using clinical trial data to Posit Connect

Episode 2: Publishing a Shiny application in R to Posit Connect - Using Clinical Trial Data Led by: Ryan Johnson, Data Science Advisor

Follow-up links:

Posit Team: https://posit.co/products/enterprise/team/
Talk to us directly: https://posit.co/schedule-a-call/?booking_calendar__c=RST_YT_Demo
Follow-along blog post: https://posit.co/blog/publishing-a-shiny-app-in-r-with-clinical-trial-data-to-posit-connect/
Source code for example: https://github.com/ryjohnson09/adam_analysis
Posit Team demo resources: pos.it/demo-resources

Timestamps: 1:35 - High-level overview of Posit Team 3:30 - Overview of clinical trial data used 5:31 - Opening up RStudio session on Posit Workbench 7:51 - Creating a new directory in RStudio 9:16 - Upload the ADaM dataset to Posit Workbench 10:17 - Using packages from a validated repository on Posit Package Manager 12:37 - Install packages for your Shiny application 13:49 - Pasting the code for the Shiny application (https://github.com/ryjohnson09/adam_analysis ) 16:16 - Publishing your Shiny application to Posit Connect 18:36 - Changing access controls to published Shiny application 20:25 - Using renv to record your R environment

On the last Wednesday of every month, we host a Posit Team demo and Q&A session that is open to all. You can use this to add the event to your own calendar.

Who are these monthly demos for? Everyone is welcome to join us - regardless of industry, background, or experience!

We will discuss topics that will speak to:

Data scientists and administrators new to Posit Team or are looking to grow their understanding of our toolchain,
Teams searching for a new analytic platform built to support open-source data science,
And, those that are just curious about Posit Team!

What you can expect from the monthly Posit Team demo:

During the session, we will walk through an end-to-end data science workflow and demo the core functionality of Posit Team while highlighting some of our latest features!

While each session’s content will vary slightly, here are a few core topics we will address each month:

Open Source Analytics: The future of data science is open source. We’ll discuss methods for leveraging open-source tools and packages in a secure and scalable way!
Deployment: How to share the amazing data science assets your Team has built, including web applications, machine learning models, APIs, and more!
Data Access: Data comes in various forms and is stored in various ways. We’ll discuss best practices for accessing, reading, and writing data!
Job Scheduling: Do you have recurring data science jobs? We’ll show you how to automate these processes using Posit Connect.

What is Posit Team?

Posit Team is a bundle of our popular professional software (Posit Workbench, Posit Connect, and Posit Package Manager) for developing data science projects, publishing data products, and managing packages.

Registration is not required. The event will be streamed through YouTube Premiere

Quarto for Academics | Mine Çetinkaya-Rundel

This video highlights some of Quarto’s features that are especially useful for academics, as educators and as researchers.

00:00 Introduction 01:06 Linking to documentation from code with code-link 02:27 Informative YAML errors and YAML completion 03:41 Creating Quarto slides with revealjs 04:20 PDF export of HTML slides 05:03 Annotating slides with chalkboard 06:40 Advancing slides for your audience with multiplex 07:38 Revealing code in slides with echo 08:16 Highlighting code with code-line-numbers 09:37 Customizing output location with output-location 10:42 Showing code chunk fences with echo: fenced 12:03 Code annotation 14:11 Authoring manuscripts with Quarto templates 16:49 Inserting citations from Zotero or from a DOI with the RStudio Visual Editor 19:59 Wrap up and learning resources

Mine Çetinkaya-Rundel

Quarto revealjs rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Get started with Quarto | Mine Çetinkaya-Rundel

This video walks you through creating documents, presentations, and websites and publishing with Quarto. The video features authoring Quarto documents with executable R code chunks using the RStudio Visual Editor (https://quarto.org/docs/visual-editor/) .

00:00 Introduction 00:34 Authoring a document with Quarto 01:13 Using the RStudio visual editor 04:13 Code chunks and chunk options 06:31 Inserting cross references to figures and tables (https://quarto.org/docs/authoring/cross-references.html ) 08:56 Adding a citation from a DOI (https://quarto.org/docs/visual-editor/technical.html#citations ) 10:10 Seamlessly switching between output formats 10:58 Creating Quarto presentations (https://quarto.org/docs/presentations/ ) 14:36 Customizing the output location of code in presentations (https://quarto.org/docs/presentations/revealjs/#output-location ) 16:09 Creating a website from scratch (https://quarto.org/docs/websites/ ) 19:19 Creating multi-format documents (https://quarto.org/docs/output-formats/html-multi-format.html ) 20:22 Publishing the website to QuartoPub (https://quarto.org/docs/publishing/quarto-pub.html )

Mine Çetinkaya-Rundel

Quarto revealjs rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

[79] Create a Python Web App Using Shiny (Gordon Shotwell)

Join our Meetup group for more events! https://www.meetup.com/data-umbrella

Resources#

About the Event#

Shiny makes it easy to build interactive web applications with the power of Python’s data and scientific stack. If you want to develop a python web application you usually need to choose between simple, limited frameworks like Streamlit and more extensible frameworks like Dash. This can cause a lot of problems if you get started with a simple framework but then discover that you need to refactor your application to accommodate the next user request. Shiny for Python differs from other frameworks because it has tremendous range. You can build a small application in a few minutes with the confidence that the framework can handle much more complex problems. In this workshop we will go through the core limitations of Streamlit, and build a Shiny app which avoids those limitations.

Timestamps#

00:00 Welcome 00:23 Reshama introduces Data Umbrella 03:45 Reshama introduces Gordon Shotwell 04:21 Gordon Shotwell begins 04:29 The motivation to develop Shiny for Python 06:05 The main strength of both the R and Python library 06:56 What Gordon Shotwell will build during his presentation 07:25 Shiny documentation website 08:01 QuickStart for R users showing differences between the R and Python libraries
08:44 All the function reference in Shiny 09:08 Demo starts 09:50 Virtual environment
10:36 How to start shiny app in the terminal 11:15 Install shiny extension in VS Code which makes it easier to preview the web app 11:36 How the output function works on the preview app to execute 12:22 Penguin dataset description for the demo 12:45 Modules/submodules shiny app is built on 13:04 How to add a sidebar layout (sidebar, panel sidebar and panel main) 13:43 How to read in the data and the output functions 14:31 How to define some server logic 14:59 The conventional shiny rule 16:30 Use of slide input 17:50 Where the reactive magic comes in 19:30 Important note on what can really slow down your shiny app 20:14 Importance of Python data copy method when using external dataset 21:01 Important note to avoid dependency inside the render function 21:30 Q&A 29:35 Adding a plot to the output: The UI sides 30:12 Adding a plot to the output: The render sides 32:16 The core principle of reactivity in which you do not want to repeat yourself 33:26 Reactivate calculation concept which allows you to store intermediate values in one place 37:24 Q&A 38:53 Reactive calculations and rendering functions 39:30 Side-effects or user effect. Another class of interactions 41:18 How to tell reactive effect what it should respond to or what events to watch before executing
41:53 How to update the data filter in the side-effect function 42:22 The second important pattern for shiny 43:00 One of the important things to pay attention to once you start learning/using shiny 44:45 Series of Q&A until the end of the video. Some response includes live demo 01:01:03 Gordon Shotwell ends his presentation 01:01:17 Reshama closes the session

About the Speaker#

Gordon Shotwell is a Software Engineer at Posit. He’s been using Shiny to solve business problems for the past ten years.

LinkedIn: https://www.linkedin.com/in/gshotwell/

Key Links#

Transcript: https://github.com/data-umbrella/event-transcripts/blob/main/2023/78-gordon-shiny.md
Meetup Event: https://www.meetup.com/data-umbrella/events/292848290/
Video: https://youtu.be/pXidQWYY14w

#python #deployment

posit::conf(2023) Workshop: Advanced Quarto with R + RStudio

Register now: http://pos.it/conf Instructor: Andrew Bray Workshop Duration: 1-Day Workshop

This course is for you if you: • have a basic knowledge of how to use the RStudio IDE • have experience working with single R Markdown and/or Quarto files • are excited to author multi-document projects like books, websites, and blogs

Participants who are new to computational documents will benefit from taking Intro to Quarto with R and RStudio: Documents and Presentations before joining this workshop.

This workshop will prepare you to author a rich array of documents in Quarto, the next generation of R Markdown. Quarto is an open-source scientific and technical publishing system that offers multilingual programming language support to create dynamic and static documents, books, presentations, blogs, and other online resources.

The focus for this workshop will be on projects that weave together multiple documents and allow you to write books and build websites. You will also learn various ways to deploy and publish your Quarto projects on the web

Quarto rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

posit::conf(2023) Workshop: Fundamentals of Package Development

Register now: http://pos.it/conf Instructor: Andy Teucher Workshop Duration: 1-Day Workshop

This workshop is for you if: • You have written several R scripts and find yourself wondering how to reuse or share the code you’ve written • You know how to write functions in R • You are looking for a way to take the next step in your R programming journey

We will be demonstrating some workflows using Git and GitHub. Knowledge of these tools is not required, and you will absolutely be able to complete the workshop without them, but some of the lessons will be more rewarding to you if you are prepared to try them out. If you are looking to get started with Git and GitHub, we recommend you register for the “What they forgot to teach you about R” workshop on Day 1, and join us for this workshop on Day 2.

We are often faced with the need to share our code with others, or find ourselves writing similar code over and over again across different projects. In R, the fundamental unit of reusable code is a package, containing helpful functions, documentation, and sometimes sample data. This workshop will teach you the fundamentals of package development in R, using tools and principles developed and used extensively by the tidyverse team - specifically the ‘devtools’ family of packages including usethis, testthat, and roxygen2. These packages and workflows help you focus on the contents of your package rather than the minutiae of package structure.

You will learn the structure of a package, how to organize your code, and workflows to help you develop your package iteratively. You will learn how to write good documentation so that users can learn how to use your package, and how to use automated testing to ensure it is functioning the way you expect it to, now and into the future. You will also learn how to check your package for common problems, and how to distribute your package for others to use.

This will be an interactive 1-day workshop, and we will be using the RStudio IDE to work through the materials, as it has been designed to work well with the development practices we will be featuring

devtools roxygen2 rstudio testthat tidyverse usethis Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

posit::conf(2023) Workshop: Introduction to Quarto with R + RStudio

Register now: http://pos.it/conf Instructor: Andrew Bray Workshop Duration: 1-Day Workshop

This course is for you if you: • have a basic knowledge of how to use the RStudio IDE • have some familiarity with markdown, or • are excited to author flexible single documents like technical reports and slide presentations

Seasoned users of R Markdown will get more out of the Advanced Quarto with R and RStudio: Projects, Websites, Books, and More workshop, which is focused on projects, a distinct strength of Quarto in authoring work that spans multiple documents.

This workshop will prepare you to author a rich array of documents in Quarto, the next generation of R Markdown. Quarto is an open-source scientific and technical publishing system that offers multilingual programming language support to create dynamic and static documents, books, presentations, blogs, and other online resources.

The focus for this workshop will be on single documents. You will learn to create static documents, to add interactivity to them with Shiny and htmlwidgets, or steer them in the direction of sophisticated scientific documents. In the afternoon you’ll take the same authoring approaches to create slide presentations in various formats such as reveal.js, beamer, and pptx

Quarto rstudio Shiny Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

posit::conf(2023) Workshop: Steal like an Rtist: Creative Coding in R

Register now: http://pos.it/conf Instructors: Ijeamaka Anyene Fumagalli & Sharla Gelfand Workshop Duration: 1-Day Workshop

This workshop is for you if you: • are comfortable with R and RStudio, experience with tidyverse and ggplot2 • are interested in applying data visualization skills more creatively, but may not know where to start or how to develop style/inspiration • are an artist interested in exploring code as another medium for creating their work

R is a tool for data analysis but also can be used for self-expression. This workshop will be an introduction to creative coding in R in order to make visual art. We will take an inspiration-first approach, using compelling pieces to discuss and learn the techniques that shape the work. This workshop takes guidance from its namesake, the book “Steal Like An Artist” by Austin Kleon - once we have identified and learned to recreate existing works, we will cover how to take this inspiration and transform, remix, or reinterpret it in the pursuit of developing our own work and artistic styles.

This workshop is hands-on and will cover color theory and manipulation, a reintroduction of the data frame as the foundation for creating art (instead of just for analyzing data!), using ggplot2 as an artistic canvas, creating basic and specialized shapes, tiling and pattern making, developing your own functions and using iteration. We will also discuss how to use controlled randomness to convert a standalone piece into a generative art system that can produce many distinct outputs. Creative coding may seem a world apart from data analysis, but we see a large overlap and intersection of the skills used in both, not to mention the creative muscles that are already used in data visualization

ggplot2 rstudio tidyverse Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

posit::conf(2023) Workshop: Teaching Data Science Masterclass

Register now: http://pos.it/conf Instructor: Dr. Mine Çetinkaya-Rundel Workshop Duration: 1-Day Workshop

This course is for you if you: • you want to learn / discuss curriculum, pedagogy, and computing infrastructure design for teaching data science with R and RStudio using the tidyverse and Quarto • you are interested in setting up your class in Posit Cloud • you want to integrate version control with git into your teaching and learn about tools and best practices for running your course on GitHub

This masterclass is aimed primarily at participants teaching data science in an academic setting in semester-long courses, however much of the information and tooling we introduce is applicable for shorter teaching experiences like workshops and bootcamps as well. Basic knowledge of R is assumed and familiarity with the tidyverse and Git is preferred.

There has been significant innovation in introductory statistics and data science courses to equip students with the statistical, computing, and communication skills needed for modern data analysis. Success in data science and statistics is dependent on the development of both analytical and computational skills, and the demand for educators who are proficient at teaching both these skills is growing. The goal of this masterclass is to equip educators with concrete information on content, workflows, and infrastructure for painlessly introducing modern computation with R and RStudio within a data science curriculum. In a nutshell, the day you’ll spend in this workshop will save you endless hours of solo work designing and setting up your course.

Topics will cover teaching the tidyverse in 2023, highlighting updates to R for Data Science (2nd ed) and Data Science in a Box as well as present tooling options and workflows for reproducible authoring, computing infrastructure, version control, and collaboration.

The workshop will be comprised of four modules: • Teaching data science with the tidyverse and Quarto • Teaching data science with Git and GitHub • Organizing, publishing, and sharing of course materials • Computing infrastructure for teaching data science

Throughout each module we’ll shift between the student perspective and the instructor perspective. The activities and demos will be hands-on; attendees will also have the opportunity to exchange ideas and ask questions throughout the session.

In addition to gaining technical knowledge, participants will engage in discussion around the decisions that go into developing a data science curriculum and choosing workflows and infrastructure that best support the curriculum and allow for scalability. We will also discuss best practices for configuring and deploying classroom infrastructures to support these tools

Mine Çetinkaya-Rundel

Quarto rstudio tidyverse Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

posit::conf(2023) Workshop: What They Forgot to Teach You About R

Register now: http://pos.it/conf Instructors: Shannon Pileggi and David Aja Workshop Duration: 1-Day Workshop

This course is for you if you answer yes to these questions: • Have you been using R for a while and feel there might be better ways to organize your R life, but don’t know what they are? • Do you want to put programming on pause and learn about actionable programming-adjacent workflows for streamlining analysis in R? • Are you willing to feel a bit of (git) pain to leverage the benefits of version control for collaboration and time travel?

This 1-day What They Forgot (WTF) To Teach You About R workshop is for experienced R and RStudio users who want to (re)design their R lifestyle via project-oriented workflows and version control for data science (Git/GitHub). At the conclusion of the workshop, you will have strategies for organizing data science projects and workflows, employing robust file paths, constructing human and machine-readable file names, and facilitating collaboration with yourself or others via version control

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Launch different development environments and manage cluster options with Posit Workbench

Posit Workbench: https://posit.co/products/enterprise/workbench/

Data scientists should be able to use the language and development environment they prefer.

Jupyter Notebook, JupyterLab, VS Code, and RStudio are all available development environments within Posit Workbench.

Workbench is also exceptional for managing compute resources. Use Kubernetes and Slurm and adjust the CPU and memory to match the job you’re trying to run

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Barret Schloerke: Lessons Learned Testing 2500+ Shiny Apps Every Day

About the Talk: The Shiny team tests 2500+ different combinations of Shiny Applications, R versions, and operating systems to verify no feature regressions occur within the bleeding edge of the Shiny-verse. Over the past few years, we have learned a few lessons to keep our tests robust and honest. There is a non-zero chance that a test failure is not your fault. When working with CI systems, external influences such as installation failures or slower computing power will prevent tests from executing properly. A picture is worth a thousand tests but visual testing is known to have a non-zero false-positive rate. To address this, we have utilized threshold based image comparisons for flaky tests. We have found that it is best to test the minimal amount of content where possible. Changes in App package dependencies can produce unintended changes in expected outputs. These hard learned lessons have reduced unreliable test results allowing the Shiny team to quickly respond to feature regressions.

Speaker’s bio: Dr. Barret Schloerke is a Software Engineer at Posit. He currently develops and maintains many R packages in the Shiny ecosystem at RStudio including shiny, reactlog, plumber, learnr, leaflet, and shinyloadtest. Dr. Schloerke received his PhD in Statistics from Purdue University under the direction of Dr. Ryan Hafen and Dr. William Cleveland, specializing in Large Data Visualization

Barret Schloerke, Shiny Team

Python in Posit Workbench | Launch in Your Native Development Environment

Posit Workbench makes it easier than ever to code in Python. Launch the development environment you prefer, including VS Code, Jupyter, and the RStudio IDE.

Learn more at Posit.co/workbench

Professional data scientists using both R and Python love Posit Workbench because of its enterprise-friendly benefits, including collaboration, centralized management, security, and commercial support

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Keynote: Hadley Wickham - Embracing multi-lingual data science | PyData Global 2022

www.pydata.org

RStudio recently changed its name to Posit to reflect the fact that we’re already a company that does more than just R. Come along to this talk to hear a few of the reasons that we love R, and to learn about some of the open source tools we’re working on for python.

PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.

PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.

00:00 Welcome! 00:10 Help us add time stamps or captions to this video! See the description for details.

Want to help add timestamps to our YouTube videos to help with discoverability? Find out more here: https://github.com/numfocus/YouTubeVideoTimestamps

Hadley Wickham

Posit Workbench | Build Data Products in R & Python Using Jupyter, VSCode, and RStudio IDE

More computing power. More possibilities. Less hassle.

Posit Workbench is the preferred development environment for R and Python developers.

Code in R. Code in Python. Develop in Jupyter, VSCode, and of course, the RStudio IDE.

Build all of the data products you can imagine: Reports, Dashboards, Applications, and APIs, with more computing power and no additional strain on IT.

For more information: Posit.co/products/enterprise/workbench

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Hadley Wickham: Thank you, from Open Source at Posit

The thing that’s made RStudio (now Posit) so amazing, is the community. We love and value our open source community, and we’re so appreciative of everyone who has helped make Posit what it is today.

And our community, is… YOU.

You live and work on every continent.

You use R for school or work or fun; answering questions that are important to you.

You create and share thousands of resources to help others.

You contribute to code and documentation on hundreds of open source projects.

You’re involved in the community, sharing with each other in organizations and user groups around the world.

And, of course, you write millions of lines of code.

We thank you, and wish you all the best in the coming year!

Hadley Wickham

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

R at AstraZeneca: upskilling our workforce through education, experience, and exposure

We were joined on November 29th at 12PM EST by Gabriella Rustici & Guillaume Desachy, who shared their experience about the R journey AstraZeneca is currently on.

Resources: ⬢ R @ AZ: Building a Community in the Pharmaceutical Industry Blog Post: https://www.rstudio.com/blog/building-a-community-in-the-pharmaceutical-industry/ ⬢ R in Pharma YouTube videos: https://www.youtube.com/c/RinPharma ⬢ Posit Pharma Site: https://posit.co/solutions/pharma/

Timestamps: 4:53 - Start of session 5:41 - Paradigm shift in the pharmaceutical industry (many people are multilingual) 6:40 - Profile of R users at AstraZeneca (varied across data science, clinicians, medical director) 8:28 - Meet the R&D Learning & Development Team 10:03 - 3E Framework: Education, Exposure, Experience 12:11 - Bridging the science community and data science audience 13:11 - We all learn differently (solutions that suit different needs & styles) 17:31 - Index of learning (self-led index, synchronicity index) 12:20 - Experiential Learning 20:47 - The community of R users at AstraZeneca 21:05 - The early days (April 2021) 21:52 - azTidyTuesday: a playground to hone data viz skills 24:15 - internal R conference 25:38 - R function of the month 27:04 - Lunch & LeaRn 28:20 - R @ AZ 10:1 29:44 - Communication expanded from internal social media to R @ AZ Monthly Newsletter 31:57 - Workshops with Posit 32:11 - AZRHotdesk - come with your questions and someone will help you solve it 35:02 - Wish list for 2023 38:38 - Start of Q&A section

Abstract: The use of R continues to become more and more important at AstraZeneca. It is a true paradigm shift that we have embarked on! This shift has required upskilling our workforce to make them proficient R users.

To do so, we are leveraging the 3Es of learning: education, experience and exposure.

Learn more about their team’s Data Science Educational Program and how the team at AstraZeneca has built their own strong community of R users - where learning takes place through experience and exposure.

Speaker bios: Gabriella is Data Science Learning Senior Director in Astrazeneca’s R&D Data Science & AI where she is responsible for developing a strategy for, and creating a centralised approach to, data science learning for R&D. Gabriella completed her PhD at the Wellcome Sanger Institute and previously run bioinformatics training programs at the University of Cambridge and the European Bioinformatics Institute, in the UK. She is passionate about designing, implementing and evaluating effective and scalable solutions to educate scientists and data science practitioners at all career stages.

Guillaume is passionate about helping bring new medicines to patients by leveraging the power of statistics and precision medicine. Since October 2020, he has been doing so at AstraZeneca where he works as a Statistical Science Director. In addition, since March 2022, he have been leading a team of 15 collaborators focusing on building the community of R users at AstraZeneca, called R @ AZ.

During the event you can ask questions anonymously through slido here as well: rstd.io/meetup-questions

Blog post on R @ AZ, Building a Community in the Pharmaceutical Industry: https://www.rstudio.com/blog/building-a-community-in-the-pharmaceutical-industry/

Please note the recording of this session will be shared at the same YouTube Live link

Data Science in People Analytics | Led by Elizabeth Esarove, AT&T

People are the face, heart, and hands of a company. In people analytics, we analyze data to reveal actionable insights that provide evidence for decisions regarding employees, work, and business objectives. This talk will cover the use of data science for people analytics projects such as workforce planning, improving employee engagement, and retaining talent.

Speaker bio: Elizabeth Esarove is a data scientist in People Analytics at AT&T. In her role, Elizabeth is part of a larger team focused on embedding data and analytics into the root of decision-making and transforming insights into actionable solutions that improve employee outcomes and drive business value.

Timestamps: *Q&A timestamps listed further below 3:42 - Start of session 5:14 - What is People Analytics 6:26 - Opportunities for Data Science in People Analytics 7:10 - Using Predictive Models to Reduce Attrition 11:10 - Segmenting Your Population 18:55 - Communicating with Leaders 20:11 - Time Series Forecasting for Workforce Changes 24:41 - Analyzing Employee Survey Comments

Helpful Resources Below: *more follow-up to come with a Q&A blog post in the works

People Analytics Books Mentioned today: Handbook of Regression Modeling in People Analytics: with examples in R, Python and Julia by Keith McNulty https://lnkd.in/eBFgniFG Excellence in People Analytics: How to Use Workforce Data to Create Business Value by Jonathan Ferrar and David Green https://a.co/d/bJrMRuW

People analytics books shared in a previous data science hangout: Predictive HR Analytics: Mastering the HR Metric: https://a.co/d/5Hx05mw Inclusalytics - How Diversity, Equity and Inclusion Leaders Use Data to Drive Their Work: https://lnkd.in/g48tdrMu

Other links shared by Liz: Time Series Models Forecasting: Principles and Practice by Rob Hyndman and George Athanasopoulos https://otexts.com/fpp3/ Text Analytics Text Mining with R by Julia Silge & David Robinson https://lnkd.in/emawveZd

Additional resources shared: R Gov Conference: https://lnkd.in/ePfN7jru (David Meza is presenting on the RStudio (Posit) Ecosystem as a Critical Part of NASA Analytics Capabilities) People analytics for getting to the moon | Data Science Hangout with David Meza, NASA: https://lnkd.in/eDirbgCF For LATAM and Spanish Speaking people, Sergio Garcia Mora shared the R4HR community which has developed lots of free access content: https://data-4hr.com/ John Kelly IV shared the Human Resources Science LinkedIn Group: https://lnkd.in/eEMpYAfk Adrian M. Pérez shared the People Analytics Handbook: https://lnkd.in/ecsWy-dA Data Science Hangout: pos.it/dsh All upcoming #Posit community events: pos.it/community-events

Q&A Timestamps: *the following timestamps are approximate. 16:00 - What are the most important people analytics KPIs @ AT&T? Can you share how your team/HR acts on these predictions (for optimal policy) both experimentally and ethically? do you implement new policy in smaller groups? 23:00 - How have you validated the predictive models? Looking backwards, how precise were they? 25:00 - Do you work with your HRBPs to segment your population? 25:00 - What languages are you using to build your predictive models? 31:00 - Do you include demographic information (gender, race, age) in your models? 31:00 - Are your surveys anonymous? 32:00 - How would you get the ROI from HR attrition modeling? 34:00 - Are most data scientists from a Psychometrics background? 35:00 - Is there a kind of “critical mass” to apply People Analytics? (just for big companies?) 36:00 - Looking at positive / negative comments, do you quote verbatim comments in your reports? (e.g. “here is one of the very positive / very negative comments we received”) 37:00 - Do you use something like Snowflake to store and model your data? And do you deploy these models automatically or manually update them? 38:00 - R user here. How do you balance between people-ops focused analytics tools from outside vendors (often very expensive, but helpful) with custom in-house analytics (often time-consuming)? 41:00 - How much of your work is driven by HR leadership, by HR business leaders, or by the HR analytics team pushing modeling and insights to those groups? 42:00 - What was your journey into learning data science and getting into people analytics? 44:00 - Do you have a role in education business units? to improve their questions, etc.? 45:00 - What is the HR tech stack at AT&T? Does your team have a data engineer solely for people data since they’re more sensitive? 47:00 - How do you present your results? (an application, report, power point) and how important is it to learn other languages (javascript, css, sql)? If you were to start a people analytics team in a company (+1000), how do you start? 50:00 - Do you use an internal tool for surveys? Do you use thresholds to maintain anonymity? 53:00 - Does AT&T have remote workers? If so, does people analytics segment on remote vs hybrid vs on-site?

Julia Silge

RStudio is now Posit!

Our name changed. Our mission stayed the same.

Everyone in the R community has contributed to the success of open source and the journey was made richer because of supporters like you. You have challenged and inspired us to create products that enable data scientists to generate knowledge and use that knowledge to create a lasting impact in their communities and organizations.

Outside of the powerful tools we have created together, we are proud to be part of such a passionate, supportive, and diverse community of users from around the world.

Over the last few years, we ruminated on how we can better enable and support the changing needs of our customers and community. The result is our rebrand to Posit.

We are excited at the chance to bring what we all love so much about the R community to everyone.

Finally, our new website posit.co is now live. Please check out our new home and let us know what you think!

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Open Source Chat - {gt} with Rich Iannone

Join Rich Iannone, maintainer of the {gt} package, as he takes questions from the community about the latest in {gt} v0.7.0, and building great looking data display tables with R.

Key Resources: ⬡ Get started with {gt} - https://gt.rstudio.com

Reach out: 38:48 - How do I ask Rich about {gt}, feature requests, bug reports, how to solve a problem via {gt}? Rich and the {gt} team would love to hear from you. ⬡ Feature requests & bug reports with GitHub Issues, https://github.com/rstudio/gt/issues ⬡ GitHub Discussions, https://github.com/rstudio/gt/discussions ⬡ Ask the community a question, https://community.rstudio.com/tag/gt ⬡ Follow {gt} on Twitter, feel free to reach out and ask questions, https://twitter.com/gt_package

Timestamps Rich Iannone Introduction. 03:52 - Why {gt}? - What does {gt} bring to the table? Why so much effort into static, data display tables? 05:50 - Why open source? Why is {gt} open source and why have you dedicated your career to develop open source software? 08:30 - {gt} v0.7.0, Tell us about those new vector formatting functions in {gt}. Why did you include them? Could you show us some examples? {gt}’s vector formatting functions help you customize the styling, look and feel of your values. Converting the output values R gives you, and making them look exactly the way you want them to can be tricky. A lot of work was put into {gt} to give nice value formatting options. You can now access all these outside of a gt table; e.g. in text, in a plot, etc. 22:35 - Could you provide an example or two with the new styling function called opt_stylize()? What kinds of tables can you make with that? Can you extend that with your own tweaks? 28:15 - Can you make your own themes and share them? “How do I create my own custom theme for my table? A theme I can share with the rest of my organization?” 31:58 - What is the distinction between tab_options and the opt_* functions? Why would a function be in opt_* and not tab_options? 34:00 - sub_values() function, to find and replace certain values in your table. 36:50 - What is the current support for latex in {gt} at the moment? “Personally, I much prefer HTML, but for scientific publications, we are asked to provide a LaTeX file.” 42:50 - “In my work, I often produce A4 output in PDF, mainly with ggplot2 content. It would be nice to be able to combine ggplot + gt tables in a similar way {patchwork} works. Having the plot and the table next to it is very useful sometimes.” 44:30 - Interactive Tables with {gt}? 47:45 - “Any plans to make applying of same style to several columns easier? Unless I’m mistaken, the locations argument of tab_style requires one to specify an individual column. See here: https://gt.rstudio.com/reference/tab_style.html#examples." Yes, supply a vector of columns or use tidyselect functions. 49:15 - “Excel output with {gt}? Would be a huge improvement. I often have to produce tabular output that can be easily reused. Usually it means Excel tables. So far I have mainly done this with Python and openpyxl or PyWin32 (through COM). A simple solution in R would be great.” 50:20 - Support for additional output formats with {gt}? Excel, PowerPoint, etc.? 50:25 - {pointplank}, a package to methodically validate your data whether in the form of data frames or as database tables., https://rich-iannone.github.io/pointblank/ . Check out the workshop materials at https://github.com/rich-iannone/pointblank-workshop 55:50 - “Are there ways to have grouped rows? I mean when repeated rows have same characters can we merge them to one?” 58:00 - “Is there an ability to add ‘battleship coordinates’ (e.g. column letters & row numbers) to a gt object? This is a standard for table across my org and I’ve been trying to figure out how to implement it.” 59:59 “Do you have suggestions or examples of building out & applying corporate formatting to gt tables (e.g. adding a company logo, company colors, etc.)?” 01:04:30 - “With PDF/LaTeX output for wide tables, it does not shrink the table.”

Rich Iannone

ggplot2 gt pointblank rstudio tidyselect Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

rstudio::conf(2022) All things R & RStudio

July 2022 in Washington DC

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

What’s New in {gt} 0.7.0?

gt 0.7.0 was just released. Rich Iannone, maintainer of gt, dives into the 7 new features added.

For more details, ⬢ Read the blog post on gt 0.7 https://www.rstudio.com/blog/all-new-things-in-gt-0-7-0/ . ⬢ Learn more about gt at https://gt.rstudio.com/ . ⬢ Follow the gt twitter account, https://twitter.com/gt_package .

00:07 The new Word table output format, .docx output. 00:34 A whole new family of vector formatting functions (vec_fmt_*()) has been added. 01:03 Table presets/themes styling with the new opt_stylize() function. 01:50 The new tab_stub_indent() for superfine control over row label indentation (in the stub) 02:26 The new fmt_duration() function for formatting of time duration values. 03:32 An upgraded gtsave() that uses {webshot2}, .png output looks better. 04:14 Accessibility enhancements for HTML table outputs

Rich Iannone

gt rstudio webshot2 Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

RStudio Pro Product Lightning Series Meetup ⚡️

Recording from our October 11th meetup: a lightning series with our RStudio Product Managers to hear what’s new, ask questions, and provide feedback.

Lightning talks: 3:16 - Sharing Internal Packages with RStudio Package Manager 18:07 - Running RStudio workloads in the Cloud with Amazon SageMaker 40:02 - Content execution in Kubernetes with RStudio Connect

Resources and links shared during the meetup: A Package Manager demo tutorial on GitHub: https://github.com/rstudio/package-manager-demo Remote API Quickstart: https://docs.rstudio.com/rspm/admin/getting-started/configuration/#quickstart-remote-cli Differences between RStudio Workbench and RStudio Workbench on SageMaker: https://docs.aws.amazon.com/sagemaker/latest/dg/rstudio.html#rstudio-differences RStudio Workbench release notes: https://www.rstudio.com/products/rstudio/release-notes/ Remote Content Execution with RStudio Connect and Kubernetes Conference talk: https://www.rstudio.com/conference/2022/talks/remote-content-execution-rstudio-connect/

Product Links: Package Manager, control and distribute packages throughout your organization: https://www.rstudio.com/products/package-manager/ Workbench, premiere development environment for data science professionals: https://www.rstudio.com/products/workbench/ Connect, easily share your insights: https://www.rstudio.com/products/connect/

Timestamps: 22:43 - Demo of RStudio Workbench on Amazon SageMaker

Sharing Internal Packages with RStudio Package Manager | Presented by Joe Roberts

Many know that RSPM can be used to mirror public packages from CRAN or PyPI, but it can also be used to share your private, internally developed packages. We’ll explore the latest features to make internal packages easier to deploy within your organization.

Running RStudio workloads in the Cloud with Amazon SageMaker | Presented by James Blair

RStudio Workbench on SageMaker enables users to “right-size” their environment for any given analysis. We’ll showcase how this flexibility enables users to effectively meet the workload demands of various analyses.

Content execution in Kubernetes with RStudio Connect | Presented by Kelly O’Briant

New and interesting ways to configure RStudio Connect: A quick introduction to off-host content execution in Kubernetes

We’re looking forward to learning from you and hearing your feedback as well!

Meetup recordings are always shared here: https://www.youtube.com/playlist?list=PL9HYL-VRX0oRKK9ByULWulAOO5jN70eXv

MLOps with vetiver in Python and R | Led by Julia Silge & Isabel Zimmerman

Many data scientists understand what goes into training a machine learning model, but creating a strategy to deploy and maintain that model can be daunting. In this meetup, learn what MLOps is, what principles can be used to create a practical MLOps strategy, and what kinds of tasks and components are involved. See how to get started with vetiver, a framework for MLOps tasks in R and Python that provides fluent tooling to version, deploy, and monitor your models.

Blog Post with Q&A: https://www.rstudio.com/blog/vetiver-answering-your-questions/

For folks interested in seeing what data artifacts look like on Connect, we have these for R: ⬢ Versioned model object: https://colorado.rstudio.com/rsc/seattle-housing-pin/ ⬢ Deployed API: https://colorado.rstudio.com/rsc/seattle-housing/ ⬢ Monitoring dashboard: https://colorado.rstudio.com/rsc/seattle-housing-dashboard/ ⬢ Create a custom yardstick metric: https://juliasilge.com/blog/nyc-airbnb/ ⬢ End point used in the demo: https://colorado.rstudio.com/rsc/scooby

Our team’s reading list (mentioned in the meetup)

Books: ⬢ Designing Machine Learning Systems by Chip Huyen: https://www.oreilly.com/library/view/designing-machine-learning/9781098107956/

Articles: ⬢ “Machine Learning Operations (MLOps): Overview, Definition, and Architecture” by Kreuzberger et al: https://arxiv.org/abs/2205.02302 ⬢ “From Concept Drift to Model Degradation: An Overview on Performance-Aware Drift Detectors” by Bayram et al: https://arxiv.org/abs/2203.11070 ⬢ “Towards Observability for Production Machine Learning Pipelines” by Shankar et al: https://arxiv.org/pdf/2108.13557.pdf ⬢ “The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction” by Breck et al: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/aad9f93b86b7addfea4c419b9100c6cdd26cacea.pdf

Web content: ⬢ How ML Breaks: A Decade of Outages for One Large ML Pipeline by Papasian and Underwood: https://www.youtube.com/watch?v=hBMHohkRgAA ⬢ MLOps Principles by INNOQ: https://ml-ops.org/content/mlops-principles ⬢ Google’s Practitioners Guide to MLOps by Salama et al: https://services.google.com/fh/files/misc/practitioners_guide_to_mlops_whitepaper.pdf ⬢ Gently Down the Stream by Mitch Seymour: https://www.gentlydownthe.stream/

Speaker bios: Julia Silge is a software engineer at RStudio focusing on open source MLOps tools, as well as an author and international keynote speaker. Julia loves making beautiful charts, Jane Austen, and her two cats.

Isabel Zimmerman is also a software engineer on the open source team at RStudio, where she works on building MLOps frameworks. When she’s not geeking out over new data science techniques, she can be found hanging out with her dog or watching Marvel movies

Isabel Zimmerman, Julia Silge

Data Science Hangout | Tiger Tang, CARFAX | Quantifying the Hours Saved

We were joined by Tiger Tang, Manager, Data Science at CARFAX. Tiger (Chongtai) Tang is dedicated to building the Data Science team specializing in NLP and forecasting. He is a big fan of Shiny and he has a passion for the Data Science community.

We recommend checking out Tiger’s 2022 RStudio Conference Talk as well, “Saving 1000 hours with RStudio:” https://www.rstudio.com/conference/2022/talks/saving-1000-hours-rstudio/

How do you sell RStudio in your workplace? ️ Build a work automation process

How do you build an automation process? 1️⃣ Look at a typical report & identify all the manual steps 2️⃣ These manual steps can usually be tied into 3 portions: getting the data, wrangling/analysis/visualization, and communication 3️⃣ You can replace these portions with R code, with the help of various R packages

What are the three types of automation? 1️⃣ Attended automation - reports that still need human involvement: use R code in RStudio 2️⃣ Unattended automation - don’t need human input, but need to happen at the same time: use RStudio + RStudio Connect 3️⃣ Hybrid - combination of the previous 2, human input will come from the stakeholder & they can kick off processes to get answers: use Shiny + RStudio Connect

Ok, time to sell it to decision makers

What are the benefits? Reproducibility Less human error Cost benefit (hours saved)

Why weren’t they interested when you shared these?

The benefits listed above are great for selling to an R user who is concerned about the day to day workflow, but not decision makers who are more concerned about ROI.

Update your strategy in highlighting the benefits: Cost benefit (hours saved) - If we go through with this we might be able to save 1,000 hours per year Less human error Reproducibility (as a free add-on travel insurance)

It’s still the same benefits, just in a slightly different order. Start with one that does not require too much context to understand.

What do you need to do the actual automation?

Understand the current process and document it. This includes understanding: ⬢ the business reasons ⬢ the occurrence (daily, weekly, ad-hoc) ⬢ how you will communicate ⬢ how often to update the process so it will not become obsolete ⬢ you are not always the original report owner, so you need to know when to stop and call for additional help.

From this you will understand the complexity, impact, and stability & can help you decide which project to start with as well.

Top 3 recommendations for automation: 1️⃣ Always start with components: For ex, if you have a process that involves: SQL, Excel, and Outlook - code them individually because the same team will need this again and you can reuse the code.

2️⃣ Test, Test, Test: Capture all the scenarios possible.

3️⃣ Be practical and stay on target: Not everything needs to be fully automated. It’s not about building something cool with R but building something impactful with R.

Structure for you:

1️⃣ Identify tasks in your workplace 2️⃣ Build a proposal with the benefits that matter to decision makers in your workplace 3️⃣ Build a requirement doc that identifies the right task to start with 4️⃣ Code by component and do plenty of tasks, while staying on target 5️⃣ Share the progress from time to time

Where to find more?

► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu ► Data Science Hangout site: rstudio.com/data-science-hangout ► Add the Data Science Hangout to your calendar: rstd.io/datasciencehangout

Follow Us Here: Website: https://www.rstudio.com LinkedIn:https://www.linkedin.com/company/rstudio Twitter: https://twitter.com/rstudio

RStudio Sports Analytics Meetup: NFL Big Data Bowl 2022 Winners discuss the Math behind the Path

Led by Robyn Ritchie, Brendan Kumagai, Ryker Moreau, Elijah Cavan

Helpful Links: Kaggle: https://www.kaggle.com/code/robynritchie/punt-returns-using-the-math-to-find-the-path Github: https://github.com/ritchi12/punt_returns_using_the_math_to_find_the_path Optimal Path Generator Shiny app: https://rstd.io/meetup-shiny-app Related blog post, Tips for Getting Started With the NFL Big Data Bowl: https://www.rstudio.com/blog/tips-for-getting-started-with-the-nfl-big-data-bowl/

Timestamps: 3:02 - Start of presentation 4:07 - Meet the team 5:25 - Quick overview of 2022 NFL Big Data Bowl subject 7:00 - Finding the best path to the end zone 9:12 - Penalized Expected Arrival Time (PEAT) 13:10 - Take the PEAT to find the safest route (A-star algorithm) 15:11 - Comparing the optimal path to the observed path 18:30 - Key R functions for this project 23:37 - Optimal Path Generator Shiny app demo 33:38 - Comparing optimal paths (risk/reward) 35:44 - “Aiming for the side won’t provide” 37:54 - The precision of a good decision 38:38 - How do you have a good submission for the big data bowl 40:40 - Elements of a good big data bowl submission

Talk Abstract: Simon Fraser University has been a force in the NFL’s Big Data Bowl for years. Focusing on punt returns, this team used tracking data to develop an algorithm to find the optimal path for a successful return, quantify player formations on the field and predict yards remaining on any frame of the return. This year’s winning team will discuss their entry, the R code behind the optimal path and advice for this year’s entrants.

Speaker Bio: Robyn, Brendan, Ryker and Elijah are all graduate students in statistics at Simon Fraser University in Vancouver, Canada. They each have a passion for sports analytics and enjoy applying their stats knowledge to the game.

Q&A Timestamps: *rough estimate :) 15:00 - Do you think being in a general Statistics graduate program was beneficial to your work? As opposed to more specific like “Data Science” or “Sports Analytics”? 19:00 - Did you calculate PEAT at just a single point in time (at catch), or at multiple points during the return? 30:00 - Is it possible to share the code for future reference? 43:00 - What were the biggest challenges when going through the data? 45:00 - In terms of all your data inputs, do you see this extending to performance data like expected speed, fatigue etc., perhaps from practice etc.? 46:00 - At any point during the project, did you feel like you had hit a cul-de-sac or impasse? 48:20 - How do you get started working with player tracking data? 56:00 - Sounds like you were running the code locally? Were there any specific roadblocks to offloading that to the cloud, or was that just not really necessary?

Upcoming community events: rstd.io/community-events Feedback & Suggestions: rstd.io/meetup-feedback

Posit Meetup | Afshin Mashadi-Hossein, Bristol Myers Squibb | Framework for Data Collaboration

Led by Afshin Mashadi-Hossein, Sr Principal Scientist at Bristol Myers Squibb

Github link: https://github.com/amashadihossein/daapr Other pharma use cases: rstudio.com/champion/life-science RStudio for Clinical Reporting: rstudio.com/solutions/pharma Chat with RStudio: rstd.io/chat-with-rstudio

Abstract: For data science teams, data preparation takes substantial investment of time, data science expertise and subject matter proficiency. However, as the name implies, data preparation is typically viewed merely as a means to an end, encouraging creation of expensive but often single-use and fragile elements in data analysis workflows.

Rather than seeing data preparation as an obstacle to be removed, we propose a framework that recognizes the time and expertise invested in data preparation and seeks to maximize the value that can be derived from it.

Viewing analysis-ready data as a multi-purpose, modularly built product that should lend itself to collaborative development and maintenance, the framework of Data-as-a-Product (DaaP) aims to remove barriers to version tracking and collaborative data development and maintenance. Specifically, the framework, which is entirely implemented in R, enables joint code and data versioning based on git, standardizes metadata capture, tracks R packages used, and encourages best practices such as adherence to functional programming and use of data testing.

Collectively, the patterns established by the DaaP framework can help data science teams transition from developing expensive, single-use “wrangled” datasets to building maintainable, version-controlled, and extendable data products that could serve as reliable components of their data analyses workflows.

Bio: Afshin is a data scientist who is passionate about putting engineering and computational tools to work to realize the potential of biomedical data in service to human health

Intro to Functional Data Analysis - Part 2 | Matthew Malloure, Dow Chemical

RStudio Meetup: Functional Data Analysis (Part 2) Led by Matthew Malloure, Dow Chemical

Link to slides: https://github.com/MatthewMalloure/RStudioMeetup_FDA

Intro to Functional Data Analysis (Part 1) https://youtu.be/nA9fVOCD8yM

Timestamps: 3:00 - Start of presentation 7:50 - Recap, what is functional data analysis (FDA)? 12:00 - Why do we need FDA? 16:40 - Initial step in FDA applications - smoothing 22:21 - What made you first interested in FDA? 25:38 - What is your decision framework for which basis function to choose? 27:50 - Screening additives compared to control 41:27 - Specific FDA problems can pop up at work and I’m not sure which analysis is right. Is there a recommended resources for selecting approaches? 44:40 - Can you define “functional” as it is used in the FDA context? 49:30 - Functional regression 58:50 - Alternative modeling approaches

Abstract: The primary purpose of this presentation is to continue the introduction of functional data analysis methods that was kicked-off during the March RStudio Energy Meetup by Santiago Rodriguez. During that session’s Q&A, two specific methods were frequently mentioned: Functional Principal Components Analysis (FPCA) and Functional Regression. In this talk, both methods are introduced in more detail, applied to a simulated example from the chemical industry, and compared to their univariate/multivariate analogues. Though no code will be shown in the presentation, commented R code used to produce all data, analyses, and figures will be provided.

Speaker Bio: Matt is an associate research data scientist supporting new product development within the Packaging and Specialty Plastics business at The Dow Chemical Company. His specialty areas include functional data analysis, Bayesian hypothesis testing, computational statistics, and experimental design. Prior to joining Dow he earned a BS in Statistics and MS in Biostatistics at Grand Valley State University and a PhD in Statistics from Texas A&M University

Welcome to Quarto Workshop! | Led by Tom Mock, RStudio

Welcome to Quarto 2-hour Workshop | Led by Tom Mock, RStudio

Content website: https://jthomasmock.github.io/quarto-2hr-webinar/ FULL Workshop Materials (this was from a 2-day workshop): rstd.io/get-started-quarto Other upcoming live events: rstd.io/community-events

Double-check: Are you on the latest version of RStudio i.e. v2022.07.1 or later?

Packages used: tidyverse, gt, gtExtras, reactable, ggiraph, here, quarto, rmarkdown, gtsummary, palmerpenguins, fs, skimr

️ Pre-built RStudio Cloud with workshop materials already installed: https://rstudio.cloud/content/4332583

For follow-up questions, please use: community.rstudio.com/tag/quarto

Timestamps: 7:16 - What is Quarto? 8:28 - How does R Markdown work? 9:40: Quarto, more than just knitr 13:56 - Quarto can support htmlwidgets in R and Jupyter widgets for Python/Julia 14:18 - Native support for Observable Javascript 19:28 - Quarto in your own workspace (Jupyter Lab, VSCode, RStudio) 20:26 - RStudio Visual Editor mode 23:30 - VS Code YAML 26:02 - Quarto for collaboration 26:55 - How do you publish Quarto? (Quarto Pub, GitHub Pages, RStudio Connect, Netlify) 28:44 - What about Data Science at Work? 29:59 - Formats baked into Quarto (basic formats, beamer, ppt, html slides, advanced layout, cross references, websites, blogs, books, interactivity) 32:13 - What to do with my existing .Rmd or .ipynb? 33:16 - Why Quarto, instead of R Markdown? 40:50 - Text Formatting 41:30 - Headings 41:51 - Code (also merging R and Python in one document) 43:29 - What about the CLI? 44:55 - Navigating in the terminal 57:56 - PART 2: Authoring Quarto 1:00:22 - Output options 1:04:46 - Quarto workflow 1:12:06 - Quarto YAML intelligence 1:13:20 - Divs and Spans 1:22:13 - Figure layout 1:34:40 - Code chunk options 1:41:00 - Quarto and R Markdown (converting R Markdown to Quarto)

This 2-hour virtual session is designed for those who have no or little prior experience with R Markdown and who want to learn Quarto.

Want to get started with Quarto?

Install RStudio v2022.07.1 from https://www.rstudio.com/products/rstudio/download/#download - this will come with a working version of Quarto!
Webinar materials/slides: https://jthomasmock.github.io/quarto-2hr-webinar/
Workshop materials on RStudio Cloud: https://rstudio.cloud/content/4332583

What is Quarto?

Quarto is the next generation of R Markdown for publishing, including dynamic and static documents and multi-lingual programming language support. With Quarto you can create documents, books, presentations, blogs or other online resources.

Should I take this?

As with all the community meetups, everyone is welcome. This will be especially interesting to you if you have experience programming in R and want to learn how to take advantage of Quarto for literate data science programming in academia, science, and industry.

This workshop will be appropriate for attendees who answer yes to these questions:

Have you programmed in R and want to better encapsulate your code, documentation, and outputs in a cohesive “data product”? Do you want to learn about the next generation of R Markdown for data science? Do you want to have a better interactive experience when writing technical or scientific documents with literate programming?

For more info on Quarto: quarto.org

A Beginner’s Guide to Shiny for Python || Winston Chang || Posit

Shiny makes it easy to build interactive web applications with the power of Python’s data and scientific stack.

Learn more about Shiny for Python: https://shiny.rstudio.com/py/ Check out our interactive Shiny for Python examples: https://shinylive.io/py/examples/

Content: Winston Chang (@winston_chang) Producer: Jesse Mostipak (@kierisi) Editing and Motion Design: Tony Pelleriti (@TonyPelleriti)

Winston Chang

rstudio Shiny for Python Shiny shinylive Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Rstats Open Source OSS Reticulate Shiny for Python Pyshiny Wasm Posit Winston Chang Tony Pelleriti

An Interview with Winston Chang: Building a Wordle App with Shiny for Python || RStudio

Shiny makes it easy to build interactive web applications with the power of Python’s data and scientific stack.

Learn more about Shiny for Python: https://shiny.rstudio.com/py/ Check out our interactive Shiny for Python examples: https://shinylive.io/py/examples/

Content: Winston Chang (@winston_chang) + Jesse Mostipak (@kierisi) Producer: Jesse Mostipak (@kierisi) Editing and Motion Design: Tony Pelleriti (@TonyPelleriti)

Winston Chang

rstudio Shiny for Python Shiny shinylive Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Rstats Open Source OSS Reticulate Shiny for Python Pyshiny Wordle Winston Chang Jesse Mostipak Tony Pelleriti Posit

Data visualization and plotting with Shiny for Python || Carson Sievert || RStudio

Shiny makes it easy to build interactive web applications with the power of Python’s data and scientific stack.

Learn more about Shiny for Python: https://shiny.rstudio.com/py/ Check out our interactive Shiny for Python examples: https://shinylive.io/py/examples/

Content: Carson Sievert (@cpsievert) Producer: Jesse Mostipak (@kierisi) Editing and Motion Design: Tony Pelleriti (@TonyPelleriti)

Carson Sievert

rstudio Shiny for Python Shiny shinylive Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Rstats Open Source OSS Reticulate Shiny for Python Pyshiny Carson Sievert Tony Pelleriti Posit

Getting Started with {shinytest2} Part 2 || Exporting values || RStudio

00:00 Introduction 00:29 Exporting reactives 03:28 Using exportTestValues()

Part 1 - Getting started: https://youtu.be/SS1Na3c8lhk Part 3 - Using shiny.testmode: https://youtu.be/xDxa_mDwN04

Manually testing Shiny applications is often laborious, inconsistent, and doesn’t scale well. Whether you are developing new features, fixing bug(s), or simply upgrading dependencies on a serious app where mistakes have real consequences, it is critical to know when regressions are introduced. shinytest2 provides a streamlined toolkit for unit testing Shiny applications and seamlessly integrates with the popular testthat framework for unit testing R code.

shinytest2 uses chromote to render applications in a headless Chrome browser. chromote allows for a live preview, better debugging tools, and/or simply using modern JavaScript/CSS.

By simply recording your actions as code and extending them to test the more particular aspects of your application, it will result in fewer bugs and more confidence in future Shiny application development.

Read up on shinytest2 here: https://rstudio.github.io/shinytest2/

Learn more about Shiny here: https://shiny.rstudio.com/

Got questions? The RStudio Community site is a great place to get assistance: https://community.rstudio.com/

Content: Barret Schloerke (@schloerke) Motion design and editing: Jesse Mostipak (@kierisi)

Theme song: Brad PKL by Blue Dot Sessions (https://app.sessions.blue/browse/track/113507 )

Barret Schloerke

chromote rstudio Shiny shinytest2 testthat Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate Shinytest2 Barret Schloerke

Getting Started with {shinytest2} Part 3 || Using shiny.testmode in {shinytest2} || RStudio

00:00 Introduction 00:15 Testing production apps

Part 1 - Getting started: https://youtu.be/SS1Na3c8lhk Part 2 - Exporting values: https://youtu.be/7KLv6HdIxvU

Manually testing Shiny applications is often laborious, inconsistent, and doesn’t scale well. Whether you are developing new features, fixing bug(s), or simply upgrading dependencies on a serious app where mistakes have real consequences, it is critical to know when regressions are introduced. shinytest2 provides a streamlined toolkit for unit testing Shiny applications and seamlessly integrates with the popular testthat framework for unit testing R code.

shinytest2 uses chromote to render applications in a headless Chrome browser. chromote allows for a live preview, better debugging tools, and/or simply using modern JavaScript/CSS.

By simply recording your actions as code and extending them to test the more particular aspects of your application, it will result in fewer bugs and more confidence in future Shiny application development.

Read up on shinytest2 here: https://rstudio.github.io/shinytest2/

Learn more about Shiny here: https://shiny.rstudio.com/

Got questions? The RStudio Community site is a great place to get assistance: https://community.rstudio.com/

Content: Barret Schloerke (@schloerke) Motion design and editing: Jesse Mostipak (@kierisi)

Theme song: Brad PKL by Blue Dot Sessions (https://app.sessions.blue/browse/track/113507 )

Barret Schloerke

chromote rstudio Shiny shinytest2 testthat Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate Barret Schloerke Shinytest2

Getting Started with {shinytest2} Part I || Example + basics || RStudio

00:00 Introduction 00:48 Overview of the demo Shiny app 03:00 Running record_test() 04:44 Results from record_test() 07:18 A note on .png files created during testing 08:52 Debugging with shinytest2 09:32 Using app$view() to open a visual representation of a headless browser

Part 2 - Exporting values: https://youtu.be/7KLv6HdIxvU Part 3 - Using shiny.testmode: https://youtu.be/xDxa_mDwN04

Manually testing Shiny applications is often laborious, inconsistent, and doesn’t scale well. Whether you are developing new features, fixing bug(s), or simply upgrading dependencies on a serious app where mistakes have real consequences, it is critical to know when regressions are introduced. shinytest2 provides a streamlined toolkit for unit testing Shiny applications and seamlessly integrates with the popular testthat framework for unit testing R code.

shinytest2 uses chromote to render applications in a headless Chrome browser. chromote allows for a live preview, better debugging tools, and/or simply using modern JavaScript/CSS.

By simply recording your actions as code and extending them to test the more particular aspects of your application, it will result in fewer bugs and more confidence in future Shiny application development.

Read up on shinytest2 here: https://rstudio.github.io/shinytest2/

Learn more about Shiny here: https://shiny.rstudio.com/

Got questions? The RStudio Community site is a great place to get assistance: https://community.rstudio.com/

Content: Barret Schloerke (@schloerke) Motion design and editing: Jesse Mostipak (@kierisi)

Theme song: Brad PKL by Blue Dot Sessions (https://app.sessions.blue/browse/track/113507 )

Barret Schloerke

chromote rstudio Shiny shinytest2 testthat Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate Shinytest2 Barret Schloerke

Hello, World! A Quick Tour of Shiny for Python || Carson Sievert || Posit

Shiny makes it easy to build interactive web applications with the power of Python’s data and scientific stack.

Learn more about Shiny for Python: https://shiny.rstudio.com/py/ Check out our interactive Shiny for Python examples: https://shinylive.io/py/examples/

Content: Carson Sievert (@cpsievert) Producer: Jesse Mostipak (@kierisi) Editing and Motion Design: Tony Pelleriti (@TonyPelleriti)

Carson Sievert

rstudio Shiny for Python Shiny shinylive Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Rstats Open Source OSS Reticulate Shiny for Python Getting Started Carson Sievert Tony Pelleriti Posit

Wrangling data for a Shiny app in Python || Michael Chow || Posit

Shiny makes it easy to build interactive web applications with the power of Python’s data and scientific stack.

Learn more about Shiny for Python: https://shiny.rstudio.com/py/ Check out our interactive Shiny for Python examples: https://shinylive.io/py/examples/

Content: Michael Chow (@chowthedog) Producer: Jesse Mostipak (@kierisi) Editing and Motion Design: Tony Pelleriti (@TonyPelleriti)

Michael Chow

rstudio Shiny for Python Shiny shinylive Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Rstats Open Source OSS Reticulate Pyshiny Shiny for Python Jupyter Michael Chow Tony Pelleriti Posit

{gt} Table Battles || Eurovision || RStudio

00:00 Introduction 00:07 Jesse’s gt table, with a focus on flag emoji and interactivity via a Shiny app 09:50 Rich’s gt table, with a focus on CSS and embedded animations

Code: https://github.com/kierisi/rstudio_videos/tree/main/gt/table-battles

Learn more about the gt package here: https://gt.rstudio.com/

Got questions? The RStudio Community site is a great place to get assistance: https://community.rstudio.com/

Content: Rich Iannone (@riannone) & Jesse Mostipak (@kierisi) Motion Design & editing: Jesse Mostipak Music: Gemeni City by Blue Dot Sessions https://app.sessions.blue/browse/track/113567

Rich Iannone

gt rstudio Shiny Rstudio Data Science Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate Rich Iannone Jesse Mostipak Table Battles

Veerle van Leemput | Analytic Health | Optimizing Shiny for enterprise-grade apps

Can you use Shiny in production? A: Yes, you definitely can.

Link to slides: https://github.com/RStudioEnterpriseMeetup/Presentations/blob/main/VeerlevanLeemput-OptimizingShiny-20220525.pdf

Packages mentioned: ⬢ shiny: https://shiny.rstudio.com/ ⬢ pins: https://pins.rstudio.com/ ⬢ plumber: https://www.rplumber.io/ ⬢ blastula: https://github.com/rstudio/blastula ⬢ callR: https://github.com/r-lib/callr ⬢ shinyloadtest: https://rstudio.github.io/shinyloadtest/ ⬢ shinycannon: https://github.com/rstudio/shinycannon ⬢ shinytest2: https://rstudio.github.io/shinytest2/ ⬢ feather: https://github.com/wesm/feather ⬢ shinipsum: https://github.com/ThinkR-open/shinipsum ⬢ bs4Dash: https://rinterface.github.io/bs4Dash/index.html

Timestamps: 2:44 - Start of presentation 5:41 - What qualifies as an enterprise-grade app? 10:46 - UI first / user experience / prototyping 13:20 - Separating code into separate scripts and creating code that’s easy to test 17:15 - Golem 19:28 - Functionize your code 20:50 - Rhino package, framework for developing enterprise-grade apps at speed 22:33 - Infrastructure, how do you bring this to your users? (lots of ways to do this. They do this with R, pins, plumber, rmd, blastula, and Posit Connect on Azure) 31:17 - Optimizing Shiny (process configuration, cache, callR, API, feather) 47:35 - Testing your app (shinyloadtest and shinycannon) 50:23 - Testing for outcomes (shinytest2) 52:15 - Monitor app performance & usage (blastula, shinycannon, usage metrics with Shiny app)

Questions: 57:38 - What’s the benefit of using pins rather than pulling the data from your database? 59:30 - Are there package license considerations you had to think about when monetizing shiny applications? 1:00:45 - Do you use promises to scale the application? (they use CallR) 1:01:49 - For beginners, golem or rhino? 1:02:50 - The myth is that only Python can be used for production apps, what made you choose to use R? 1:05:12 - Is feather strictly better than using JSON? 1:06:38 - Where do you see the line between BI (business intelligence) and Shiny for your applications? 1:08:36 - Any tips for enterprise-grade UI development? Making beautiful apps (bs4Dash app) 1:10:25 - Have you found an upper limit for users? 1:12:19 - Any tips for more dynamic data? (optimizing database helps here) 1:13:50 - Where do you install shinycannon? (on our development Linux server) 1:15:00 - Can you share other resources or examples of code? (Slides here with resources: https://github.com/RStudioEnterpriseMeetup/Presentations/blob/main/VeerlevanLeemput-OptimizingShiny-20220525.pdf )

For upcoming events: rstd.io/community-events-calendar Info on Posit Connect: https://www.rstudio.com/products/connect/ To chat with Posit: rstd.io/chat-with-rstudio

{gt} Table Battles || Crosswords || RStudio

00:00 Introduction 00:34 Rich’s gt table, with a focus on creating audio within a table 07:28 Jesse’s gt table, with a focus on sentiment analysis

Learn more about the gt package here: https://gt.rstudio.com/

Got questions? The RStudio Community site is a great place to get assistance: https://community.rstudio.com/

Content: Rich Iannone (@riannone) & Jesse Mostipak (@kierisi) Motion Design & editing: Jesse Mostipak Music: Nu Fornacis by Blue Dot Sessions https://app.sessions.blue/browse/track/98983

Rich Iannone

gt rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate Sentiment Analysis Audio Rich Iannone Jesse Mostipak

Katie Masiello || Build a Codenames app using {pins} and Shiny! || RStudio

00:00 Introduction 00:05 Project outline 03:56 Create a codename generator (using RMarkdown) 09:35 Publish to RStudio Connect 10:38 Create a Shiny app 18:15 A little bit of troubleshooting 18:18 Ta-da!

Learn more about the pins package here: https://pins.rstudio.com/ Learn more about Shiny here: https://shiny.rstudio.com/ And learn more about RStudio Connect here: https://www.rstudio.com/products/connect/

Got questions? The RStudio Community site is a great place to get assistance: https://community.rstudio.com/

Content: Katie Masiello (@katieontheridge) Animation, motion design, and editing: Jesse Mostipak (@kierisi)

Theme song: Contrarian by Blue Dot Sessions (https://app.sessions.blue/browse/track/64281 )

rmarkdown rstudio Shiny Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate Pins Codenames Katie Masiello

Posit Pharma Meetup: R for Clinical Study Reports & Submission | Yilong Zhang

Presenter: Yilong Zhang, PhD

During the meetup you can ask questions here: rstd.io/pharma-meetup-questions (anonymously or with your name)

Abstract: The use of open-source R is evolving in drug discovery, research, and development for study design, data analysis, visualization, and report generation in the pharmaceutical industry. It is critical to enable the ability to produce tables, listings, and figures (TLFs) and submit results to the regulatory agency using R. We developed R packages (r2rtf, pkglite) and reference book (https://r4csr.org/ ) to simplify the workflow for an organization to complete those tasks. Based on the proposed workflow, the first pilot project has been successfully submitted to FDA by the R consortium R submission working group (https://rconsortium.github.io/submissions-wg/) .

Bio: Yilong Zhang is a biostatistician at Meta (and was previously at Merck). He has worked with a group of statisticians and programmers to demonstrate the capability of using R for regulatory work. Other research interest include statistical methods in study design, missing data, and survival analysis with more than 25 papers published in peer-reviewed journals. Before joining Merck, he earned Ph.D. degree in Biostatistics at New York University.

Helpful links: Use cases & insights from pharmaceutical industry leaders: https://www.rstudio.com/champion/life-science R/Pharma Conference: https://rinpharma.com/ R Validation Hub: https://www.pharmar.org/ Link to speak with RStudio: rstd.io/chat-with-rstudio Link to slides: rstd.io/pharma-meetup-slides

Data Science Hangout | Michael Chow, Posit | Exploring Team Structure w/ Data Scientists & Engineers

We were joined by Michael Chow, Data Scientist and Software Engineer at RStudio. Michael also previously led a team at the California Integrated Travel Project.

On this week’s hangout there were a lot of thoughts shared on structuring a data science team from both Michael and the broader group:

⬢ Jacqueline Nolis also shared thoughts on this on a data science hangout that there were virtues to different ones, but ended up sold on the decentralized model where data scientists are embedded in teams: https://youtu.be/CcPE29bYGVo?t=325

⬢ Michael agreed that data scientists and analysts should be sitting with the teams that they’re pushing out reports for. Otherwise, I would be trying to send people into those teams to figure out their priorities.

⬢ A data scientist should work with a Project Manager or whoever’s leading the team to push up metrics but also help change the roadmap.

⬢ It leaves a tricky question of where data engineers should be and how they should interact with the team. Today data engineers are often doing more tooling empowerment, so it can be okay to have them a bit more centralized and connect to the data scientists to enforce best practices or enable new pieces for them.

⬢ I think a nice model is for data scientists/analysts to live in the teams and data engineers to be like spokes of a wheel where then the data scientists connect with them and work closely to enforce better best practice and enable new important things.

⬢ Tatsu shared that in thinking of the structure, it’s also important to find your translators and to use the power of feedback. Reach out to those people to start to put that feedback into action.

⬢ George shared that insurance companies have come from a really traditional landscape where they have lots of actuaries working on lots of excel spreadsheets and there can be a lack of knowledge sharing and tool sharing. This is where the data science element comes in. To me, within the organization, you need to have this team which is a mini-spoke if you will, because they are central to the actuarial team. If they are too far removed and they’re back with the IT team, you end up with the old problems because they may not get the business concept communicated back. It’s all about getting enough skills, so they can get stuff done, especially proof of concepts. Maybe after that you can take a step back and then start to look at the centralized model again.

⬢ A central team can help converge to what they see as best practice, but if you’re pushing out something new, exploring a new line of work or area it can be important to set the data engineer there to actually do whatever they need to. Make sure that the converging doesn’t stifle creativity or prevent a team from doing the right thing.

⬢ Manny jumped in to share the perspective from data science being with IT as well, data science is a new field for their company (in real estate) and there’s an identity of where does data science fall. The IT team is fantastic and they’re very structured. Data science is so fluid and creative and non structured at the moment, so you kind of have to look at where it actually should fall.

please note that some of the points above are summarized and not 100% actual quotes.

Resources shared:

⬢ Tatsu shared in the chat, a few projects that Michael is working on: vetiver: https://vetiver.tidymodels.org/articles/vetiver.html , siuba: https://github.com/machow/siuba ⬢ Libby shared a helpful tip on creating a 2 minutes YouTube video with a cover letter, to get the attention of a hiring manager ⬢ Javier shared an example Shiny app used in an interview: https://javierorraca.shinyapps.io/Bloomreach_Shiny_App/ ⬢ Michael mentioned David Robinson’s screencasts: https://www.youtube.com/channel/UCeiiqmVK07qhY-wvg3IZiZQ ⬢ Michael mentioned an article on “What data scientists really do according to 35 data scientists”: https://hbr.org/2018/08/what-data-scientists-really-do-according-to-35-data-scientists ⬢ Rachael shared a blog post link where Jacqueline Nolis talked about team structure as well: https://www.rstudio.com/blog/building-effective-data-science-team-answering-your-questions/#Structure

► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu ► Add the Data Science Hangout to your calendar: rstd.io/datasciencehangout ► View the Data Science Hangout site here: rstudio.com/data-science-hangout

Follow Us Here: Website: https://www.rstudio.com LinkedIn:https://www.linkedin.com/company/rstudio-pbc Twitter: https://twitter.com/rstudio

Michael Chow

{gt} Table Battles || Digital Publications || RStudio

00:00 Introduction 00:32 Jesse’s gt table, with a focus on changing background cell color 07:11 Rich’s gt table, which uses three different tables to create a fixed-size scrollable gt table

You can find the code for each table here: https://github.com/kierisi/rstudio_videos/tree/main/gt/table-battles/01_round-01_digital-publications

Learn more about the gt package here: https://gt.rstudio.com/

Got questions? The RStudio Community site is a great place to get assistance: https://community.rstudio.com/

Content: Rich Iannone (@riannone) & Jesse Mostipak (@kierisi) Motion Design & editing: Jesse Mostipak Music: Nu Fornacis by Blue Dot Sessions https://app.sessions.blue/browse/track/98983

Rich Iannone

gt rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate Rich Iannone Jesse Mostipak TidyTuesday Table Battle

Brad Lindblad | Professional Financial Reports with {rmarkdown} | Posit

GitHub: https://github.com/bradlindblad/pro_reports_talk

Abstract: With finance there will always be a need for reports, and as long as there’s a need for reports, there will be R users who want to create them as lazily as possible.

R Markdown lets us create incredibly customized and branded reports that can run automatically each month or day or whatever, and it all starts with the wonderful parameterizing features of R Markdown.

In this lightning talk, we will work through a practical example of creating an income statement for a group of theoretical office branches. You will learn how to make a parameterized R Markdown report, organize your R Markdown files and even create a custom cover letter, all in R.

Bio: Brad Lindblad is a data scientist located in Fargo, North Dakota. He is author of the tidyUSDA and schrute R packages, and specializes in geospatial data science and risk modeling. Brad is a frequent contributor to data science publications and loves creating new R users.

This is a meetup recording from December 2020. For more information on how to join meetups live: rstd.io/community-events

Links shared in the chat: Brad’s material/slides: https://github.com/bradlindblad/pro_reports_talk For anyone who’s new to R Markdown, this is a great reference guide and overview: https://bookdown.org/yihui/rmarkdown/ Pagedown package: https://github.com/rstudio/pagedown ETL example: https://solutions.rstudio.com/r/apps/twitter-etl/ More information on RStudio Connect: https://www.rstudio.com/products/connect/ To chat with RStudio about Connect: rstd.io/chat-with-rstudio

Daniel Petzold || RStudio Team: Building and Sharing Jupyter Notebooks || RStudio

Learn more about RStudio Team here. https://www.rstudio.com/products/team/

Find the code for this example here. https://github.com/danielpetzold/space-tracker Read our blog post here. https://www.rstudio.com/blog/build-and-share-jupyter-notebooks-on-rstudio-team/

Timecodes 0:00 - Intro 0:07 - Build Jupyter Notebooks to analyze and visualize data 2:47 - Publish directly from RStudio Workbench to your content hub 5:13 - Share With Your Stakeholders on RStudio Connect

Jupyter Notebooks are interactive documents for code, outputs, and text. However, they’re often stuck in data scientists’ local computing environments. Collaborating can be difficult and sharing can be tedious. To live up to their fullest potential, data science teams need a way to scale their development securely and efficiently — while providing stakeholders easy access to their output and visualizations.

RStudio Team, made up of RStudio Workbench, RStudio Connect, and RStudio Package Manager, brings everything together to help data scientists create, reproduce, and share insights from their Jupyter Notebooks.

Let’s dive into a real-life example by exploring data from NASA’s Center for Near-Earth Objects (NEOs). Daniel Petzold walks us through his data analysis and reporting. Want to explore the report yourself? Check out the published report on RStudio Connect here. https://colorado.rstudio.com/rsc/space-tracker/space_tracker.html

On RStudio Workbench, you have a choice of editors: the RStudio IDE, JupyterLab, Jupyter Notebook, or VS Code. Choose your preference. From here, you can explore your dataset, embed HTML directly in your document, create visualizations, and more.

Once you’ve run your analyses and created insightful visualizations, you want to be able to share them with your team. RStudio Workbench allows you to publish to RStudio Connect, the content platform from RStudio.

You have multiple options: push-button deployment from Jupyter Notebook or using terminal commands from JupyterLab.

It’s not enough to publish your work. Once on RStudio Connect, you can share with end-users. Make your analysis accessible to specific users or more generally with different authentication measures. In addition, you can schedule the document to run at a certain time and send out an email with refreshed data.

Click the links below to learn more about these offerings.

RStudio Workbench: https://www.rstudio.com/products/workbench/

RStudio Connect: https://www.rstudio.com/products/connect/

RStudio Team

rstudio Rstudio Data Science Machine Learning Python Stats Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tidyr Github Data Wrangling Tidy Data Odbc Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate Jupyter JupyterLab Jupyter Notebooks Rstudio Workbench RStudio Connect

Capacity Planning for Microsoft Azure Data Centers | Using R & RStudio Connect

Capacity Planning for Microsoft Azure Data Centers | An Explainable Data Science Workflow using R & RStudio Connect | Presented by Paul Chang

2:12 - Start of presentation 47:43 - Start of Q&A session

Thank you for watching! Here are a few helpful links:

Link to Paul’s slides: https://lnkd.in/gh-hGScE
More information on RStudio Connect: https://www.rstudio.com/products/connect/
How to open an Azure account: https://azure.microsoft.com/en-us/
Getting started with SAML authentication on RStudio Connect: https://support.rstudio.com/hc/en-us/articles/360022321494-Getting-Started-with-SAML-in-RStudio-Connect
pins package: https://pins.rstudio.com/
plumber package: https://www.rplumber.io/
Upcoming events: rstd.io/community-events
Chat with our team to start an RStudio Connect evaluation: rstd.io/chat-with-rstudio

Abstract: The Long Range Capacity Planning team at Microsoft is responsible for producing plans for expanding Microsoft Azure Data Centers around the world. These are multi-billion dollar plans that enable the full suite of IaaS and PaaS cloud offerings for our customers, over a 5+ year time horizon. In this talk, we will present the data science software stack that we have built using RStudio Connect and Azure, for producing these data center capacity plans. We will discuss how RStudio Connect has empowered our data scientists to connect more directly with internal stakeholders and decision makers, and how RStudio Connect has enabled us to streamline our data science and business processes.

Speaker Bio: Paul Chang, Senior Data & Applied Scientist, Microsoft

Paul Chang is the Systems Architect of the Long Range Capacity Planning team for Microsoft Azure Data Centers. He received his Applied Math PhD from Simon Fraser University and has worked in a variety of fields including Applied Functional Analysis, Hydrogen Fuel Cell modeling, and A.I. Applications in Vehicular Traffic Engineering. He was also a software engineer in SQL Azure for a couple of years.

Thank you for joining us!

If you ever have suggestions or general feedback, please let us know! Here’s an anonymous google form: rstd.io/meetup-feedback
We’d love to hear from you too! Here’s a talk submission form as well: rstd.io/meetup-speaker-form
If you’d like to learn more about RStudio Connect: https://www.rstudio.com/products/connect/
If you’re just starting to advocate for data science in general or RStudio tools: rstudio.com/champion

Isabella Velásquez | Building a Blog with R | RStudio

Building a Blog With R Presented by: Isabella Velásquez

Here are a bunch of resources Isabella shared ⤵️

Slides from the presentation: https://lnkd.in/gqGFmHMf Internal Blog Example: https://lnkd.in/gaFPxN5F Other resources from the talk: https://lnkd.in/gjXxeMaa

Distill Resources: 1️⃣ Distill for R Markdown: https://lnkd.in/gWsEBXfN 2️⃣ Building a blog with distill by Tom Mock: https://lnkd.in/gQiE8PC2 3️⃣ (Re-)introducing Distill for R Markdown: https://lnkd.in/gzidDpV2 4️⃣ The distillery: https://lnkd.in/gwDAg_7G 5️⃣ Postcards package: https://lnkd.in/geT6uB9t

Blogdown Resources:

1️⃣ blogdown: Creating Websites with R Markdown: https://lnkd.in/gGQ-fCWw 2️⃣ Hugo Themes: https://themes.gohugo.io/ 3️⃣Hugo Apéro: https://lnkd.in/g8U9tfvq 4️⃣ A Blogdown New Post Workflow with Github and Netlify: https://lnkd.in/gYNwsKTm

The R programming language is known for its applications to data science, but one of its best assets is the inviting community. Folks from around the world share their lessons learned, best practices, and code to support and inspire others. One tool that helps contribute to the thriving community is the blog.

A blog is a wonderful opportunity to record your data stories, gain exposure for your expertise, and support others in their R journey. Thanks to the advancement of tools like R Markdown, you can quickly get up and running with a blog and focus on customization and style.

In this talk, we will discuss possible reasons for creating a blog, the pros and cons of a blog, and how to decide on topics. We will then explore tools for creating your blog that make it easy to showcase your R skills, such as blogdown and distill.

At RStudio, we are always looking for stories of how you are using R for your work, community, or for fun. If this talk inspires you to start writing, we would love for you to contribute to the RStudio blog: https://www.rstudio.com/blog/

Speaker Bio: Isabella Velásquez: Isabella is a content strategist, author, and active member of the R community. Currently, she works at RStudio as a Sr. Product Marketing Manager with the goal of driving engagement around all the awesome things happening at RStudio. In her previous role, she conducted data analysis and research, developed infrastructure to support use of data, and created resources to engage technical and non-technical audiences. She channels these experiences to illuminate what is possible with great products

Isabella Velásquez

RStudio Connect | Cut down on the grunt work. Deliver insights more effectively with RStudio

You built a great data science product. Something your stakeholders want and have been asking for.

But operationalizing that data product is hard.

How do you deliver it to interested stakeholders in the format that they want with the specific insights that they need? And what if you need to deliver your data products to multiple stakeholders, all with slightly different requirements?

How do you make sure that the data product is always up to date? And when the data is updated, how will your stakeholders know?

This is why we built RStudio Connect. RStudio Connect is the best way for data science teams to curate and share data products with their organizations. It comes with all of the bells and whistles that professional data science teams typically need to turn their analyses into value.

In this live event, we’ll do a deep dive into one of RStudio Connect’s most powerful features: scheduling and distributing data science products.

This feature enables data science teams to take all of their data science products and insights and deliver them to everyone that needs them to make decisions. Always on time, in the format that they want, and as efficiently as possible.

Specifically: we will show how a data scientist can deploy an R Markdown or Jupyter Notebook to RStudio Connect, then schedule the data product to update and get delivered to interested stakeholders. All without the traditional work associated with updating data pipelines and updating reports.

Put simply: we think RStudio gives data scientists, data engineers, and business users superpowers, and we can’t wait to show it off to you

R Markdown Advanced Tips to Become a Better Data Scientist & RStudio Connect | With Tom Mock

R Markdown is an incredible tool for being a more effective data scientist. It lets you share insights in ways that delight end users.

In this presentation, Tom Mock will teach you some advanced tips that will let you get the most out of R Markdown. Additionally, RStudio Connect will be highlighted, specifically how it works wonderfully with tools like R Markdown.

Please provide feedback: https://docs.google.com/forms/d/e/1FAIpQLSdOwz3yJluPR2fEqE0hBt92NtKZzzNACR8KJhHUt9rhFj3HqA/viewform?usp=sf_link

More resources if you’re interested: https://docs.google.com/document/d/1VKGs1G9GcQcv4pCYFbK68_LDh72ODiZsIxXLN0z-zD4/edit

04:15 Literate Programming 09:00 - Rstudio Visual Editor Demo 15:44 - R and python in same document via {reticulate} 18:10 - Q&A: Options for collaborative editing (version control, shared drive etc.) 19:30 - Q&A: Multi-pane support in Rstudio 20:46 Data Product (reports, presentations, dashboards, websites etc.) 24:15 - Distill article 26:27 - Xaringan presentation (add three dashes — for new slide) 28:58 - Flexdashboard (with shiny) 30:30 - Crosstalk (talk between different html widgets instead of {shiny} server) 35:03 - Q&A: Jobs panel – parallelise render jobs in background 36:50 - Q&A: various data product packages, formats 39:35 Control Document (modularise data science tasks, control code flow) 39:58 - Knit with Parameters (YAML params: option) 41:20 - Reference named chunks from .R files (knitr::read_chunk()) 43:00 - Child Documents (reuse content, conditional inclusion, {blastula} email) 47:07 Templating (don’t repeat yourself) 47:38 - rmarkdown::render() with params, looping through different param combinations 49:30 - Loop templates within a single document 50:40 - 04-templating/ live code demo 54:37 - {whisker} vs {glue} – {{logic-less}} vs {logic templating} 55:30 - {whisker} for generating markdown files that you can continue editing 57:49 RMarkdown + Rstudio Connect 1:00:41 Follow-up Reading and resources 1:04:49 Q&A - {shiny} apps, {webshot2} for screenshots of html, reading in multiple .R files, best practice for producing MSoffice files, {blastula}

blastula crosstalk flexdashboard reticulate rmarkdown rstudio Shiny webshot2 Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

The Importance of Understanding Your Business Users | Data Science Hangout Highlights

RStudio is joined by Tori Oblad, Data Officer at WaFd bank, to discuss how data scientists can become leaders within their organizations.

► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu

Follow Us Here: Website: https://www.rstudio.com LinkedIn: https://www.linkedin.com/company/rstudio-pbc Twitter: https://twitter.com/rstudio

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Measuring the Impact of Data Science | Data Science Hangout Highlights

RStudio is joined by Frank Corrigan, Director of Decision Intelligence, to discuss how data scientists can become leaders within their organizations.

Watch the full recording: https://www.youtube.com/watch?v=KBs4b3Q2n8Y

► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu

Follow Us Here: Website: https://www.rstudio.com LinkedIn: https://www.linkedin.com/company/rstudio-pbc Twitter: https://twitter.com/rstudio

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Make Sure You Communicate Value | Data Science Hangout Highlights

RStudio is joined by Tori Oblad, Data Officer at WaFd bank, to discuss how data scientists can become leaders within their organizations.

► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu

Follow Us Here: Website: https://www.rstudio.com LinkedIn: https://www.linkedin.com/company/rstudio-pbc Twitter: https://twitter.com/rstudio

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Teaching and learning with RStudio Cloud | RStudio

Learn about RStudio Cloud and most recent developments, particularly with respect to teaching with it.

Slides are posted at https://rstd.io/tl-rscloud .

ABOUT RSTUDIO CLOUD: RStudio Cloud is a lightweight, cloud-based solution that allows anyone to do, share, teach and learn data science online.

Analyze your data using the RStudio IDE, directly from your browser. Share projects with your team, class, workshop or the world. Teach data science with R to your students or colleagues. Learn data science in an instructor-led environment or with interactive tutorials.

There is nothing to configure and no dedicated hardware, installation or annual purchase contract required. Individual users, instructors and students only need a browser to do, share, teach and learn data science.

We will always offer a free plan for casual, individual use, and we now offer paid premium plans for professionals, instructors, researchers, and organizations.

RSTUDIO CLOUD RESOURCES: RStudio Cloud https://rstudio.cloud RStudio Cloud Pricing plans https://rstudio.cloud/plans/instructor RStudio Cloud guide https://rstudio.cloud/learn/guide {rscloud} https://github.com/rstudio/rscloud

#

ABOUT RSTUDIO: RStudio’s mission is to create free and open-source software for data science, scientific research, and technical communication to enhance the production and consumption of knowledge by everyone, regardless of economic means, and to facilitate collaboration and reproducible research, both of which are critical to the integrity and efficacy of work across industries.

RStudio also produces RStudio Team, a modular platform of commercial software products that give organizations the confidence to adopt R, Python and other open-source data science software at scale, along with online services to make it easier to learn and use them over the web.

Together, RStudio’s open-source software and commercial software form a virtuous cycle: the adoption of open-source data science software at scale in organizations creates demand for RStudio’s commercial software; and the revenue from commercial software, in turn, enables deeper investment in the open-source software that benefits everyone. Check out www.rstudio.com

Follow us on Twitter: https://twitter.com/rstudio

Facebook: https://www.facebook.com/rstudiopbc/

And LinkedIn: https://www.linkedin.com/company/rstudio-pbc/

RStudio Team

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate Cloud RStudio Cloud Teaching Remote Teaching Online Education

Data Scientists vs. Business Analysts | Data Science Hangout Highlights

RStudio is joined by Frank Corrigan, Director of Decision Intelligence, to discuss how data scientists can become leaders within their organizations.

► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu

Follow Us Here: Website: https://www.rstudio.com LinkedIn: https://www.linkedin.com/company/rstudio-pbc Twitter: https://twitter.com/rstudio

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

RStudio Cloud | {rscloud} Package | Instructor View

You can access RStudio Cloud’s API to manage space members programatically using the rscloud package.

You will need to create client credentials to use the package. To do so, click on your icon/name in the header to reveal the User panel, then click on Credentials. This will take you to the Credentials page of RStudio User Settings, where you can create and manage your client credentials.

{rscloud} package repo: https://github.com/rstudio/rscloud

ABOUT RSTUDIO CLOUD: RStudio Cloud is a lightweight, cloud-based solution that allows anyone to do, share, teach and learn data science online.

Analyze your data using the RStudio IDE, directly from your browser. Share projects with your team, class, workshop or the world. Teach data science with R to your students or colleagues. Learn data science in an instructor-led environment or with interactive tutorials.

There is nothing to configure and no dedicated hardware, installation or annual purchase contract required. Individual users, instructors and students only need a browser to do, share, teach and learn data science.

We will always offer a free plan for casual, individual use, and we now offer paid premium plans for professionals, instructors, researchers, and organizations.

RSTUDIO CLOUD RESOURCES: RStudio Cloud https://rstudio.cloud RStudio Cloud Pricing plans https://rstudio.cloud/plans/instructor RStudio Cloud guide https://rstudio.cloud/learn/guide {rscloud} https://github.com/rstudio/rscloud

VIDEO CREDITS: Monitor icon made by xnimrodx from flaticon.com Cloud icon made by Freepik from flaticon.com Tiny Putty Music from Blue Dot Sessions: https://app.sessions.blue/browse/track/52046

#

ABOUT RSTUDIO: RStudio’s mission is to create free and open-source software for data science, scientific research, and technical communication to enhance the production and consumption of knowledge by everyone, regardless of economic means, and to facilitate collaboration and reproducible research, both of which are critical to the integrity and efficacy of work across industries.

RStudio also produces RStudio Team, a modular platform of commercial software products that give organizations the confidence to adopt R, Python and other open-source data science software at scale, along with online services to make it easier to learn and use them over the web.

Together, RStudio’s open-source software and commercial software form a virtuous cycle: the adoption of open-source data science software at scale in organizations creates demand for RStudio’s commercial software; and the revenue from commercial software, in turn, enables deeper investment in the open-source software that benefits everyone. Check out www.rstudio.com

Follow us on Twitter: https://twitter.com/rstudio

Facebook: https://www.facebook.com/rstudiopbc/

And LinkedIn: https://www.linkedin.com/company/rstudio-pbc/

RStudio Team

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate Rscloud RStudio Cloud Api Teaching Cloud

Oliver Bridges - Smart DCC | Energy Meetup | RStudio

Navigating the Analytics Journey at Smart DCC Presentation by Oliver Bridges

Abstract: This talk will highlight the journey we have been (and still are) on that took us from outsourcing our analytics, to running some basic SQL and using graphical-based visualization tools, to open-source RStudio, and eventually to RStudio Team.

Our evolution has been shaped by the huge volumes of data and demands from our customers that really led us to RStudio. While none of us are defined as data scientists, rather data engineers, our whole team has been on this journey with us and we’re continually learning as we go.

From creating our own packages, APIs, pins, and Shiny dashboards we have now created workflows that allow us the deal with 10s of millions of records of aggregated data. As you start (or continue) on your own journey, I’d love to help share what we’ve learned along the way.

Bio: Oliver Bridges is the Head of Data Science and Analytics at Smart DCC, designing and now managing a team of 15. Ollie has 15 years of experience in telecommunication and 10 years in smart metering. He has worked all around the world, including Cape Town, Copenhagen, Sydney, Amsterdam, Brussels, Dublin, and the UK. Ollie lives in Cornwall UK, and has worked remotely since the start of the pandemic, but before that worked in Manchester (about 300 miles from my house). In his spare time, he loves sport and coding!

Smart DCC’s mission is to help digitise Britain’s energy network, and make a critical contribution in the effort to achieve net-zero greenhouse gas emissions, and improve the nation’s connectivity

RStudio Team

How to Improve Your Communication Skills | Data Science Hangout Highlights

RStudio is joined by Elaine McVey, VP of Data Science at The Looma Project, to discuss how data scientists can become leaders within their organizations.

Watch the full recording here: https://www.youtube.com/watch?v=IkqItgPSPro&feature=youtu.be

► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu

Follow Us Here: Website: https://www.rstudio.com LinkedIn: https://www.linkedin.com/company/rstudio-pbc Twitter: https://twitter.com/rstudio

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

How to Communicate Value | Data Science Hangout Highlights

RStudio is joined by Jonathan Regenstein, Head of Data and Quantamental Research at Truist Securities, to discuss how data scientists can become leaders within their organizations.

Watch the full recording here: https://www.youtube.com/watch?v=pNTENrov020

► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu

Follow Us Here: Website: https://www.rstudio.com LinkedIn: https://www.linkedin.com/company/rstudio-pbc Twitter: https://twitter.com/rstudio

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Kaija Gahm | greenT (Shiny Contest) | RStudio

Kaija Gahm: greenT Shiny app

greenT is a Shiny app I built to explore grapheme-color synesthesia, a neurological phenomenon where letters/numbers (graphemes) are associated with colors in my mind. Whenever friends found out about my synesthesia, they would want to know what color their names were in my head, and it got to be challenging to explain. When I found myself manually coloring squares in a spreadsheet to demonstrate what I saw in my mind’s eye, I decided to build an app instead. The app lets you either set your own colors or see what mine look like. Then you can enter text, and it will display as a series of rectangles according to the colors you’ve selected. I also added a feature that mimics a Google form to collect survey responses from other people with synesthesia, so they can submit their color data if they want to.

Bio: My name is Kaija (rhymes with Maya), and I’m an incoming PhD student in the ecology/evolutionary biology department at UCLA, where I’ll be studying movement ecology and behavior of vultures using social network analysis. I’ve been using R for about six years, and for the past year I’ve been working as a data manager for an ecosystem science nonprofit. I’m on the board of SORTEE, an organization focused on openness and reproducibility in ecology and evolutionary biology. I didn’t come from a computer science background, so I’m passionate about talking to people about incorporating R into their work in accessible and friendly ways.

When I’m not at my computer, you can usually find me outdoors (hiking or birdwatching) or spending entire days in the kitchen making up recipes. I tweet and blog, mostly about R

How to Speak to Executives | Data Science Hangout Highlights

RStudio is joined by Elaine McVey, VP of Data Science at The Looma Project, to discuss how data scientists can become leaders within their organizations.

Watch the full recording here: https://www.youtube.com/watch?v=IkqItgPSPro

► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu

Follow Us Here: Website: https://www.rstudio.com LinkedIn: https://www.linkedin.com/company/rstudio-pbc Twitter: https://twitter.com/rstudio

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Yu-Hung Chang, Phillip Lear & Brendan Scully | R in Manufacturing & Consumer Products | RStudio

Presentations from: Pratt and Whitney - Brendan Scully Kellogg Company - Phillip Lear AGCO - Dr. Yu-Hung Chang

Abstract #1 In this talk, Brendan will showcase two of the tools developed at Pratt using R and hosted on RStudio Connect: a forecasting model for repair shop performance and a real-time dashboard for work management on the repair shop floor. In addition to these tools, he will talk about how to approach some common development and implementation challenges for decision analytics tools at large companies like Pratt.

Bio: Brendan Scully is the Decision Analytics Strategy Lead for Pratt and Whitney’s aftermarket operations business. In this role, he is focused on how decision analytics tools can be used to improve the maintenance, repair, and overhaul of aircraft engines. Before this, Brendan helped other companies make the most of their data as an actuarial consultant at Milliman and Deloitte, and as data scientist at the African Leadership University.

Abstract #2 Phillip will discuss his organization’s journey in transitioning a set of mixed analytic technologies to using a standardized analytic content platform. This talk will offer real life lessons learned in overcoming organizational barriers, getting stakeholders on board, and running a successful POC.

Bio: Phillip is a Principal Data Scientist at the Kellogg Company working with global sales, marketing and ecommerce teams. He also teaches R to economics students at Grand Valley State University and sits on the Advisory Board for the Economics department. He lives in Kalamazoo (yes there really is a Kalamazoo!) with his wife Jamie and their two corgis, Penny and Winston.

Abstract #3 Dr. Yu-Hung Chang is an Advanced Analytics Specialist at AGCO. She’s going to showcase an interactive app developed at AGCO using R and hosted on RStudio Connect, which provides comprehensive insights for global manufacturing quality analysis. The real-time dashboard provides not only forecasting failure rate based on different levels of parts, but also varieties of diagrams of cost and claims in different aspects. In this talk, Yu-Hung will share what are the challenges for development of analytics tools and how to establish some common development in agricultural manufacturing world.

Dr. Yu-Hung Chang is an Advanced Analytics Specialist working for AGCO’s Global Field Quality group. Her multi-disciplinary background combines technical fundamentals with leading-edge research and big data applications, using the most advanced technologies. She has authored numerous articles, published in inter-disciplinary journals, covering topics from Statistics and Computational Physics to Aeronautics and Automotive

Meghan Hall & Mitch Tanney | R in Sports Analytics | RStudio

R in spoRts analytics! Presentations by Mitch Tanney & Meghan Hall

Moving the Needle Toward Organizational Success (0:00 - 29:35) Extending R Markdown: from websites, slides, to PDFs, and more! (32:00 -53:00)

Talk #1 Moving the Needle Toward Organizational Success (0:00 - 29:35)

What You’ll Learn: Data analysis can and should influence decisions on the field, court, ice or pitch in the same way it impacts decisions in the boardroom. While the physical landscapes may vary, the objective to maximize the likelihood of successful organizational outcomes does not. Practical applications of data-driven, decision-making processes will serve as the framework for introducing attendees to the field of sports analytics.

About the Speaker: Prior to joining RStudio in Customer Success, Mitch chased a professional football playing career and later founded and led analytics departments for the Chicago Bears and Denver Broncos. He was a member of the Super Bowl 50 World Championship team in Denver. Mitch graduated summa cum laude as a double major in Mathematics and Spanish from Monmouth College and completed an MBA with distinction from the University of Iowa.

Talk #2 Extending R Markdown: from websites, slides, to PDFs, and more! (32:00 -53:00)

About the Speaker: Meghan Hall is a data scientist at Zelus Analytics and is currently teaching a course on data visualization at Carnegie Mellon University. Meghan contributes to the public sports analytics community by helping beginners learn R as well as by writing and presenting on various aspects of hockey analysis as a member of Hockey-Graphs

Katherine Kopp | COVID vaccine distribution Shiny app walkthrough (mock data) | RStudio

Learn more:

Data Driven West Virginia: https://business.wvu.edu/research-outreach/data-driven-wv

DDWV PPE forecasting: https://wvutoday.wvu.edu/stories/2020/04/27/wvu-business-experts-partner-with-the-national-guard-to-forecast-ppe-needs

DDWV inventory management system: https://wvutoday.wvu.edu/stories/2021/03/22/a-different-kind-of-science-wvu-chambers-college-data-scientists-propel-west-virginia-s-acclaimed-vaccine-strategy-with-digital-inventory-management-system

West Virginia National Guard: https://www.wv.ng.mil/

Shiny: https://shiny.rstudio.com/

West Virginia leading nation at start of vaccine rollout: https://www.vox.com/first-person/2021/3/4/22313540/covid-19-vaccine-west-virginia

To understand just how hard it is to get vaccines to the population, it helps to understand where it can go wrong. This starts with how vaccines are packed into containers.

To fill up a container, Pfizer places 195 vials into a tray, and up to 5 trays into a single container. Moderna puts 10 vials into a small box, and then combines a minimum of 10 small boxes into a single container. In most states Pfizer and Moderna ship directly to the organization that will be administering the vaccine to the population. This could be a hospital, a pharmacy, or any place where trained professionals will be putting shots into arms.

But what happens when a pharmacy receives a full container from Pfizer, 975 vials, but only needs 600?

West Virginia has removed this complication by shipping directly to five hubs strategically located throughout the state. Within each of these hubs, containers of vaccine vials are broken down into smaller components and then either picked up or shipped directly to the hospital, pharmacy, or organization that will be administering the vaccine.

These hubs are managed by the Joint Interagency Task Force (JIATF), a team of teams composed of public, private, and governmental organizations as well as the National Guard. The Joint Interagency Task Force is responsible for drawing up a weekly distribution plan for each hub, in alignment with CDC allocations, and matching vaccine supply with demand.

By using a statewide system managed by a central organization, there’s a level of agility and fluidity that allows each hub to adjust to a variety of changes in order to maximize the number of vaccines that are being administered to the population each week.

RStudio’s mission is to create free and open-source software for data science, scientific research, and technical communication to enhance the production and consumption of knowledge by everyone, regardless of economic means, and to facilitate collaboration and reproducible research, both of which are critical to the integrity and efficacy of work across industries.

RStudio also produces RStudio Team, a modular platform of commercial software products that give organizations the confidence to adopt R, Python and other open-source data science software at scale, along with online services to make it easier to learn and use them over the web.

Together, RStudio’s open-source software and commercial software form a virtuous cycle: the adoption of open-source data science software at scale in organizations creates demand for RStudio’s commercial software; and the revenue from commercial software, in turn, enables deeper investment in the open-source software that benefits everyone. Check out www.rstudio.com .

Follow us on Twitter: https://twitter.com/rstudio

Facebook: https://www.facebook.com/rstudiopbc/

And LinkedIn: https://www.linkedin.com/company/rstudio-pbc/

RStudio Team

rstudio Shiny Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Shiny RMarkdown Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Rstats Open Source OSS Reticulate Data Driven West Virginia Covid Covid Vaccine Vaccine Distribution Katherine Kopp Brad Price Major Ryan Coss

Managing COVID vaccine distribution in West Virginia | RStudio

With a little help from open source software

Learn more: Data Driven West Virginia: https://business.wvu.edu/research-outreach/data-driven-wv

DDWV PPE forecasting: https://wvutoday.wvu.edu/stories/2020/04/27/wvu-business-experts-partner-with-the-national-guard-to-forecast-ppe-needs

DDWV inventory management system: https://wvutoday.wvu.edu/stories/2021/03/22/a-different-kind-of-science-wvu-chambers-college-data-scientists-propel-west-virginia-s-acclaimed-vaccine-strategy-with-digital-inventory-management-system

West Virginia National Guard: https://www.wv.ng.mil/

Shiny: https://shiny.rstudio.com/

West Virginia leading nation at start of vaccine rollout: https://www.vox.com/first-person/2021/3/4/22313540/covid-19-vaccine-west-virginia

In the United States, approximately 2.5 million doses of COVID vaccines are being delivered each day, and how these doses go from the manufacturer to a shot in someone’s arm varies by state, often with mixed results.

But early on in the vaccine distribution process, one state led the pack in terms of using the majority of vaccine doses it had been allotted. That state? West Virginia.

Part of what has made West Virginia successful is the creation of an inventory management system using Shiny, an open source framework for building interactive web applications. The system was built by Data Driven West Virginia, part of the John Chambers College of Business and Economics at West Virginia University, in collaboration with the West Virginia Army National Guard.

Using Shiny has provided visibility into each component of the vaccine supply chain, leading to the creation of distribution plans that are able to quickly and efficiently match supply with demand, getting vaccines to the right people in the right location at the right time.

RStudio’s mission is to create free and open-source software for data science, scientific research, and technical communication to enhance the production and consumption of knowledge by everyone, regardless of economic means, and to facilitate collaboration and reproducible research, both of which are critical to the integrity and efficacy of work across industries.

RStudio also produces RStudio Team, a modular platform of commercial software products that give organizations the confidence to adopt R, Python and other open-source data science software at scale, along with online services to make it easier to learn and use them over the web.

Together, RStudio’s open-source software and commercial software form a virtuous cycle: the adoption of open-source data science software at scale in organizations creates demand for RStudio’s commercial software; and the revenue from commercial software, in turn, enables deeper investment in the open-source software that benefits everyone. Check out www.rstudio.com

Follow us on Twitter: https://twitter.com/rstudio

Facebook: https://www.facebook.com/rstudiopbc/

And LinkedIn: https://www.linkedin.com/company/rstudio-pbc/

RStudio Team

rstudio Shiny Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Plumber Blogdown Gt Tidymodels Statistics Debugging Rstats Open Source OSS Data Driven West Virginia Katherine Kopp Brad Price Major Ryan Coss Covid Covid Vaccine Vaccine Distribution

Tom Mock | RStudio Connect in Production

https://rstudio.com/resources/webinars/rstudio-connect-in-production/

In part 2 of this 3 part series, Tom covers: Communicating results can be the most challenging part of Data Science: many insights never leave the laptops where they are discovered. In this webinar, we will show you how to use RStudio Connect to deploy your results in a production environment. You’ll learn how to automate publishing, schedule updates, and provide consumers with self-service access to your work. RStudio Connect is a revolutionary new way to host executable Data Science content.

About Tom: Thomas is involved in the local and global data science community, serving as Outreach Coordinator for the Dallas R User Group, as a mentor for the R for Data Science Online Learning Community, as co-founder of #TidyTuesday, attending various Data Science and R-related conferences/meetups, and participated in Startup Weekend Fort Worth as a data scientist/entrepreneur

rstudio webinars Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Kelly O’Briant | Interactivity in Production | RStudio (2019)

https://rstudio.com/resources/webinars/interactivity-in-production/

In part 3 of this 3 part series, Kelly covers: Interactive products take your data science to a new level, but they require new coding decisions. This webinar will give you clear guidelines on when and how to add interactivity to your work. Here you’ll learn: when to use off-the-shelf interactive products like parameterized R Markdown and htmlwidgets, when to create bespoke interactivity with Shiny, how to make your Shiny apps as fast as possible, how to support interactivity in production, and much more.

About Kelly: Kelly is Solutions Engineer for RStudio and also an organizer of the Washington DC chapter of R-Ladies Global. It’s an R users group for lady-folk and friends

rstudio Shiny webinars Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Garrett Grolemund | Reproducibility in Production | RStudio (2019)

https://rstudio.com/resources/webinars/reproducibility-in-production/

In part 1 of this 3 part series, Garrett covers the following:

Computational documents offer limitless opportunities for your business. With them, your consumers can rerun your report with new parameters, apply your analysis to new data, or schedule future, automatic updates to your work—all with the click of a button. This is the first in a three part webinar series that will describe this new form of reproducibility. Here, we begin by showing you how to write executable R Markdown documents for a production environment.

About Garrett: Garrett is the author of Hands-On Programming with R and co-author of R for Data Science and R Markdown: The Definitive Guide. He is a Data Scientist at RStudio and holds a Ph.D. in Statistics, but specializes in teaching. He’s taught people how to use R at over 50 government agencies, small businesses, and multi-billion dollar global companies; and he’s designed RStudio’s training materials for R, Shiny, R Markdown and more. Garrett wrote the popular lubridate package for dates and times in R and creates the RStudio cheat sheets

lubridate rstudio Shiny webinars Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Jenny Bryan | Help me help you: creating reproducible examples | RStudio (2018)

What is a reprex? It’s a reproducible example. Making a great reprex is both an art and a science and this webinar will cover both aspects. A reprex makes a conversation about code more efficient and pleasant for all. This comes up whenever you ask someone for help, report a bug in software, or propose a new feature. The reprex package (https://reprex.Tidyverse.org ) makes it especially easy to prepare R code as a reprex, in order to share on sites such as https://community.rstudio.com , https://github.com , or https://stackoverflow.com . The habit of making little, rigorous, self-contained examples also has the great side effect of making you think more clearly about your programming problems.

Webinar materials: https://rstudio.com/resources/webinars/help-me-help-you-creating-reproducible-examples/

About Jenny: Jenny is a software engineer on the tidyverse team. She is a recovering biostatistician who takes special delight in eliminating the small agonies of data analysis. Jenny is known for smoothing the interfaces between R and spreadsheets, web APIs, and Git/GitHub. She’s been working in R/S for over 20 years and is a member of the R Foundation. She also serves in the leadership of rOpenSci and Forwards and is an adjunct professor at the University of British Columbia

Jenny Bryan

reprex rstudio tidyverse tidyverse.org webinars Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Reprex

Nathan Stephens | Make PowerPoint Presentations with R Markdown | RStudio (2018)

Data scientists use R Markdown documents to create reproducible code that can be rendered in a variety of output types. Some of the most common output types include HTML, Word, and PDF, but new improvements make it possible to create PowerPoint presentations as well. PowerPoint presentations are still common currency for sharing insights in most organizations today. This webinar demonstrates how to create feature rich PowerPoint presentations from R Markdown and how to use these presentations to share insights, visualizations, Shiny apps, and more.

About Nathan: Nathan has a background in analytic solutions and consulting. He has experience building data science teams, architecting analytic infrastructure, and delivering innovative data products. He is a long time user of R

rstudio Shiny Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Nathan Stephens

Mike Garcia | R in Pharma: Intro to Shiny | Posit

Slides: https://garciamikep.github.io/rstudioglobal-2021-shiny-slides/slides.html#1

From rstudio::global(2021) Pharma X-Sessions, sponsored by ProCogia: in this introduction to Shiny app development, we begin with a quick review of visualization with {ggplot2} and then cover core concepts in app structure and reactive programming. After building several Shiny apps of increasing complexity, we wrap up with a demonstration of how to include your Shiny app in a dashboard using the {flexdashboard} package.

About Mike Garcia: Mike is a Data Science Consultant with ProCogia, with a background in Biostatistics and experience in clinical trial design and public health research. If not geeking out on data with a cup of coffee and spreading his passion for R, he’s probably out enjoying the outdoors.

Learn more about the rstudio::global(2021) X-Sessions: https://blog.rstudio.com/2021/01/11/x-sessions-at-rstudio-global/

To hear more about how other major pharmaceutical companies are transitioning to open source data science you can watch talks from this year’s R in Pharma conference: https://www.youtube.com/@RinPharma/playlists

At Posit, we have a dedicated Pharma team to help organizations migrate and utilize open source for drug development. To learn more about our support for life sciences, please see our dedicated Pharma page where you can book a call with our team. (https://posit.co/solutions/pharma )

flexdashboard ggplot2 rstudio Shiny Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate Mike Garcia Pharma How To

Andy Nicholls & Michael Rimler | Using R to Drive Agility in Clinical Reporting | RStudio (2020)

The R language is used extensively throughout the pharmaceutical industry. But its use within the tightly regulated clinical reporting workflows has remained limited. GSK Biostatistics has embarked upon a journey to embed R as a primary statistical analysis tool for clinical reporting.

Enabling R within a global department of over 600 Statisticians, Programmers and Data Scientists is challenging! It requires planning, patience, and a strong foundation that enables consistency across the enterprise. We invite you to learn more about how we achieved this at GSK.

You’ll learn about our tidyverse-centric training program, a future-ready Working Area for R Programming (WARP) environment, and a leading-edge R for Clinical Reporting (R4CR) initiative. The goal: help embed R in every-day clinical reporting output.

About Andy: Andy Nicholls has a long history with the R language and in Data Science, authoring the book ‘R in 24 Hours’. He is currently Head of Statistical Data Sciences within GSK’s Biostatistics department. One of his team’s main objectives is to embed the R language within Biostatistics; developing training materials, overseeing various adoption initiatives and provisioning a world-class environment for R . He is also the lead for the cross-industry R Validation Hub initiative that aims to support the use of R for regulatory work.

About Michael: Michael Rimler is a clinical programmer in GSK Biostatistics and passionate about influencing the evolving role of open source technologies and data science capabilities on clinical data analytics. He is involved with numerous internal initiatives aimed at moving the organization in this direction, including leading the effort to fully integrate R into the clinical reporting process. Michael is a co-lead of the Open Source Technologies in Clinical Research PHUSE working group project, has chaired a PHUSE US Single Day Event on Data Visualization, and will serve as a co-chair for the 2021 PHUSE US Connect

rstudio tidyverse Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Forcats Rstats Open Source OSS Reticulate Clinical Reporting GSK Pharma Andy Nicholls Michael Rimler

A quick tour of RStudio 1.4 | RStudio

HD version here: https://youtu.be/oCR_LB3H73M

0:00 Introduction 0:20 R Markdown Visual Editor 0:46 Insert citations in R Markdown 1:09 Python support in Environment pane 2:05 Python environment selection 2:25 Rainbow parentheses 2:43 Monospace font support 2:54 Support for multiple source columns 3:10 Command palette 3:27 Customize data and configuration storage (users and servers) 3:55 RStudio Pro edition features 4:08 Authenticate RStudio Server Pro using SAML 4:25 Project sharing with Launcher 4:48 Request a GPU with SLURM 5:00 Run Visual Studio Code sessions (beta)

What’s new with RStudio 1.4:

A visual markdown editor that provides improved productivity for composing longer-form articles and analyses with R Markdown.

New Python capabilities, including display of Python objects in the Environment pane, viewing of Python data frames, and tools for configuring Python versions and conda/virtual environments.

The ability to add source columns to the IDE workspace for side-by-side text editing.

A new command palette (accessible via Ctrl+Shift+P) that provides easy keyboard access to all RStudio commands, add-ins, and options.

Support for rainbow parentheses in the source editor (enabled via Options, then Code, then Display).

New citation support that allows you to include document citations from your document bibliography, personal or group libraries, and several other sources.

Integration with a host of new RStudio Server Pro features including project sharing when using Launcher, Microsoft Visual Studio Code support (currently in beta), SAML authentication, and local launcher load-balancing.

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate Visual Editor

Alison Hill & Desirée De Leon | How to Get Your Materials Online With R Markdown | RStudio (2020)

Full title: Sharing on Short Notice: How to Get Your Teaching Materials Online With R Markdown

Educators create a lot of files for teaching- slides, exercises, solutions, assignments, data, figures- that all ultimately need to be shared with other people. Having a link for sharing your teaching materials can save you time and pain, but it is hard to get started if you’ve never shared your resources online before. In this webinar, we’ll give a tour of the R Markdown ecosystem for educators that you can start to use right away. We’ll show how it can help you make your teaching more shareable, reproducible, and resilient.

About Alison: I studied psychology and quantitative methods, receiving my Ph.D. from Vanderbilt University (2008). For eight years, I was a professor and scientist at Oregon Health & Science University, where my research was funded by the National Institutes of Health, the Oregon Clinical and Translational Research Institute, and Autism Speaks. I have written numerous scientific journal articles and book chapters on autism and neurodevelopmental disorders. I have developed and delivered workshops, graduate-level courses, and curricula based on teaching R, the tidyverse, and literate programming. You can follow my current work for RStudio Education on GitHub.

About Desirée: I am neuroscience PhD student at Emory University and also a former summer intern at RStudio. This past summer, I worked with Alison Hill to develop a handbook filled with practical advice and resources for educators who teach with R and RStudio. I enjoy spending my time on collaborative projects that involve coding, teaching, and illustration

rstudio tidyverse Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Alison Hill Desiree De Leon Reproducibility

Enabling Remote Data Science Teams | RStudio (2020)

Whatever happens in the coming months, remote work is here to stay. The goal of this webinar is to provide data scientists and data science team leaders with the knowledge and tools to succeed as a distributed team. Some of the topics we will cover include:

Setting up a remote and collaborative R environment
Version control and Scrum
Using Shiny and RStudio Connect to share apps within and across teams
How to improve the UI and appearance of Shiny dashboards
How to scale Shiny dashboards to hundreds of users
How to build and grow a remote data science team

About Alex: Alex is a Solutions Engineer at RStudio, where he helps organizations succeed using R and RStudio products. Before coming to RStudio, Alex was a data scientist and worked on economic policy research, political campaigns, and federal consulting.

About Olga: Olga is experienced in production applications of analytical solutions, especially for FMCG companies. Recently she developed a price elasticity model for Unilever.

About Damian: Damian is one of the four co-founders of Appsilon. Before founding Appsilon he worked at Accenture, UBS, Microsoft and Domino Data Lab.

About Pedro: Pedro has nearly a decade of experience combining frontend and backend technologies, and is an expert on augmenting R Shiny dashboards with CSS and JavaScript

rstudio Shiny Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Forcats Rstats Open Source OSS Reticulate Version Control Scrum RStudio Connect Dashboards Remote Work

Alex Gold | Managing Packages for Open Source Data Science | RStudio

What You’ll Learn: With over 15,000 R packages on CRAN, over 230,000 on PyPI, and more arriving every day, the task of managing a package environment for data science in R and Python can be daunting. In this webinar, you will learn about the most current strategies and tooling for creating and maintaining a reproducible package environment. Whether you’re an individual data scientist or the administrator of an entire RStudio Team cluster, you’ll better understand how you can enhance your ability to work easily and safely with open-source data science packages.

About Alex: Alex is a Solutions Engineer at RStudio, where he helps organizations succeed using R and RStudio products. Before coming to RStudio, Alex was a data scientist and worked on economic policy research, political campaigns, and federal consulting

RStudio Team

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate Alex Gold

Kevin Bolger & Oisin Bates | Architecting RStudio Products in the Cloud | RStudio (2020)

Part of RStudio’s Data Science in the Cloud Webinar Series.

About Kevin: After finishing his education in the University of Limerick, Ireland – Kevin’s passion for data science was cemented. Focusing primarily on data analytics and modelling, he went on to spend the first years of his career working at a biopharmaceutical company, where he led the data team on multiple products. Since moving to Seattle with his Washington native wife, Kevin has spent his spare time enjoying the beautiful PNW and playing ‘hurling’, an ancient gaelic field sport with the Seattle Gaels. He now leads the Data Science team at ProCogia as the Director of Data Solutions – where he works with clients from Biotech to Telecom.

About Oisin: Oisin is a data science and AWS Cloud Architect for ProCogia

rstudio Rstudio Data Science Machine Learning Python Stats Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Forcats Rstats Open Source OSS Reticulate Kevin Bolger Oisin Bates Cloud Cloud-Based Data Science Data Science Cloud

Lou Bajuk & Sean Lopp | RStudio: A Single Home for R & Python| RStudio

Many Data Science teams today are bilingual, leveraging both R and Python in their work. While both languages are tremendously powerful, teams frequently struggle to use them together. We’ve heard from our customers how even experienced data scientists familiar with both languages often struggle to combine them without painful context switching and manual translations. Data Science leaders and business stakeholders find it difficult to make key data science content easily discoverable and available for decision-making, and IT Admins and DevOps engineers grapple with how to efficiently support these teams.

In this webinar, you will learn how RStudio helps Data Science teams tackle all these challenges, and make the Love Story between R and Python a happier one:

Easily combine R and Python in a single Data Science project
Leverage a single infrastructure to launch and manage Jupyter Notebooks and JupyterLab environment, as well as the RStudio IDE
Organize and share Jupyter Notebooks alongside your work in R and your mixed R and Python projects

This webinar will show examples of all these capabilities, and discuss the benefits of leveraging R and Python.

About Lou: Lou is a passionate advocate for data science software, and has had many years of experience in a variety of leadership roles in large and small software companies, including product marketing, product management, engineering and customer success. In his spare time, his interests includes books, cycling, science advocacy, great food and theater.

About Sean: Sean has a degree in mathematics and statistics and worked as an analyst at the National Renewable Energy Lab before making the switch to customer success at RStudio. In his spare time he skis and mountain bikes and is a proud Colorado native

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Lou Bajuk Sean Lopp Webinar

Alex Gold | Deploying End-To-End Data Science with Shiny, Plumber, and Pins | RStudio

It’s easier than ever to craft a complete R-centric data science pipeline thanks to packages like Shiny, Plumber, and Pins. In this talk, you’ll learn how to use R to bring your modeling and visualization work into production. You’ll walk away with recipes, tips, and tricks to deploy data, models, and apps to ensure your work is as impactful as possible.

About Alex: Alex is a Solutions Engineer at RStudio, where he helps organizations succeed using R and RStudio products. Before coming to RStudio, Alex was a data scientist and worked on economic policy research, political campaigns, and federal consulting

plumber rstudio Shiny Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Alex Gold

Fernanda Viegas & Martin Wattenberg | Data, visualization, and designing AI | RStudio (2020)

Recent progress in machine learning has raised a series of urgent questions: How can we train and debug deep learning models? How can we understand what is going on inside a neural network? And, perhaps most important, how can we design systems that serve people best? We’ll show a series of examples from the People+AI Research (PAIR) initiative at Google–ranging from data visualizations for researchers, to tools for medical practitioners, to guidelines for designers–that illustrate how thinking carefully about data can lead to better tools, more effective design, and help humans and AI work together.

About Fernanda: Fernanda Viégas and Martin Wattenberg co-lead Google’s PAIR (People+AI Research) initiative, part of Google Brain. Their work in machine learning focuses on transparency and interpretability, as part of a broad agenda to improve human/AI interaction. They are well known for their contributions to social and collaborative visualization, and the systems they’ve created are used daily by millions of people. Viégas and Wattenberg are also known for visualization-based artwork, which has been exhibited in venues such as the Museum of Modern Art in New York, London Institute of Contemporary Arts and the Whitney Museum of American Art. Their artwork has influenced contemporary design practice: for instance, the techniques in their wind map are now used by many major media companies to display the weather.

About Martin: Fernanda Viégas and Martin Wattenberg co-lead Google’s PAIR (People+AI Research) initiative, part of Google Brain. Their work in machine learning focuses on transparency and interpretability, as part of a broad agenda to improve human/AI interaction. They are well known for their contributions to social and collaborative visualization, and the systems they’ve created are used daily by millions of people. Viégas and Wattenberg are also known for visualization-based artwork, which has been exhibited in venues such as the Museum of Modern Art in New York, London Institute of Contemporary Arts and the Whitney Museum of American Art. Their artwork has influenced contemporary design practice: for instance, the techniques in their wind map are now used by many major media companies to display the weather

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate AI Google Deep Learning

Panel: Career Advice for Data Scientists | RStudio (2020)

Featuring: Gabriela de Queiroz, David Keyes, Sydeaka Watson, and Jen Hecht

This panel will be focused on how you build a career around R! Our panelists are all passionate about R and have each taken a different path to build a career around that passion. We’ll be touching on topics like the different stages of career growth and how you find a community you can go to for support. If you’re just getting started in your career, looking to make a transition, or interested in learning some great career-building skills, be sure to join us!

About Gabriela de Queiroz: Gabriela de Queiroz is a Sr. Engineering & Data Science Manager at IBM where she manages and leads a team of developers working on Data & AI Open Source projects. Her team contributes to projects such as TensorFlow, PyTorch, Apache Arrow, Apache Spark, and several other open source projects inside IBM. She works to democratize AI by building tools and launching new open source projects. She is passionate about making data science available to everybody and is actively involved with several organizations to foster an inclusive community. She is the founder of AI Inclusive, a global organization that is helping increase the representation and participation of gender minorities in Artificial Intelligence. She is also the founder of R-Ladies, a worldwide organization for promoting diversity in the R community with more than 180 chapters in 50+ countries.

About David Keyes: David Keyes is the founder of R for the Rest of Us, which teaches people and individuals to use R through online courses, custom trainings, and public workshops. With stops as an elementary school teacher, PhD social scientist, and program evaluator, David brings his unconventional career trajectory to his current role, helping those with who don’t think of themselves as typical R users embrace the power of R.

About Sydeaka Watson: Dr. Sydeaka Watson is a native of New Orleans, Louisiana and currently lives in Dallas, Texas. She is Founder and Owner of Korelasi Data Insights, LLC and a Senior Data Scientist at Elicit Insights, LLC. In these roles, Sydeaka uses predictive analytics and visual tools to draw insights from diverse datasets. Sydeaka earned a Ph.D. in Statistics from Baylor University and has several years of teaching experience. In her 5 years as Research Assistant Professor in The University of Chicago Biostatistics Laboratory, she consulted with over 110 biomedical research teams in The University of Chicago Medical Center, specializing in statistical analysis and experimental design for clinical research studies. Her current research interests include applications of (1) image recognition for computer vision and (2) data science for social justice. Sydeaka currently serves as Organizer of the R-Ladies Dallas Chapter. She also volunteers in the Dallas chapters of Girls Who Code and Black Girls Code.

About Jen Hecht: Jen Hecht is the VP of People Operations at RStudio. She was first introduced to R in 2013, as a non-programmer seeking better ways to manage analytical projects - a quest which was aided both by the RStudio toolchain and the welcoming support of R Ladies, R meetup groups, and other wonderful open resources. Ever since, she has been captivated by open data science tools and the communities that build them. Before joining RStudio in 2018, Jen held HR and People Analytics roles in a variety of industries, including financial services, biotech, and used record shops. Outside of work, Jen loves books and music of all kinds, and is a novice fly caster

rstudio tensorflow Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Data Science Career Career

Podcast | Not So Standard Deviations Episode 100 | RStudio (2020)

Featuring: Hilary Parker & Roger Peng

In episode 100 of Not So Standard Deviations, the first ever episode prepared in advance, Hilary and Roger discuss creativity, its role in data science, and how it can be fostered through conversation. Also, follow up on coffee and oat milk.

About Hilary: Hilary Parker is a Data Scientist on the styling recommendations team at Stitch Fix, a personal styling service that uses a combinations of human stylists and algorithmic recommendations to help people find what they love. At Stitch Fix, she focuses on what sorts of data to collect from clients in order to optimize clothing recommendations, as well as building out prototypes of algorithms or entirely new products based on new data sources. She is also a co-founder of the Not So Standard Deviations podcast, a bi-weekly data science podcast with Roger Peng that has over half a million downloads. Their topics of discussion include the R ecosystem, recent developments in the data science and statistics field, reproducibility and the “how” of how data scientists and statisticians work. Hilary recently authored the paper Opinionated Analysis Development based on discussions from the podcast. Prior to her career in the tech field, Hilary received her PhD in Biostatistics from Johns Hopkins School of Public Health. She lives at the San Francisco Zen Center with her partner, a Soto Zen Priest. In her free time, she enjoys exploring her home of 2 years, San Francisco.

About Roger: Roger D. Peng is a Professor of Biostatistics at the Johns Hopkins Bloomberg School of Public Health where his research focuses on the development of statistical methods for addressing environmental health problems. He is the author of the popular book R Programming for Data Science and nine other books on data science and statistics. He is also the co-creator of the Johns Hopkins Data Science Specialization, the Simply Statistics blog where he writes about statistics for the public, the Not So Standard Deviations podcast with Hilary Parker, and The Effort Report podcast with Elizabeth Matsui. Roger is a Fellow of the American Statistical Association and is the recipient of the Mortimer Spiegelman Award from the American Public Health Association, which honors a statistician who has made outstanding contributions to public health. He can be found on Twitter and GitHub at @rdpeng

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Hilary Parker Roger Peng

Alan Feder | Categorical Embeddings: New Ways to Simplify Complex Data | RStudio

When building a predictive model in R, many of the functions (such as lm(), glm(), randomForest, xgboost, or neural networks in keras) require that all input variables are numeric. If your data has categorical variables, you may have to choose between ignoring some of your data and too many new columns.

Categorical embeddings are a relative new method, utilizing methods popularized in Natural Language Processing that help models solve this problem and can help you understand more about the categories themselves.

While there are a number of online tutorials on how to use Keras (usually in Python) to create these embeddings, this talk will use embed::step_embed(), an extension of the recipes package, to create the embeddings.

About Alan: Alan Feder is a Principal Data Scientist at Invesco, where he uses as much R as possible to solve problems and build products throughout the company. Previously, he worked as a data scientist at AIG and an actuary at Swiss Re. He studied statistics and mathematics at Columbia University. He is unreasonably excited to spread the word about categorical embeddings. Alan lives in New York City with his wife, Ashira, and two children, Matan and Sarit

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Alan Feder

Alex Cookson | The Power of Great Datasets | RStudio

There are a few classic datasets, like mtcars, nycflights, or Titanic passengers. They’re okay, but they leave something to be desired for folks learning R: they’re kind of boring.

There’s a big difference between “Okay Datasets” and “Great Datasets”. Great Datasets prompt you to exclaim, “That’s so cool!” They get your blood pumping and mind racing with questions you want answered. They give tremendous motivation to answer those questions. And in answering those questions, you’ll probably learn some R.

I want you to curate Great Datasets. You’ll contribute to the richness of our community, you’ll learn some R yourself, and you’ll feel fantastic when someone finds your Great Dataset and exclaims, “That’s so cool!”

About Alex: Alex Cookson helps the Customer Intelligence team at the Royal Canadian Mint make the most of their data. When he’s not working on A/B testing, recommendation engines, or exploratory data analysis at the Mint, he can be found participating in Tidy Tuesday or thinking up cool datasets to explore. And when he’s not doing that, he’s probably cycling around Toronto or doting on his two cats, Tom Tom and Ruby

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Package Development Alex Cookson

Athanasia M. Mowinckel | Make a package - Make some friends | RStudio

In 2017, I had never exposed my code to anyone other than a select few before, and I was terrified. I had some functions made from a colleagues script that I thought might be useful for others, and dared myself to make a package and push it to github.

In stead of the dreaded ridiculing of poor code and development, people embraced the package and helped us make it better. Within just a couple of days, pull requests came from others to help us improve the code, implement tests, and improve documentation. I learned so much just by looking through the PRs and seeing how others worked.

Rather than make me shy off development, the R neuro community’s positive feedback has helped me find a new interest and joy in developing tools.

About Athanasia: Athanasia M. Mowinckel is a staff scientist at the Center for Lifespan Changes in Brain and Cognition, at the University of Oslo. She has a background on cognitive psychology, and uses R for almost everything. She goes by the nickname “Mo” (closer to ‘Mou’ than ‘Moe’), and is a member of the R-Ladies Global team

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Athanasia M. Mowinckel Package Development

Carson Sievert | Custom theming in Shiny & R Markdown with bslib & thematic | RStudio

Custom theming in Shiny and R Markdown often requires writing styling rules in both CSS and R. In particular, styles for HTML content (e.g., actionButton(), tabsetPanel(), titlePanel(), etc) derive from Bootstrap CSS, so customization is traditionally done by overwriting that CSS, which is difficult to do 100% correctly. The {bslib} package helps solve this problem by making it easy to customize (any version of) Bootstrap CSS defaults from R. However, this only solves part of the problem since CSS doesn’t necessarily effect output(s) rendered by R, such as plotOutput(). The {thematic} package helps solve this problem by providing auto theming of plotOutput()s (based on CSS) as well as a simple interface for styling any R graphic for any output format.

About Carson: Carson is a software engineer at RStudio working on R packages such as shiny, shinymeta, and plotly. His book “Interactive data visualization with R, plotly, and shiny”, published by CRC Press, is also freely available online at plotly-r.com

Carson Sievert

bslib rstudio Shiny shinymeta thematic Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Carson Sievert Bslib Thematic

Chelsea Parlett-Pelleriti | Hands-on ways to remotely teach data science are invaluable | RStudio

Full title: With more learning occurring virtually or in hybrid mode, hands-on ways to remotely teach data science are invaluable

With more learning occurring virtually or in hybrid mode, hands-on ways to remotely teach DS are invaluable. Guided simulation exercises in R allow learners to explore concepts deeply, on their own time, and with others. They can also experiment with the simulations, try out edge cases, and challenge their assumptions, leading to more fruitful discussions. The comparison between coefficient estimates in regular, LASSO, and RIDGE regression, or how PCA performs when data are related are great examples of concepts where guided simulations can encourage learners to build intuitive knowledge. This talk explores how to use simulation exercises in R to help learners explore DS concepts and provides examples.

About Chelsea: Chelsea Parlett-Pelleriti is a PhD Candidate and full-time instructional faculty teaching Data Science at Chapman University. Her research centers around how we can use statistics and machine learning to improve the way we analyze behavioral data. In her free time, you can find Chelsea on Twitter making stats memes or statsTikTok’s. She also writes about statistics, machine learning, and using R for various blogs

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Chelsea Parlett-Pelleriti

Colin Rundel | parsermd - parsing R Markdown for fun and profit | RStudio

parsermd is a new R package for parsing and programmatically interacting with R Markdown (Rmd) documents. This package implements a formal grammar for Rmd documents in C++ using Boost’s Spirit X3 library and provides additional user facing functions for the resulting abstract syntax tree. In this talk we will provide background on the structure and grammar of Rmd documents as well as discuss the ways in which the parsing of these documents enables a variety of automatable tasks. Specifically, we will focus on demonstrating how these tools can be used to provide automated feedback on student submissions in a statisical programming course.

About Colin: Colin is a lecturer in Statistics and Data Science at the University of Edinburgh. He has been teaching statistics and data science courses, with a focus on computing and spatial modeling, for the last 8 years

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Colin Rundel

Danielle Oberdier | How Content Makes the Data Go ‘Round | RStudio

What makes a successful data science community thrive across industries? A recent Aflac WorkForces Report showed that professionals who are engaged in a community within their industry are 70% more likely to be satisfied with their work. I believe anyone can and should create content about data. In this talk, I will direct your attention towards 1) the ways that content creation can lead to heightened data science opportunities 2) how to know which type/s of content mediums (podcasts, blogs, video) are right for you 3) how to leverage social media and networking connections to make your content reach the right audiences. I hope to inspire listeners to create their own content and online brands as resources for fellow R community members

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Danielle Oberdier

Danielle Smalls-Perkins & Dorris Scott | Reflections on creating the Mi-R community | RStudio

Full title: Your R is My R too: Reflections on creating the Mi-R community

While the R community has made strides in increasing the representation and participation for women and users from underrepresented regions, there are still members of the R community that have expressed desires for a more inclusive space in addition to these strides. In addition, there are unique challenges that underrepresented R users experience in their respective workspaces or academic environments. In late February of 2020, Danielle Smalls-Perkins and Dorris Scott created Mi-R (Minorities in R) as a result of their various experiences both in and outside the R community. The purpose of this talk is to reflect on the challenges, highlights, and future directions of the first six months since the creation of Mi-R.

About Danielle: Danielle Smalls-Perkins co-founded MiR Community with the hope that the R community would continue to encourage the inclusion and recognition of contributions made from R users of diverse backgrounds. She loves to use R for understanding and storytelling. Danielle currently works as a Senior Strategist in Google’s Trust and Safety Team. She advocates for model fairness, interpretability, and reducing harmful outcomes of algorithmic decision-making on vulnerable populations.

About Dorris: Dorris Scott is the GIS Librarian and Social Science Data Curator at Washington University – St. Louis, where she provides consultation on projects that use geospatial data along with providing training in various GIS software, programming applications of geospatial data, and data management. She also serves as a liaison between Washington University Libraries and social science departments assisting faculty with their data needs such as data management and data curation. Dorris received her PhD in Geography from the University of Georgia, with a specialization in GIS applications for public health

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Dorris Scott Danielle Smalls-Perkins

Dean Marchiori | A retrospective on a year of commercial data science projects in R | RStudio

Full title: How reproducible am I? A retrospective on a year of commercial data science projects in R

Reproducibility is a critical aspect in science to enable trust & communication. In R, many tools exist to bring in the best practices of reproducibility into the hands of data scientists. However, outside of a research setting, how does reproducibility hold up in commercial data science projects? In this talk I take an honest retrospective of my own commercial R projects in the last year. I look at the various types of analyses completed, and which workflows were selected and why. Through this process we can learn how workflow choices may help in the short term but hinder in the long term. More importantly what can be done strike the balance between progress and perfection when doing data science in the wild?

About Dean: Dean Marchiori is a Statistician based in Sydney, Australia. He currently works with Endeavour Energy as a Senior Data Scientist modelling bushfire and vegetation risk on the electricity network. Dean’s career started in finance as an equities trader before moving into advanced analytics, where he has worked with some of Australia’s largest organisations. His professional interests are in geospatial analysis, time series modelling and the R programming language. In 2019, Dean was named one of the top 10 analytics leaders in Australia by IAPA. He is also recognised as an Accredited Statistician with the Statistical Society of Australia. He holds a Bachelor of Science in Mathematics (awarded with University Medal) and a Masters degree in Applied Finance. Outside of work Dean enjoys bodysurfing, running and spending time with his wife and two boys

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Dean Marchiori

Garrick Aden-Buie | xaringan Playground: Using xaringan to learn web development | RStudio

xaringan is a quirky package that extends R Markdown to create beautiful web-based HTML slides. Some of xaringan’s quirks come from the JavaScript library it uses, remarkjs, and some of it from the unusual naming scheme xaringan uses for its functions. But under this quirky exterior lies a powerful tool for learning and practicing web development, especially when combined with infinite_moon_reader() for immediate feedback. In this talk I’ll cover some basic web concepts that illustrate how fun and rewarding it can to learn HTML, CSS and JavaScript while building awesome slides in R Markdown.

About Garrick: The workshop will be led by friend of RStudio, Garrick Aden-Buie, R developer and educator and Data Scientist at Moffitt Cancer Center, where he uses Shiny to enable and accelerate cancer research

Garrick Aden-Buie

rstudio Shiny Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Garrick Aden-Buie Xaringan HTML

John Helveston | Using formr to create R-powered surveys with individualized feedback | RStudio

This talk demonstrates how the formr study framework extends the power and flexibility of R to surveys. Using R and RMarkdown code, researchers and teachers can use the formr platform to generate both simple surveys and complex studies with individualized feedback. The platform is built on a web-based application programming interface for R via OpenCPU, enabling complex features such as automated email and text message reminders, adaptive testing, graphical and interactive feedback, and integration with external data sources. In this talk, I introduce some of the formr basics and showcase two examples of how I have used it, including making conjoint surveys with randomized images and timed, randomized quizzes for my students.

About John: John Paul Helveston is an Assistant Professor in the Engineering Management and Systems Engineering Department at the George Washington University. He studies technological change, with a particular interest in accelerating the transition to environmentally sustainable and energy-saving technologies. His research centers around how consumer preferences, market dynamics, and policy affect the emergence of critical technologies, such as electric vehicles and solar energy. He is an expert on China’s rapidly emerging electric vehicle industry as well as the critical relationship between the US and China in developing and mass producing low carbon energy technologies. He applies an interdisciplinary approach to research, with expertise in discrete choice modeling and conjoint analysis as well as interview-based case studies. He has conducted extensive fieldwork in China, collaborating with colleagues at Tsinghua university, Beijing Normal University, and China’s State Information Center on past projects. He is a fluent speaker of Mandarin Chinese and also an award-winning swing dancer. John holds a Ph.D. and M.S. in Engineering and Public Policy from Carnegie Mellon University and a B.S. in Engineering Science and Mechanics from Virginia Tech

rmarkdown rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate John Helveston Formr

Lucy D’Agostino McGowan | Designing Randomized Studies using Shiny | RStudio

This talk will walk through building a self-contained randomized study using Shiny and learnr modules. We will discuss building informed consent, the randomization process, demographic surveys, and R-based studies into a single online framework to allow users to seamlessly enroll and participate in randomized studies via a single URL. The talk will include both practical recommendations as well as technical code snippets.

About Lucy: Lucy D’Agostino McGowan is an assistant professor of statistics at Wake Forest University, where she leads the WFU Data Science Lab. Her research focuses on causal inference, human-data interaction, and statistical communication

learnr rstudio Shiny Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Lucy D'Agostino McGowan

Michael Chirico | Making .pot-ery with R: Translations in R Packages | RStudio

The R community is globally distributed and R itself is available with messages in 14 languages. Adding translations for non-native English-speaking users of your package can ease their experience and empower them to build better things with less frustration (though please note that ““object of type ‘closure’ is not subsettable”” is equally inscrutable in all human languages).

In this talk, I will cover translations in R packages – how to implement them, why to do so, and how to maintain them. This will summarize and extend learnings based on our experience adding Mandarin translations to data.table and culminating in the potools package.

About Michael: Michael Chirico is a data scientist working on compute memory efficiency at Google. Before that he worked at Grab in Singapore and earlier got his PhD in Economics at the University of Pennsylvania. He is passionate about making tools to empower others who work with data (most of this energy is directed towards data.tab ≤) and loves learning languages (at various middling levels of proficiency in Japanese, Spanish, and Mandarin, with goals to learn Cantonese, Hokkien, Vietnamese and Bahasa)

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Michael Chirico

Nicholas Pylypiw | Racial Equity Dashboard: Unpacking Systemic Inequity | RStudio

At Cape Fear Collective, we’re redefining what a town square looks like in our community, serving as a place where all people, organizations, and ideas can come together to effect real, lasting, and systemic change. By merging cutting edge data science with an emphasis on equity and the lived experience of our most marginalized communities, Cape Fear Collective supports Southeastern North Carolina’s front line organizations in combating poverty, racism, poor health and education outcomes, and socio-economic disparities. This talk is about how we bring that model to life through our Racial Equity Dashboard, from data sourcing, to modeling and, ultimately, action.

About Nick: Nick is the Director of Data Science at Cape Fear Collective, a non profit which supports Southeastern North Carolina’s front line organizations in combating poverty, racism, poor health and education outcomes, and socio-economic disparities. Prior to CFC, he honed his data science and consulting skills in the marketing analytics space, transforming the way Fortune 500+ companies (Lowe’s Southwest Airlines, P&G, and many others) think about their customer strategy and value proposition. He lives in Raleigh, North Carolina

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Nicholas Pylypiw Dashboard

Pamela E. Pairo | An easy and friendly way to build your multilingual website | RStudio

Having a personal website is a great way to share our experiences with other people, that also allows us to improve our communication skills and expand our networking groups. Besides, if the website is multilingual, the scope will be extended considerably by facilitating the exchange of ideas. I will give the key steps, some tips, and important considerations to bear in mind when creating a multilingual website using Blogdown, Hugo, and Netlify. Although having a multilingual website demands more effort, R enables us to build a website easily and keep it updated. I aim to help and encourage others to build their website to promote exchange experiences among people from different native languages.

About Pamela: Pamela E. Pairo is a Ph.D. in Biological Sciences of the University of Buenos Aires, with expertise in community ecology. One of her research lines focuses on analyzing the impact of human activities on the diversity and composition of biological communities with particular interest in arthropods. In addition, she is interested in studying the spatio-temporal patterns of dengue disease in Argentina. She also is a teaching assistant in statistics at the Argentine University of Enterprise (UADE)

blogdown rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Pamela E. Pairo

Richard Vogg | Examples of simulated datasets that bring value to a data-driven company | RStudio

Full title: How I became a Data Composer – examples of simulated datasets that bring value to a data-driven company

How can I get the buy-in from business partners to use more advanced techniques? What can I do to make a data project involving several teams more efficient? And how can I train analysts who do not (yet) have access to sensitive data? A good data composer is skilled at creating suitable data quickly and efficiently. R has many functions and packages that help with simulating independent variables and composing those in a meaningful way. In this talk, I will share how I started creating data and how this skill helped me with solving some of the issues described above. Showing a few examples – of small, medium-sized, and large data composition – I want to encourage attendees to simulate data and enrich their data skillset.

About Richard: Richard Vogg studied mathematics at TU Kaiserslautern, Germany, where he focused on statistics and obtained a Master’s degree. He worked as a Senior Business Analyst at Evalueserve in Chile for the last years, analyzing data for a major US bank. At the end of 2020, he moved back to Germany. Richard is a fan of applied statistics and storytelling with data. Outside of R, he enjoys playing the ukulele, trumpet, and didgeridoo

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Richard Vogg Simulated Data

Simon Couch | tidymodels/stacks: A Grammar for Stacked Ensemble Modeling | RStudio

Full title: tidymodels/stacks, Or, In Preparation for Pesto: A Grammar for Stacked Ensemble Modeling

Through a community survey conducted over the summer, the RStudio tidymodels team learned that users felt the #1 priority for future development in the tidymodels package ecosystem should be ensembling, a statistical modeling technique involving the synthesis of multiple learning algorithms to improve predictive performance. This December, we were delighted to announce the initial release of stacks, a package for tidymodels-aligned ensembling. A particularly statistically-involved pesto recipe will help us get a sense for how the package works and how it advances the tidymodels package ecosystem as a whole.

About Simon: Simon Couch is an R developer and statistics student at Reed College, where he is entering the final semester of his undergraduate degree. He co-authors and maintains R packages including broom, infer, and stacks, leads trainings and workshops as an RStudio-certified tidyverse trainer, and researches in algorithmic data privacy. He interned on the RStudio tidymodels team in summer 2020, and is currently applying to doctoral programs in statistics

Simon Couch

infer rstudio tidymodels tidyverse Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Simon Couch Stacks

Wolfram King | Lifelong Learning with R Weekly | RStudio

R Weekly is a weekly newsletter with many great R blogs post, tutorials, and other formats of resources.

R Weekly wants to keep track of these great things in the R community and make it more accessible to everyone.

This is a warm and welcoming place. The team welcomes everyone who wants to contribute to the R community.

In this talk I will cover these 6 topics:

How to use the R Weekly website
Why I created R Weekly
How to Contribute to R Weekly
How to release a new post
How to join the team
Learning from building the community

About Wolfram: Wolfram King is the founder of the R Weekly project. He is an active member of the R community and has several popular R open-source projects on GitHub

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Wolfram King

ZJ | Easy larger-than-RAM data manipulation with {disk.frame} | RStudio

Learn how to handle 100GBs of data with ease using {disk.frame} - the larger-than-RAM-data manipulation package.

R loads data in its entirety into RAM. However, RAM is a precious resource and often do run out. That’s why most R user would have run into the “cannot allocate vector of size xxB.” error at some point.

However, the need to handle larger-than-RAM data doesn’t go away just because RAM isn’t large enough. So many useRs turn to big data tools like Spark for the task. In this talk, I will make the case that {disk.frame} is sufficient and often preferable for manipulating larger-than-RAM data that fit on disk. I will show how you can apply familiar {dplyr}-verbs to manipulate larger-than-RAM data with {disk.frame}.

About ZJ: ZJ is a machine learning developer based in Melbourne, Australia. He regularly contributes to open source projects. He has more than 10 years of experience in banking before joining the tech sector. In his free time, he enjoys playing Go/Baduk/Weiqi

dplyr rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate ZJ Disk.frame

Kara Woo | Always look on the bright side of plots | RStudio

Everyone who creates visualizations in R is bound to make mistakes that prevent their plots from looking as they should. Sometimes, these mistakes create beautiful “accidental aRt”, though other times they’re just plain frustrating. Either way, however, there’s something to be learned. This talk will draw on years of watching both the ggplot2 issue tracker and the @accidental__aRt twitter account to highlight some common plot foibles and explain what they can teach us about how ggplot2 works.

About Kara: Kara Woo is a research scientist in data curation at Sage Bionetworks, where she builds tools to help researchers document and share their data. Kara is a core developer of ggplot2 and collects data visualizations gone beautifully wrong on a blog called accidental aRt

ggplot2 rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Kara Woo Accidental Art

Kate Hertweck | R training and documentation for different levels of expertise | RStudio

Full title: Making the jump from learning to applying: R training and documentation for different levels of expertise

How does someone make the leap from learning R to actively applying R in professional work? At what point (if ever!) do we get to call ourselves “experts in R?

This talk explores what differentiates novice, practitioner, and expert R programmers, and how transitions between these stages occur. I’ll discuss the type of support required for R users to move from one level of expertise to the next, and how different types of training and documentation can support R users at each level.

Understanding variable levels of education among R practitioners supports our own professional work, from collaborative coding to package development, and helps build a bigger, more inclusive R community.

About Kate: Kate Hertweck is the bioinformatics training manager at Fred Hutchinson Cancer Research Center, where they develop and teach courses on reproducible computational methods as a part of fredhutch.io. Kate’s graduate training at University of Missouri in genomic evolution of plants was followed by a postdoctoral fellowship at the National Evolutionary Synthesis Center (NESCent) at Duke University, where they fell in love with R and began working exclusively in computational biology. Kate then spent four years as an assistant professor teaching bioinformatics, genomics, and plant taxonomy before transitioning to biomedical research training. Kate has been involved in The Carpentries, a non-profit organization that teaches reproducible computational methods, since 2014, serving as a leader in community governance since 2016. When not being an overenthusiastic instructor, Kate likes to spend their time doing fiber arts (knitting, crochet) and enjoying all things science fiction

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Kate Hertweck

Malcolm Barrett | You’re Already Ready: Zen and the Art of R Package Development | RStudio

R packages make it easier to write robust, reproducible code, and modern tools in R development like usethis make it easy to work with packages. When you write R packages, you also unlock a whole ecosystem of tools that will make it easier to test, document, and share your code. Despite these benefits, many believe package development is too advanced for them or that they have nothing to offer. A fundamental belief in Zen is that you are already complete, that you already have everything you need. I’ll talk about why your project is already an R package, why you’re already an R package developer, and why you already have the skills to walk the path of development.

About Malcolm: Malcolm Barrett is Clinical Research Data Scientist at Teladoc Health, an epidemiologist, and an R developer. He is also an organizer for the Los Angeles R Users Group. Malcolm is the author of several R packages, including ggdag and precisely. Previousy, he was an intern at RStudio and spent two years of service in AmeriCorps. In 2013 and 2014, while serving in AmeriCorps, Malcolm lived in the Zen Center of New York City, where he is still a student

rstudio usethis Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Malcolm Barrett Package Development

Marcus Adams | Putting a GMP Shiny App into Production | RStudio

Full title: Not The App We Deserve. The App We Need: Putting a GMP Shiny App into Production

In February 2020, the Digital Proactive Process Analytics (DPPA) group within Merck’s manufacturing division officially launched a Shiny app to automate the creation of Continuous Process Verification (CPV) reports into production. That’s right – the almighty, mysterious, coveted production. From a technical perspective, the app is nothing particularly special (except other than getting LaTeX successfully installed to support the use of R Markdown). Users enter a few parameters and out pops a PDF with a series statistical analyses of a product’s quality testing data. The R blogosphere is filled with examples of similar Shiny apps.

What mattered was the app was in production, and furthermore it was approved for GMP use. This meant these reports could be submitted to the FDA and other regulatory agencies. This meant the data could be used to support product release decisions. This meant Merck’s engineers were about to save thousands of hours per year in compiling data, generating charts, and calculating summary statistics. This was the app manufacturing sites needed.

Most of the work in getting this app into production was not implementing the top-level features. Sorry, no discussion of fancy statistical process control methods here. Instead this talk will discuss some of the many things the development team (none of which came from a software development background) needed to learn in order to create a robust, secure, and maintainable production application.

About Marcus: Marcus Adams is an Associate Director, Engineering at the biopharmaceutical company Merck. He earned his BEng and MS in Chemical Engineering from the University of Delaware and Villanova University, respectively. His more than decade of experience at Merck spans the bio-pharmaceutical spectrum and includes experience in pre-clinical PK/PD modeling, product commercialization, in-line technology support, procurement, and vaccine distribution technology development. Currently, he works as a part of the Digital Proactive Process Analytics team, leveraging Merck’s Big Data Platform in the development of manufacturing information data models, report automation tools, and integrated-systems analysis applications. His professional interests include effective digital visualization, reproducible research/analysis, and convincing his coworkers of the diverse, flourishing world beyond Microsoft Excel

rstudio Shiny Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Marcus Adams GMP

Max Kuhn | What’s new in tidymodels? | RStudio

tidymodels is a collection of packages for modeling using a tidy interface. In the last year there have been numerous improvements and extensions. This talk gives an overview of additional tuning methods, new extension packages for models and recipes, and other features.

About Max: Max Kuhn is a software engineer at RStudio. He is currently working on improving R’s modeling capabilities. He was a Director of Nonclinical Statistics at Pfizer Global R&D in Connecticut. He was applying models in the pharmaceutical and diagnostic industries for over 18 years. Max has a Ph.D. in Biostatistics. Max is the author of numerous R packages for techniques in machine learning and reproducible research and is an Associate Editor for the Journal of Statistical Software. He, and Kjell Johnson, wrote the book Applied Predictive Modeling, which won the Ziegel award from the American Statistical Association, which recognizes the best book reviewed in Technometrics in 2015. Their latest book, Feature Engineering and Selection, was published in 2019

Max Kuhn

rstudio tidymodels Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Max Kuhn

Maya Gans | Starting an R Book Club: Cooking Up Friendships in Isolation | RStudio

Amidst a global pandemic there’s been one consistency in my life: every Tuesday a group of friends meet to discuss Hadley Wickham’s Advanced R. I crowdsourced interest using the R4DS Slack and the results were magical: a group of incredibly curious and generous people motivated to learn and teach one another emerged. The meetings evolved from a group of strangers giving timid presentations to a safe space where we share and improve upon personal applications. The 1 club has grown to 3 regional cohorts, and became a model for discussing other books too. This talk will go over the structure of our meetings in hopes of empowering others to start their own book clubs, showcasing a different way people can create and engage in communities.

About Maya: I am a mycologist turned data scientist. I love statistics, data visualization, and all things JavaScript. I am currently an intern at RStudio designing a visual block-based programming language. I create music-related infographics for JamBase.com. When I’m not coding, I’m climbing tall mountains

Hadley Wickham

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Maya Gans Book Club Advanced R

Michael Chow | Bringing the Tidyverse to Python with Siuba | RStudio

Last January I left my job to spend a year developing siuba, a python port of dplyr. At its core, this decision was driven by a decade of watching python and R users produce similar analyses, but in very different ways.

In this talk, I’ll discuss 3 ways siuba enables R users to transfer their hard-earned programming knowledge to python: (1) leveraging the power of dplyr syntax, (2) options to generate SQL code, and (3) working with the plotnine plotting library.

Looking back, I’ll consider two critical pieces that have helped me develop siuba: using it to livecode TidyTuesday analyses, and building an interactive tutorial for absolute beginners.

About Michael: Michael Chow is a data scientist and learning researcher. He serves as a co-director at Code for Philly. In past lives, he worked on adaptive assessment tools in ed tech, and received a PhD in cognitive psychology from Princeton University

Michael Chow

dplyr plotnine rstudio tidyverse Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Siuba Michael Chow SQL

Mine Çetinkaya-Rundel | Feedback at scale | RStudio

As enrollments in statistics and data science courses grow and as these courses become more computational, educators are faced with an interesting challenge – providing timely and meaningful feedback, particularly with online delivery of courses. The simplest solution is using assignments that are easier to auto-grade, e.g. multiple-choice questions, simplistic coding exercises, but it is impossible to assess mastery of the entire data science cycle using only these types of exercises. In this talk I will discuss writing effective learnr exercises, providing useful and motivating feedback with gradethis, distributing them at scale online and as an R package, and collecting student data for formative assessment with learnrhash.

About Mine: Mine Çetinkaya-Rundel is Professional Educator and Data Scientist at RStudio as well as Senior Lecturer in the School of Mathematics at University of Edinburgh (on leave from Department of Statistical Science at Duke University). Mine’s work focuses on innovation in statistics and data science pedagogy, with an emphasis on computing, reproducible research, student-centered learning, and open-source education as well as pedagogical approaches for enhancing retention of women and under-represented minorities in STEM. Mine works on integrating computation into the undergraduate statistics curriculum, using reproducible research methodologies and analysis of real and complex datasets. She also organizes ASA DataFest and works on the OpenIntro project. She is also the creator and maintainer of datasciencebox.org and she teaches the popular Statistics with R MOOC on Coursera

Mine Çetinkaya-Rundel

gradethis learnr rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Mine Çetinkaya-Rundel Feedback Learnr Learnrhash

Neal Richardson | Bigger Data With Ease Using Apache Arrow | RStudio

The Apache Arrow project enables data scientists using R, Python, and other languages to work with large datasets efficiently and with interactive speed. Arrow is so fast at some workflows that it seems to defy reality–or at least the limits of R’s capabilities. This talk examines the unique characteristics of the Arrow project that enable it to redefine what is possible in R. The talk also highlights some of the latest developments in the arrow R package, including how you can query and manipulate multi-file datasets, and it presents strategies for speeding up workflows by up to 100x.

About Neal: Currently Director of Engineering at Ursa Labs / RStudio. Previously led product and engineering at Crunch.io. Ph.D. in Political Science from the University of California, Berkeley

Neal Richardson

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Forcats Rstats Open Source OSS Reticulate Neal Richardson Apache Apache Arrow Big Data

Rika Gorn | From Zero to Hero: Best practices for setting up Rstudio Team in the Cloud | RStudio

Learn best practices for setting up the entire Rstudio team infrastructure - Server Pro, Connect, Package Manager from the perspective of a data scientist and for a data science audience - especially those who have never worked with servers, AWS, or bash. This talk will also be applicable to data scientists looking to start on an engineering project outside of Rstudio as well.

I started out as a complete novice, & throughout my learning experience I noticed a distinct lack of resources for non-engineers. This talk will focus on best practices for AWS architecture and cloud formation, key security issues such as SSL and https, server configurations, deployment errors, and most importantly resources that are understandable for data scientists just getting into the data engineering or devops space.

About Rika: Rika Gorn is the Manager of Business Intelligence at Spring Health - a mental healthcare tech start-up that provides comprehensive mental healthcare benefits. Previously, she worked on quality assurance for a mobile mental health team at Coordinated Behavioral Care, data analytics at Covenant House International, strategic management and evaluation at TCC Group, and program analysis at the Vera Institute of Justice. Rika received her Bachelors in Political Science from Hunter College and her Masters in Public Administration at the NYU Wagner School of Public Service. Rika is also a proud board member of R-Ladies NYC

RStudio Team

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Devops Rika Gorn

Riva Quiroga | Learning to program in R with a “communicative approach” | RStudio

Full title: How to do things with words: learning to program in R with a “communicative approach”

Textbooks for learning a new language always start the same: you learn to say hello, to introduce yourself, and some simple and useful sentences to communicate with others. In language teaching, this is called a “communicative approach”, and is based on the idea that learning a language successfully comes through having to communicate real meaning to real people. This is what I expected to find when I first tried to learn R seven years ago. Sadly, I got stuck in resources that started with definitions of abstract concepts and no real examples of how to say things with data. In this talk I will discuss the benefits of adopting a communicative approach and how to implement it when teaching/learning R, writing documentation, and writing code that will be read by other human beings.

About Riva: I like to organize R related things, like meetups (RLadies Santiago & RLadies Valparaíso), conferences (LatinR, satRday Santiago), book translations (R4DS in Spanish), and data projects (#DatosDeMiercoles). I am an editor at The Programming Historian, and I am currently pursuing a PhD in Linguistics

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Riva Quiroga

Sean Lopp | R & Python: Going Steady | RStudio

While there has been a lot of excitement about the R and Python love story, there are still misconceptions that individuals, teams, or organizations must pick between R or Python. This talk will explain why this false choice exists, debunk the myths that cause teams to be stuck with only one tool, and clarify how data scientists can use both languages to be more effective. We will explore this love story’s blossoming relationship by looking at updates to RStudio’s packages and products that make it easier to develop and collaborate in R and Python. This talk is for individuals who want to uncover the benefits of multilingual data science, IT professionals who are skeptical their life can get better by supporting more languages, and data science managers interested in enabling their teams instead of forcing their data superheros to be subservient to particular tools.

About Sean: Sean has a degree in mathematics and statistics and worked as an analyst at the National Renewable Energy Lab before making the switch to customer success at RStudio. In his spare time he skis and mountain bikes and is a proud Colorado native

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Sean Lopp

Shirbi Ish-Shalom | Using R to Up Your Experimentation Game | RStudio

Have you ever cut an A/B test short? Maybe because of traffic constraints, your antsy boss, or early successful results. In reality, cutting your test short can be catastrophic, making your business decision no better than a coin flip. Learn some R-driven tips & tricks to get meaningful results quickly with a statistically rigorous methodology called sequential testing, an A/B testing enhancement my team employs at Intuit.

Key Takeaways:

What is sequential testing and how to use it.
How to learn (and fail!) quickly by taking big metric swings
How I used R to share my learnings & make them useful for anyone (even non-data scientists!) at my company

About Shirbi: Shirbi Ish-Shalom is a human person

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Shirbi Ish-Shalom

Sophie Beiers | Trial and Error in Data Viz at the ACLU | RStudio

Visualizing data the “right” way requires many considerations — the topic, the quality of your data, your audience, your time frame, and the various channels of (sometimes conflicting) feedback you received. In this presentation, I’ll introduce some reflections on these considerations and ways I’ve incorporated feedback (or not) into my work as Data Journalist at the ACLU. Lastly, I’ll present some of the sillier trials and errors I’ve made that were arguably necessary to my process in creating effective data visualizations using R.

About Sophie: Sophie Beiers works on the ACLU’s Analytics team as a Data Journalist where she analyzes and visualizes data for their lawyers’ legal arguments and for external advocacy pieces. Prior to her time at the ACLU, she received her master’s degree in Quantitative Methods in Social Sciences at Columbia University where she also TA’d at the Lede Program for Data Journalism. Before NYC, she kicked off her career in analytics in San Francisco at the education nonprofit “YouthTruth.” Sophie is a Bay Area native but currently lives in Portland, OR and enjoys running, hiking, and making pottery in her free time

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Sophie Beiers

Vicki Boykis | Your public garden | RStudio

Vicki will discuss how that as people who can write code and analyze data, we have a lot of input and power over what our digital and work worlds looks like, and therefore can act as agents of change and repair.

About Vicki: Vicki Boykis is a machine learning engineer at Automattic, the company behind Wordpress.com. She works mostly in Python, R, Spark, and SQL, and really enjoys building end-to-end data products. Outside of work she publishes the Normcore Tech newsletter (https://vicki.substack.com ) and blogs at https://veekaybee.github.io/ . In her “spare time”, she blogs, reads, and writes terrible joke tweets about data

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Vicki Boykis

Yanina Bellini Saibene | On programming, teaching, and building tutorials with learnr | RStudio

Full title: On programming, teaching, and building interactive tutorials with learnr

Teaching R is part of my activities as a community organizer, an RStudio Certified Instructor, a conference chair, and a researcher. Since 2019, I use the learnr package to generate interactive tutorials to teach R synchronously and asynchronously. The addition of the Tutorials panel in RStudio IDE and the need for virtual classes made the use of this package even more interesting. In this talk, I will tell you how to generate interactive tutorials, how to add pedagogical tools to them, what other packages you can use with {learnr} and show multilingual examples.

About Yanina: Yanina Bellini Saibene is a researcher at INTA (National Institute of Agricultural Technology) dedicated to applying data science to the agricultural sector and a professor in several regional specializations about Digital Agriculture and Data Analysis. Yanina is formally trained as a Licenciate in Information System with a Master degree in Data Mining and Knowledge Management. She is an active member of the R Community as an R-Ladies organizer and part of the R-Ladies Global Team. She also is a co-founder and co-chair of LatinR and part of the organizing team of useR!2020 and user!2021 (co-chair). She is also co-founder of MetaDocencia, a open, free, volunteer-lead, not-for-profit educational organization and part of the teams that translate educational and technical material to Spanish

learnr rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Yanina Bellina Saibene Learnr

Aaron Jacobs | Introducing xrprof: A New Way to Profile R | RStudio (2021)

Tracking down performance issues in R code usually means using R’s built-in Rprof() profiler or one of the packages built around it. But the changing nature of the R community (towards more deployed applications) makes local profiling workflows frustrating, which is why I have written a new profiler: xrprof.

xprof is compatible with existing R tools, but unlike them it can be used to profile R code that is already running – in fact, it is designed to be safe to point at R code running ““in production””. xrprof also works seamlessly when R is run inside Docker, and can even be run in complex environments like Kubernetes clusters.

Taking inspiration from the {jointprof} package, xrprof can also show function calls at the C/C++ level alongside those from R. This can be immensely useful for diagnosing problems in packages that make heavy use of compiled code.

About Aaron: Aaron Jacobs is a Senior Data Scientist on the R&D team at Crescendo, a technology company in the sports betting space with a large internal R ecosystem. Prior to Crescendo he worked in Canadian public policy research. Aaron has a strong interest in the engineering side of data science and the emerging use of R “in production”. He is the author of several CRAN and GitHub packages, as well as xrprof – a new R profiling tool

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate

Allison Horst | R Art Lessons | RStudio

Art can be a welcoming bridge for learners and users to engage with and learn tools and skills in R. As RStudio’s first Artist-in-Residence, my goal has been to make the R landscape more welcoming for a broader community of users through engaging, didactic artwork. In this R, art, and heart-filled talk, I’ll share the motivation behind my R artwork and some lessons learned over the past year as Artist-in-Residence, including:

Learning to embrace cute and credible artwork Art to help students engage with, learn and remember R skills Art for community building and support I hope this talk inspires viewers to use, create and share more artwork, so that together we can make the R landscape feel even brighter.

About Allison: Allison Horst PhD teaches data analysis, statistics, and presentation skills at the Bren School of Environmental Science and Management (UC Santa Barbara). In addition to courses, she leads interdepartmental R/RStudio workshops for incoming graduate students, created and teaches an online R-refresher workshops for alumni, and is a co-founder of R-Ladies Santa Barbara. In 2018 she earned the student-selected Distinguished Teaching Award at the Bren School, and in 2019 was awarded the campus-wide Distinguished Teaching Award by the UCSB Academic Senate. For her graduate research, Allison studied toxicity of engineered nanoparticles in environmental microorganisms. She is also a landscape painter, illustrator and designer

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Allison Horst

Andrew Tran | The Opioid Files: Turning big pharmacy data over to the public | RStudio (2021)

Through a community survey conducted over the summer, the RStudio tidymodels team learned that users felt the #1 priority for future development in the tidymodels package ecosystem should be ensembling, a statistical modeling technique involving the synthesis of multiple learning algorithms to improve predictive performance. This December, we were delighted to announce the initial release of stacks, a package for tidymodels-aligned ensembling. A particularly statistically-involved pesto recipe will help us get a sense for how the package works and how it advances the tidymodels package ecosystem as a whole.

About Andrew: Andrew is a data reporter on the rapid-response investigative team at The Washington Post who has analyzed how covid-19 has disproportionately impacted certain communities, the spread of opioids across the country, and the rise of right-wing violence. He shared in winning the Pulitzer Prize for Investigative Reporting in 2018. He’s an advocate for open data and reproducibility in journalism

rstudio tidymodels Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Andrew Tran Wapo

Barret Schloerke | plumber + future: Async Web APIs | RStudio

plumber is an R package that allows users to create web APIs by decorating R functions using roxygen2-like comments. In the latest release, asynchronous code (using future or promises) may be inserted at any stage of a plumber route execution, enabling parallel processing using multiple workers. In this talk, I will go through how you can set up your own asynchronous plumber API to leverage your full computing potential.

About Barret: I specialize in Large Data Visualization where I utilize the interactivity of a web browser, the fast iterations of the R programming language, and large data storage capacity of Hadoop

Barret Schloerke

plumber roxygen2 rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Api

Emily Riederer | oRganization | RStudio

Many case studies demonstrate the benefits of organizations developing internal R packages. But how do you move your organization from individual internal packages to a coherent internal ecosystem?

This talk applies the jobs-to-be-done framework to consider the different roles that internal tools can play, from unblocking IT challenges to democratizing tribal knowledge. Beyond technical functionality, we will explore design principles and practice that make internal packages good teammates and consider how these deviate from open-source standards.

Finally, we will consider how to exploit the unique challenges and opportunities of developing within an organization to make packages that collaborate well – both with other packages and their human teammates.

About Emily: Emily Riederer is an Analytics Manager at Capital One. Emily leads a team that is focused on building internal analytical tools and data products, including a suite of R packages and Shiny apps, and cultivating an innersource community of practice for analysts. Emily is an active member of the R community. In 2019, she co-organized satRday Chicago and the Chicago R unconference. You can find her {projmgr} R package on CRAN and her blog at emilyriederer.netlify.com. Previously, Emily earned degrees in Mathematics and Statistics at UNC Chapel Hill and worked as a research assistant in emergency department simulation and optimization

rstudio Shiny Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate

Eric Cronstrom | How we made the switch: a case study on automating a complex report | RStudio

The Center for Charter Schools at Central Michigan University produces annual reports for about 60 schools. The reporting process used to be a cumbersome blend of many technologies. The Center used to use a blend of SQL, Excel, inDesign, and VBScript that would all culminate in a nice looking, branded report for each school. 2 years ago, staff turnover allowed the data team to rethink the process, having had experience in R from graduate work the team at the Center decided to go all in on R Studio and R Markdown for report production a mere 1 month before the reports were due to be published.

This talk will be a case study of how we as an organization adopted RStudio tools to streamline a cumbersome process to fantastic results.

About Eric: Eric is responsible for administering a wide range of day to day program functions associated with the performance and accountability of CMU partner schools. He ensures that the data the Center utilizes to evaluate school performance is accurate and stored within a sound data infrastructure. He also leverages his wide range of technical skills to lead the development and production of reports and respond to questions regarding school performance and demographic context. Prior to joining the Center, Eric was a database administrator for Central Michigan University Libraries. He has also served as a lecturer at Central Michigan University teaching courses in web design, database design and programming. He earned a master’s degree in information systems management and a bachelor’s degree in computer science from Central Michigan University. He is also pursuing an additional master’s degree in applied statistics at Central Michigan University

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate K12 Charter Schools Education

Grant Fleming | Fairness and Data Science: Failures, Factors, and Futures | RStudio

In recent years, numerous highly publicized failures in data science have made evident that biases or issues of fairness in training data can sneak into, and be magnified by, our models, leading to harmful, incorrect predictions being made once the models are deployed into the real world. But what actually constitutes an unfiar or biased model, and how can we diagnose and address these issues within our own work? In this talk, I will present a framework for better understanding how issues of fairness overlap with data science as well as how we can improve our modeling pipelines to make them more interpretable, reproducible, and fair to the groups that they are intended to serve. We will explore this new framework together through an analysis of ProPublica’s COMPAS recidivism dataset using the tidymodels, drake, and iml packages.

About Grant: Grant Fleming is a Data Scientist at Elder Research, co-author of the Wiley book Responsible Data Science (2021), and contributor to the O’Reilly book 97 Things About Ethics Everyone in Data Science Should Know. His professional focus is on machine learning for social science applications, model explainability, and building tools for reproducible data science. Previously, Grant was a research contractor for USAID

rstudio tidymodels Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Ethics Propublica Tech

Hadley Wickham | Maintaining the house the tidyverse built | RStudio

Hadley will talk about how the tidyverse has evolved since its creation (just five years ago!). You’ll learn about our greatest successes, learn from our biggest failures, and get some hints of what’s coming down the pipeline for the future.

About Hadley: Hadley Wickham is the Chief Scientist at RStudio, a member of the R Foundation, and Adjunct Professor at Stanford University and the University of Auckland. He builds tools (both computational and cognitive) to make data science easier, faster, and more fun. You may be familiar with his packages for data science (the tidyverse: including ggplot2, dplyr, tidyr, purrr, and readr) and principled software development (roxygen2, testthat, devtools, pkgdown). Much of the material for the course is drawn from two of his existing books, Advanced R and R Packages, but the course also includes a lot of new material that will eventually become a book called “Tidy tools”

Hadley Wickham

devtools dplyr ggplot2 pkgdown purrr readr roxygen2 rstudio testthat tidyr tidyverse Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Hadley Wickham

Irene Steves | The dynamic duo: SQL & R | RStudio

There’s a point in every data wranglers’ career in which their full dataset can no longer fit into just CSV files, and the journey to database-world begins. I reached this point about two years ago, when I transitioned from ecological research to the world of eCommerce fraud prevention. My calls to read_csv became scarcer as I came to rely more and more on databases. In this talk, I’ll demonstrate how I use R and SQL to access database tables, and how I incorporate both into my daily workflow, aided by features in RStudio IDE. I’ll also discuss our company’s “riskiconn” package for handling database connections and queries, which includes customizations to simplify day-to-day data querying.

About Irene: Irene holds an M.Sc. in Ecology and a B.A. in Integrative Biology, through which she first discovered R and data science. Her interest in data led her to the Arctic Data Center at the University of California Santa Barbara, a summer internship at RStudio, and ultimately to the Research & Data Science department at Riskified, where she now explores the complex patterns of fraud in eCommerce. In her free time, she studies Hebrew through podcasts and dubbed kids’ movies

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate SQL EDA

Javier Luraschi | Using pins with Python and JavaScript | RStudio

Last year, pins got released as a brand new R package to pin, discover and cache remote resources for R users. This package has matured to support many use cases; from caching remote URLs, and easily sharing datasets with other R users, to building automated pipelines.

However, in order to truly collaborate in multi-disciplinary data-driven teams, one needs to consider how to collaborate beyond R. How can we share resources with designers and machine learning experts who happen to use different programming languages like Python and JavaScript?

This talk will introduce the pinsjs project, a cross-language community project which has the goal of bringing pins to the broader open source community to enable rich workflows across larger data-driven teams.

About Javier: Javier is the author of “Mastering Spark with R”, pins, sparklyr, mlflow and torch. He holds a double degree in Math and Software Engineer and decades of industry experience with a focus on data analysis. Javier is currently working on a project of his own; and previously worked in RStudio, Microsoft Research and SAP

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Javier Luraschi Javascript Pins

Jeroen Ooms | Monitoring health and impact of open-source projects | RStudio

At rOpenSci, we have come to realize that in order to help researchers get the most out of R, we need better tooling to monitor the quality, health, and impact of R packages. This applies both to our internal projects, as well as other packages in the R ecosystem. But what exactly makes a good R package?

In this talk we discuss various aspects of open-source software that are not always immediately obvious, and that you may want to consider when depending on an R package. We identify several categories of indicators you could look for, ranging from the role in the dependency network, to expectations around maintenance and participation.

Finally we introduce an ambitious new rOpenSci project called R-universe: an open platform, where we will experiment with showing metrics and other background information about packages, that may reveal something about the health and the impact of the project, and also facilitate discovery of other software.

About Jeroen: Postdoc hacker for @ropensci at UC Berkeley

Jeroen Ooms

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Jeroen Ooms Ropensci

John Burn-Murdoch | Reporting on and visualising the pandemic | RStudio

John will discuss the lessons he’s learned reporting on and visualising the pandemic, including the world of difference between making charts for a technical audience and making charts for a mass audience. You’ll learn from his experience navigating the highly personal and political context within which people consume and evaluate graphics and data, and how that can help us better design and communicate with visualisations down the pipeline for the future.

About John: John Burn-Murdoch is the Financial Times’ senior data visualisation journalist, and creator of the FT’s coronavirus trajectory tracker charts. He has been leading the FT’s data-driven coverage of the pandemic, exploring its impacts on health, the economy and wider society. When pandemics are not happening, he also uses data and graphics to tell stories on topics including politics, economics, climate change and sport, and is a visiting lecturer at the London School of Economics

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Forcats Rstats Open Source OSS Reticulate John Burn-Murdoch FT Financial Times Covid Covid-19 Pandemic Journalism

JooYoung Seo | Accessible Data Science Beyond Visual Models | RStudio

Full title: Accessible Data Science Beyond Visual Models: Non-Visual Interactions with R and RStudio Packages

Data science is full of vision-dominant practices, and most data scientists rely heavily on visual models.

However, data science itself should require insight and computational thinking beyond what is just seen by eyes.

JooYoung Seo, who is a blind data scientist and who was working for RStudio’s accessibility projects over the summer 2020, will talk about his experience with some non-visual techniques to interact with data.

If you would like to know more about various ways of making data science accessible via R, and new accessibility features introduced in RStudio IDE and Shiny, his demonstration without sight will be thought-provoking.

About JooYoung: JooYoung Seo is a Ph.D candidate in the Learning, Design, and Technology program at the Pennsylvania State University, and internationally certified accessibility professional whose research and development focuses on accessible computing for all. As an RStudio’s double-certified data science instructor (i.e., Tidyverse + Shiny), who is blind, he is committed to making data science ecosystem more accessible to people with and without dis/abilities using R. To this end, he has been actively contributing to R open-source projects including Shiny, RMarkdown, bookdown, and distill for accessibility, and interned on the RStudio IDE and Shiny team as an accessibility engineer in summer 2020

Shiny Team

bookdown rmarkdown rstudio Shiny tidyverse Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate JooYoung Seo Accessibility

Heather & Jacqueline Nolis | Push straight to prod: API development with R and Tensorflow | RStudio

Talk from rstudio::conf(2019)

When tasked with creating the first customer-facing machine learning model at T-Mobile, we were faced with a conundrum. We had been told time and time again to deploy machine learning models in production you had to use Python, but our very best data scientists were fluent in building neural networks in R with Keras and TensorFlow. Determined to avoid double work, we decided to use R in production for our machine learning models. After months of work, wrangling our containers to meet cloud security compliance, and conforming to DevOps standards, we succeeded in creating a containerized API solution using the keras and plumber R packages and Docker. Today R is actively powering tools that our customers directly interact with and we have open sourced our methods. In this talk, we’ll walk through how to deploy R models as container-based APIs, the struggles and triumphs we’ve had using R in production, and how you can design your teams to optimize for this sort of innovation.

About Heather Nolis: Heather Nolis is a founding member of the AI @ T-Mobile team, focusing the conversion of cutting-edge analyses to real-time, scalable data-driven products. She began her career in neuroscience but once realized how heavily that field relied on software built by other people, she pivoted - deciding to make software herself. You can find her @heatherklus on Twitter, where she speaks about diversity in technology, the ethical implications of data, and cats.

About Jacqueline Nolis: Dr. Jacqueline Nolis is a co-founder of Nolis, LLC, a data science consulting firm. She has over a decade of experience using data to help companies including DSW, Union Bank, Microsoft, and Airbnb. She has a PhD from Arizona State University where her research focused on electric vehicle route optimization. For fun she likes to use machine learning for humor

plumber rstudio tensorflow Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate Tensorflow Tmobile

JD Long | Empathy in action: Building community of practice for analytics in a global corp | RStudio

Talk from rstudio::conf(2019)

JD Long, Vice President of Risk Management & Data Philosophy at the global reinsurer Renaissance Re will share his experience with creating a “Community of Practice” for analytics inside of a global corporation. The theme of “empathy” will be recurring as he discusses how he worked to create a supportive learning environment focused on helping analysts “kick ass” regardless of their tool set. This means creating a community that’s supportive of Excel, SQL, Python, and, of course, R.

About JD Long: I build models. And according to George E. P. Box, my models are wrong. My skill is understanding when and where my models are useful. I’m an experienced risk and data scientist with a background in insurance, reinsurance, market risk, and stochastic modeling. I’m the guy who can build a Monte Carlo model, help parallelize the model to run on Amazon’s cloud services and then stand in front of a general audience and put the work in context where everyone understands. My super power is thinking probabilistically, understanding risk, and communicating clearly. I have a history forming bridges between IT and business teams

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate JD Long

Jesse Mostipak | R4DS online learning community: self-taught data science & DEI | RStudio (2019)

The first iteration of the R4DS Online Learning Community was created as an online space for learners and mentors to gather and work through the “R for Data Science” text in a collaborative and supportive environment. The creation of this group was inspired by my own success in transitioning to a career in data science coupled with the resources that I wanted to see in the R programming space. This talk will go through the learnings of creating an online learning space focused on R programming for data science, and how future iterations of similar groups can more proactively center on bringing about diversity, equity, and inclusion to data science spaces.

About Jesse Mostipak: Molecular biologist turned public school teacher who eventually fell in love with non-profit data science. Harvard SDP, Datanaut, and perpetual #rstats noob

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Jesse Mostipak

Kelly Nicole Bodwin | Intro stats with R: Easing the transition to software for beginners | RStudio

Talk from rstudio::conf(2019)

In this talk, we will present our approach to incorporating R and RStudio into a 10-week introductory statistics course for non-majors Cal Poly. Our primary contribution will be to share a series of Shiny Apps, created to ease students with no statistical or coding background into the philosophy of using programming tools to explore data. Our program was recently used in 3 sections of 35 students each this Fall, during which students were surveyed regularly for their reactions to the approach. We will demonstrate our new tools, discuss our successes and failures, share student-generated output, and summarize the results of our Fall survey.

Kelly teaches at Cal Poly in San Luis Obispo, California, where she takes great joy in forcing statistical thought and R skills upon unsuspecting undergraduates. Her research is in clustering methods, digital humanities, and R tools for education. In her free time, she hikes and camps, plays board games, and tweets too much about R

rstudio Shiny Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate Kelly Bodwin Beginner

Mary Rudis | How R and Posit are revolutionizing Stats Education in Community Colleges | Posit

Talk from rstudio::conf(2019)

There is no doubt that Posit has had an impact on how introductory statistics is taught in colleges today. When we consider the sheer dominance that giants like Texas Instruments, IBM, and Pearson Publishing have had in academic curriculum development, it’s no small wonder that tools like R and Python have been able to gain a foothold. Projects like DataCamp, ModernDive.com, “Introductory Statistics with Randomization and Simulation” courtesy of openintro.org, Wickham’s “R for Data Science” and Peng’s “R Programming for Data Science” are great resources for the student who has already some fundamental math or statistical background and has become comfortable around computing and applications-driven computational exercises. But many of us know that Data Science cannot simply be relegated to the privileged few that stumble into it by virtue of circumstance. My passion, and the purpose of my talk, is to provide educators with a digestible guidebook that would be appropriate for introduction to statistical concepts in high school, college, and under-resourced schools looking for ways to increase diversity in STEM. Organized in small, adaptable activities designed to be the amuse-esprit enticing both the timid and the skeptical to the proverbial banquet table that is Posit, this exploration into the world of statistics education should be of interest to a wide audience. My hope is to increase data literacy in real world context – with primary emphasis on descriptive statistics and distributions.

About Mary Rudis: After graduating from Lehigh University in Bethlehem, PA with a degree in Mathematics, Mary began as a high school mathematics and computer science teacher, developing technology infrastructure for a small, private high school in Pennsylvania. Throughout her career, she brings innovative approaches and enjoys the role of trailblazer. Mary’s most recent accomplishments as math department chair included developing mathematics curriculum and coordinating engineering, bioengineering and data science degrees at Great Bay Community College in Portsmouth, NH. Mary’s primary interests are learning and instruction, developing data science curricula for two-year colleges and 4-year liberal arts colleges, and working with area high school students in STEM at the University of NH Tech Camp each summer

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Mary Rudis Education

Mike K Smith | Using rmarkdown and parameterised reports | RStudio

My brain is lazy, shallow and easily distracted. Learn how I use notebooks to keep my present-self organised, my future-self up to speed with what I was thinking months ago, and also how I use parameterised reports to share results for both quantitative and non-quantitative audiences across multiple endpoints. I can update and render outputs for a variety of outputs from a single markdown notebook or report. I’ll show you how I organise my work using the Tidyverse, use child documents with parameterisation and also how this is served out to my colleagues via RStudio Connect.

About Mike K Smith: I have 25 years experience of working in the Pharmaceutical industry (Pfizer), with more than 15 years working on modelling and simulation projects. I am a keen advocate of smarter drug development with a particular interest in Bayesian methods, dose-response, reproducible research and knowledge management. My particular expertise is in the use of simulation methodology to predict drug outcomes, find efficient trial designs, assess decision criteria and evaluate analysis methodologies. My current role at Pfizer is as specialist in computation and modeling solutions - evaluating and deploying new tools and training colleagues. I am an RStudio certified tidyverse trainer

rmarkdown rstudio tidyverse Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Mike Smith

Jake Thompson | Branding and Packaging Reports with R Markdown | RStudio (2020)

The creation of research reports and manuscripts is a critical aspect of the work conducted by organizations and individual researchers. Most often, this process involves copying and pasting output from many different analyses into a separate document. Especially in organizations that produce annual reports for repeated analyses, this process can also involve applying incremental updates to annual reports. It is important to ensure that all relevant tables, figures, and numbers within the text are updated appropriately. Done manually, these processes are often error prone and inefficient. R Markdown is ideally suited to support these tasks. With R Markdown, users are able to conduct analyses directly in the document or read in output from a separate analyses pipeline. Tables, figures, and in-line results can then be dynamically populated and automatically numbered to ensure that everything is correctly updated when new data is provided. Additionally, the appearance of documents rendered with R Markdown can be customized to meet specific branding and formatting requirements of organizations and journals. In this presentation, we will present one implementation of customized R Markdown reports used for Accessible Teaching, Learning, and Assessment Systems (ATLAS) at the University of Kansas. A publicly available R package, ratlas, provides both Microsoft Word and LaTeX templates for different types of projects at ATLAS with their own unique formatting requirements. We will discuss how to create brand-specific templates, as well as how to incorporate the templates into an R package that can be used to unify report creation across an organization. We will also describe other components of branding reports beyond R Markdown templates, including customized ggplot2 themes, which can also be wrapped into the R package. Finally, we will share lessons learned from incorporating the R package workflow into an existing reporting pipeline. https://rstudio.com/resources/rstudioconf-2020/branding-and-packaging-reports-with-r-markdown/

ggplot2 rstudio Rstudio::conf(2020) Jake Thompson Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Max Kuhn | Total Tidy Tuning Techniques | RStudio (2020)

Many models have structural parameters that cannot be directly estimated from the data. These tuning parameters can have a significant effect on model performance and require some mechanism for finding reasonable values. The tune and workflow packages enable tidymodels users to optimize these parameters using a variety of efficient grid search methods as well as with iterative search techniques (such as Bayesian optimization)

Max Kuhn

rstudio tidymodels Rstudio::conf(2020) Max Kuhn Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Alicia Schep | Auto-magic Package Development | RStudio (2020)

Vega-lite is a high-level grammar of interactive graphics implemented in Javascript; it renders interactive visualizations in the browser based on a JSON specification. In Python and Javascript, the Altair and vega-lite-api packages have demonstrated how the development of APIs to build Vega-Lite graphics can be partially automated based on the Vega-Lite JSON schema, which describes the required format for a Vega-Lite JSON specification. This talk will describe the development of the ‘vlbuildr’ package for building Vega-Lite specifications in R and the ‘vlmetabuildr’ package for building the ‘vlbuildr’ package. The ‘vlbuildr’ package seeks to provide a pipe-friendly, “R-like” functional interface for building up simple to complex specifications for Vega-Lite graphics, which can in turn be rendered as an HtmlWidget by the ‘vegawidget’ R package. Building such an API in a fully automated way from the Vega-Lite schema presents considerable challenges, so the approach taken here was to rely on partial automation. Human judgement dictates the basic contours of the API, such as what groups of functions to include and how various types of building blocks will go together. The part that is automated is filling in many details such as the different variants of a group of functions, the exact parameters needed for each function, and the documentation of those parameters – the parts that would be extremely tedious to port over!

rstudio Rstudio::conf(2020) Alicia Schep Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Dewey Dunnington | Best practices for programming with ggplot2 | RStudio (2020)

The ggplot2 package is widely acknowledged as a powerful, dynamic, and easy-to-learn graphics framework when used in an interactive environment

ggplot2 rstudio Rstudio::conf(2020) Dewey Dunnington Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Eduardo Ariño de la Rubia | Value in Data Science Beyond Models in Production | RStudio (2020)

ML in production is one of the most obvious ways that data science organizations create value in business. However, these models are at the very end of a long story of how quantitative research changes and enhances organizations. In this talk I will discuss how I have found DS organization to be truly transformative outside of ML in the loop. Bio: Eduardo Ariño de la Rubia is a DS manager and educator. He loves R and RStudio. He has a Masters in Negotiation, Conflict Resolution and Peacebuilding, which is probably the most useful training he could have received

rstudio Rstudio::conf(2020) Eduardo Ariño De La Rubia Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Desiree De Leon | Of Teacups, Giraffes, and R Markdown | RStudio (2020)

How do you make your R Markdown lessons feel friendly for learners you’ll never meet? How do you make it engaging so they sit and stay a while? How do you make it memorable so they come back to visit again? In this talk, I’ll share lessons learned from my experience of making a series of online statistics modules (co-authored by Hasse Walum) that feel accessible and fun– housed entirely in an R Markdown site, complete with a whimsical, illustrated narrative about teacup giraffes. I’ll show how adding good characters with your audience in mind, good design, and good play helped me make the most of HTML output. To help you get started, I’ll share resources that Alison Hill and I have developed–including a series of cookbooks and out-of-the-box templates– so that you will have a leg up on applying these ideas to R Markdown collections of your own

rstudio Rstudio::conf(2020) Desiree De Leon Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Dr. Travis Gerke | UnicoRns are real | RStudio (2020)

Common advice from experienced data scientists to job-seekers is to avoid job postings that describe a “data science unicorn”: someone who has experience performing an unrealistically large array of technical and business-related job duties. Seeking a unicorn is viewed as a potential indicator that the company fails to understand their data science needs, and that new hires will not be poised for success due to lacking support and resources [Robinson & Nolis, 2019]. The R language, particularly when used with RStudio products, has evolved to enable production-level activities in the areas of data wrangling, reporting/dashboarding, database/software engineering, machine learning, and web application development. It is increasingly plausible that a data scientist will be able to efficiently perform a wide variety of job functions with experience only in a single language (R). Indeed, even entry level R users may tread into “unicorn” territory. Current standards for data scientist job descriptions and salaries do not accommodate this nuance, leaving both job-seekers and hiring managers unable to distinguish job requirements which should be read as warning signs from listings which are idyllic matches for the modern R unicorn. In this talk, we present data aggregated from several large compensation analytics companies which summarize current benchmarks for data science job descriptions and corresponding salary ranges. We then suggest job description language to target modern R users, considering both job duty compatibility and job post findability. These descriptions are presented with likely salary range pairings. Attention is given to deviations from traditional degree requirements, years of experience, and demands for multiple programming language literacy which may lack relevance for the R unicorn. Our overarching goal is to provide job description templates which encourage optimal matchmaking between R job seekers and organizations in need of their talents

rstudio Rstudio::conf(2020) Dr. Travis Gerke Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Ellis Hughes | Approaches to Assay Processing Package Validation | RStudio (2020)

In this talk I will discuss the steps that have been created for validating internally generated R packages at SCHARP (Statistical Center for HIV/AIDS Research and Prevention) and the lessons learned while creating packages as a team. Housed within Fred Hutch, SCHARP is an instrumental partner in the research and clinical trials surrounding HIV prevention and vaccine development. Part of SCHARP’s work involves analyzing experimental biomarkers and endpoints which change as the experimental question, analysis methods, antigens measured, and assays evolve. Maintaining a validated code base that is rigid in its output format, but flexible enough to cater a variety of inputs with minimal custom coding has proven to be important for reproducibility and scalability. SCHARP has developed several key steps in the creation, validation, and documentation of R packages that take advantage of R’s packaging functionality. First, the programming team works with leadership to define specifications and lay out a roadmap of the package at the functional level. Next, statistical programmers work together to develop the package, taking advantage of the rich R ecosystem of packages for development such as roxygen2, devtools, usethis, and testthat. Once the code has been developed, the package is validated to ensure it passes all specifications using a combination of testthat and rmarkdown. Finally, the package is made available for use across the team on live data. These procedures set up a framework for validating assay processing packages that furthers the ability of Fred Hutch to provide world-class support for our clinical trials

devtools rmarkdown roxygen2 rstudio testthat usethis Rstudio::conf(2020) Ellis Hughes Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Namita Nandakumar | R + Tidyverse in Sports | RStudio (2020)

This talk will use a case study, most likely in hockey, to showcase the many ways in which R and the Tidyverse can be used to analyze sports data as well as the unique priorities and considerations that are involved in applying statistical tools to sports problems

rstudio tidyverse Rstudio::conf(2020) Namita Nandakumar Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate Sports

Andrew Mangano | Growth Hacking - Product Analytics at Scale using R and RStudio | RStudio (2020)

Salesforce is not only a cloud software solution out of the box, but also a highly customizable platform that can be modified for a wide range of use cases. In addition to complexity, customer trust is our #1 company value and customer data privacy is abstracted from everyone outside of the customer. Product and Growth Analytics is an emerging field separate from business analytics and data science and focuses on building software product that improve user retention and engagement. Companies like Facebook and AirBnB have robust data science teams focused on product analytics. At Salesforce however, given the scale, customization, and privacy values, product data science is not so straightforward. Utilizing R and Rstudio tools for collaboration and reproducible analytics, the Data Intelligence team is able to solve complex problems at enterprise scale. This talk will preview anonymized predictive and growth analytics work while also highlighting how we work and collaborate cross platform and languages (Python via reticulate)

reticulate rstudio Rstudio::conf(2020) Andrew Mangano Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Braulio Cuandon & Ana Alyeska Santos | Using R to Create Reproducible Engineering Test Reports

Talk from rstudio::conf(2020)

Engineers at Biosense Webster, a Johnson and Johnson medical device company that specializes in diagnosing and treating cardiac arrhythmias, write multiple test reports to comply with FDA regulatory standards. These intricate reports require 36 hours of an engineer’s time on average, constraining the engineers from completing investigations and studies in a timely matter. Writing scripts in R that create reproducible reports can significantly reduce the time spent by an engineer creating these reports allowing them to do a much thorough investigation with a larger scope. Through Shiny, engineers could conveniently have their parameters and recorded data processed and stored in a database by accessing a web link and filling out the required information within a user-friendly interface. Upon the generation of the report, accurate and properly formatted test reports, compliant to both the company and FDA regulatory standards, are produced through Rmarkdown and knitr knitting all the outputs with complete data analysis tools such as normality plots and process capability measurements to a word document that follows company required headers, footers, and headings. The reproducible report creation shown in this report can be extended to other types of test reports and protocols. The pilot phase that has been conducted has shown that complete report production has been decreased from 36 hours to an hour

rmarkdown rstudio Shiny Rstudio::conf Rstudio::conf(2020) Braulio Cuandon Ana Alyeska Santos Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Shiny RMarkdown CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Colin Gillespie | How to win an AI Hackathon, without using AI | RStudio (2020)

Anyone reading a newspaper or listening to the news is led to believe that AI is the solution to all problems. From self-driving cars to detecting disease to catching fraud, there doesn’t seem to be a situation that AI can’t tackle. Once “big data” is thrown into the mix, the AI solution is all but certain. But is AI always needed? Over the last eighteen months, Jumping Rivers has entered (and won) four Hackathons. All Hackathons were characterised with “big data” and the need to improve prediction. All Hackathons were won without using AI (or any sort of machine learning). This talk will focus on one particular competition around reducing leakage at Northumbrian Water. Using a combination of R, Shiny, and Tidyverse (and a few other tricks), we were able to demonstrate within the short Hackathon time frame that clear presentation of data to the front line engineers was more likely to reduce leakage, than simply providing vague estimates of a potential future leak

rstudio Shiny tidyverse Rstudio::conf(2020) Colin Gillespie Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Danielle Navarro | Toward a grammar of psychological experiments | RStudio (2020)

Why does a psychological scientist learn a programming language? While motivations are many and varied the two most prominent are data analysis and data collection. The R programming language is well placed to address the first need, but there are fewer options for programming behavioural experiments within the R ecosystem. The simplest experimental designs can be recast as surveys, for which there are many options, but studies in cognitive psychology, psychophysics or developmental psychology typically require more flexibility. In this talk I outline the design principles behind xprmntr, an R package that provides wrappers to the a javascript library (jsPsych) for constructing web based psychology experiments and uses the plumber package to call server side R code as needed. In doing so, I discuss limitations to the current implementation and what a “grammar of experiments” might look like

plumber rstudio Rstudio::conf(2020) Danielle Navarro Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Evangeline Reynolds | Flipbooks | RStudio (2020)

Good examples facilitate accomplishing new or unpracticed tasks in a programmatic workflow. Tools for communicating examples have improved in recent years. Especially embraced are tools that show code and its resultant output immediately thereafter — the case of Jupytr notebooks and Rmarkdown documents. But creators using these tools often must choose between big-picture or narrow-focus demonstration; creators tend to either demo a complete code pipeline that accomplishes a realistic task or instead demonstrate a minimal example which makes clear the behavior of a particular function, but how it might be used in a larger project isn’t clear. Flipbooks help address this problem, allowing the creator to present a full demonstration which accomplishes a real task, and gives the viewer the opportunity to focus on unfamiliar steps. A set of flipbook building functions parse code in a data manipulation or visualization pipeline and then build it back up incrementally. Aligned superimposition of new code and output atop previous code and output makes it easy to identify how each code change triggers changes in output. The presentation will guide attendees in creating their own Flipbooks (with Xaringan slides) or mini Flipbooks (gif output)

rmarkdown rstudio Rstudio::conf(2020) Evangeline Reynolds Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Kevin Ushey | renv: Project Environments for R | RStudio (2020)

The renv package helps you create reproducible environments for your R projects. With renv, you can make your R projects more:

Isolated: Installing a new or updated package for one project won’t break your other projects, and vice versa.
Portable: Easily transport your projects from one computer to another, even across different platforms. renv makes it easy to install the packages your project depends on.
Reproducible: renv records the exact package versions you depend on, and ensures those exact versions are the ones that get installed wherever you go.

In this presentation, I’ll introduce renv and some of its main workflows

renv rstudio Rstudio::conf(2020) Kevin Ushey Renv Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Ian Lyttle | Small Team, Big Value: Using R to Design Visualizations | RStudio (2020)

Many R users can feel isolated due to the prevalence of Python or Tableau at their institutions. This talk will focus on how we use R to develop reference implementations of visualizations (using ggplot2), and to develop corporate-themed color maps (using the colorspace package) to bring value to the entire institution. Color maps can be translated into variety of formats, for Tableau, Qlik Sense, d3, etc., and deployed independently from R. For visualizations, our goal is to translate ggplot2 objects to Vega-Lite specifications, using a package we are developing: ggvega. Vega-Lite visualizations are web-native, and are rendered independently from R. Specifications can be designed to be extensible to new data, allowing them serve as templates, to be deployed and updated for use outside of R. Of course, despite isolation within an institution, our work with the larger R open-source communities provides a foundation on which to build; in fact, we have a lot of company and are having a lot of fun

ggplot2 rstudio Rstudio::conf(2020) Ian Lyttle Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Mine Çetinkaya-Rundel | Making the Shiny Contest | RStudio (2020)

In January 2019 RStudio launched the first-ever Shiny contest to recognize outstanding Shiny applications and to share them with the community. We received 136 submissions for the contest and reviewing them was incredibly inspiring and humbling. In this talk, we shine a spotlight on the backstage: the inspiration behind the contest, the process of evaluation, what we learned about Shiny developers and how we can better support them, and what we learned about running contests and how we hope to improve the Shiny Contest experience. We also highlight some of the winning apps as well as the newly revamped Shiny Gallery, which features many noteworthy contest submissions. Finally, we introduce the new process for submitting your apps to the Shiny Gallery and, of course, to Shiny Contest 2020! https://rstudio.com/resources/rstudioconf-2020/making-the-shiny-contest/

Mine Çetinkaya-Rundel

rstudio Shiny Rstudio::conf(2020) Mine Çetinkaya-Rundel Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Thomas Lin Pedersen | Extending your ability to extend ggplot2 | RStudio (2020)

The ggplot2 package continue to be one of the most used frameworks for producing graphics in R. While being extremely flexible, the package itself can be constrained by the different types of graphic elements and statistic transformations available. Instead of continuing to add new features, the development in recent years have focused on making ggplot2 extensible by other packages, thus distributing development and maintenance. Despite the best of intentions, ggplot2 can feel daunting to extend, due unusual idiosyncrasies, a foreign object system, and a partly obscured rendering model. This talk intend to remove the mystery of extending ggplot2, by describing the basic ways that it can be extended and showcasing a couple of simple extensions that can be build with very little code. Lastly, it will include discussions of some best practices and gotchas that may come in handy when you start out

Thomas Lin Pedersen

ggplot2 rstudio Rstudio::conf(2020) Thomas Lin Pedersen Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Henrik Bengtsson | Future: Simple Async, Parallel & Distributed Processing in R | RStudio (2020)

Future is a minimal and unifying framework for asynchronous, parallel, and distributed computing in R. It is designed for robustness, consistency, scalability, extendability, and adoptability - all in the spirit of “developer writes code once, user runs it anywhere”. It is being used in production for high-performance computing and asynchronous UX, among other things. In this talk, I will discuss common feature requests, recent progress we have made, and what is the pipeline

rstudio Rstudio::conf(2020) Henrik Bengtsson Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Tyson Barrett | List-columns in data.table | RStudio (2020)

The use of list-columns in data frames and tibbles is well documented (e.g. Bryan, 2018), providing a cognitively efficient way to organize results of complex data (e.g. several statistical models, groupings of text, data summaries, or even graphics) with corresponding data. For example, one can store student information within classrooms, player information within teams, or analyses within groups. This allows the data to be of variable sizes without overly complicating or adding redundancies to the structure of the data. In turn, this can improve the reliability to appropriately analyze the data. Because of its efficiency and speed, being able to use data.table to work with list-columns would be beneficial in many data contexts (e.g. to reduce memory usage in large data sets). Herein, I demonstrate how one can create list-columns in a data table using the by argument in data.table and purrr::map(). I compare the behavior of the data.table approaches to the dplyr::group_nest() function and tidyr::unnest(), two of the several powerful Tidyverse nesting and unnesting functions. Results using bench::mark() show the speed and efficiency of using data.table to work with list-columns

dplyr purrr rstudio tidyr tidyverse Rstudio::conf(2020) Tyson Barrett Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Dr. Carson Sievert | Reproducible Shiny apps with shinymeta | RStudio (2020)

Shiny makes it easy to take domain logic from an existing R script and wrap some reactive logic around it to produce an interactive webpage where others can quickly explore different variables, parameter values, models/algorithms, etc. Although the interactivity is great for many reasons, once an interesting result is found, it’s more difficult to prove the correctness of the result since: (1) the result can only be (easily) reproduced via the Shiny app and (2) the relevant domain logic which produced the result is obscured by Shiny’s reactive logic. The R package shinymeta provides tools for capturing and exporting domain logic for execution outside of a Shiny runtime (so that others can reproduce Shiny-based result(s) from a new R session)

Carson Sievert

rstudio Shiny shinymeta Rstudio::conf(2020) Dr. Carson Sievert Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Aymen Waqar | Building an iPad dashboard using plumber & RStudio Connect in Pharma | RStudio (2020)

As companies are becoming aware of the need to embrace data-driven solutions, R has gained a huge momentum over recent years. Getting the insights to users has become a very important factor of Data Scientist work. While our world has advanced there is a need to build not only web applications, but also applications on mobile that are available offline. We would like to share with you how within months we have gone from nothing to a production-ready application that handles 500 concurrent users in healthcare. There are plenty of challenges to solve including restricted environments, internal processes and users availability. We will show you how to overcome them and iterate fast, navigating through complex infrastructure and integrating with proxy architecture to serve applications to end users in compliant manner. With RStudio Connect and Plumber you can deploy a scalable REST API that can feed insights to your users. This allows you to go one step further and implement native applications for tablets and smartphones. With the right tools, mindset and priorities you can achieve personal success by introducing a digital transformation within your organization, starting with something as small as converting a business critical Excel file that is slow, difficult to edit and maintain, to a robust application. Step by step your organization will evolve and become empowered by your insights uncovering even more untapped potential

plumber rstudio Rstudio::conf(2020) Aymen Waqar Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Ben Joaquin | Data Science in Meatspace | RStudio (2020)

“The Data Science community is dominated by folks doing amazing work with data that starts in and never leaves cyberspace. This talk is about best practices and playbooks for doing data science that involves meatspace (the opposite of cyberspace) and why R is such a great language for working with data that originated in the physical world. While the concrete examples in this talk will mostly come from the manufacturing space, where I have the most experience, I believe the themes are relevant to many meatspace workflows. We’ll talk through effective playbooks that can help you navigate common tasks throughout the life-cycle of a project. We’ll also weave in how R’s glorious package ecosystem, including Tidyverse, can be combined with other languages like python, and with enterprise products like RStudio Connect to great effect. Specifically, we’ll discuss practices in these areas:

best practices for data collection in meatspace the importance of quantifying measurement system error collecting the correct data for training computer vision models the rarely discussed cost of maintaining models in production”

rstudio tidyverse Rstudio::conf(2020) Ben Joaquin Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Karl Feinauer | Using Jupyter with RStudio Server Pro | RStudio (2020)

This talk is for R admins who want to learn how to set up Jupyter notebooks on RStudio Server Pro. We’ll cover prerequisites, basic configuration, best practices for management, Jupyter Lab, and more

rstudio Rstudio::conf(2020) Karl Feinauer Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

David Smith | MLOps for R with Azure Machine Learning | RStudio (2020)

David Smith | January 31, 2020 Azure Machine Learning service (Azure ML) is Microsoft’s cloud-based machine learning platform that enables data scientists and their teams to carry out end-to-end machine learning workflows at scale. With Azure ML’s new open-source R SDK and R capabilities, you can take advantage of the platform’s enterprise-grade features to train, tune, manage and deploy R-based machine learning models and applications. In this talk, the attendees will learn how to: •Carry out ML workflows using the authoring experience of their choice, from no-code to code-first options that include Azure ML’s drag-and-drop visual interface for defining workflows and RStudio Server on the Data Science Instance, a hosted VM workstation, for using the Azure ML R SDK from the RStudio browser-based interface. •Use the Azure ML R SDK to manage cloud resources and train, hyperparameter tune, and log and visualize metrics for their models at scale on Azure compute. •Build ML Pipelines in R for defining and orchestrating reusable and reproducible ML workflows. •Deploy, manage, and monitor their R ML models and applications as web services on Azure Container Instance and Azure Kubernetes Service, with an emphasis on robust DevOps and CI/CD for orchestrating and streamlining their end-to-end data science development lifecycle

rstudio Rstudio::conf(2020) David Smith Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Neal Richardson | Accelerating Analytics with Apache Arrow | RStudio (2020)

The Apache Arrow project is a cross-language development platform for in-memory data designed to improve system performance, memory use, and interoperability. This talk presents recent developments in the ‘arrow’ package, which provides an R interface to the Arrow C++ library. We’ll cover the goals of the broader Arrow project, how to get started with the ‘arrow’ package in R, some general concepts for working with data efficiently in Arrow, and a brief overview of upcoming features

Neal Richardson

rstudio Rstudio::conf(2020) Neal Richardson Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Rstats Open Source OSS Reticulate Apache Arrow

Ryan Timpe | Learning R with humorous side projects | RStudio (2020)

What should you name a new dinosaur discovery, according to neural networks? Which season of The Golden Girls should you watch when playing a drinking game? How can you build a LEGO set for the lowest price? R is constantly evolving, so as users, we’re constantly learning. Over the past few years, I’ve found that working on side projects is great for hands-on learning - and for me, the more absurd the project, the better. Side projects provide a safe, low-stakes environment to learn new packages and methodologies before using them in work or in production. Sharing those projects can help publicize the package and increase its accessibility, benefiting both the original author and future users. In this talk, I’ll share my experiences with side projects for learning state-of-the-art data science tools and growing as an R user, including how one project helped me land my dream job

rstudio Rstudio::conf(2020) Ryan Timpe Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Bryan Lewis | Parallel computing with R using foreach, future, and other packages | RStudio (2020)

Steve Weston’s foreach package defines a simple but powerful framework for map/reduce and list-comprehension-style parallel computation in R. One of its great innovations is the ability to support many interchangeable back-end computing systems so that the same R code can run sequentially, in parallel on your laptop, or across a supercomputer. Recent new packages like future package define elegant new programming approaches that can use the foreach framework to run across a wide variety of parallel computing systems. This talk introduces the basics of foreach and future packages with examples using a variety of back-end systems including MPI, Redis and R’s default parallel package clusters

rstudio Rstudio::conf(2020) Bryan Lewis Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Javier Luraschi | Updates on Spark, MLflow, and the broader ML ecosystem | RStudio (2020)

Originally posted at https://rstudio.com/resources/rstudioconf-2020/updates-on-spark-mlflow-and-the-broader-ml-ecosystem/

rstudio Rstudio::conf(2020) Javier Luraschi Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Dr. Sydeaka Watson | Neural Networks for Longitudinal Data Analysis | RStudio (2020)

Longitudinal data (or panel data) arise when observations are recorded on the same individuals at multiple points in time. For example, a longitudinal baseball study might track individual player characteristics (team affiliation, age, height, weight, etc.) and outcomes (batting average, stolen bases, runs, strikeouts, etc.) over multiple seasons, where the number of seasons could vary across players. Neural network frameworks such as Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRUs) can flexibly accommodate this data structure while preserving and exploiting temporal relationships. In this presentation, we highlight the use of neural networks for longitudinal data analysis with tensorflow and keras in R

rstudio tensorflow Rstudio::conf(2020) Dr. Sydeaka Watson Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Gergely Daroczi | Getting things logged | RStudio (2020)

One of the greatest strengths of R is the ease and speed of developing a prototype (let it be a report or dashboard, a statistical model or rule-based automation to solve a business problem etc), but deploying to production is not a broadly discussed topic despite its importance. This hands-on talk focuses on best practices and actual R packages to help transforming the prototypes developed by business analysts and data scientist into production jobs running in a secured and monitored environment that is easy to maintain – discussing the importance of logging, securing credentials, effective helper functions to connect to database, open-source and SaaS job schedulers, dockerizing the run environment and scaling infrastructure

rstudio Rstudio::conf(2020) Gergely Daroczi Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Rob Hyndman | How Rmarkdown changed my life | RStudio (2020)

Over the last few years, Rmarkdown seems to have taken over my life, or at least my written communication. These days I use Rmarkdown to maintain my website, write my blog, write textbooks, write academic papers, prepare slides for talks, keep my CV up-to-date, help my students write theses, prepare university policy documents, write letters, prepare exams, write reports for clients, and more. I haven’t quite got to the point of using it for shopping lists, but perhaps that’s my next Rmarkdown template. I will reflect on the journey in getting to this point, what I’ve lost and what I’ve gained. I will also speculate on what might be next in the Rmarkdownification of my life

rmarkdown rstudio Rstudio::conf(2020) Rob Hyndman Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Frank Harrell | R for Graphical Clinical Trial Reporting | RStudio (2020)

For clinical trials a good deal of effort goes into producing both final trial reports and interim reports for data monitoring committees, and experience has shown that reviewers much prefer graphical to tabular reports. Interactive graphical reports go a step further and allow the most important information to be presented by default, while inviting the reviewer to drill down to see other details. The drill-down capability, implemented by hover text using the R plotly package, allows one to almost entirely dispense with tables because the hover text can contain the part of a table that pertains to the reviewer’s current focal point in the graphical display, among other things. Also, there are major efficiency gains by having a high-level language for producing common elements of reports related to accrual, exclusions, descriptive statistics, adverse events, time to event, and longitudinal data. This talk will overview the hreport package, which relies on R, RMarkdown, knitr, plotly, Hmisc, and HTML5. RStudio is an ideal report development environment for using these tools

rmarkdown rstudio Rstudio::conf(2020) Frank Harrell Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Therese Anders | Peer review in data science courses | RStudio (2020)

Peer review enables instructors of large data science classes to provide substantive feedback to students beyond what is feasible with standard code review via automated grading and continuous integration. It facilitates peer learning, which is shown in literature to have positive learning outcomes, and can reduce the burden of grading by course staff. The ghclass package provides a suite of functions to manage courses via GitHub repositories. The package has recently been supplemented with the functionality to implement peer review. Developed during my 2019 summer internship with RStudio in collaboration with my mentor Mine Çetinkaya-Rundel, the peer review functions in ghclass interface with the GitHub API to create review repositories, move files between authors and reviewers, submit feedback, and collect grades. In this presentation, I will give a demonstration of the peer review functions in ghclass. A set of six functions allows instructors to 1) create a random review roster, 2) set up the review repository infrastructure within a GitHub organization, 3) move assignments from authors to reviewers, 4) collect grades, 5) return the feedback, and 6) obtain a rating of the review from the authors. I reflect on the pedagogy of implementing peer review in introductory data science classes and talk about lessons learned from a real-world test run of the package in the Fall semester 2019 at the University of Edinburgh, conducted by Mine Çetinkaya-Rundel. The presentation highlights ghclass as an R command-line based, open source, low profile, and powerful solution to enable peer review in classes ranging from a size of two to approximately 400 students. A 5 minute presentation in our Lightning Talks series

Mine Çetinkaya-Rundel

rstudio Rstudio::conf(2020) Therese Anders Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Dr. Tyler Morgan-Wall | 3D ggplots with rayshader | RStudio (2020)

Learn how a single line of code can transform your data visualizations into stunning 3D using the rayshader package. In this talk, I will show how you can use rayshader to create beautiful 3D figures and animations to help promote your research and analyses to the public. Find out how to use principles of cinematography to take users on a 3D tour of your data, scripted entirely within R. Leaving the 3D pie charts in the pantry at home, I will discuss how to build interpretable, engaging, and informative plots using all three dimensions

rstudio Rstudio::conf(2020) Dr. Tyler Morgan-Wall Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Rstats Open Source OSS Reticulate

Katie Masiello | Professional Case Studies | RStudio (2020)

The path to becoming a world-class, data-driven organization is daunting. The challenges you will likely face along the way can be thorny, and in some cases, seem outright impossible to overcome. How do you get teams that traditionally butt heads, such as IT and data science, to complement each other and work in unison? How can you efficiently scale the scope and reach of your data products as requirements change? Your time should be spent doing truly valuable work instead of updating charts and reports. How do you prevent the support structure behind your platform from toppling like a house of cards? Despite these challenges, we think that the end result is worth it: an organization that is equipped to make important decisions, with confidence, using data analysis that comes from a sustainable environment. We see this outcome every day

rstudio Rstudio::conf(2020) Katie Masiello Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Miriah Meyer | Effective Visualizations | RStudio (2020)

Originally posted to https://rstudio.com/resources/rstudioconf-2020/effective-visualizations/

rstudio Rstudio::conf(2020) Miriah Meyer Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Daniel Falbel | What’s new in TensorFlow for R | RStudio (2020)

TensorFlow is the most popular open-source platform for machine learning and it’s ecosystem is evolving incredibly fast. In this talk we will explore what’s new in TensorFlow 2.0 as well as how to build data pre-processing pipelines using the tfdatasets package and how to use pre-trained models with tfhub

Daniel Falbel

rstudio tensorflow tfdatasets Rstudio::conf(2020) Daniel Falbel Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Lionel Henry | Interactivity and Programming in the Tidyverse | RStudio (2020)

In Tidyverse grammars such as dplyr you can refer to the columns in your data frames as if they were objects in the workspace. This syntax is optimised for interactivity and is a great fit for data analysis, but it makes it harder to write functions and reuse code. In this talk we present some advances in the tidy eval framework that make it easier to program around Tidyverse pipelines without having to learn a lot of theory

Lionel Henry

dplyr rstudio tidyverse Rstudio::conf(2020) Lionel Henry Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Colin Fay | Production-grade Shiny Apps with golem | RStudio (2020)

Shiny is an amazing tool when it comes to creating web applications with R. Almost anybody can get a small Shiny App in a matter of minutes, provided they have a basic knowledge of R. As of today, we can safely tell that it has become the de-facto tool for web application in the R world. Building a proof-of-concept application is easy, but things change when the application becomes larger and more complex, and especially when it comes to sending that app to production—until recently there hasn’t been any real framework for building and deploying production-grade Shiny Apps. This is where ‘golem’ comes into play: offering Shiny developers an opinionated framework for creating production-ready Shiny Applications. With ‘golem’, Shiny developers now have a toolkit for making a stable, easy-to-maintain, and robust for production web application with R. ‘golem’ has been developed to abstract away the most common engineering tasks (for example, module creation, addition of external CSS or JavaScript file, …), so you can focus on what matters: building the application. And once your application is ready to be deployed, ‘golem’ guides you through testing, and brings you tools for deploying to common platforms. In this talk, Colin and Vincent will present the ‘golem’ package, first talking about the “why ‘golem’?”, then presenting the general philosophy behind this framework, and help you get started building your first Shiny App with ‘golem’

rstudio Shiny Rstudio::conf(2020) Colin Fay Golem Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Jared Lander | R: Then and Now | RStudio (2020)

R has changed a lot since the meetup was founded 10 years ago. Back then we were using base graphics (or lattice) and the apply family of functions and we didn’t have pipes. At the time there was an impressive 1800 packages on CRAN, now there are over 15,000 extending R’s reach far beyond its traditional domain of statistics and machine learning into publishing, website building and video generation. The community has grown and changed dramatically during that time, with the New York meetup alone going from 25 to over 10,000 members. During this talk we go through a then-and-now of R code and community to palpably see how everything has changed

rstudio Rstudio::conf(2020) Jared Lander Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

William Chase | The Glamour of Graphics | RStudio (2020)

I see a lot of ugly charts. This is to be expected as I work with a lot of academics and data scientists, neither of whom have been trained in how to design attractive charts. I myself produced many ugly charts during my years as a research scientist, when the design process basically came down to random tweaking until things “looked good”. If only I could go back and tell young inexperienced me that there was a better way. In this talk, I will present that better way–a series of design principles that can take any chart from drab to fab. Rather than applying these techniques willy nilly, I will show how they form a layered “Glamour of Graphics” that is structured and can be easily applied to any chart. This Glamour of Graphics has some simple implementations in ggplot, where we will replace geoms, aesthetics, and scales with typography, color, and layout. Finally, I will discuss why looks matter when it comes to charts, and how by following the Glamour of Graphics you can design charts that are more persuasive and more accurately perceived

rstudio Rstudio::conf(2020) William Chase Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Riva Quiroga | The development of “datos” package for the R4DS Spanish translation| RStudio (2020)

Originally posted at https://rstudio.com/resources/rstudioconf-2020/the-development-of-datos-package-for-the-r4ds-spanish-translation/

rstudio Rstudio::conf(2020) Riva Quiroga Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Nima Safaian | Rpanda trading simulation - from an idea to a multi-user shiny app | RStudio (2020)

The idea of rpanda commodities trading simulation was many years in the making. As energy trading professionals working in the industry, we had developed insights around how to make risk/reward market calls, and what skills make someone an exceptional commodities trader. Traders are one of the most expensive seats in terms of monetizing value from the assets. We developed rpanda as a simulated environment which replicates closely how real-life physical commodities trading works in order to assist talent development and selection, both in academics and enterprise. My co-founder and I did not know how to design production-ready software, but we always had used R/Shiny for market analysis in our corporate jobs. Rather than hiring expensive app developers, we decided to do it ourselves. We used Rstudio development stack such as Rstudio Connect and open source tools, like plumber to turn our idea into a production-ready app that is used by University of Alberta classes. In this presentation, we share our journey, technical challenges, and how we overcame them

plumber rstudio Shiny Rstudio::conf(2020) Nima Safaian Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Carl Howe & Greg Wilson | Data Science Education in 2022 | RStudio (2020)

More people are learning data science every day, and there are more ways for them to learn than ever before. To understand where we are and where we might be going, this talk looks at what data science education could look like two years from now: far enough away that we can dream, but close enough that we can only dream a little. We explore the balance between automated and collaborative learning, different ways to deliver different kinds of lessons to different kinds of people, and ways in which our tools and practices could improve

rstudio Rstudio::conf(2020) Carl Howe Greg Wilson Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Jeff Leek | Data science education as a public health intervention in E. Baltimore | RStudio (2020)

Originally posted at https://rstudio.com/resources/rstudioconf-2020/data-science-education-as-an-economic-and-public-health-intervention-in-east-baltimore/

rstudio Rstudio::conf(2020) Jeff Leek Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Lauren Chadwick | Meet You Where You R | RStudio (2020)

At RStudio, we wake up and go to bed thinking about the positive impact that open source work and data science has had and can have on the world. To maximize this impact, we find three areas of investment absolutely critical to ensure our open source community keeps up with the world’s changes and outlives us all: 1. Find ways to make R more approachable. 2. Enable teams of all types & sizes (educational, professional, etc.) to be able to leverage the work they’re doing in R, and effortlessly communicate that work to others. 3. Extend the language so our open-source community can continue to be at the forefront of innovation, no matter their preference of tool or language

rstudio Rstudio::conf(2020) Lauren Chadwick Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Jesse Sadler | vctrs: Creating custom vector classes with the vctrs package | RStudio (2020)

The base R types of vectors enable the representation of an amazingly wide array of data types. There is so much you can do with R. However, there may be times when your data does not fit into one of the base types and/or you want to add metadata to vectors. vctrs is a developer-focused package that provides a clear path for creating your own S3-vector class, while ensuring that the classes you build integrate into user expectations for how vectors work in R. This presentation will discuss the why and how of using vctrs through the example of debkeepr, a package for integrating historical non-decimal currencies such as pounds, shillings, and pence into R. The presentation will provide a step-by-step process for developing various types of vectors and thinking through the design process of how vectors of different classes should work together

rstudio vctrs Rstudio::conf(2020) Jesse Sadler Vctrs Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Amanda Gadrow | Lessons about R I learned from my cat | RStudio (2020)

Forming good development habits for R projects is pretty straight-forward if you follow the lessons I’ve learned from my cat, whose advice includes “be lazy”, “keep your claws sharp”, and “land on your feet”. Attendees of this talk will learn how to make life easier on colleagues and their future selves by using simple software engineering best practices to build their current projects. Each point will come with cat photos and code samples, the two best parts of the Internet!

rstudio Rstudio::conf(2020) Amanda Gadrow Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Athos Damiani | Sound annotation with Shiny and wavesurfer | RStudio (2020)

We observed a huge improvements of Machine Learning tools but the main effort were to help at post annotated dataset step. We still struggle to build a trusty pipeline to make these annotations. The package wavesurfer brings to R users the ability to annotate audio files with ease and reliability, exploring the friendly user interface of Shiny to make this hard and laborious part of the project more joyful and efficient. A 5-minute presentation in our Lightning Talks series

rstudio Shiny Rstudio::conf(2020) Athos Damiani Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Caroline Ledbetter | Rproject templates to automate and standardize your workflow | RStudio (2020)

Many teams and organizations have tasks and structures that are standard across projects. Lack of consistency and documentation can lead to lost productivity when team members join collaborations or previous work is consulted by your future self. Setting up folder structures can be particularly tedious. This talk will demonstrate using Rstudio project templates as part of an organizational package to automatically setup file structures, establish git repositories and add standardized readme files. It will also show how including report templates for Rmarkdown files can lead to more consistent and professional reports. Project info can be optionally stored so that project information can be easily added automatically to the top of reports and included in snippets for code file headers. Creating standard, easy to implement documentation and procedures can be particularly effective in encouraging skeptical collaborators to use git and Rmarkdown. Organizational packages can also be a great place to house functions that are specific and common to an organizations needs. The talk will showcase this functionality using the CIDAtools package that we developed. While the CIDAtools package was developed to address issues that sometimes arise from the less structured environment of academia, the tools presented can be equally useful in an industry setting. A 5-minute presentation in our Lightning Talks series

rmarkdown rstudio Rstudio::conf(2020) Caroline Ledbetter Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Colin Rundel | `livecode`: broadcast your live coding sessions from and to RStudio | RStudio (2020)

In this talk we will demonstrate livecode, a new R package for broadcasting code for live code demonstrations. This package implements a simple webserver (using httpuv) to dynamically publishes the content of a code file (i.e. .R or .Rmd) as you edit it live. This enables your students to have near realtime access to your code as you write it. The broadcast file can be viewed with any webbrowser but the package is specifically designed to be used within RStudio leveraging its builtin viewer. This gives students have direct access to the shared code within the IDE, allowing direct copying into their own source files and/or the console and thereby improving their ability to interact and experiment with your code. A 5-minute presentation in our Lightning Talks series

httpuv rstudio Rstudio::conf(2020) Colin Rundel Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Davis Vaughn | Sliding Windows and Calendars | RStudio (2020)

A number of R packages exist to make computing moving averages on a single numeric series straightforward. But generally “real” life is much messier than that! Try computing a moving average over a twenty-day sliding window when you have a time series with missing data. Oh! By the way, you should also skip over weekends when looking back twenty days. And you know that random holiday that your company celebrates that no one else does? Skip over that too. These are hard but realistic problems, and until now there has been a lack of tools necessary to solve them. In this talk, I’ll present two packages designed to tackle these issues, slide and almanac. slide is a package designed to perform arbitrary sliding window calculations. The simplest example of this would be a moving average. What makes slide unique is its support for sliding relative to an index, such as a date vector, which allows you to correctly compute the boundaries of that twenty day window. almanac is package for creating custom business calendars, and then adjusting dates relative to them. Inspired by lubridate, almanac allows you to shift dates by a set number of “business” days while respecting the weekends and holidays defined by a user-specified calendar. For example, shifting a Friday forward by 1 business day would land on a Monday, unless that Monday happened to be a holiday, in which case the next business day would actually be Tuesday. Together, slide and almanac provide the tooling necessary to solve the problem mentioned earlier. Additionally, because slide works with any arbitrary function, we can use the same procedure to compute rolling regressions, cumulative sums, and any other sliding computation. A 5-minute presentation in our Lightning Talks series

lubridate rstudio Rstudio::conf(2020) Davis Vaughn Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Dr. Amelia McNamara | Making a tidy dress | RStudio (2020)

After at least a year of dreaming about it, I finally produced the #rstats / #Tidyverse dress of my dreams. This involved designing fabric, getting it custom printed, making a pattern from an existing garment, and sewing the dress. I learned a lot of useful lessons during this project, including "do unit tests" (make a practice dress) and "document your work" (get your BFF to take pictures of you)

rstudio tidyverse Rstudio::conf(2020) Dr. Amelia McNamara Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Hunter Glanz | The Five Principles of Data Science Education | RStudio (2020)

In this talk, I will outline a unified philosophy of data science education, and provide tips and tools for implementing these principles in the classroom using R and RStudio. Although data science as a professional discipline is well-established, its pedagogy is still in a period of growth. Even within a single university, multiple data science courses may be offered across different departments leading to inevitable redundancy of efforts amidst rich domain-specific innovations. My experience as an instructor in many such courses has lead me to five principles that transcend domain, context, and choice of language: reproducibility, communication, version control, practical application, and data ethics. For each of these full-stack themes, I will share examples of how to leverage tools in R and RStudio to enhance learning. A 5-minute presentation in our Lightning Talks series

rstudio Rstudio::conf(2020) Hunter Glanz Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Javier Luraschi | Datasets in Reproducible Research with ‘pins’ | RStudio (2020)

Open source code is an essential piece in making science reproducible. Tools like ‘rmarkdown’ and GitHub facilitate running and sharing outcomes with colleagues and with the broad scientific community at large. However, it is less clear what tools should be used to retrieve, store and share datasets; while it is possible to make datasets part of your workflows today, it is usually hard and we are often left with manually sharing or downloading links to datasets. Not only that, but it’s also hard to share or discover datasets. In this talk we will introduce for the first time the ‘pins’ package. A package designed to: pin, discover and share resources. Meaning that, you can use ‘pins’ to simplify your data science workflows by easily fetching resources from GitHub, Kaggle, CRAN and RStudio Connect. We will present a ‘pin’ as a generic resource that can contain tabular datasets like CSVs, unstructured data like JSON files, image archives as ZIP files and so on. This talk will be highly interactive showing you how to get started by installing ‘pins’ from CRAN, retrieve and cache resources, share and discover useful and fun data resources to improve and enhance your day-to-day workflows

rmarkdown rstudio Rstudio::conf(2020) Javier Luraschi Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Jon Harmon | Learning by Teaching: Mentoring at the R4DS Online Learning Community | RStudio (2020)

I host a weekly R Office Hour on the R4DS Online Learning Community Slack. By doing so, I have learned more about R than I ever would have thought. Here I'll present concrete examples of how R users can participate in the R community to expand their skills. R users of all skill levels can develop their skills by helping one another learn. Committing to help people with their coding challenges leads to the exploration of answers in areas you might otherwise not examine. A 5-minute presentation in our Lightning Talks series

rstudio Rstudio::conf(2020) Jon Harmon R4DS Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Katherine Simeon | Every voice matters: An analysis of @WeAreRLadies | RStudio (2020)

As a rotating curation, @WeAreRLadies is a twitter account that has a different curator (i.e., tweeter) each week with a mission to highlight female and minority genders and their work in R. So far, curators have tweeted from 18 different countries and represent a variety of domains and levels of R expertise, ranging from R novices to those developing their own packages. With 45 R-Ladies curators to date, the account has become a popular R-related twitter resource, gaining more than 13,000 followers in the past year and hundreds of interactions each week. This talk will present a text analysis and reflection on over a year of Twitter text data from @WeAreRLadies. As the founder and maintainer of this account, I witness firsthand the bidirectional relationship between one’s learning journey and their use of R. In this talk, I will attempt to quantify this through a text analysis that explores how one’s experiences learning and using R relates to how they talk (or tweet) about it. By analyzing tweet text as well as other metrics provided by twitter (e.g., number of likes, replies, and clicks), I will showcase different ways curators have engaged with the R Twitter community and explore how account engagement has changed as the number of curators and followers continue to grow. I will also discuss how curators’ different areas of expertise have resulted in tweets and discussions that both demonstrate the variety of tools available in R, and spotlight unifying ideas and best practices in R programming. Finally, I will reflect on lessons learned and future directions for @WeAreRLadies, as well as its contribution to the R-Ladies Global initiative. Overall, this talk will discuss how diverse perspectives of @WeAreRLadies curators have enriched the conversations in the R Twitter community by validating different learning journeys and by promoting and amplifying underrepresented voices. A 5-minute presentation in our Lightning Talks series

rstudio Rstudio::conf(2020) Katherine Simeon Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Kelly Bodwin | Course Material Creation in the R Ecosystem | RStudio (2020)

In this talk, I will introduce a suite of three packages designed to aid course material creation in R: {demoR} for displaying code in knitted R Markdown with custom highlighting and formatting; {shindig} for shortcuts to creating simple educational Shiny apps; and {curricular} for easy creation of syllabi, homework exercises, exams, etc. Together, we will explore how these new tools - in conjunction with other existing resources - have been used to create a clean and consistent ecosystem for my R-based Introductory Statistics course. I will share some metrics on student outcomes, as well as my own experiences with the advantages and challenges in building the course. A 5-minute presentation in our Lightning Talks series

rstudio Shiny Rstudio::conf(2020) Kelly Bodwin Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Mike K Smith | Learn to teach, for goodness sake | RStudio (2020)

Even though I’ve completed 4 marathons, you certainly shouldn’t come to me for a training plan on how to achieve your goals for any race you’re about to run. So why do we often turn to “experienced R users” to help us learn R or train an organization? The RStudio certified trainers have been taught modern, evidence-based teaching practices which they use in planning training sessions in order to help delegates achieve THEIR learning goals effectively in a given time-frame. My talk will illustrate some of these teaching concepts and how, by becoming a certified trainer, you can help others learn about R more effectively. A 5 minute presentation in our Lightning Talks series

rstudio Rstudio::conf(2020) Mike K Smith Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Jenny Bryan | Object of type ‘closure’ is not subsettable | RStudio (2020)

Your first “object of type ‘closure’ is not subsettable” error message is a big milestone for an R user. Congratulations, if there was any lingering doubt, you now know that you are officially programming! Programming involves considerably more troubleshooting and debugging than many of us expected (or signed up for). The ability to solve your own problems is an incredibly powerful stealth skill that is worth cultivating with intention. This talk will help you nurture your inner problem solver, covering both general debugging methods and specific ways to implement them in the R ecosystem

Jenny Bryan

rstudio Jenny Bryan Rstudio::conf(2020) Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Emily Riederer | RMarkdown Driven Development | RStudio (2020)

RMarkdown enables analysts to engage with code interactively, embrace literate programming, and rapidly produce a wide variety of high-quality data products such as documents, emails, dashboards, and websites. However, RMarkdown is less commonly explored and celebrated for the important role it can play in helping R users grow into developers. In this talk, I will provide an overview of RMarkdown Driven Development: a workflow for converting one-off analysis into a well-engineered and well-designed R package with deep empathy for user needs. We will explore how the methodical incorporation of good coding practices such as modularization and testing naturally evolves a single-file RMarkdown into an R project or package. Along the way, we will discuss big-picture questions like “optimal stopping” (why some data products are better left as single files or projects) and concrete details such as the {here} and {testthat} packages which can provide step-change improvements to project sustainability

rmarkdown rstudio testthat Emily Riederer Rstudio::conf(2020) Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Claus Wilke | Spruce up your ggplot2 visualizations with formatted text | RStudio (2020)

The ggtext package provides various functions to add formatted text to ggplot2 figures, both in the form of plot or axis labels and in the form of text labels or text boxes inside the plot panel. Text formatting can be achieved through a small subset of markdown, HTML, and CSS directives. Features currently supported include italics, bold, super- and sub-script, as well as changing font size, font family, and color. Basic support for adding images to formatted text is also available

ggplot2 rstudio Claus Wilke Rstudio::conf(2020) Ggtext Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Jim Hester | Azure Pipelines and GitHub Actions | RStudio (2020)

Open source R packages on GitHub often take advantage of continuous integration services to automatically check their packages for errors. This is very useful to catch things quickly, as well and increasing confidence for proposed changes, as the Pull Requests can be checked before they are merged. Travis-CI and Appveyor are the most popular current methods. However newer services, Azure Pipelines and GitHub Actions, show promise for being more powerful and simpler to configure and debug. I will discuss these services and demonstrate some of their capabilities and how to configure them for your own use in packages and reports

rstudio Jim Hester Rstudio::conf(2020) Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Larry Fenn | Journalism with RStudio, R, and the tidyverse | RStudio (2020)

The Associated Press data team primarily uses R and the Tidyverse as the main tool for doing data processing and analysis. In this talk, some of the technology behind the published stories will be showcased: - Using dbplyr to work off a hosted database containing 380 million opioid records to identify “pill mills”. - Using open-sourced AP style templates for R Markdown and ggplot to quickly produce graphics and reports off breaking news. - Using R Markdown and htmlwidgets to give reporters and editors interactive reports to identify reporting leads

dbplyr rstudio tidyverse Larry Fenn Rstudio::conf(2020) Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

George Kastrinakis | Building a data science pipeline for the FT with RStudio Connect | RStudio

Talk from rstudio::conf(2020)

We have recently implemented a new Data Science workflow and pipeline, using RStudio Connect and Google Cloud Services. This has vastly decreased our pipeline complexity, allowing us to bring our models and products into scheduled production more quickly. In addition, our workflow, working closely together as a team on all projects on a regular two-week sprint cycle, has increased the range of projects we have been able to take on and complete. To detail some of the key lessons we’ve learned (and some of the difficulties!), we’ll walk you through one of our recent sprints, where we productionalised the generation of a suite of behavioural and demographic features so that they can be more easily plugged in to a range of models and used across the business by the FT’s platform and product teams

rstudio George Kastrinakis Rstudio::conf(2020) Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Sharla Gelfand | Don’t repeat yourself, talk to yourself! Reporting with R | RStudio (2020)

If you’re responsible for analyses that need updating or repeating on a semi-regular basis, you might find yourself doing the same work over and over again. The principle of “don’t repeat yourself” from software engineering motivates us to use functions and packages, the core of repetition in the R universe. For analyses, it can be difficult to know how to use this principle and move beyond “copying and pasting scripts and changing the data file and the object names and updating the dates and results in RMarkdown”, especially when there’s some element of human intervention required, whether it be for validating assumptions or cleaning artisanal data. This talk will focus on those next steps, showcasing opportunities to stop repeating yourself and instead anticipate the needs of and communicate effectively with your future self (or the next person with your job!) using project-oriented workflows, clever interactivity, templated analyses, functions, and yes, your own packages

rmarkdown rstudio Sharla Gelfand Rstudio::conf(2020) Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Ian Cook | Bridging the Gap between SQL and R | RStudio (2020)

Ian Cook | January 31, 2020 Like it or not, SQL is the closest thing we have to a universal language for working with structured data. Celebrating its 50th birthday in 2020, SQL today integrates with thousands of applications and has millions of users worldwide. Data analysts using SQL represent a large audience of potential R users motivated to expand their data science skills. But learning R can be frustrating for SQL users. One major frustration is the inability to directly query R data frames with SQL SELECT statements. Eager to use R for tasks that are not possible with SQL (like data visualization and machine learning), these users are dismayed to find that they must first learn an unfamiliar syntax for data manipulation. The popularity of the sqldf package (which automatically exports an R data frame into an embedded database, then runs a SQL query on it) demonstrates this frustration. But now there is a way to directly query an R data frame without moving the data out of R. In this talk, I introduce tidyquery, a new R package that runs SQL queries directly on R data frames. tidyquery is powered by dplyr and by queryparser, a new pure-R, no-dependency SQL query parser

dplyr rstudio Ian Cook Rstudio::conf(2020) SQL Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Winston Chang | Asynchronous programming in R | RStudio (2020)

Writing regular R code is straightforward: you tell R to do something, it does it, and then it returns control back to you. This is called synchronous programming. However, if you use R to coordinate threads, processes, or network communication, the regular model may be unable to do what you want, or it may only be able to do it with a significant performance penalty. In this talk I'll explain how asynchronous programming with the later package can handle these kinds of programming problems. I'll also show how to provide a synchronous interface for asynchronous code, so that users will have a simple, familiar way to use your code. Materials - github repo

Winston Chang

rstudio Winston Chang Rstudio::conf(2020) Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Nick Strayer | Stochastic Block Models with R: Statistically rigorous clustering | RStudio (2020)

Nick Strayer | January 31, 2020 Often a machine learning research project starts with brainstorming, continues to one-off scripts while an idea forms, and finally, a package is written to disseminate the product. In this talk, I will share my experience rethinking this process by spreading the package writing across the whole process. While there are cognitive overheads involved with setting up a package framework, I will argue that these overheads can serve as a scaffolding for not only good code but robust research practices. The result of this experiment is the SBMR package: a native R package written to fit and investigate the results of Bipartite Stochastic Block Models that forms the backbone of my PhD dissertation. By going over the ups and downs of this process, I hope to leave the audience with inspiration for moving the package writing process closer to the start of their projects and melding research and code more closely to improve both

Nick Strayer

rstudio Nick Strayer Rstudio::conf(2020) Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

J.J. Allaire | Open Source Software for Data Science | RStudio (2020)

Open Source Software for Data Science

J.J. Allaire | February 1, 2020 Open-source software is fundamentally necessary to ensure that the tools of data science are broadly accessible, and to provide a reliable and trustworthy foundation for reproducible research. This talk will delve into why open source software is so important and discuss the role of corporations as stewards of open source software. I’ll also talk about how RStudio is structured and organized to pursue its mission of creating open source software for data science.

About the speaker J.J. Allaire - JJ Allaire is a software engineer and entrepreneur who has created a wide variety of products including ColdFusion Open Live Writer Lose It! and RStudio

JJ Allaire

rstudio J.J. Allaire Rstudio::conf(2020) Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Kelly O’Briant | Configuration management tools for the R admin | RStudio (2019)

This talk will feature an introduction to configuration management tools for the Analytic Administrator. An analytic admin is someone who is invested in continually improving analytic infrastructure, advocates for best practices in data product deployments, and acts to adopt DataOps philosophies in their organization. One of the biggest challenges for an analytic admin can be figuring out how to help IT groups develop core competencies around the management of R tooling. When IT groups are unfamiliar with R, they might lean heavily on the analytic admin for guidance or resist adoption entirely. Data science teams that rely on delivering results through integrated R based solutions can get blocked when they lack the full support of IT. I’ll present a roadmap for how analytic admins can create custom teaching tools for introducing the R toolchain. Using these strategies, a dash of creativity, and a little bit of empathy, I hope you can get the IT buy-in you’ll need to make R a fully legitimate part of your organization. About the Author Kelly O’Briant Kelly is Solutions Engineer for RStudio and also an organizer of the Washington DC chapter of R-Ladies Global. It’s an R users group for lady-folk and friends

rstudio Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Kelly O'briant

Jeffrey Arnold | Solving R for data science | RStudio (2019)

While teaching a course using “R for Data Science”, I wrote a complete set of solutions to its exercises and posted them on GitHub. Then other people started finding them. And now I’m here. In this talk, I’ll discuss why I did it, and what I learned from the process, both what I learned about the Tidyverse itself, and what I learned from teaching it.

About Jeffrey: I am formerly an Assistant Professor of Political Science of Political Science University of Washington and Core Faculty Member in the Center for Statistics and the Social Sciences, an Instructor of Political Science and QuanTM Pre-Doctoral Fellow at Emory University. I received Ph.D. in political science at the University of Rochester. Prior to graduate studies, I was a Research Associate/Economist in the Money and Payments Studies research group at the Federal Reserve Bank of New York

rstudio tidyverse Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Jeffrey Arnold

Wes McKinney | Ursa Labs and Apache Arrow in 2019 | RStudio (2019)

Learn more about what’s happening at URSA labs at https://wesmckinney.com/archives.html

rstudio Wes McKinney Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats

Tyler Morgan-Wall | 3D mapping, plotting, and printing with rayshader | RStudio (2019)

Long form discussion: https://www.tylermw.com/3d-printing-rayshader/

rstudio Tyler Morgan-Wall Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats

Panel | Growth & change of careers, organizations & responsibility in data science | RStudio (2019)

Hosted by Eduardo Arino de la Rubia, Instagram

With: Hilary Parker, Data Scientist, Stitch Fix Karthik Ram, Data Science Fellow, UC Berkeley Angela Bassa, Director of Data Science, iRobot Tracy Teal, Executive Director, Carpentries

About the Author Eduardo Arino de la Rubia Technologist and Data Scientist driven to create software that people use, find useful, and pleasant. From programming through architecture, from green field to maintenance, software is interesting technologically, socially, and intellectually. I enjoy contributing to the process, either through leadership or individual effort, of creating software that is deployed joyfully and is as bug free as possible

rstudio Eduardo Arino De La Rubia Data Scientist Director of Data Science Hilary Parker Karthik Ram Angela Bassa Tracy Teal Rstudio Data Science Machine Learning Python Stats Data Visualization Data Viz Ggplot Technology Coding RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats

Karthik Ram | A guide to modern reproducible data science with R | RStudio (2019)

Resources: https://github.com/karthik/rstudio2019

rstudio Karthik Ram Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats

Ian Fellows | Don’t let long running tasks hang users: introducing IPC for Shiny | RStudio (2019)

Long running tasks in Shiny are not cancelable and typically lock the user interface while running. This talk introduces the ipc package, which helps you build dynamic applications when non-trivial computations are involved. ipc allows your Shiny Async workers to be cancelable and communicate intermediate results/progress back to the user interface.

About the Author Ian Fellows Dr. Ian Fellows is a professional statistician with research interests ranging over many sub-disciplines of statistics. His work in statistical visualization won the prestigious John Chambers Award in 2011, and in 2007-2008 his Texas Hold’em AI programs were ranked second in the world. Applied data analysis is his passion. He is accustomed to providing insightful analysis and operationalizing these analyses in enterprise systems using a variety of programming languages and tools including R, Python, Java, MapReduce and Spark

rstudio Shiny Ian Fellows Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats

Alan Dipert | Integrating React.js and Shiny | RStudio (2019)

React.js is a thriving JavaScript library that eases encapsulating and sharing sophisticated component libraries. The React.js ecosystem is filled with components for doing everything from color selection (react-color) to animation (react-spring). While it’s always been technically possible to integrate React.js components with Shiny applications, it hasn’t always been particularly obvious how. To make it easier, we augmented the excellent reactR package with functions specifically designed to make it easier to create new htmlwidgets, inputs, and outputs based on React.js components. In this talk, I will further motivate and demonstrate these new tools and do my best to empower the audience to try them out.

About the Author Alan Dipert Alan is a software engineer on the Shiny team at RStudio. In the past, he’s helped build web applications, reporting pipelines, and many things between. When he’s not working, Alan likes to spend his time reading or being with his family

Shiny Team

rstudio Shiny Alan Dipert Reactjs React.js Javascript Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats

Kevin Kuo | Introducing mlflow | RStudio (2019)

We introduce the R API for MLflow, which is an open source platform for managing the machine learning lifecycle. We demonstrate each component of the platform–Tracking, Projects, and Models–and describe how they can be leveraged in practical data science workflows.

About the Author Kevin Kuo Kevin is a software engineer working on open source packages for big data analytics and machine learning. He has held data science positions in a variety of industries and was a credentialed actuary. He likes mixing cocktails and studying about wine

rstudio Kevin Kuo MLFlow Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Api

Caitlin Hudon | Learning from eight years of data science mistakes | RStudio (2019)

Over the past eight years of doing data science, I’ve made plenty of mistakes, and I’d love to share them with you – including what I’ve learned and what I’d do differently with some hindsight. This talk will cover mistakes made during analyses (including communication when delivering results) team and infrastructure mistakes, plus some advice for incoming data scientists.

About the Author Caitlin Hudon My name is Caitlin Hudon and I am lead data scientist at OnlineMedEd, a startup in Austin. I have about eight years of experience doing data science-y things in a variety of industries including IoT, marketing, higher education, non-profits, and start-ups. I am also the co-founder of R-Ladies Austin, founder of the ‘ALL the Ladies in Tech’ quarterly happy hour here in ATX, and a member of the Fall 2017 NASA Datanaut class. Outside of data, I love tacos (especially trading taco spots in Austin), traveling (I’ve been to all 50 states and am working on the continents), the Cubs (including Will Ferrell’s Harry Caray impressions), and live music

rstudio Caitlin Hudon Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats

Gabor Csardi | pkgman A fresh approach to package installation | RStudio (2019)

The main goals of pkgman is to make package installation fast and more reliable. This allows new, simpler and safer workflows, such as separate package libraries for projects. In this talk, we will show the features that make pkgman fast, convenient and reliable. Features that make pkgman fast: * Concurrency: pkgman performs all downloads, package builds and installations concurrently by default. * Metadata and package cache: pkgman caches all metadata and all downloaded and locally built packages in its cache. * Lazyness: pkgman only downloads and installs packages if needed. Features that make pkgman convenient: * BioC and GitHub packages are supported seamlessly. * Informative UI. pkgman can lay out the installation/update plan, that the user needs to confirm. It returns data about downloads, builds, installations, etc. Features that make pkgman reliable: * Dependency solver. pkgman makes sure that you end up in consistent, working state of dependencies. * Private library: pkgman’s own dependencies do not affect your regular package library, and vice versa. pkgman does not load any packages from your regular library.

About the Author Gábor Csárdi Gábor is a software engineer at RStudio, working in Hadley’s team on R infrastructure packages

Gábor Csárdi

rstudio Gabor Csardi Pkgman Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats

Brooke Watson | R at the ACLU Joining tables to to reunite families | RStudio (2019)

Last year, over 2500 immigrant children were separated from their family while in government custody. Information about their status is scattered across several government agencies, and throughout the national class-action lawsuit “Ms. L vs ICE,” the Analytics team of the ACLU has been using R to join, deduplicate, validate, and analyze it. Using specifics of this case, this talk will address common challenges arising from human-generated data in spreadsheets. With generalizable examples, I will discuss data tidying, standardization, deduplication, and validation using the tidyverse, janitor, assertthat, and other packages. Finally, I will share best practices for requesting useful data from non-quantitative subject matter experts.

About the Author Brooke Watson I am a Data Scientist at the ACLU, where I use code and statistics to support civil rights litigation and advocacy. Previously, I worked in public health and disease research, most recently as a Research Scientist with the EcoHealth Alliance. I completed my Master’s degree in epidemiology from the London School of Hygiene and Tropical Medicine and swam for Tennessee’s Lady Vols as an undergrad

rstudio tidyverse Brooke Watson ACLU Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats

Mel Gregory | RStudio Cloud for education | RStudio (2019)

RStudio Cloud aims to take the friction out of doing data science with R, allowing students (and instructors) to skip over installation, setup, and IT challenges and go straight to the good stuff. In this talk, you will learn about RStudio Cloud, how it can streamline the learning process, and how to use it to facilitate teaching classes, workshops, and other learning situations. Additionally, we will highlight some best practices for using RStudio Cloud in an educational setting, and talk about other learning resources available.

About the Author Mel Gregory Mel is a software engineer and has been for a while. She likes simplicity

rstudio Mel Gregory Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats

Darby Hadley | RStudio Job Launcher Changing where we run R stuff | RStudio (2019)

RStudio Job Launcher provides the ability to start processes within batch processing systems and container orchestration platforms. In this talk, we will explore what is possible when you have the ability to launch containerized R sessions including scaling, isolating, and customizing environments. We will review examples of launching ad-hoc jobs as well as dockerized R sessions in Kubernetes using the Job Launcher.

About the Author Darby Hadley Darby is a QA engineer for multiple teams at RStudio. He has a passion for improving products, creating efficient processes, and helping people. Before joining RStudio he worked primarily in the video game industry

rstudio Darby Hadley Job Launcher Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats

Tonya Filz | The resilient R champion | RStudio (2019)

Merriam-Webster defines resilience as the ability to recover from or adjust easily to misfortune or change. As a Customer Success Representative who works alongside data scientists using RStudio’s toolchain, I’ve had a front row seat to the challenges faced by data scientists as they aim to promote the use of RStudio’s toolchain in their organization. This talk will focus on effective strategies that have been used to overcome some of the most difficult organizational barriers that are faced by data scientists using R. Specific topics will include funding barriers, IT support, server space, the “open source mentality”, and political pressures within organizations.

About the Author Tonya Filz Helping data scientists and data analysts succeed by leveraging RStudio to clean, visualize, analyze, and communicate conclusions to key business leaders

rstudio Tonya Filz Customer Success Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats

Felienne | Explicit Direct Instruction in Programming Education | RStudio (2019)

In education, there is and has always been debate about how to teach. One of these debates centers around the role of the teacher: should their role be minimal, allowing students to find and classify knowledge independently, or should the teacher be in charge of what happens in the classroom, explaining students all they need to know? These forms of teaching have many names, but the most common ones are exploratory learning and direct instruction respectively. While the debate is not settled, more and more evidence is presented by researchers that explicit direct instruction is more effective than exploratory learning in teaching language and mathematics and science. These findings raise the question whether that might be true for programming education too. This is especially of interest since programming education is deeply rooted in the constructionist philosophy, leading many programmers to follow exploratory learning methods, often without being aware of it. This talk outlines this history of programming education and additional beliefs in programming that lead to the prevalence of exploratory forms of teaching. We also explain the didactic principles of direct instruction, explore them in the context of programming, and hypothesize how it might look like for programming

rstudio Felienne Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS

Yihui Xie | pagedown Creating beautiful PDFs with R Markdown and CSS | RStudio (2019)

The traditional way to beautiful PDFs is often through LaTeX or Word, but have you ever thought of printing a web page to PDF? Web technologies (HTML/CSS/JavaScript) are becoming more and more amazing. It is entirely possible to create high-quality PDFs through Google Chrome or Chromium now. Web pages are usually single-page documents, but they can be paginated thanks to the JavaScript library Paged.js, so that you can have elements like headers, footers, and page margins for the printing purpose. In this talk, we introduce a new R package, pagedown (https://github.com/rstudio/pagedown) , to create PDF documents based on R Markdown and Paged.js. Applications of pagedown includes, but not limited to, books, articles, posters, resumes, letters, and business cards. With the power of CSS and JavaScript, you can typeset your documents with amazing elegance (e.g., a single line of CSS, “tr:nth-child(even) { background: #eee; }”, will give you a striped table, and “border-radius: 50%;” gives you a circular element) and power (e.g., HTML Widgets).

VIEW MATERIALS https://bit.ly/pagedown

pagedown rstudio Yihui Xie Pagedown Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS

Matt Dancho | Using R, the Tidyverse, H2O, and Shiny to reduce employee attrition | RStudio (2019)

An organization that loses 200 high-performing employees per year has a lost productivity cost of about $15M/year. This cost is massive, yet many organizations don’t know it exists. It doesn’t show up on a financial statement. Therefore, it goes unnoticed. This presentation showcases how several open source tools integrate to form a solution to the employee attrition problem. Specifically: (1) How the tidyverse enables problem identification through visualization. (2) How recipes + H2O can be combined to explain key relationships to attrition and predict employee attrition. (3) How Shiny can be used to create a powerful dashboard that empowers business leaders to make data-driven decisions across the organization

rstudio Shiny tidyverse Matt Dancho Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS

Hadley Wickham | vctrs Tools for making size and type consistent functions | RStudio (2019)

vctrs is a new package that provides tools (cognitive and computational) to ensure that functions behave consistently with respect to inputs of varying length and type. The end goal of vctrs is to be invisible to the end user of the tidyverse (simply enabling their predictions about function outputs to be more correct), but will help developers write functions that “just work”

Hadley Wickham

rstudio tidyverse vctrs Hadley Wickham Vctrs Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS

Tareef Kawaf | Welcome and the Posit Vision | Posit (2019)

From Posit President, TAREEF KAWAF

About Tareef: Tareef Kawaf is a software startup executive and current president of RStudio, Inc., a Massachusetts-based company that develops both open-source and commercial software for the R statistical programming language. Prior to joining RStudio, Mr. Kawaf served as senior vice president of engineering and operations at Brightcove, Inc.. Over 8 years he helped Brightcove build and operate the second-largest online video platform, helping it grow from 0 to 92M in revenue and complete its initial public offering (IPO). Mr. Kawaf jointly holds a patent for the “Method and System for Dynamic Pricing,” issued in 2001 which is a core component of Oracle’s ATG Commerce solutions and helps retailers define sophisticated rules for couponing, discounting, and personalized commerce. Mr. Kawaf received his B.S. degree in Computer Science with a minor in Mathematics from the University of Massachusetts Amherst in 1994. He and his family currently reside outside of Boston, MA

rstudio Tareef Kawaf Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Shiny in Production: Data Products at Scale Workshop - rstudio::conf(2019L)

What is the 2-day Shiny in Production Workshop? That’s a great question, I’m glad you asked.

Shiny in Production | Data Products at Scale Workshop - 2 Days

This two-day workshop will teach you the best practices that make production-quality Shiny application performance possible. Does 30,000 simultaneous users for a Shiny application sound challenging? Shiny is used all over the world to deliver interactive, visual data products from data science teams to internal and external audiences at scale. The course will not spend time on writing applications, but will focus on the open source and professional ecosystem around shiny that includes performance optimization, testing, and production deployment. This workshop relies on RStudio professional products for deployment.

You should take this workshop if you are comfortable writing Shiny apps but are afraid for what will happen when people start looking at them.

The workshop is taught by Sean Lopp, RStudio Solutions Engineer. Sean works with RStudio product teams and helps enterprise customers to realize the value of R and Shiny.

Speakers: Sean Lopp and a who’s-who of RStudio Engineers, R-Admins, and Shiny Developers

Advanced R Markdown 2-day Workshop - rstudio::conf(2019L)

What is the 2-day Advanced R Markdown Workshop? That’s a great question, I’m glad you asked.

This is a two-day hands-on workshop based on the book “R Markdown: The Definitive Guide”.

This workshop is designed for those who want to take their R Markdown skills to the next level. We’ll talk about many low-level details in the rmarkdown package and the whole R Markdown ecosystem. The two goals of this workshop are: 1) learn how to fully customize R Markdown output (HTML, LaTeX/PDF, Word, and PowerPoint); and 2) learn more about existing R Markdown extensions in the ecosystem, such as flexdashboard, bookdown, blogdown, pkgdown, xaringan, rticles, and learnr. We will also talk about how to use or develop new language engines (languages that are not R), how to develop HTML widgets, and integrate Shiny with R Markdown.

You should take this workshop if you have experience programming in R and want to learn how to take advantage of the amazing breadth and depth of R Markdown. You’ll get the most from it if you enjoy learning how R Markdown works under the hood (which will involve reading some source code), and are seriously interested in hacking (playing) with HTML, JavaScript, CSS, LaTeX, and command-line tools. We will give minimal tutorials on these languages and tools in the workshop, but it may be easier for you to keep pace with the instructor if you already know them before.

This workshop is led by Yihui Xie, Software Engineer and Data Scientist at RStudio. Yihui is the main author of the open-source knitr package for reproducible research and dynamic report generation. He has also created and co-authored several other R packages, including rmarkdown, bookdown, blogdown, xaringan, tinytex, DT, shiny, and leaflet. He has authored and co-authored four books: Dynamic Documents with R and knitr, bookdown: Authoring Books and Technical Documents with R Markdown, blogdown: Creating Websites with R Markdown, and R Markdown: The Definitive Guide.

Speakers: Yihui Xie and Alison Hill with guest lecturers Jiena Gu and Hao Zhu

Reproducible Examples with the reprex package

Reproducible Examples and the reprex package. https://speakerdeck.com/jennybc/reprex-reproducible-examples-with-r

Jump to: 0:08 Intro 0:40 Basic usage of reprex 3:35 Motivation, why use reprex? “Help me help you”

4:08 Define reprex? Three commons ways to use the term.

noun, a reproducible example
the reprex package. a tool to build R reprexs
reprex::reprex(), a function in reprex to make a reprex.

5:26 When should you use a reprex?

6:14 reprex installation and setup. How do you actually get repex on your machine? 7:59 Advanced setup and discussion. 9:45 Please use advanced features responsibly.

11:02 Why does the reprex package exist? Anyone who has helped teach R or dealt with github issues, twitter, stack overflow & RStudio community questions knows that helping people diagnose their coding problems can be hard. This tool comes from hard-won experience. It’s aim to is help people ask well formed questions and increase the chances of getting well formed answers quickly.

12:52 philosophy behind reprex

code that I can run
code that I don’t need to run
code that I can easily run

13:52 code that I can run.

17:25 Tips on writing good reprexs. Dos and don’ts.

18:52 How do I get my data into my reprex? Getting small data and CSV type data into your reprex is easy.
“I have a big hairy data object and I can only show their problem by using it”, but that’s not always the case.

21:02 code that I don’t need to run reprex gives your reader the code and reveals the output being produced by that code. For experienced coders, that might be enough to help you.

22:44 code that I can easily run Don’t copy and paste from the R console. This is usually annoying for your reader. Worse than console copy-pasta is the screenshot. (Many people think screenshots of code are downright offensive.)

25:03 reprex_clean If you copy someone else’s reprex into your consolve, it may include their output, making your new reprex a untidy. Here are tips for taking someone else’s reprex code and output, and create a clean reprex reply.

25:54 shock and awe More interesting features of the reprex package.

26:29 What about figures and plots in your reprex? So happy you asked about that. reprex will automatically upload your images to imgur.com.
28:23 Create a reprex by explicitly providing your code in the reprex call.
29:00 when you need your reprex to work in the current working directory.
30:45 Differently flavored markdown. Optimize your reprex markdown output for github, stack overflow, or the RStudio community.
30:31 Make your reprex create an R script, with your reprex outputs as comments. This is handy for pasting into an email or slack-type-app.
32:25 Rich text format, rtf output. (currently experimental feature as of this video)
33:06 supress the reprex add at the bottom of your reprex
33:19 Include session info.
33:54 Auto styling of your code. Good if you’re dealing with poorly formatting code.
34:25 Change your comments string.
34:32 Silence Tidyverse startup messages.
35:00 Capture a reprex that sends messages to standard output and standard input (e.g. package installation compilation messages).

36:13 Set up personal defaults for your reprex usage.

36:54 reprex RStudio addins; render reprex and reprex selection. These accelerate your use of reprex.

39:01 The human side of reproducible examples. How to ask questions in ways that are most likely to get answered. Sorry for the tough love, but this is important. Why are you always asked to give a reprex?

Experts try to use reproducible examples to ensure their advice works.
Making a good reprex is hard. But, you are asking them to solve a problem for you, so meet them halfway.
Creating reprexes is good coding practice.
Making a good reprex is often a good way to debug your issue in the embarrassment-free privacy of your own home.
reprexes lead to discussions more likely to help people in the future.

44:34 Behind the scenes of reprex

44:44 Thanks for those that helped make reprex possible.

Questions and Answers

46:05 can reprex capture variables and objects in the current environment? (not yet, maybe in development)
47:25 does reprex actually check that the code is self contained? (self contained)
48:08 does readr::read_csv support the text argument? (yep, just read the help manual for readr)

Shiny Train-the-Trainer Workshop - rstudio::conf(2019L)

What is the 2-day Shiny Train-the-Trainer Workshop? That’s a great question, I’m glad you asked.

Register at https://rstd.io/conf Learn more at https://rstd.io/conf-agenda

Shiny Train-the-Trainer Certification Workshop - 2 Day

Day 1 of the course will be co-taught by Mine Cetinkaya-Rundel and Garrett Grolemund, RStudio Data Scientists and Professional Educators.
On Day 2, Mine will teach the Shiny track and Garrett will teach the Tidyverse track.

This two-day workshop will equip you to teach R effectively. We will draw on RStudio’s experience teaching R to recommend tips for designing, teaching, and supporting short R courses.

On Day 1 of the course, you will learn practical activities that you can use immediately to improve your presentation style, learning outcomes, and student engagement. You will leave the class with a cognitive model of learning that you can use to develop your own effective workshops or courses within your organization. The course will also cover how to use RStudio Cloud and its curriculum of tutorials to jump-start your own lessons.

On Day 2 of the course, participants will have the option to choose one of two tracks: Teaching the Tidyverse or Teaching Shiny.

Teaching Shiny: Classroom examples will focus on teaching Shiny at the beginner and intermediate levels. The course materials will build on RStudio’s Mastering Shiny workshop as well as the upcoming book from the author of the Shiny package, Joe Cheng, and they will cover the entire lifecycle of a Shiny app: build ️ improving ️ share. Participants will receive the course materials for teaching Mastering Shiny. You should take this workshop if you work as a training partner and want to qualify as an RStudio Certified Shiny Instructor or if you are an advocate for R in your organization. You should be proficient in Shiny already and be prepared to submit examples of your work. Prior teaching experience is helpful, but not required. Please bring a laptop and a device that has video recording capabilities (such as a laptop or cell phone).

Instructors: Garrett Grolemund, Mine Çetinkaya-Rundel

Joe Cheng, Mine Çetinkaya-Rundel

Tidyverse Train-the-Trainer Certification Workshop - rstudio::conf(2019L)

What is the 2-day Tidyverse Train-the-Trainer Workshop? That’s a great question, I’m glad you asked.

Register at https://rstd.io/conf Learn more at https://rstd.io/conf-agenda

Tidyverse Train-the-Trainer Certification Workshop - 2 Days

Day 1 of the course will be co-taught by Mine Cetinkaya-Rundel and Garrett Grolemund, RStudio Data Scientists and Professional Educators.
On Day 2, Mine will teach the Shiny track and Garrett will teach the Tidyverse track.

This two-day workshop will equip you to teach R effectively. We will draw on RStudio’s experience teaching R to recommend tips for designing, teaching, and supporting short R courses.

On Day 1 of the course, you will learn practical activities that you can use immediately to improve your presentation style, learning outcomes, and student engagement. You will leave the class with a cognitive model of learning that you can use to develop your own effective workshops or courses within your organization. The course will also cover how to use RStudio Cloud and its curriculum of tutorials to jump-start your own lessons.

On Day 2 of the course, participants will have the option to choose one of two tracks: Teaching the Tidyverse or Teaching Shiny.

Teaching the Tidyverse: Classroom examples will focus on how to teach students to do data analysis with the Tidyverse. We will use Master the Tidyverse, which is an award-winning two-day workshop developed by RStudio, as an example. Participants will receive the course materials for teaching Master the Tidyverse. You should take this workshop if you work for a training partner and want to qualify as an RStudio Certified Tidyverse Instructor or if you are an advocate for R in your organization. You should be proficient in the Tidyverse already and be prepared to submit examples of your work. Prior teaching experience is helpful, but not required. Please bring a laptop and a device that has video recording capabilities (such as a laptop or cell phone).

Instructors: Garrett Grolemund, Mine Çetinkaya-Rundel

Mine Çetinkaya-Rundel

Data Manipulation Tools: dplyr – Pt 3 Intro to the Grammar of Data Manipulation with R

Data wrangling is too often the most time-consuming part of data science and applied statistics. Two tidyverse packages, tidyr and dplyr, help make data manipulation tasks easier. Keep your code clean and clear and reduce the cognitive load required for common but often complex data science tasks.

dplyr docs: dplyr.tidyverse.org/reference/

Pt. 1: What is data wrangling? Intro, Motivation, Outline, Setup https://youtu.be/jOd65mR1zfw

/01:44 Intro and what’s covered Ground Rules
/02:40 What’s a tibble
/04:50 Use View
/05:25 The Pipe operator:
/07:20 What do I mean by data wrangling?

Pt. 2: Tidy Data and tidyr https://youtu.be/1ELALQlO-yM

/00:48 Goal 1 Making your data suitable for R
/01:40 tidyr “Tidy” Data introduced and motivated
/08:10 tidyr::gather
/12:30 tidyr::spread
/15:23 tidyr::unite
/15:23 tidyr::separate

Pt. 3: Data manipulation tools: dplyr https://youtu.be/Zc_ufg4uW4U

00.40 setup
02:00 dplyr::select
03:40 dplyr::filter
05:05 dplyr::mutate
07:05 dplyr::summarise
08:30 dplyr::arrange
09:55 Combining these tools with the pipe (Setup for the Grammar of Data Manipulation)
11:45 dplyr::group_by

Pt. 4: Working with Two Datasets: Binds, Set Operations, and Joins https://youtu.be/AuBgYDCg1Cg Combining two datasets together

/00.42 dplyr::bind_cols
/01:27 dplyr::bind_rows
/01:42 Set operations dplyr::union, dplyr::intersect, dplyr::set_diff
/02:15 joining data dplyr::left_join, dplyr::inner_join, dplyr::right_join, dplyr::full_join,

Cheatsheets: https://www.rstudio.com/resources/cheatsheets/

Documentation: tidyr docs: tidyr.tidyverse.org/reference/

tidyr vignette: https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html dplyr docs: http://dplyr.tidyverse.org/reference/
dplyr one-table vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html
dplyr two-table (join operations) vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/two-table.html

Tidy Data and tidyr – Pt 2 Intro to Data Wrangling with R and the Tidyverse

Data wrangling is too often the most time-consuming part of data science and applied statistics. Two tidyverse packages, tidyr and dplyr, help make data manipulation tasks easier. Keep your code clean and clear and reduce the cognitive load required for common but often complex data science tasks.

http://tidyr.tidyverse.org/reference/

Pt. 1: What is data wrangling? Intro, Motivation, Outline, Setup https://youtu.be/jOd65mR1zfw

/01:44 Intro and what’s covered Ground Rules
/02:40 What’s a tibble
/04:50 Use View
/05:25 The Pipe operator:
/07:20 What do I mean by data wrangling?

Pt. 2: Tidy Data and tidyr https://youtu.be/1ELALQlO-yM

00:48 Goal 1 Making your data suitable for R
01:40 tidyr “Tidy” Data introduced and motivated
08:10 tidyr::gather
12:30 tidyr::spread
15:23 tidyr::unite
15:23 tidyr::separate

Pt. 3: Data manipulation tools: dplyr https://youtu.be/Zc_ufg4uW4U

00.40 setup
/02:00 dplyr::select
/03:40 dplyr::filter
/05:05 dplyr::mutate
/07:05 dplyr::summarise
/08:30 dplyr::arrange
/09:55 Combining these tools with the pipe (Setup for the Grammar of Data Manipulation)
/11:45 dplyr::group_by
/15:00 dplyr::group_by

Pt. 4: Working with Two Datasets: Binds, Set Operations, and Joins https://youtu.be/AuBgYDCg1Cg Combining two datasets together

/00.42 dplyr::bind_cols
/01:27 dplyr::bind_rows
/01:42 Set operations dplyr::union, dplyr::intersect, dplyr::set_diff
/02:15 joining data dplyr::left_join, dplyr::inner_join, dplyr::right_join, dplyr::full_join,

Cheatsheets: https://www.rstudio.com/resources/cheatsheets/

Documentation: tidyr docs: tidyr.tidyverse.org/reference/

tidyr vignette: https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html dplyr docs: http://dplyr.tidyverse.org/reference/
dplyr one-table vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html
dplyr two-table (join operations) vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/two-table.html

What is data wrangling? Intro, Motivation, Outline, Setup – Pt. 1 Data Wrangling Introduction

Data wrangling is too often the most time-consuming part of data science and applied statistics. Two tidyverse packages, tidyr and dplyr, help make data manipulation tasks easier. These videos introduce you to these tools. Keep your R code clean and clear and reduce the cognitive load required for common but often complex data science tasks.

Pt. 1: What is data wrangling? Intro, Motivation, Outline, Setup https://youtu.be/jOd65mR1zfw

01:44 Intro and what’s covered Ground Rules
02:40 What’s a tibble
04:50 Use View
05:25 The Pipe operator:
07:20 What do I mean by data wrangling?

Pt. 2: Tidy Data and tidyr https://youtu.be/1ELALQlO-yM

/00:48 Goal 1 Making your data suitable for R
/01:40 tidyr “Tidy” Data introduced and motivated
/08:15 tidyr::gather
/12:38 tidyr::spread
/15:30 tidyr::unite
/15:30 tidyr::separate

Pt. 3: Data manipulation tools: dplyr https://youtu.be/Zc_ufg4uW4U

00.40 setup
/02:00 dplyr::select
/03:40 dplyr::filter
/05:05 dplyr::mutate
/07:05 dplyr::summarise
/08:30 dplyr::arrange
/09:55 Combining these tools with the pipe (Setup for the Grammar of Data Manipulation)
/11:45 dplyr::group_by
/15:00 dplyr::group_by

Pt. 4: Working with Two Datasets: Binds, Set Operations, and Joins https://youtu.be/AuBgYDCg1Cg Combining two datasets together

/00.42 dplyr::bind_cols
/01:27 dplyr::bind_rows
/01:42 Set operations dplyr::union, dplyr::intersect, dplyr::set_diff
/02:15 joining data dplyr::left_join, dplyr::inner_join, dplyr::right_join, dplyr::full_join,

Cheatsheets: https://www.rstudio.com/resources/cheatsheets/

Documentation: tidyr docs: tidyr.tidyverse.org/reference/

tidyr vignette: https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html dplyr docs: http://dplyr.tidyverse.org/reference/
dplyr one-table vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html
dplyr two-table (join operations) vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/two-table.html

New York Times “For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights”, By STEVE LOHRAUG. 17, 2014 https://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-work.html

Working with Two Datasets: Binds, Set Operations, and Joins – Pt 4 Intro to Data Manipulation

Data wrangling is too often the most time-consuming part of data science and applied statistics. Two tidyverse packages, tidyr and dplyr, help make data manipulation tasks easier. Keep your R code clean and clear and reduce the cognitive load required for common but often complex data science tasks.

dplyr docs: dplyr.tidyverse.org/reference/

Pt. 1: What is data wrangling? Intro, Motivation, Outline, Setup https://youtu.be/jOd65mR1zfw

/01:44 Intro and what’s covered Ground Rules:
/02:40 What’s a tibble
/04:50 Use View
/05:25 The Pipe operator:
/07:20 What do I mean by data wrangling?

Pt. 2: Tidy Data and tidyr https://youtu.be/1ELALQlO-yM

/00:48 Goal 1 Making your data suitable for R
/01:40 tidyr “Tidy” Data introduced and motivated
/08:10 tidyr::gather
/12:30 tidyr::spread
/15:23 tidyr::unite
/15:23 tidyr::separate

Pt. 3: Data manipulation tools: dplyr https://youtu.be/Zc_ufg4uW4U

/00.40 setup
/02:00 dplyr::select
/03:40 dplyr::filter
/05:05 dplyr::mutate
/07:05 dplyr::summarise
/08:30 dplyr::arrange
/09:55 Combining these tools with the pipe (Setup for the Grammar of Data Manipulation)
/11:45 dplyr::group_by

Pt. 4: Working with Two Datasets: Binds, Set Operations, and Joins https://youtu.be/AuBgYDCg1Cg Combining two datasets together

00.42 dplyr::bind_cols
01:27 dplyr::bind_rows
01:42 Set operations dplyr::union, dplyr::intersect, dplyr::set_diff
02:15 joining data - dplyr::left_join, dplyr::inner_join, - dplyr::right_join, dplyr::full_join,