ML
Resources tagged ML#
You Can Lead a Horse to Water…Changing the Data Science Culture for Veterinary Scientists
Presented by Jill MacKay
A retrospective look at supporting data science skills in a research-focussed veterinary school
This is a talk about environment management, but not in the way you’re thinking. In many industries, domain-specific experts need enough understanding of data science to support their work and communicate with data scientists, but often have insufficient training in these skills, and limited time with which to obtain data science skills and practice them. This is particularly challenging for those who are interdisciplinary and have limited control over their workload, such as medics and field scientists. In this talk, an educational scientist describes the previous 10 years of supporting veterinary scientists to adopt open science practices surrounding data science. What worked, what failed miserably, and reflections on why it can be so hard to get a horse to drink.
Materials:
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Teaching data science. Session Code: TALK-1095
Why You Should Stop Networking and Start Making Friends - posit::conf(2023)
Presented by Libby Heeren
When we think about making connections, we think about networking. I’d like you to forget about networking and start thinking about making friends. I’ll share my perspective as a community builder and host of the Data Humans podcast on how I cultivated a community of practice for myself and how I became a force multiplier who increases engagement.
You’ll learn how I made genuine human connections, the practical steps to making data friends, the power of vulnerability, and why we all benefit when we show up as our whole selves.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Lightning talks. Session Code: TALK-1167
Why You Should Add Logging To Your Code (and make it more helpful) - posit::conf(2023)
Presented by Daren Eiri
Learn how the log4r package can help you better understand the errors your code may produce, and how to also get promptly alerted for severe errors by leveraging cloud monitoring solutions like Azure Monitor or AWS CloudWatch
When an error happens in your API, Shiny App, or quarto document, it is not always clear what line of code you need to look at, and the error messages aren’t always helpful. By walking through a simple API example, I show how you can use logging packages like log4r to provide error messages that make sense to you. I also show how you can use cloud-based data collection platforms like Azure Monitor or AWS CloudWatch to set up alerts, so you can get notified by email or text message for those severe errors that you need to be immediately aware of.
Gain more visibility into the health of your code by incorporating logging and pushing your logs to the cloud.
Materials: https://dareneiri.github.io/positconf2023/
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Lightning talks. Session Code: TALK-1164
What’s New in the Torch Ecosystem - posit::conf(2023)
Presented by Daniel Falbel
torch is an R port of PyTorch, a scientific computing library that enables fast and easy creation and training of deep learning models.
In this talk, you will learn about the latest features and developments in torch, such as luz, a higher-level interface that simplifies your model training code, and vetiver, a new integration that allows you to deploy your torch models with just a few lines of code. You will also see how torch works well with other R packages and tools to enhance your data science workflow. Whether you are new to torch or already an experienced user, this talk will show you how torch can help you tackle your data science challenges and inspire you to build your own models.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Lightning talks. Session Code: TALK-1163

What’s New in Quarto?* - posit::conf(2023)
Presented by Charlotte Wickham
It’s been over a year since Quarto 1.0, an open-source scientific and technical publishing system, was announced at rstudio::conf(2022). In this talk, I’ll highlight some of the improvements to Quarto since then. You’ll learn about new formats, options, tools, and ways to supercharge your content. And, if you haven’t used Quarto yet, come to see some reasons to try it out.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Quarto (1). Session Code: TALK-1072

What I Wish I Knew Before Becoming a Data Scientist - posit::conf(2023)
Presented by Kaitlin Bustos
In this talk, I’m sharing my personal journey as a data scientist and the key lessons learned along the way. I’ll emphasize the importance of finding a positive community of like minded-allies, persevering through setbacks as success is not linear, and exploring by embracing the broad nature of the data science field. By sharing my experiences and acknowledging the challenges I’ve faced attendees will gain a fresh perspective on what it takes to succeed in a data science career and inspire them to pursue their passions in the field.
Overall, this talk aims to provide a glimpse into the reality of a data science career. Attendees will take away a sense of motivation and empowerment to find their own unique path to success.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Lightning talks. Session Code: TALK-1169
What an Early 2000s Reality Show Taught Me about File Management - posit::conf(2023)
Presented by Reiko Okamoto
Clutter, whether it’s physical or digital, destroys our ability to focus; home organization ideas can be extended to create an workspace where analysts feel inspired to work with data.
Ideas from home organization shows are surprisingly applicable to file management. Using a room divider to establish dedicated zones for different activities in a studio apartment is analogous to creating self-contained projects in RStudio. Likewise, swapping mismatched hangers with matching ones to tidy a closet resembles the adoption of a file naming convention to make a directory easier to navigate.
In this talk, I will share good practices in file management through the lens of home organization. We all know that clutter, whether it is in our physical space or on our machine, destroys our ability to focus. These practices will help R users of all levels create a serene, relaxing environment where they feel inspired to work with data.
https://reikookamoto.github.io/; https://github.com/reikookamoto/posit-conf-2023-neat
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Getting %$!@ done: productive workflows for data science. Session Code: TALK-1090
We Converted our Documentation to Quarto - posit::conf(2023)
Presented by Melissa Van Bussel
Elevate your Quarto projects to new heights with these practical tips and tricks!
“Wiki”, “User Guide”, “Handbook” – whatever you call yours, we converted ours to Quarto!
A year ago, my team’s documentation, which had been created using Microsoft Word, was large and lacked version control. Scrolling through the document was slow, and, due to confidentiality reasons, only one person could edit it at a time, which was a significant challenge for our team of multiple developers. After realizing we needed a more flexible solution, we successfully converted our documentation to Quarto.
In this talk, I’ll discuss our journey converting to Quarto, the challenges we faced along the way, and tips and tricks for anyone else who might be looking to adopt Quarto too.
Slides: https://melissavanbussel.quarto.pub/posit-conf-2023; Code for slides: https://github.com/melissavanbussel/posit-conf-2023; My YouTube: https://www.youtube.com/c/ggnot2; My website: https://www.melissavanbussel.com/; My Twitter: https://twitter.com/melvanbussel; My LinkedIn: https://www.linkedin.com/in/melissavanbussel/
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Quarto (2). Session Code: TALK-1140
Visualizing Data Analysis Pipelines with Pandas Tutor and Tidy Data Tutor - posit::conf(2023)
Presented by Sean Kross
The data frame is a fundamental data structure for data scientists using Python and R. Pandas and the tidyverse are designed to center building pipelines for the transformation of data frames. However, within these pipelines it is not always clear how each operation is changing the underlying data frame. To explain each step in a pipeline data science instructors resort to hand-drawing diagrams to illustrate the semantics of operations such as filtering, sorting, and grouping.
In this talk, I will introduce Pandas Tutor and Tidy Data Tutor, step-by-step visual representation engines of data frame transformations. Both tools illustrate the row, column, and cell-wise relationships between an operation’s input and output data frames.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Teaching data science. Session Code: TALK-1096
Validating and Testing R Dataframes with Pandera via reticulate - R-Python Interoperability
Presented by Niels Bantilan
Original Full Title: Validating and Testing R Dataframes with Pandera via reticulate: A Case Study in R-Python Interoperability
Data science and machine learning practitioners work with data every day to analyze and model them for insights and predictions. A major component of any project is data quality, which is a process of cleaning, and protecting against flaws in data that may invalidate the analysis or model. Pandera is an open source data testing toolkit for dataframes in the Python ecosystem: but can it validate R dataframes?
This talk is composed of three parts: first I’ll describe what data testing is and motivate why you need it. Then, I’ll introduce the iterative process of creating and refining dataframe schemas in Pandera. Finally, I’ll demonstrate how to use it in R with the reticulate package using a simple modeling exercise as an example.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: R or Python? Why not both!. Session Code: TALK-1123
Using R, Python, and Cloud Infrastructure to Battle Aquatic Invasive Species - posit::conf(2023)
Presented by Uli Muellner and Nicholas Snellgrove
Invasive species are a huge threat to lake ecosystems in Minnesota. With over 10,000 water bodies across the state, having up-to-date data and decision support is critical. Researchers at the University of Minnesota have created four complex R and Python models to support lake managers, all pulled together and presented with the most recent infestation data available.
Come along with us to see how we connected these models in the AIS Explorer, a decision support application built in Shiny to help prioritize risks and placing watercraft inspectors, using tools like OCPU and cloud toolings like Lambda, EventBridge and AWS S3.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: R or Python? Why not both!. Session Code: TALK-1118
Using R with Databricks Connect - posit::conf(2023)
Presented by Edgar Ruiz
Spark Connect, and Databricks Connect, enable the ability to interact with Spark stand-alone clusters remotely. This improves our ability to perform Data Science at-scale. We will share the work in sparklyr, and other products, that will make it easier for R users to take advantage this new framework.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Tidy up your models. Session Code: TALK-1084

Using R to develop production modeling workflows at Mayo Clinic - posit::conf(2023)
Presented by Brendan Broderick
Developing workflows that help train models and also help deploy them can be a difficult task. In this talk I will share some tools and workflow tips that I use to build production model pipelines using R. I will use a project of predicting patients who need specialized respiratory care after leaving the ICU as an example. I will show how to use the targets package to create a reproducible and easy to manage modeling and prediction pipeline, how to use the renv package to ensure a consistent environment for development and deployment, and how to use plumber, vetiver, and shiny applications to make the model accessible to care providers.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Leave it to the robots: automating your work. Session Code: TALK-1149
Using Data to Protect Traditional Lifeways - posit::conf(2023)
Presented by Angie Reed
The spirit of Penobscot Nation’s work to protect the health of their relative, the Penobscot River, is embodied in the Penobscot water song which says ““Water, we love you, thank you so much water, we respect you.”” Because the Penobscot River is not a natural resource - she is a relative, family - this song describes the foundation of our efforts to protect her health and well-being. The identity of Penobscot people cannot be disconnected from the river, and protecting this traditional lifeway is at the heart of our work.
For over a decade we have used R to manage, transform, analyze, and visualize data, and the free, open-source Posit products help us leave a legacy of good data management and the ability to share results with Penobscot Nation citizens. You will learn more about how our use of R has helped us achieve more stringent protections for the Penobscot River and how we engage young people in every step of this work. We are also part of a larger network of tribal environmental professionals, working together to learn R and share data and insights. We will give you information about how you can volunteer to help expand the network of folks providing technical assistance on any R and RStudio related topics.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: End-to-end data science with real-world impact. Session Code: TALK-1144
USGS R Package Development: 10-year Reflections - posit::conf(2023)
Presented by Laura DeCicco
Ten years ago, the first set of git commits was submitted to a new R software package repository “dataRetrieval” with the goal to provide an easy way for R users to retrieve U.S Geological Survey (USGS) water data. At that time, the perception within the USGS was the use of R was exclusive to an elite group of “very serious scientists.” Fast forward, we now find many newer USGS hires having a solid grasp of the language from the start along with the use of R in a wide variety of applications.
In this talk, I’ll discuss my experiences maintaining the dataRetrieval package, how it’s shaped my career, impacted USGS R usage, and why data providers should consider sponsoring their own R packages wrapping their data API services.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Lightning talks. Session Code: TALK-1171
Unlock the Power of DataViz Animation and Interactivity in Quarto - posit::conf(2023)
Presented by Deepsha Menghani
Plot animated and interactive visualizations with Plotly and Crosstalk in Quarto using R. In thi sintro to Plotly & Crosstalk in R: Using code examples, learn to integrate dashboard elements into Quarto with animated plots, interactive widgets (checkboxes), and linked plots via brushing.
This talk showcases how to use packages, such as Plotly and Crosstalk, to create interactive data visualizations and add dashboard-like elements to Quarto. Using a fun dataset available through the “Richmondway” package, we examine the number of times Roy Kent uses salty language throughout all seasons of ““Ted Lasso.”” We illustrate this using animated plots, interactive selection widgets such as checkboxes, and by linking two plots with brushing capabilities.
Materials:
- Slides: https://deepshamenghani.github.io/posit_plotly_crosstalk/#/title-slide
- Code repo: https://github.com/deepshamenghani/posit_plotly_crosstalk
- Richmondway data package: https://github.com/deepshamenghani/richmondway
- In-Depth Guide to Creating and Publishing an R Data Package (Richmondway) Using Devtools: https://medium.com/p/245b0fd4c359
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Quarto (2). Session Code: TALK-1143
Towards the Next Generation of Shiny UI
Presented by Carson Sievert
Create awesome looking and feature rich Shiny dashboards using the bslib R package.
Shiny recently celebrated its 10th birthday, and since its birth, has grown tremendously in many areas; however, a hello world Shiny app still looks roughly like it did 10 years ago. The bslib R package helps solve this problem making very easy to apply modern and customizable styling your Shiny apps, R Markdown / Quarto documents, and more. In addition, bslib also provides dashboard-focused UI components like expandable cards, value boxes, sidebar layouts, and more to help you create delightful Shiny dashboards.
Materials:
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Shiny user interfaces. Session Code: TALK-1124

tidymodels: Adventures in Rewriting a Modeling Pipeline - posit::conf(2023)
Presented by Ryan Timpe
An overview of the benefits unlocked on our data science team by adopting tidymodels.
Data science sure has changed over the past few years! Everyone’s talking about production. RStudio is now Posit. Models are now tidy.
This talk is about embracing that change and updating existing models using the tidymodels framework. I recently completed this change, letting go of our in-production code and revisioning it with tidymodels. My team ended up with a faster, more scalable pipeline that enabled us to better automate our workflow and increase our scale while improving our stakeholders’ experiences.
I’ll share tips and tricks for adopting the tidymodels framework in existing products, best practices for learning and upskilling teams, and advice for using tidymodel packages to build more accessible data science tools.
Materials: https://www.ryantimpe.com/files/tidymodels_adventures_positconf2023.pdf
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Tidy up your models. Session Code: TALK-1082
The Road to Easier Shiny App Deployments - posit::conf(2023)
Presented by Liam Kalita
We’re often helping developers to assess, fix and improve their Shiny apps, and often the first thing we do is see if we can deploy the app. If you can’t deploy your Shiny app, it’s a waste of time. If you can deploy it successfully, then at the very least it runs, so we’ve got something to work with.
There are a bunch of reasons why apps fail to deploy. They can be easy to fix, like Hardcoded secrets, fonts, or missing libraries. Or they can be intractable and super frustrating to deal with, like manifest mismatches, resource starvation, and missing libraries.
At the end of this talk, I want you to know how to identify, investigate and proactively prevent Shiny app deployment failures from happening.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: The future is Shiny. Session Code: TALK-1089
The Power of Prototyping in R Shiny: Saving Millions by Building the Right Tool - posit::conf(2023)
Presented by Maria Grycuk
The development of software can be costly and time-consuming. If the end users are not involved in the process from the start the tool we built may not meet their needs. In this presentation, I will discuss how prototyping in Shiny can help you build the right tool and save you from spending millions of dollars on a tool no one will use. I will explore the advantages of using Shiny for prototyping, particularly its ability to rapidly build interactive applications. I will also discuss how to design effective prototypes, including techniques for gathering user feedback and using that feedback to refine your tool. I will emphasize the importance of presenting real-life data, particularly when building data-driven tools.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Shiny user interfaces. Session Code: TALK-1125
The People of Posit: Bringing Personality to R Packages - posit::conf(2023)
Presented by JP Flores and Sarah Parker
The R programming language offers the versatility to perform statistical analyses, create publication-ready plots, and render high-quality reports and presentations. Despite having this environment of indispensable tools, it can be daunting for a beginner-level programmer to get started. Luckily, the Posit community is one of a kind and values inclusivity, collaboration, and empathy. By putting a face to the R packages we use on a daily basis, we hope to make every programmer feel included and capable. We want to inspire attendees to create their own projects or packages, connect with others inside and outside of their field of expertise, and challenge themselves to learn something new, knowing the community is right there to support them.
Materials: http://www.sarmapar.com/people_of_posit/
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Lightning talks. Session Code: TALK-1165
The Need for Speed - AccelerateR-ing R Adoption in GSK - posit::conf(2023)
Presented by Ben Arancibia
How does a risk-averse Pharma Biostatistics organization with 900+ people switch from using proprietary software to using R and other open-source tools for delivering clinical trial submissions? First slowly, then all at once. GSK started the transition of using R for its clinical trial data analysis in 2020 and now uses R for our regulatory-reviewed outputs. The AccelerateR Team, an agile pod of R experts and data scientists, rotates through GSK Biostatistics study teams sitting side by side to answer questions and mentor during this transition.
We will share our experience from AccelerateR and how other organizations can use our learnings to scale R from pilots to full enterprise adoption and contribute to open source industry R packages.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Pharma. Session Code: TALK-1068
The ‘I’ in Team: Peer-to-Peer Best Practices for Growing Data Science Teams - posit::conf(2023)
Presented by Liz Roten
R users don’t always come in sets. Often, you may be the only user on in the cubicle-block. But, one miraculous day, your manager finally fills the void and you welcome more folks on your team. Suddenly, the little R system you created to suit your needs, like a custom package, code styling, and file organization, isn’t just for you.
Want to suddenly overhaul that one package you wrote two years ago? It probably won’t work when your colleagues try to update it.
Your new teammates are data.table fans, but you prefer the tidyverse. Do you need to refactor? Are style choices, like indentation important when collaborating, or are you just being persnickety?
In this talk, you will learn how to bring new teammates on board and blend your respective styles without pulling your hair out.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Building effective data science teams. Session Code: TALK-1063
Thanks, I Made It with Quartodoc - posit::conf(2023)
Presented by Isabel Zimmerman
When Python package developers create documentation, they typically must choose between mostly auto-generated docs or writing all the docs by hand. This is problematic since effective documentation has a mix of function references, high-level context, examples, and other content.
Quartodoc is a new documentation system that automatically generates Python function references within Quarto websites. This talk will discuss pkgdown’s success in the R ecosystem and how those wins can be replicated in Python with quartodoc examples. Listeners will walk away knowing more about what makes documentation delightful (or painful), when to use quartodoc, and how to use this tool to make docs for a Python package.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Data science with Python. Session Code: TALK-1139

Teaching Data Science in Adverse Circumstances: Posit Cloud and Quarto to the Rescue - posit::conf
Presented by Aleksander Dietrichson
The focus of this presentation is on the challenges faced by teachers of data science whose students are not quantitatively inclined and may face some adversity in terms of technology resources available to them and potential language barriers. I identify three main areas of challenges and show how at Universidad Nacional de San Martín (Argentina) we addressed each of the areas through a combination of original curriculum redesign, production of course materials appropriate for the students in question; and the use of OS, and some Posit products, i.e.:posit.cloud and Quarto. I show how these technologies can be used as a pedagogical tool to overcome the challenges mentioned, even on a shoestring budget.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Teaching data science. Session Code: TALK-1094
Take it in Bits: Using R to Make Eviction Data Accessible to the Legal Aid Community - posit::conf
Presented by Logan Pratico
One in five low-income renter households in the US experienced falling behind on rent or being threatened with eviction in 2021. Yet most are unrepresented when facing eviction in court. The complex and fast-paced legal system obscures access to timely information, leaving tenants without assistance.
In this talk, I discuss the Civil Court Data Initiative’s use of R alongside AWS Cloud and SQL to analyze disaggregate eviction records. I focus on the integration of RMarkdown with Amazon Athena and EC2 to create weekly eviction reports across 20 states for legal aid groups working to assist tenants. The upshot: accessible eviction data to help legal aid providers better address local legal needs.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: End-to-end data science with real-world impact. Session Code: TALK-1146
Sustainable Growth of Global Communities: R-Ladies’ Next Ten Years - posit::conf(2023)
Presented by Riva Quiroga
In this talk we share how good programming practices inspire the way we manage the R-Ladies community in order to make it sustainable.
R-Ladies’ first ten years were about growing the community: from being just one chapter in 2012 to becoming a global organization in 2016, and now fostering more than 230 chapters worldwide. But how can we face the challenges of growing an organization based solely on volunteer work?
In this talk, we discuss how good programming practices –such as modularity, refactoring, and testing– inspire the way we see the sustainable management of an ever-growing community. To that end, we will present our most recent efforts at creating and documenting workflows, distributing the workload, and automating tasks that allow volunteers to focus their time where it is most needed.
After watching this talk, you will get some ideas on how to support volunteers in your own community or project, and on how to use GitHub Actions to automate workflows and tasks.
Learn more and join at: https://rladies.org/
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: It takes a village: building and sustaining communities. Session Code: TALK-1128
Succeed in the Life Sciences with R/Python and the Cloud - posit::conf(2023)
Presented by Colby Ford
This talk covers best practices and lessons learned surrounding the use of R and Python by technical teams in the cloud, focusing on Posit Workbench, Azure ML, and Databricks.
In the life sciences, whether it’s pharma, biotech, research, or another type of organization, we are unique in that we blend scientific knowledge with technical skills to extract insights from large, complex datasets. In the cloud, we can architect solutions to help us scale, automate, and collaborate. Interestingly, the use of R and Python by bioinformatics, genomics, biostatistics, and data science teams can be challenging in a cloud-first world where all the data is somewhere other than your laptop (like a data lake).
In this talk, I will share best practices and lessons learned surrounding the use of R and Python by technical teams in the cloud. We’ll focus on the use of Posit Workbench and RStudio on various cloud services such as Azure ML and Databricks.
Tuple, The Cloud Genomics Company: https://tuple.xyz
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Pharma. Session Code: TALK-1069
Styling and Templating Quarto Documents - posit::conf(2023)
Presented by Emil Hvitfeldt
Quarto is a powerful engine to generate documents, slides, books, websites, and more. The default aesthetics looks good, but there are times when you want and need to change how they look. This is that talk.
Whether you want your slides to stand out from the crowd, or you need your documents to fit within your corporate style guide, being able to style Quarto documents is a valuable skill.
Once you have persevered and created the perfect document, you don’t want the effort to go to waste. This is where templating comes in. Quarto makes it super easy to turn a styled document into a template to be used over and over again.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Compelling design for apps and reports. Session Code: TALK-1106

Speeding Up Plots in R/Shiny - posit::conf(2023)
Presented by Ryszard Szymański
A slow plots can ruin the user experience of our dashboard. This talk covers techniques for speeding up the rendering process of our visualisations.
Slow dashboards lead to a poor user experience and cause users to lose interest, or even become frustrated. A common culprit of this situation is a slowly rendering plot.
During the talk, we will dive deeper into how plots are rendered in Shiny, identify common bottlenecks that can occur during the rendering process, and learn various techniques for improving the speed of plots in R/Shiny dashboards.
These techniques will range from more efficient data processing to library-specific optimisations at the browser level.
Materials: I’d like to include a link to my linkedin profile: https://www.linkedin.com/in/ryszard-szyma%C5%84ski-310a7017a/
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Lightning talks. Session Code: TALK-1172
Solving a Secure Geocoding Problem (That Hardly Anybody Has) - posit::conf(2023)
Presented by Tesla DuBois
Due to data security concerns, the strictest health researchers won’t send patient addresses to remote servers for geocoding. The only existing methods for offline geocoding are expensive, cumbersome, or require working with code - all limiting factors for many researchers. So, a couple of classmates and I made a standalone desktop application using shell, Docker, PostGIS, and Python to geocode addresses through a simple GUI without ever sending them off the local machine. Come for the technical ins and outs and stay for the anecdotes about how my R background played into the daunting, frustrating, but ultimately successful task of creating a data science tool using unfamiliar technologies.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Developing your skillset; building your career. Session Code: TALK-1111
Small Package, Broad Impact: How I Discovered the Ultimate New Hire Hack - posit::conf(2023)
Presented by Trang Le
Onboarding new hires can be a challenging process, but taking a problem-focused approach can make it more meaningful and rewarding. In this talk, I will share how I discovered the ultimate new hire hack by creating two small packages that gave me the confidence I needed when I started at BMS. Through building these packages, I not only learned R things like using bslib and making font files available for published dashboards, but also gained a deep understanding of my company’s internal systems and workflows, and connected with my team via lots of questions. The resulting packages are still heavily used today.
Join me to discover how small packages can have a broad impact and what hiring managers can do to help.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Developing your skillset; building your career. Session Code: TALK-1112
Siuba and duckdb: Analyzing Everything Everywhere All at Once - posit::conf(2023)
Presented by Michael Chow
Every data analysis in Python starts with a big fork in the road: which DataFrame library should I use?
The DataFrame Decision locks you into different methods, with subtly different behavior::
- different table methods (e.g. polars
.with_columns()vs pandas.assign()) - different column methods (e.g. polars
.map_dict()vs pandas.map())
In this talk, I’ll discuss how siuba (a dplyr port to python) combines with duckdb (a crazy powerful sql engine) to provide a unified, dplyr-like interface for analyzing a wide range of data sources‚ whether pandas and polars DataFrames, parquet files in a cloud bucket, or pins on Posit Connect.
Finally, I’ll discuss recent experiments to more tightly integrate siuba and duckdb.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Databases for data science with duckdb and dbt. Session Code: TALK-1101

Side Effects of a Year of Blogging - posit::conf(2023)
Presented by Millie Symns
A big part of being in the R community is sharing your knowledge in different forums, no matter how big or small. So what better way to do that than a blog? And what better way than using R and Posit products to build and maintain that blog and website? This was the route I took to challenge myself in putting myself out there more in the community to find my voice, share my knowledge and learn new things.
In this talk, I will reflect on lessons learned and gains I have spent the past year blogging and sharing my website for all to see. The side effects include professional and personal benefits - from a clear profile of my skills to the progression of the development of my art. You may leave inspired to try the challenge for yourself.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: It takes a village: building and sustaining communities. Session Code: TALK-1130
ShinyUiEditor: From Alpha to Powerful Shiny App Development Tool - posit::conf(2023)
Presented by Nick Strayer
Since its alpha debut at last year’s conference, the ShinyUiEditor has experienced continuous development, evolving into a powerful tool for crafting Shiny app UIs. Some key enhancements include the integration of new bslib components and the editor’s ability to create or navigate to existing server bindings for inputs and outputs.
In addition to new features, the editor is now available as a VSCode extension enabling it to integrate smoothly into more developers’ workflows. This talk will showcase how these new capabilities empower users to efficiently create visually appealing and production-ready applications with ease.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Shiny user interfaces. Session Code: TALK-1126

Shiny New Tools for Scaling your Shiny Apps - posit::conf(2023)
Presented by Joe Kirincic
So you have a Shiny app your org loves, but as adoption grows, performance starts getting sluggish. Profiling reveals your cool interactive plots are the culprit. What can you do to make things snappy again? We can increase the number of app instances, sure, but suppose that isn’t an option for us. Another approach is to shift the plotting work from the server onto the client.
In this talk, we’ll learn how to leverage two Javascript projects, DuckDB-WASM and Observable’s Plot.js, in our Shiny app to create fast, flexible interactive visualizations in the browser without burdening our app’s server function. The end result is an app that can scale to more users without needing to increase compute resources.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: The future is Shiny. Session Code: TALK-1088
Shiny for Python Machine Learning Apps with pandas, scikit-learn and TensorFlow - posit::conf(2023)
Presented by Chelsea Parlett-Pelleriti
With the introduction of Shiny for Python in 2022, users can now use the power of reactivity with their favorite Python packages. Shiny can be used to build interactive reports, dashboards, and web apps, that make sharing insights and results both simple and dynamic. This includes apps to display and explore popular Machine Learning models built with staple Python packages like pandas, scikit-learn, and TensorFlow. This talk will demonstrate how to build simple Shiny for Python apps that interface with these packages, and discuss some of the benefits of using Shiny for Python to build your web apps.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: The future is Shiny. Session Code: TALK-1087
Shiny Developer Secrets: Insights From Over 1200 Applicants and What You MUST Know to Shine
Presented by Vedha Viyash
Over 1,200 candidates applied for the R/Shiny developer role at Appsilon in the last year, and I will be sharing some insights that we have gained from going through the qualitative and quantitative feedback collected from every round of the interview process.
I will be sharing some key takeaways that would help you focus on things that will make you a better Shiny developer. From reactivity to software testing, there are multiple skills that make up a good Shiny developer and you will get to know the major gaps and how to focus on them.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Lightning talks. Session Code: TALK-1173
Serenity Now, Productivity Later: Focus Your Project Stack with The Gonzalez Matrix - posit::conf
Presented by Patrick Tennant
How should you respond when your boss has too many good ideas for data science projects? In this talk, I’ll review our use of an adapted version of the Eisenhower Matrix that lays out our projects according to the effort required and the value they will produce. Given the functionally unlimited number of data science projects a team could do, learn how we keep our team focused on valuable work while reducing the stress of a never-ending list of projects.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Building effective data science teams. Session Code: TALK-1065
Scale Your Data Validation Workflow With {pointblank} and Posit Connect - posit::conf(2023)
Presented by Michael Garcia
For the Data Services team at Medable, our number one priority is to ensure the data we collect and deliver to our clients is of the highest quality. The {pointblank} package, along with Posit Connect, modernizes how we tackle data validation within Data Services.
In this talk, I will briefly summarize how we develop test code with {pointblank}, share with {pins}, execute with {rmarkdown}, and report findings with {blastula}. Finally, I will show how we aggregate data from test results across projects into a holistic view using {shiny}.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Leave it to the robots: automating your work. Session Code: TALK-1058
Running R-Shiny without a Server - posit::conf(2023)
Presented by Joe Cheng
A year ago, Posit announced ShinyLive, a deployment mode of Shiny that lets you run interactive applications written in Python, without actually running a Python server at runtime. Instead, ShinyLive turns Shiny for Python apps into pure client-side apps, running on a pure client-side Python installation.
Now, that same capability has come to Shiny for R, thanks to the webR project.
In this talk, I’ll show you how you can get started with ShinyLive for R, and why this is more interesting than just cheaper app hosting. I’ll talk about some of the different use cases we had in mind for ShinyLive, and help you decide if ShinyLive makes sense for your app.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: I can’t believe it’s not magic: new tools for data science. Session Code: TALK-1151

Reproducible Manuscripts with Quarto - posit::conf(2023)
Presented by Mine Çetinkaya-Rundel
In this talk, we present a new capability in Quarto that provides a straightforward and user-friendly approach to creating truly reproducible manuscripts that are publication-ready for submission to popular journals. This new feature, Quarto manuscripts, includes the ability to produce a bundled output containing a standardized journal format, source documents, source computations, referenced resources, and execution information into a single bundle that is ingested into journal review and production processes. We’ll demo how Quarto manuscripts work and how you can incorporate them into your current manuscript development process as well as touch on pain points in your current workflow that Quarto manuscripts help alleviate.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Quarto (1). Session Code: TALK-1070

Reliable Maintenance of Machine Learning Models - posit::conf(2023)
Presented by Julia Silge
Maintaining machine learning models in production can be quite different from maintaining general software projects, because of the unique statistical characteristics of ML models.
In this talk, learn about model drift, the different ways the word “performance” is used with models, what you can monitor about a model, how feedback loops impact models, and how you can use vetiver to set yourself up for success with model maintenance. This talk will help practitioners who are already deploying models, but this is also useful knowledge for practitioners earlier in their MLOps journey; decisions made along the way can make the difference between resilient models that are easier to maintain and disappointing or misleading models.
Materials: https://github.com/juliasilge/ml-maintenance-2023
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Tidy up your models. Session Code: TALK-1083

R! You Going?! - posit::conf(2023)
Presented by SherAaron Hurt
3 things to remember when starting your journey to become a data scientist
Everyone will have a different journey when becoming a data scientist. However, there are a few tips to consider to make the journey less daunting and more enjoyable. Listen, as I tell my story as a data scientist and offer resources and tips to build confidence for those who are new to their journey. The tools are available however, it is not always easy to find them.
keywords: openscience, The Carpentries, R programming language, GPS, data science journey, data science resources
Materials:
- https://www.linkedin.com/in/sheraaronhurt/
- carpentries.org/
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Teaching data science. Session Code: TALK-1097
R Not Only In Production - posit::conf(2023)
Presented by Kara Woo
I will share what our team has learned from successfully integrating R in all areas of our company’s operations. InsightRX is a precision medicine company whose goal is to ensure that each patient receives the right drug at the optimal dose. At InsightRX, R is a first-class language that’s used for purposes ranging from customer-facing products to internal data infrastructure, new product prototypes, and regulatory reporting. Using R in this way has given us the opportunity to forge fruitful collaborations with other teams in which we can both learn and teach.
Join me as I share how the skills of data science and engineering can complement each other to create better products and greater impact.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: R Not Only In Production. Session Code: KEY-1108
Quickly get your Quarto HTML theme in order - posit::conf(2023)
Presented by Greg Swinehart
A 5-minute talk to discuss how I’ve used Quarto and Bootstrap variables to quickly make Shiny’s new website look as it should. The Quarto user I have in mind works at an organization with specific brand guidelines to follow. I‚ will discuss how to set up your theme, show some key Quarto settings, and how Bootstrap‚ Sass variables are your best friend.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Lightning talks. Session Code: TALK-1170

Parameterized Quarto Reports Improve Understanding of Soil Health - posit::conf(2023)
Presented by Jadey Ryan
Learn how to use R and Quarto parameterized reporting in this four-step workflow to automate custom HTML and Word reports that are thoughtfully designed for audience interpretation and accessibility.
Soil health data are notoriously challenging to tidy and effectively communicate to farmers. We used functional programming with the tidyverse to reproducibly streamline data cleaning and summarization. To improve project outreach, we developed a Quarto project to dynamically create interactive HTML reports and printable PDFs. Custom to every farmer, reports include project goals, measured parameter descriptions, summary statistics, maps, tables, and graphs.
Our case study presents a workflow for data preparation and parameterized reporting, with best practices for effective data visualization, interpretation, and accessibility.
Talk materials: https://jadeyryan.com/talks/2023-09-25_posit_parameterized-quarto/
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Elevating your reports. Session Code: TALK-1160
Package Management for Data Scientists - posit::conf(2023)
Presented by Tyler Finethy
In my talk, “Package Management for Data Scientists,” we will discuss software dependencies for R and Python and the common issues faced during package installations. I will begin with an overview of package management, highlighting its crucial role in data science. We’ll then focus on practical strategies to prevent dependency errors and address effective troubleshooting when these problems occur. Lastly, we will look towards the future, discussing potential package management improvements, focusing on reproducibility and accessibility for those new to the field.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Managing packages. Session Code: TALK-1081
Open Source Solutions to Next-Generation Submissions, After 30 Years of Industry Experience
Presented by Mike K Smith
The pharmaceutical industry is undergoing rapid change, driven by a desire from both industry and regulatory agencies to move to more interactive visualizations and web applications to review data and make decisions. These changes would have been unthinkable 30 years ago when I started working at Pfizer.
In this talk, I’ll consider the drivers for these changes, how open-source tools can help achieve this, and why collaboration across the industry is vital to achieving this goal. I’ll contrast this with my experience of 30 years working in the pharma industry - when the R language had only just been released, when the internet was new, and when submissions to agencies were printed out, loaded onto trucks, and shipped to their doors.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Pharma. Session Code: TALK-1067
Open Source Property Assessment: Tidymodels to Allocate $16B in Property Taxes - posit::conf(2023)
Presented by Nicole Jardine and Dan Snow
How the Cook County Assessor’s Office uses R and tidymodels for its residential property valuation models.
The Cook County Assessor’s Office (CCAO) determines the current market value of properties for the purpose of property taxation. Since 2020, the CCAO has used R, tidymodels, and LightGBM to build predictive models that value Cook County’s 1.5 million residential properties, which are collectively worth over $400B. These predictive models are open-source, easily replicable, and have significantly improved valuation accuracy and equity over time.
Join CCAO Chief Data Officer Nicole Jardine and Director of Data Science Dan Snow as they walk through the CCAO’s modeling process, shares lessons learned, and offer a sneak peek at changes planned for the 2024 reassessment of Chicago.
Materials: https://github.com/ccao-data
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: End-to-end data science with real-world impact. Session Code: TALK-1147
Oops I’m a Manager - On More Effective 1-on-1 Meetings - posit::conf(2023)
Presented by Andrew Holz
As a team leader (accidental or not), it’s easy to get caught up in the daily grind and overlook the importance of 1-on-1s. Bad idea. 1-on-1s are critical for building trust, providing feedback, and ensuring that everyone is on the same page.
Keys to good 1-on-1s start with a small amount of prep and a running shared document of notes and takeaways. Another key is to rotate types of 1-on-1s. Possibilities include “heads down” on recent work, “heads up” looking further out, and career-focused sessions. After some tips on the right sort of questions and uncovering sneaky issues, I will also touch on effective feedback.
I will share resources and hope to include Seussian visuals and a few poetic lines to help the key points stick.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Building effective data science teams. Session Code: TALK-1064

Never again in outer par mode: making next-generation PDFs with Quarto & typst - posit::conf(2023)
Presented by Carlos Scheidegger
Quarto 1.4 will introduce support for Typst. Typst is a brand-new open-source typesetting system built from scratch to support the lessons we have learned over almost half a century of high-quality computer typesetting that TeX and LaTeX have enabled. If you’ve ever had to produce a PDF with Quarto and got stuck handling an inscrutable error message from LaTeX, or wanted to create a new template but were too intimated by LaTeX’s arcane syntax, this talk is for you. I’ll show you why we need an alternative for TeX and LaTeX , and why it will make Quarto even better.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Quarto (2). Session Code: TALK-1142

Motley Crews: Collaborating with Quarto - posit::conf(2023)
Presented by Susan McMillan, Wyl Schuth, and Michael Zenz
Adoption of Quarto for document creation has transformed the collaborative workflow for our small higher-education analytics team. Historically, content experts wrote in Word documents and data analysts used R for statistics and graphics. Specialization in different software tools created challenges for producing collaborative analytic reports, but Quarto has solved this problem. We will describe how we use Quarto for writing and editing text, embedding statistical analysis and graphics, and producing reports with a standard style in multiple formats, including web pages.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Elevating your reports. Session Code: TALK-1157
Matching Tools to Titans: Tailoring Posit Workbench for Every Cloud - posit::conf(2023)
Presented by James Blair
In an era of diverse cloud platforms, leveraging tools effectively is paramount. This talk highlights the adaptability of Posit Workbench within leading cloud platforms. Delve into strategic integrations, understand key challenges, and uncover practical solutions. By the end, attendees will be equipped with insights to harness Posit Workbench’s capabilities seamlessly across varied cloud environments.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Data science infrastructure for your org. Session Code: TALK-1115
Making a (Python) Web App is easy! - posit::conf(2023)
Presented by Marcos Huerta
Making Python Web apps using Dash, Streamlit, and Shiny for Python
This talk describes how to make distribution-free prediction intervals for regression models via the tidymodels framework.
By creating and deploying an interactive web application you can better share your data, code, and ideas easily with a broad audience. I plan to talk about several Python web application frameworks, and how you can use them to turn a class, function, or data set visualization into an interactive web page to share with the world. I plan to discuss building simple web applications with Plotly Dash, Streamlit, and Shiny for Python.
Materials:
- Comprehensive talk notes here: https://marcoshuerta.com/posts/positconf2023/
- https://www.tidymodels.org/learn/models/conformal-regression/
- https://probably.tidymodels.org/reference/index.html#regression-predictions
Corrections: In my live remarks, I said a Dash callback can have only one output: that is not correct, a Dash callback can update multiple outputs. I was trying to say that a Dash output can only be updated by one callback, but even that is no longer true as of Dash 2.9. https://dash.plotly.com/duplicate-callback-outputs""
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: The future is Shiny. Session Code: TALK-1086
Magic with WebAssembly and webR - posit::conf(2023)
Presented by George Stagg
Earlier this year the initial version of webR was released and users have begun building new interactive experiences with R on the web. In this talk, I’ll discuss webR’s TypeScript library and what it is able to do. The library allows users to interact with the R environment directly from JavaScript, which enables manipulation tricks that seem like magic. I’ll begin by describing how to move objects from R to JS and back again, and discuss the technology that makes this possible. I’ll continue with more advanced manipulation, such as invoking R functions from JS and talk about why you might want to do so. Finally, I’ll describe how messages are sent over webR’s communication channel and explain how this enables webR to work with Shinylive.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: I can’t believe it’s not magic: new tools for data science. Session Code: TALK-1152

Large Language Models in RStudio - posit::conf(2023)
Presented by James Wade
Large language models (LLMs), such as ChatGPT, have shown the potential to transform how we code. As an R package developer, I have contributed to the creation of two packages – gptstudio and gpttools – specifically designed to incorporate LLMs into R workflows within the RStudio environment.
The integration of ChatGPT allows users to efficiently add code comments, debug scripts, and address complex coding challenges directly from RStudio. With text embedding and semantic search, we can teach ChatGPT new tricks, resulting in more precise and context-aware responses. This talk will delve into hands-on examples to showcase the practical application of these models, as well as offer my perspective as a recent entrant into public package development.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: I can’t believe it’s not magic: new tools for data science. Session Code: TALK-1154
It’s All About Perspective: Making a Case for Generative Art - posit::conf(2023)
Presented by Meghan Santiago Harris
This talk explores how to create art in the R language while highlighting some similarities between the skills required for creating generative art and those needed to perform data science tasks in R.
Because the field of data science is inherently task-oriented, it is no wonder that most people struggle to see the utility of generative art past the bounds of a casual hobby. This talk will invite the participant to learn about generative art while focusing on ““why”” people should create it and its potential place in data science. This talk is suitable for all disciplines and artistic abilities. Furthermore, this talk will aim to expand the participant’s perspective on generative art with the following concepts:
- What is generative art and how can it be created in R or Python
- Justifications for generative art within Data Science
- Examples of programming skills that are transferrable between generative art and pragmatic data science projects
Materials:
- Link to the talk repo: https://github.com/Meghansaha/a_case_for_genart
- Link to the slides: https://meghansaha.github.io/a_case_for_genart/#/title-slide
- Link to the artpack package site: https://meghansaha.github.io/artpack/
- Personal Site: https://thetidytrekker.com/
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Developing your skillset; building your career. Session Code: TALK-1109
It’s Abstractions All the Way Down… - posit::conf(2023)
Presented by JD Long
Abstractions rule everything around us. JD Long talks about abstractions from the board room to the silicon.
Over 20 years ago Joel Spolsky famously wrote, “All non-trivial abstractions, to some degree, are leaky.” Unsurprisingly this has not changed. However, we have introduced more and more layers of abstraction into our workflows: Virtual Machines, AWS services, WASM, Docker, R, Python, data frames, and on and on. But then on top of the computational abstractions we have people abstractions: managers, colleagues, executives, stakeholders, etc.
JD’s presentation will be a wild romp through the mental models of abstractions and discuss how we, as technical analytical types, can gain skill in traversing abstractions and dealing with leaks.
Materials: https://github.com/CerebralMastication/Presentations/tree/master/2023_posit-conf
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: It’s abstractions all the way down …. Session Code: KEY-1161
It’s a Great Time to be an R Package Developer! - posit::conf(2023)
Presented by Jenny Bryan and Hadley Wickham
(Due to unforeseen circumstances, Hadley Wickham presented this talk “slide karaoke” style, from materials prepared by Jenny Bryan.)
In R, the fundamental unit of shareable code is the package. As of March 2023, there were over 19,000 packages available on CRAN. Hadley Wickham and I recently updated the R Packages book for a second edition, which brought home just how much the package development landscape has changed in recent years (for the better!).
In this talk, I highlight recent-ish developments that I think have a great payoff for package maintainers. I’ll talk about the impact of new services like GitHub Actions, new tools like pkgdown, and emerging shared practices, such as principles that are helpful when testing a package.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Package development. Session Code: TALK-1132


Insights in 5-D! (Using magic small-multiples layouts) - posit::conf(2023)
Presented by Matt Dzugan
Using Small-Multiples (faceted graphs) is an effective way to compare patterns across many dimensions. In this talk, I’ll walk you through some ways to lay out your individual facets according to the underlying data. For example, maybe each facet represents a city or point on a 2D plane - we’ll explore ways to organize facets in a grid that mimics the data itself - unlocking your ability to explore patterns in 4+ dimensions. Other solutions to this problem rely on manually-curated lists that map common layouts to a grid, but in this talk, we’ll explore solutions that work on EVERYTHING. I’ll show you how to incorporate this technique into your viz and how I built the libraries since there are some interesting data science concepts at play.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Lightning talks. Session Code: TALK-1174
In-Process Analytical Data Management with DuckDB - posit::conf(2023)
Presented by Hannes Mühleisen
This talks introduces DuckDB, an in-process analytical data management system that is deeply integrated into the R ecosystem.
DuckDB is an in-process analytical data management system. DuckDB supports complex SQL queries, has no external dependencies, and is deeply integrated into the R ecosystem. For example, DuckDB can run SQL queries directly on R data frames without any data transfer. DuckDB uses state-of-the-art query processing techniques like vectorised execution and automatic parallelism. DuckDB is out-of-core capable, meaning that it is possible to process datasets far bigger than main memory. DuckDB is free and open source software under the MIT license.
In this talk, we will describe the user values of DuckDB, and how it can be used to improve their day-to-day lives through automatic parallelisation, efficient operators, and out-of-core operations.
Materials:
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Databases for data science with duckdb and dbt. Session Code: TALK-1099
HTML and CSS for R Users - posit::conf(2023)
Presented by Albert Rapp
You can get the most out of popular R tools by combining them with easy-to-learn HTML & CSS commands.
It’s easy to think that R users do not need HTML and CSS. After all, R is a language designed for data analysis, right? But the reality is that these web standards are everywhere, even in R. In fact, many great tools like {ggtext}, {gt}, {shiny}, and Quarto unlock their full potential when you know a little bit of HTML & CSS. In this talk, I will demonstrate specific examples where R users can benefit from HTML and CSS and show you how to get started with these two languages.
Materials:
- Here’s the link to the video that I mention in the talk: https://youtu.be/QU8wSya-Y9E?si=zw59OSFPl1eJSY7M
- Part 1 of this two part series can be found at https://www.youtube.com/watch?v=jX4_Dnzhl0M
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Compelling design for apps and reports. Session Code: TALK-1105
How You Get Value as a 1-Person Posit Connect Team - posit::conf(2023)
Presented by Sean Nguyen
Sean, a sole Posit Connect developer, shares his experience in delivering business impact. He narrates his transition from crafting one-off reports to developing and deploying robust data science web applications using Python and R with Posit Connect. Despite its common association with large enterprise teams, Sean demonstrates how Posit Connect can be effectively utilized in smaller settings. He presents his work on creating and deploying end-to-end machine learning pipelines in Python, hosting them as APIs, and seamlessly integrating with Shiny apps via Posit Connect. This talk imparts practical strategies and techniques to foster user and executive adoption of Posit Connect within lean (and large) organizations.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Getting %$!@ done: productive workflows for data science. Session Code: TALK-1093
How to Win Friends and Influence People (With Data) - posit::conf(2023)
Presented by Joe Powers
Too many great data science products never go into production. To persuade leaders and colleagues to adopt your data science offering, you must translate your insights into terms that are relevant and accessible to them. Attempts to persuade these audiences with proofs and model performance stats will often fall flat because the audience is left feeling overwhelmed.
This talk will demonstrate the data simulation, visualization, and story-telling techniques that I use to influence leadership and the community-building techniques I use to earn the trust and support of fellow analysts. These efforts were successful in persuading Intuit to adopt advanced analytic methods like sequential analysis that cut the duration of our AB tests by over 60%.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Bridging the gap between data scientists and decision makers. Session Code: TALK-1077
How to Keep Your Data Science Meetup Sustainable - posit::conf(2023)
Presented by Ted Laderas
Many data science meetup organizers struggle with burnout. It can be daunting to plan a meetup schedule, especially with the added burden of work and life.
In this talk, I want to highlight some strategies for keeping your data science meetup sustainable. Specifically, I want to highlight the role of self-care in growing and sustaining your group, as well as low-key activities like a data scavenger hunt, watching videos together, styling plots together, and sharing useful tidyverse functions.
By making it easy for your members to contribute and empowering them, it takes a lot of the burden off you as an organizer. You don’t need to reinvent the wheel for meetups or have famous guests for each one. Let’s start the conversation and make your meetup last.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: It takes a village: building and sustaining communities. Session Code: TALK-1129
How to Help Developers Make Apps Users Love - posit::conf(2023)
Presented by Michał Parkoła
There are many resources that can help you design better apps.
But what if your org creates many apps?
Scaling good design to larger groups dials the challenge up to 11.
In this talk, I will share how we approach the problem at Appsilon.
- I will present our in-house Design Guide.
- I will share the successes and failures we’ve had building it and helping a wide variety of developers use it
- I will then share some tips about what you might want to consider if you want to help your org build better apps at scale.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Shiny user interfaces. Session Code: TALK-1127
How I Learned to Stop Worrying and Love Public Packages - posit::conf(2023)
Presented by Joe Roberts
The popularity of R and Python for data science is in no small part attributable to the vast collection of extension packages available for everything from common tasks like data cleaning to highly-specialized domain-specific functions. However, with that ease of sharing packages comes a larger target for bad actors trying to exploit them. We’ll explore these security risks along with approaches you can take to mitigate them using Posit Package Manager.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Managing packages. Session Code: TALK-1079
How Data Scientists Broke A/B Testing (And How We Can Fix It) - posit::conf(2023)
Presented by Carl Vogel
As data scientists, we care about making valid statistical inferences from experiments. And we’ve adapted well-established and well-understood statistical methods to help us do so in our A/B tests. Our stakeholders, though, care about making good product decisions efficiently. I’ll describe how the way we design A/B tests can put these goals in tension and why that often causes misalignment between how A/B tests are intended to be used and how they are actually used. I’ll also talk about how I’ve used R to implement alternative experimental approaches that have helped bridge the gap between data scientists and stakeholders.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Bridging the gap between data scientists and decision makers. Session Code: TALK-1076
Hitting the Target(s) of Data Orchestration - posit::conf(2023)
Presented by Alexandros Kouretsis
We are living at a time when the size of datasets can be overwhelming. Add to this that their process involves linking together different computing systems and software, and integrating dynamically changing reference data, and for sure, you have a problem. Reproducibility, traceability, and transparency have left the building.
Here is where Posit Connect along with the vast R ecosystem comes to save the day, allowing the creation of reproducible pipelines. I will share with you my first-hand experience in this presentation. In particular, how we used Targets in Posit Connect combined with AWS technologies in a bioinformatics pipeline. The result? An effective and secure workflow orchestration that is scalable and advances knowledge.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Leave it to the robots: automating your work. Session Code: TALK-1148
Grammar of Graphics in Python with Plotnine - posit::conf(2023)
Presented by Hassan Kibirige
{plotnine} brings the elegance of {ggplot2} to the Python programming language. Learn about The Grammar of Graphics and get a feel of why it is an effective way to create Statistical Graphics.
ggplot2 is one of the most loved visualisation libraries. It implements a Grammar of Graphics system, which requires one to think about data in terms of columns of variables and how to transform them into geometric objects. It is elegant and powerful. This is a talk about plotnine, which brings the elegance of ggplot2 to the Python programming language. It is an invitation to learn about the Grammar of Graphics system and to appreciate it. It will include some tips on how to avoid common frustrations as you learn the system.
Materials:
- Website: https://plotnine.org
- Source Code: https://github.com/has2k1/plotnine
- Slides for this talk: https://github.com/has2k1/my-talks
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Data science with Python. Session Code: TALK-1137

Github Copilot integration with RStudio, it’s finally here! - posit::conf(2023)
Presented by Tom Mock
This talk closes issue #10148, “Github Copilot integration with RStudio”, the most upvoted feature request in RStudio’s history. Code generating AI tools like Github Copilot‚ promise an “AI pair programmer that offers autocomplete-style suggestions as you code”. For the first time, we’ll show a native integration of Copilot into RStudio, helping to build on that promise by providing AI-generated “ghost text” autocompletion with R and other languages. I’ll also provide a comparison of Copilot’s “ghost text” to a chat-style interface in RStudio via the {chattr} package from the Posit MLVerse team.
To make the most of these new features, I’ll walk through some examples of how sharing additional context, comments, code, and other “prompt engineering” can help you go from code-generating AI tools that feels like an annoying backseat driver to an experienced copilot. We’ll close with a robust end-to-end example of how these new RStudio integrations and packages can help you be a more productive developer.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Data science infrastructure for your org. Session Code: TALK-1117
Getting the Most Out of Git - posit::conf(2023)
Presented by Colin Gillespie
Did you believe that Git will solve all of your data science worries? Instead, you’ve been plunged HEAD~1 first into merging (or is that rebasing?) chaos. Issues are ignored, branches are everywhere, main never works, and no one really knows who owns the repository.
Don’t worry! There are ways to escape this pit of despair. Over the last few years, we’ve worked with many data science teams. During this time, we’ve spotted common patterns and also common pitfalls. While one size does not fit all, there are golden rules that should be followed. At the end of this talk, you’ll understand the processes other data science teams implement to make Git work for them.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Getting %$!@ done: productive workflows for data science. Session Code: TALK-1091#
Thumbnail from happygitwithr.com, still from Heaven King video
From Novices to Experts: Building a Community of Engaged R Users - posit::conf(2023)
Presented by Natalia Andriychuk
At Pfizer, we have over 1500 users with R installed on their machines, along with an R community on MS Teams comprising over a thousand colleagues globally. How can we effectively engage with Pfizer R users and celebrate the successes of this community, while sharing best practices? Additionally, how do we avoid isolated groups duplicating efforts to solve R-related problems across different parts of the organization?
To address these challenges, we established the Pfizer R Center of Excellence (CoE) in early 2022. We focus our efforts on bringing together a rapidly growing community of colleagues, providing technical expertise, and offering best-practice guidance. A well-established, maintained and engaged R community promotes an inclusive and supportive learning environment that drives innovation within organizations. Our aim is to help colleagues thrive in their R journey, regardless of their expertise level.
During my talk, I will share the techniques we used to build a supportive R community, the tools employed to increase community engagement, and the successes and challenges encountered in building an engaging community of R users.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Pharma. Session Code: TALK-1066
From Journalist to Coder: Creating a Web Publication with Quarto - posit::conf(2023)
Presented by Brian Tarran
This is the story of how a Royal Statistical Society writer discovered Quarto, learned how to code (a bit), and built realworlddatascience.net, an online publication for the data science community.
In March 2022, I was tasked by the Royal Statistical Society with creating a new online publication: a data science website for data science professionals. I’ve been a print journalist for 20 years and have worked on websites in that time, but my coding ability began and ended with wrapping href tags around text and images. That is until I discovered Quarto. In this talk, I describe how I explored, learned, and fell in love with the Quarto publishing system, how I used it to build a website – Real World Data Science (realworlddatascience.net) – and how the open source community mindset helped shape my thinking about what a new publication could and should be!
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Quarto (1). Session Code: TALK-1071
From Data Confusion to Data Intelligence - posit::conf(2023)
Presented by Elaine McVey and David Meza
Data science teams operate in a unique environment, much different than the IT or software development life cycle. Hope from executives for the impact of data science is extremely high! Understanding of how to make data science efforts successful is very low! This creates an interesting set of organizational challenges for data and analytics teams. These are particularly clear when data science is being introduced at new companies, but plays out at organizations of all sizes. So, how do we navigate this dynamic? We will share some strategies for success.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: From Data Confusion to Data Intelligence. Session Code: KEY-1060
From Concept to Impact: Building and Launching Shiny Apps in the Workplace - posit::conf(2023)
Presented by Tiger Tang
Learn to build and launch a Shiny app like you are working on a start-up!
Unlock the potential of Shiny apps for your organization! Join Tiger as he shares insights from implementing Shiny apps at his workplace, handling over 160,000 internal requests. Discover a practical mindmap to find, build, and enhance Shiny app use cases, ensuring robustness and improved user engagement.
Materials: https://tigertang.org/posit_conf_2023/
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Bridging the gap between data scientists and decision makers. Session Code: TALK-1074
FOCAL Point: Utilizing Python, R, and Shiny to Capture, Process, and Visualize Motion - posit::conf
Presented by Justin Markel & Alyssa Burritt
One of the fastest movements in modern sports is a golf swing. Capturing this motion using a high-speed camera system creates many unique challenges in processing, analyzing, and visualizing the thousands of data points that are generated. These spatial coordinates can be quickly translated through Python scripts to well-known, industry-specific performance metrics and graphics in Shiny. Down the line, R utilities aid more complicated analyses and optimizations, driving new product innovations.
This talk will cover our company’s process of implementing these tools into our workflow and highlight key program features that have helped successfully combine these applications for users with a variety of technical backgrounds.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: R or Python? Why not both!. Session Code: TALK-1120
Field Guide to Writing Your First R Package - posit::conf(2023)
Presented by Fonti Kar
I recall writing my first package being an intimidating task. In my talk, I will share how a Biologist’s mindset can make R package writing more approachable. This talk is an encouragement and a gentle stroll through the package building process. I will show you ways to be curious when you get stuck and how to prepare for the unexpected. I hope sharing my perspective will help others see package development as wonderful as the natural world and dispel any hesitation to start!
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Package development. Session Code: TALK-1135
Extending Quarto - posit::conf(2023)
Presented by Richard Iannone
What are Quarto shortcode extensions? Think of them as powerful little programs you can run in your Quarto docs. I won’t show you how to build a shortcode extension during this talk but rather I’m going to take you on a trip across this new ecosystem of shortcode extensions that people have already written. For example, I’ll introduce you to the fancy-text extension for outputting nicely-formatted versions of fancy strings such as LaTeX and BibTeX; you’ll learn all about the fontawesome, lordicon, academicons, material-icons, and bsicons shortcode extensions that let you add all sorts of icons. This is only a sampling of the shortcode extensions I will present, there will be many other inspiring examples as well.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Quarto (2). Session Code: TALK-1141
epoxy: Super Glue for Data-driven Reports and Shiny Apps - posit::conf(2023)
Presented by Garrick Aden-Buie
R Markdown, Quarto, and Shiny are powerful frameworks that allow authors to create data-driven reports and apps. But truly excellent reports require a lot of work in the final steps to get numerical and stylistic formatting just right.
{epoxy} is a new package that uses {glue} to give authors templating superpowers. Epoxy works in R Markdown and Quarto, in markdown, LaTeX, and HTML outputs. It also provides easy templating for Shiny apps for dynamic data-driven reporting.
Beyond epoxy’s features, this talk will also touch on tips and approaches for data-driven reporting that will be useful to a wide audience, from R Markdown experts to the Quarto and Shiny curious.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Elevating your reports. Session Code: TALK-1155

Dynamic Interactions: webR to Empower Educators & Researchers with Interactive Quarto Docs
Presented by James Balamuta
Full talk title: Dynamic Interactions: Empowering Educators and Researchers with Interactive Quarto Documents Using webR
Traditional Quarto documents often lack interactivity, limiting the ability of students and researchers to fully explore and engage with the presented topic. In this talk, we propose a novel approach that utilizes webR, a WebAssembly-powered version of R, to seamlessly embed R code directly within the browser without the need for a server. We demonstrate how this approach can transform static Quarto documents into dynamic examples by leveraging webR’s capabilities through standard Quarto code cells, enabling real-time execution of R code and dynamic display of results. Our approach empowers educators and researchers alike to harness the power of interactivity and reproducibility for enhanced learning and research experiences.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Quarto (1). Session Code: TALK-1073
duckplyr: Tight Integration of duckdb with R and the tidyverse - posit::conf(2023)
Presented by Kirill Müller
The duckplyr R package combines the convenience of dplyr with the performance of DuckDB. Better than dbplyr: Data frame in, data frame out, fully compatible with dplyr.
duckdb is the new high-performance analytical database system that works great with R, Python, and other host systems. dplyr is the grammar of data manipulation in the tidyverse, tightly integrated with R, but it works best for small or medium-sized data. The former has been designed with large or big data in mind, but currently, you need to formulate your queries in SQL.
The new duckplyr package offers the best of both worlds. It transforms a dplyr pipe into a query object that duckdb can execute, using an optimized query plan. It is better than dbplyr because the interface is “data frames in, data frames out”, and no intermediate SQL code is generated.
The talk first presents our results, a bit of the mechanics, and an outlook for this ambitious project.
Materials: https://github.com/duckdblabs/duckplyr/
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Databases for data science with duckdb and dbt. Session Code: TALK-1100
dplyr 1.1.0 Features You Can’t Live Without - posit::conf(2023)
Presented by Davis Vaughan
Did you enjoy my clickbait title? Did it work? Either way, welcome!
The dplyr 1.1.0 release included a number of new features, such as:
- Per-operation grouping with
.by - An overhaul to joins, including new inequality and rolling joins
- New
consecutive_id()andcase_match()helpers - Significant performance improvements in
arrange()
Join me as we take a tour of this exciting dplyr update, and learn how to use these new features in your own work!
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Lightning talks. Session Code: TALK-1162

Diversify Your Career with Shiny for Python - posit::conf(2023)
Presented by Gordon Shotwell
A few years ago my company made a sudden shift from R to Python which was quite bad for my career because I didn’t really know Python. The main issue was that I couldn’t find a niche that allowed me to use my existing knowledge while learning the new language.
Shiny for Python is a great niche for R users because none of the Python web frameworks can do what Shiny can do. Additionally, almost all of your knowledge of the R package is applicable to the Python one.
This talk will provide an overview of the Python web application landscape and articulate what Shiny adds to this landscape, and then go through the five things that R users need to know before developing their first Shiny for Python application.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Data science with Python. Session Code: TALK-1138
Developing a Prototyping Competency in a Statistical Science Organization - posit::conf(2023)
Presented by Daniel Woodie
The introduction of new tools, methods, and processes can be a struggle within a statistical science organization. Being deliberate and investing in the creation of a prototyping competency can help in accelerating progress. Prototyping allows organizations to quickly experiment with new ideas, reduce the risk of failure, identify potential issues early, and iterate until the desired outcome is achieved.
I will highlight the key areas we have focused on accelerating, our framework for developing this competency, how we use Shiny, and the lessons we’ve learned along the way. Developing a prototyping competency is crucial for statistical science organizations that wish to stay competitive and innovative in today’s rapidly changing landscape.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Building effective data science teams. Session Code: TALK-1059
Democratizing Access to Education Data - posit::conf(2023)
Presented by Erika Tyagi
Learn how the Urban Institute is making high-quality data more accessible through the Education Data Portal.
Every year, government agencies release large amounts of data on schools and colleges, but this information is scattered across various websites and is often difficult to use. To make these data more accessible, the Urban Institute built the Education Data Portal, a freely available one-stop shop for harmonized data and metadata for nearly all major federal education datasets. In this talk, we’ll demonstrate how the portal works and share lessons we’ve learned about making data accessible to users with varying technical skills and preferred programming languages.
The Urban Institute’s Education Data Portal: https://educationdata.urban.org
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: End-to-end data science with real-world impact. Session Code: TALK-1145
dbtplyr: Bringing Column-Name Contracts from R to dbt - posit::conf(2023)
Presented by Emily Riederer
starts_with(language): Translating select helpers to dbt. Translating syntax between languages transports concepts across communities. We see a case study of adapting a column-naming workflow from dplyr to dbt’s data engineering toolkit.
dplyr’s select helpers exemplify how the tidyverse uses opinionated design to push users into the pit of success. The ability to efficiently operate on names incentivizes good naming patterns and creates efficiency in data wrangling and validation.
However, in a polyglot world, users may find they must leave the pit when comparable syntactic sugar is not accessible in other languages like Python and SQL.
In this talk, I will explain how dplyr’s select helpers inspired my approach to ‘column name contracts,’ how good naming systems can help supercharge data management with packages like {dplyr} and {pointblank}, and my experience building the {dbtplyr} to port this functionality to dbt for building complex SQL-based data pipelines.
Materials:
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Databases for data science with duckdb and dbt. Session Code: TALK-1098
Data Visualization with Seaborn - posit::conf(2023)
Presented by Michael Waskom
Seaborn is a Python library for statistical data visualization. After nearly a decade of development, seaborn recently introduced an entirely new API that is more explicitly based on a formal grammar of graphics. My talk will introduce this API and contrast it with the classic seaborn interface, sharing insights about the influence of the grammar of graphics on the ergonomics and maintainability of data visualization software.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Data science with Python. Session Code: TALK-1136
Data Science in Production: The Way to a Centralized Infrastructure - posit::conf(2023)
Presented by Oliver Bracht
In this talk, the success story of Covestro’s posit infrastructure is presented. The problem of the leading German material manufacturer was that no common development environment existed. With the help of eoda and Posit, a replicable, centralized development environment for R and Python was created. Although R and Python represent the core of the infrastructure, multiple languages and tools are unified. In addition to the collaboration of Covestro’s data science teams, compliance guidelines could also be better fulfilled. The staging architecture hereby provides developers with a concept for testing and going live with their products. This project presents a best practice approach to a data science infrastructure using Covestro as an example.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Data science infrastructure for your org. Session Code: TALK-1113
CRAN-ial Expansion: Taking Your R Package Development to New Frontiers with R-Universe - posit::conf
Presented by Mo Athanasia Mowinckel
Say goodbye to installation headaches and hello to a universe of possibilities with R-Universe! Take your R package development to new frontiers by organizing and sharing packages beyond the bounds of CRAN. R-Universe’s reliable package-building process strengthens installation and usage instructions, resulting in fewer support requests and an easy installation experience for users. With webpages and an API for exploring packages, R-Universe creates a streamlined and tidy ecosystem for R-package constellations. Also, you can build a custom toolchain for your users, relieving your workload and empowering users to help themselves. Join me to learn how to explore the vastness of R-Universe and expand your package development possibilities!
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Managing packages. Session Code: TALK-1080
Connect on Kubernetes: Content-level Containerization - posit::conf(2023)
Presented by E. David Aja, not Kelly O’Briant
Running Connect with off-host content execution in Kubernetes is very cool and allows you to enable some powerful and sophisticated workflows. The question is, do you really need it? How do you evaluate and decide? Let’s have a candid conversation about whether Connect content execution on Kubernetes is right for you and your organization.
Moving to Kubernetes will introduce complexity, so it’s important to have a strong motivating reason for making the switch. This talk will introduce new Connect features that are made possible by content-level containerization.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Data science infrastructure for your org. Session Code: TALK-1116
Conformal Inference with Tidymodels - posit::conf(2023)
Presented by Max Kuhn
Conformal inference theory enables any model to produce probabilistic predictions, such as prediction intervals. We’ll demonstrate how these analytical methods can be used with tidymodels. Simulations will show that the results have good coverage (i.e., a 90% interval should include the real point 90% of the time).
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Tidy up your models. Session Code: TALK-1085

Commit to Change: How to Increase Accessibility in Your Favorite Open Source Projects - posit::conf
Presented by Rose Franzen
Explore accessibility for data scientists by uncovering some common barriers in open source tools with simple fixes that anyone can implement.
Dive into the often-overlooked world of accessibility for developers and data scientists! Uncover common accessibility barriers in open source tools, and learn simple steps to address them. Whether you’re a seasoned maintainer or a total novice, you’ll walk away with clear action items to implement right away. Join the movement of individuals championing the next frontier of disability inclusion in technology, shaping a more equitable future for all!
Materials: https://github.com/franzenr/commit-to-change
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Package development. Session Code: TALK-1134
Combining R and Python for the Belgian Justice Department - posit::conf(2023)
Presented by Thomas Michem
We build a great case on how to combine R and Python in a production environment.
So the justice department’s back office monitors the smooth processing of all traffic fines in Belgium. They gather that data from all police departments and check if any anomalies occur.
The back-office monitors that using a shiny application where they can see traffic signs showing the status of the whole operation and the status is built using Python scripts that perform anomaly detection if the number of fines is in line with what they expect daily. And the results of those checks are delivered to a front-end shiny application with Python flask API.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: R or Python? Why not both!. Session Code: TALK-1122
Coding Tools for Industry R&D – Development Lessons from an Analytical Lab - posit::conf(2023)
Presented by Camila Saez Cabezas
Are you considering or curious about developing code-based tools for scientists? Whether you are an experienced developer or a fellow Posit Academy graduate who might be stepping into this role for the first time, the aim of my story is to inspire you and help you navigate this process. While developing custom R functions, packages, and Shiny apps for diverse analytical capabilities and users in R&D, I learned why it’s important to collect certain information at the start before writing any tidying, analysis, visualization, and web application code.
In this talk, I will share the essential technical questions that help me define and plan for success.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Lightning talks. Session Code: TALK-1168
CI/CD Pipelines - Oh, the Places You’ll Go! - posit::conf(2023)
Presented by Trevor Nederlof
Data scientists are creating incredibly useful data products at an accelerating rate. These products are consumed by others who expect them to be accurate reliable and timely, often promises unfulfilled. In this talk, we will explore how to use common CI/CD pipeline tools already within reach of attendees to automatically test and deploy their apps, APIs, and reports.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Lightning talks. Session Code: TALK-1166
Can I Have a Word? - posit::conf(2023)
Presented by Ellis Hughes
Since its release, {gt} has won over the hearts of many due to its flexible and powerful table-generating abilities. However, in cases where office products were required by downstream users, {gt}’s potential remained untapped. That all changed in 2022 when Rich Iannone and I collaborated to add Word documents as an official output type. Now, data scientists can engage stakeholders directly, wherever they are.
Join me for an upcoming talk where I’ll share my excitement about the new opportunities this update presents for the R community as well as future developments we can look forward to.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Elevating your reports. Session Code: TALK-1156

Building a Flexible, Scaleable Self-Serve Reporting System with Shiny - posit::conf(2023)
Presented by Natalie O’Shea
Working in the high-touch world of consulting, our team needed to develop a reporting system that was flexible enough to be tailored to the specific needs of any given partner while still reducing the highly manual nature of populating client-ready slide decks with various metrics and data visualizations. Utilizing the extensive resources developed by the R user community, I was able to create a flexible, scalable reporting system that allows users to populate templated Google slide decks with metrics and professional-grade visualizations using data pulled directly from our database at the time of query. This streamlined approach enables our consultants to spend less time copy-pasting data from one channel to another and instead focus on what they do best: surfacing business-relevant insights and recommendations for our partners.
By sharing my approach to customizable self-serve reporting in Shiny, I hope attendees will walk away with new ideas about how to combine parameterized reporting and dashboard development to get the best of both worlds. Additionally, I hope to end by sharing how this project was pivotal in making the business case for procuring Posit products for my broader organization.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Bridging the gap between data scientists and decision makers. Session Code: TALK-1075
Black Hair and Data Science Have More in Common Than You Think - posit::conf(2023)
Presented by Kari Jordan
Data science is often difficult to define because of its many intersections, including statistics, programming, analytics, and other domain knowledge. Would you believe it if I told you Black hair and data science have much in common?
This talk is for those considering learning about, studying, or pursuing data science. In it, Dr. Kari L. Jordan draws parallels between approaches to caring for Black hair and approaches to learning data science. We start with the roots and end by picking the right tools and products to maintain our coiffure.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: It takes a village: building and sustaining communities. Session Code: TALK-1131
Becoming an R Package Author (or How I Got Rich Responding to GitHub Issues) - posit::conf(2023)
Presented by Matt Herman
The transition from analyzing data in R to making packages in R can feel like a big step. Writing code to clean data or make visualizations seems categorically different from building robust “software” on which other people rely.
In this talk, I’ll show why that distinction is not necessarily true by discussing my personal experience from learning R in graduate school to reporting bugs on GitHub to becoming a co-author of the tidycensus package and a practicing data scientist. The positive and supportive R community on GitHub, Twitter, and elsewhere contributes to why anyone who writes R code can become a package author.
- I have not actually gotten rich but I did get freelance data work based on my package contributions!
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Package development. Session Code: TALK-1133
Automating the Dutch National Flu Surveillance for Pandemic Preparedness - posit::conf(2023)
Presented by Patrick van den Berg
The next pandemic may be caused by a flu strain, and with thousands of patients with the flu in Dutch hospitals annually it is important to have accurate and current data. The National Institute for Public Health and the Environment of the Netherlands (RIVM) collects and processes flu data to achieve pandemic preparedness. However, the flu reporting process used to be very laborious, stealing precious time from epidemiologists. In our journey of automating this data pipeline we learned that collaboration was the most important factor in getting to a working system. This talk will be at the cross-section of data science and epidemiology and will provide you with a valuable opportunity to learn from our experiences.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Leave it to the robots: automating your work. Session Code: TALK-1150
AI and Shiny for Python: Unlocking New Possibilities - posit::conf
Presented by Winston Chang
In the past year, people have come to realize that AI can revolutionize the way we work. This talk focuses on using AI tools with Shiny for Python, demonstrating how AI can accelerate Shiny application development and enhance its capabilities. We’ll also explore Shiny’s unique ability to interface with AI models, offering possibilities beyond Python web frameworks like Streamlit and Dash. Learn how Shiny and AI together can empower you to do more, and do it faster.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: I can’t believe it’s not magic: new tools for data science. Session Code: TALK-1153

Adding a Touch of glitr: Developing a Package of Themes on Top of ggplot - posit::conf(2023)
Presented by Aaron Chafetz and Karishma Srikanth Please note, a power issue cut off the first five minutes of the talk.
Explore how our team at the US Agency for International Development (USAID) created our own data viz branding R package on top of ggplot2 and how you can too.
How do you create brand cohesion across your large team when it comes to data viz? Inspired by the BBC’s bbplot, our team at the US Agency for International Development (USAID) developed a package on top of ggplot2 to create a common look and feel for our team’s products. This effort improved not just the cohesiveness of our work, but also trustworthiness. By creating this package, we reduced the reliance on using defaults and the time spent on each project customizing numerous graphic elements. More importantly, this package provided an easier on-ramp for new teammates to adopt R. We share our journey within a federal agency developing a style guide and aim to guide and inspire other organizations who could benefit from developing their own branding package and guidance.
Materials:
- https://speakerdeck.com/achafetz/adding-a-touch-of-glitr
- https://usaid-oha-si.github.io/glitr/
- https://issuu.com/achafetz/docs/oha_styleguide
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Compelling design for apps and reports. Session Code: TALK-1103
A hacker’s guide to open source LLMs - posit::conf(2023)
Presented by Jeremy Howard
In this deeply informative video, Jeremy Howard, co-founder of fast.ai and creator of the ULMFiT approach on which all modern language models (LMs) are based, takes you on a comprehensive journey through the fascinating landscape of LMs. Starting with the foundational concepts, Jeremy introduces the architecture and mechanics that make these AI systems tick. He then delves into critical evaluations of GPT-4, illuminates practical uses of language models in code writing and data analysis, and offers hands-on tips for working with the OpenAI API. The video also provides expert guidance on technical topics such as fine-tuning, decoding tokens, and running private instances of GPT models.
As we move further into the intricacies, Jeremy unpacks advanced strategies for model testing and optimization, utilizing tools like GPTQ and Hugging Face Transformers. He also explores the potential of specialized datasets like Orca and Platypus for fine-tuning and discusses cutting-edge trends in Retrieval Augmented Generation and information retrieval. Whether you’re new to the field or an established professional, this presentation offers a wealth of insights to help you navigate the ever-evolving world of language models.
(The above summary was, of course, created by an LLM!)
For the notebook used in this talk, see https://github.com/fastai/lm-hackers .
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Notebooks+LLMs may just be the future of coding. Session Code: KEY-1107
20 questions and AI chat bots - posit::conf(2023)
Presented by Winston Chang
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: I can’t believe it’s not magic: new tools for data science. Session Code: TALK-1153

{slushy}: A Bridge to the Future - posit::conf(2023)
Presented by Becca Krouse
Scaling the use of R can present complications for environment management, especially in regulated industries with a focus on traceability. One solution is controlled (aka “frozen”) environments, which are carefully curated and tested by tech teams. However, the speed of R development means the environments quickly become outdated and users are unable to benefit from the latest advances. Enter {slushy}: a team-friendly tool powered by {renv} and Posit Package Manager. Users can quickly mimic a controlled environment, with the easy ability to time travel between snapshot dates. Attendees will learn how {slushy} bolstered our R adoption efforts, and how this strategy enables tech teams and users to work in parallel towards a common future.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Managing packages. Session Code: TALK-1078
How the R for Data Science (R4DS) Online Learning Community Made Me a Better Student - posit::conf
Presented by Lydia Gibson
Through my participation in R4DS Online Learning Community, I have advanced my R and data science skills, making me a better student than I otherwise would have been through just my studies. As a non-traditional MS Statistics student with an undergraduate background in economics, I had absolutely no experience with the R programming language prior to pursuing my Master’s degree. In July 2021, with hopes of getting a headstart on learning R before beginning my degree program, I joined the R4DS Slack Workspace. Along with helping to improve my programming skills, R4DS has connected me with scholarships, mentorship, and other opportunities, and I think that it would be beneficial for other students to know about this great resource.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Developing your skillset; building your career. Session Code: TALK-1110