Databricks
Resources tagged Databricks#
Next-Gen Data Science: How Posit and Databricks Are Transforming Analytics at Scale
Modern data science teams face the challenge of navigating complex landscapes of languages, tools and infrastructure. Positron, Posit’s next-generation IDE, offers a powerful environment tailored for data science, seamlessly integrating with Databricks to empower teams working in Python and R. Now integrated within Posit Workbench, Positron enables data scientists to efficiently develop, iterate and analyze data with Databricks — all while maintaining their preferred workflows. In this session, we’ll explore how Python and R users can develop, deploy and scale their data science workflows by combining Posit tools with Databricks. We’ll showcase how Positron simplifies development for both Python and R and how Posit Connect enables seamless deployment of applications, reports and APIs powered by Databricks. Join us to see how Posit + Databricks create a frictionless, scalable and collaborative data science experience — so your teams can focus on insights, not infrastructure.
Talk By: James Blair, Senior Product Mgr. Cloud Integrations, Posit, PBC
Databricks Named a Leader in the 2025 Gartner® Magic Quadrant™ for Data Science and Machine Learning Platforms: https://www.databricks.com/blog/databricks-named-leader-2025-gartner-magic-quadrant-data-science-and-machine-learning Build and deploy quality AI agent systems: https://www.databricks.com/product/artificial-intelligence See all the product announcements from Data + AI Summit: https://www.databricks.com/events/dataaisummit-2025-announcements
Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc
Data + AI Summit 2024 - Keynote Day 2 - Full
Speakers:
- Alexander Booth, Asst Director of Research & Development, Texas Rangers
- Ali Ghodsi, Co-Founder and CEO, Databricks
- Bilal Aslam, Sr. Director of Product Management, Databricks
- Darshana Sivakumar, Staff Product Manager, Databricks
- Hannes Mühleisen, Creator of DuckDB, DuckDB Labs
- Matei Zaharia, Chief Technology Officer and Co-Founder, Databricks
- Reynold Xin, Chief Architect and Co-Founder, Databricks
- Ryan Blue, CEO, Tabular
- Tareef Kawaf, President, Posit Software, PBC
- Yejin Choi, Sr Research Director Commonsense AI, AI2, University of Washington
- Zeashan Pappa, Staff Product Manager, Databricks
About Databricks Databricks is the Data and AI company. More than 10,000 organizations worldwide — including Block, Comcast, Conde Nast, Rivian, and Shell, and over 60% of the Fortune 500 — rely on the Databricks Data Intelligence Platform to take control of their data and put it to work with AI. Databricks is headquartered in San Francisco, with offices around the globe, and was founded by the original creators of Lakehouse, Apache Spark™, Delta Lake and MLflow.
Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data … Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc
What’s New in Quarto?* - posit::conf(2023)
Presented by Charlotte Wickham
It’s been over a year since Quarto 1.0, an open-source scientific and technical publishing system, was announced at rstudio::conf(2022). In this talk, I’ll highlight some of the improvements to Quarto since then. You’ll learn about new formats, options, tools, and ways to supercharge your content. And, if you haven’t used Quarto yet, come to see some reasons to try it out.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Quarto (1). Session Code: TALK-1072

Visualizing Data Analysis Pipelines with Pandas Tutor and Tidy Data Tutor - posit::conf(2023)
Presented by Sean Kross
The data frame is a fundamental data structure for data scientists using Python and R. Pandas and the tidyverse are designed to center building pipelines for the transformation of data frames. However, within these pipelines it is not always clear how each operation is changing the underlying data frame. To explain each step in a pipeline data science instructors resort to hand-drawing diagrams to illustrate the semantics of operations such as filtering, sorting, and grouping.
In this talk, I will introduce Pandas Tutor and Tidy Data Tutor, step-by-step visual representation engines of data frame transformations. Both tools illustrate the row, column, and cell-wise relationships between an operation’s input and output data frames.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Teaching data science. Session Code: TALK-1096
Validating and Testing R Dataframes with Pandera via reticulate - R-Python Interoperability
Presented by Niels Bantilan
Original Full Title: Validating and Testing R Dataframes with Pandera via reticulate: A Case Study in R-Python Interoperability
Data science and machine learning practitioners work with data every day to analyze and model them for insights and predictions. A major component of any project is data quality, which is a process of cleaning, and protecting against flaws in data that may invalidate the analysis or model. Pandera is an open source data testing toolkit for dataframes in the Python ecosystem: but can it validate R dataframes?
This talk is composed of three parts: first I’ll describe what data testing is and motivate why you need it. Then, I’ll introduce the iterative process of creating and refining dataframe schemas in Pandera. Finally, I’ll demonstrate how to use it in R with the reticulate package using a simple modeling exercise as an example.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: R or Python? Why not both!. Session Code: TALK-1123
Using R with Databricks Connect - posit::conf(2023)
Presented by Edgar Ruiz
Spark Connect, and Databricks Connect, enable the ability to interact with Spark stand-alone clusters remotely. This improves our ability to perform Data Science at-scale. We will share the work in sparklyr, and other products, that will make it easier for R users to take advantage this new framework.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Tidy up your models. Session Code: TALK-1084

Using R to develop production modeling workflows at Mayo Clinic - posit::conf(2023)
Presented by Brendan Broderick
Developing workflows that help train models and also help deploy them can be a difficult task. In this talk I will share some tools and workflow tips that I use to build production model pipelines using R. I will use a project of predicting patients who need specialized respiratory care after leaving the ICU as an example. I will show how to use the targets package to create a reproducible and easy to manage modeling and prediction pipeline, how to use the renv package to ensure a consistent environment for development and deployment, and how to use plumber, vetiver, and shiny applications to make the model accessible to care providers.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Leave it to the robots: automating your work. Session Code: TALK-1149
Using Data to Protect Traditional Lifeways - posit::conf(2023)
Presented by Angie Reed
The spirit of Penobscot Nation’s work to protect the health of their relative, the Penobscot River, is embodied in the Penobscot water song which says ““Water, we love you, thank you so much water, we respect you.”” Because the Penobscot River is not a natural resource - she is a relative, family - this song describes the foundation of our efforts to protect her health and well-being. The identity of Penobscot people cannot be disconnected from the river, and protecting this traditional lifeway is at the heart of our work.
For over a decade we have used R to manage, transform, analyze, and visualize data, and the free, open-source Posit products help us leave a legacy of good data management and the ability to share results with Penobscot Nation citizens. You will learn more about how our use of R has helped us achieve more stringent protections for the Penobscot River and how we engage young people in every step of this work. We are also part of a larger network of tribal environmental professionals, working together to learn R and share data and insights. We will give you information about how you can volunteer to help expand the network of folks providing technical assistance on any R and RStudio related topics.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: End-to-end data science with real-world impact. Session Code: TALK-1144
Unlock the Power of DataViz Animation and Interactivity in Quarto - posit::conf(2023)
Presented by Deepsha Menghani
Plot animated and interactive visualizations with Plotly and Crosstalk in Quarto using R. In thi sintro to Plotly & Crosstalk in R: Using code examples, learn to integrate dashboard elements into Quarto with animated plots, interactive widgets (checkboxes), and linked plots via brushing.
This talk showcases how to use packages, such as Plotly and Crosstalk, to create interactive data visualizations and add dashboard-like elements to Quarto. Using a fun dataset available through the “Richmondway” package, we examine the number of times Roy Kent uses salty language throughout all seasons of ““Ted Lasso.”” We illustrate this using animated plots, interactive selection widgets such as checkboxes, and by linking two plots with brushing capabilities.
Materials:
- Slides: https://deepshamenghani.github.io/posit_plotly_crosstalk/#/title-slide
- Code repo: https://github.com/deepshamenghani/posit_plotly_crosstalk
- Richmondway data package: https://github.com/deepshamenghani/richmondway
- In-Depth Guide to Creating and Publishing an R Data Package (Richmondway) Using Devtools: https://medium.com/p/245b0fd4c359
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Quarto (2). Session Code: TALK-1143
Towards the Next Generation of Shiny UI
Presented by Carson Sievert
Create awesome looking and feature rich Shiny dashboards using the bslib R package.
Shiny recently celebrated its 10th birthday, and since its birth, has grown tremendously in many areas; however, a hello world Shiny app still looks roughly like it did 10 years ago. The bslib R package helps solve this problem making very easy to apply modern and customizable styling your Shiny apps, R Markdown / Quarto documents, and more. In addition, bslib also provides dashboard-focused UI components like expandable cards, value boxes, sidebar layouts, and more to help you create delightful Shiny dashboards.
Materials:
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Shiny user interfaces. Session Code: TALK-1124

tidymodels: Adventures in Rewriting a Modeling Pipeline - posit::conf(2023)
Presented by Ryan Timpe
An overview of the benefits unlocked on our data science team by adopting tidymodels.
Data science sure has changed over the past few years! Everyone’s talking about production. RStudio is now Posit. Models are now tidy.
This talk is about embracing that change and updating existing models using the tidymodels framework. I recently completed this change, letting go of our in-production code and revisioning it with tidymodels. My team ended up with a faster, more scalable pipeline that enabled us to better automate our workflow and increase our scale while improving our stakeholders’ experiences.
I’ll share tips and tricks for adopting the tidymodels framework in existing products, best practices for learning and upskilling teams, and advice for using tidymodel packages to build more accessible data science tools.
Materials: https://www.ryantimpe.com/files/tidymodels_adventures_positconf2023.pdf
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Tidy up your models. Session Code: TALK-1082
The Need for Speed - AccelerateR-ing R Adoption in GSK - posit::conf(2023)
Presented by Ben Arancibia
How does a risk-averse Pharma Biostatistics organization with 900+ people switch from using proprietary software to using R and other open-source tools for delivering clinical trial submissions? First slowly, then all at once. GSK started the transition of using R for its clinical trial data analysis in 2020 and now uses R for our regulatory-reviewed outputs. The AccelerateR Team, an agile pod of R experts and data scientists, rotates through GSK Biostatistics study teams sitting side by side to answer questions and mentor during this transition.
We will share our experience from AccelerateR and how other organizations can use our learnings to scale R from pilots to full enterprise adoption and contribute to open source industry R packages.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Pharma. Session Code: TALK-1068
The ‘I’ in Team: Peer-to-Peer Best Practices for Growing Data Science Teams - posit::conf(2023)
Presented by Liz Roten
R users don’t always come in sets. Often, you may be the only user on in the cubicle-block. But, one miraculous day, your manager finally fills the void and you welcome more folks on your team. Suddenly, the little R system you created to suit your needs, like a custom package, code styling, and file organization, isn’t just for you.
Want to suddenly overhaul that one package you wrote two years ago? It probably won’t work when your colleagues try to update it.
Your new teammates are data.table fans, but you prefer the tidyverse. Do you need to refactor? Are style choices, like indentation important when collaborating, or are you just being persnickety?
In this talk, you will learn how to bring new teammates on board and blend your respective styles without pulling your hair out.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Building effective data science teams. Session Code: TALK-1063
Teaching Data Science in Adverse Circumstances: Posit Cloud and Quarto to the Rescue - posit::conf
Presented by Aleksander Dietrichson
The focus of this presentation is on the challenges faced by teachers of data science whose students are not quantitatively inclined and may face some adversity in terms of technology resources available to them and potential language barriers. I identify three main areas of challenges and show how at Universidad Nacional de San Martín (Argentina) we addressed each of the areas through a combination of original curriculum redesign, production of course materials appropriate for the students in question; and the use of OS, and some Posit products, i.e.:posit.cloud and Quarto. I show how these technologies can be used as a pedagogical tool to overcome the challenges mentioned, even on a shoestring budget.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Teaching data science. Session Code: TALK-1094
Styling and Templating Quarto Documents - posit::conf(2023)
Presented by Emil Hvitfeldt
Quarto is a powerful engine to generate documents, slides, books, websites, and more. The default aesthetics looks good, but there are times when you want and need to change how they look. This is that talk.
Whether you want your slides to stand out from the crowd, or you need your documents to fit within your corporate style guide, being able to style Quarto documents is a valuable skill.
Once you have persevered and created the perfect document, you don’t want the effort to go to waste. This is where templating comes in. Quarto makes it super easy to turn a styled document into a template to be used over and over again.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Compelling design for apps and reports. Session Code: TALK-1106

Solving a Secure Geocoding Problem (That Hardly Anybody Has) - posit::conf(2023)
Presented by Tesla DuBois
Due to data security concerns, the strictest health researchers won’t send patient addresses to remote servers for geocoding. The only existing methods for offline geocoding are expensive, cumbersome, or require working with code - all limiting factors for many researchers. So, a couple of classmates and I made a standalone desktop application using shell, Docker, PostGIS, and Python to geocode addresses through a simple GUI without ever sending them off the local machine. Come for the technical ins and outs and stay for the anecdotes about how my R background played into the daunting, frustrating, but ultimately successful task of creating a data science tool using unfamiliar technologies.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Developing your skillset; building your career. Session Code: TALK-1111
Side Effects of a Year of Blogging - posit::conf(2023)
Presented by Millie Symns
A big part of being in the R community is sharing your knowledge in different forums, no matter how big or small. So what better way to do that than a blog? And what better way than using R and Posit products to build and maintain that blog and website? This was the route I took to challenge myself in putting myself out there more in the community to find my voice, share my knowledge and learn new things.
In this talk, I will reflect on lessons learned and gains I have spent the past year blogging and sharing my website for all to see. The side effects include professional and personal benefits - from a clear profile of my skills to the progression of the development of my art. You may leave inspired to try the challenge for yourself.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: It takes a village: building and sustaining communities. Session Code: TALK-1130
ShinyUiEditor: From Alpha to Powerful Shiny App Development Tool - posit::conf(2023)
Presented by Nick Strayer
Since its alpha debut at last year’s conference, the ShinyUiEditor has experienced continuous development, evolving into a powerful tool for crafting Shiny app UIs. Some key enhancements include the integration of new bslib components and the editor’s ability to create or navigate to existing server bindings for inputs and outputs.
In addition to new features, the editor is now available as a VSCode extension enabling it to integrate smoothly into more developers’ workflows. This talk will showcase how these new capabilities empower users to efficiently create visually appealing and production-ready applications with ease.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Shiny user interfaces. Session Code: TALK-1126

Shiny New Tools for Scaling your Shiny Apps - posit::conf(2023)
Presented by Joe Kirincic
So you have a Shiny app your org loves, but as adoption grows, performance starts getting sluggish. Profiling reveals your cool interactive plots are the culprit. What can you do to make things snappy again? We can increase the number of app instances, sure, but suppose that isn’t an option for us. Another approach is to shift the plotting work from the server onto the client.
In this talk, we’ll learn how to leverage two Javascript projects, DuckDB-WASM and Observable’s Plot.js, in our Shiny app to create fast, flexible interactive visualizations in the browser without burdening our app’s server function. The end result is an app that can scale to more users without needing to increase compute resources.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: The future is Shiny. Session Code: TALK-1088
Shiny for Python Machine Learning Apps with pandas, scikit-learn and TensorFlow - posit::conf(2023)
Presented by Chelsea Parlett-Pelleriti
With the introduction of Shiny for Python in 2022, users can now use the power of reactivity with their favorite Python packages. Shiny can be used to build interactive reports, dashboards, and web apps, that make sharing insights and results both simple and dynamic. This includes apps to display and explore popular Machine Learning models built with staple Python packages like pandas, scikit-learn, and TensorFlow. This talk will demonstrate how to build simple Shiny for Python apps that interface with these packages, and discuss some of the benefits of using Shiny for Python to build your web apps.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: The future is Shiny. Session Code: TALK-1087
Serenity Now, Productivity Later: Focus Your Project Stack with The Gonzalez Matrix - posit::conf
Presented by Patrick Tennant
How should you respond when your boss has too many good ideas for data science projects? In this talk, I’ll review our use of an adapted version of the Eisenhower Matrix that lays out our projects according to the effort required and the value they will produce. Given the functionally unlimited number of data science projects a team could do, learn how we keep our team focused on valuable work while reducing the stress of a never-ending list of projects.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Building effective data science teams. Session Code: TALK-1065
Scale Your Data Validation Workflow With {pointblank} and Posit Connect - posit::conf(2023)
Presented by Michael Garcia
For the Data Services team at Medable, our number one priority is to ensure the data we collect and deliver to our clients is of the highest quality. The {pointblank} package, along with Posit Connect, modernizes how we tackle data validation within Data Services.
In this talk, I will briefly summarize how we develop test code with {pointblank}, share with {pins}, execute with {rmarkdown}, and report findings with {blastula}. Finally, I will show how we aggregate data from test results across projects into a holistic view using {shiny}.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Leave it to the robots: automating your work. Session Code: TALK-1058
Running R-Shiny without a Server - posit::conf(2023)
Presented by Joe Cheng
A year ago, Posit announced ShinyLive, a deployment mode of Shiny that lets you run interactive applications written in Python, without actually running a Python server at runtime. Instead, ShinyLive turns Shiny for Python apps into pure client-side apps, running on a pure client-side Python installation.
Now, that same capability has come to Shiny for R, thanks to the webR project.
In this talk, I’ll show you how you can get started with ShinyLive for R, and why this is more interesting than just cheaper app hosting. I’ll talk about some of the different use cases we had in mind for ShinyLive, and help you decide if ShinyLive makes sense for your app.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: I can’t believe it’s not magic: new tools for data science. Session Code: TALK-1151

Reproducible Manuscripts with Quarto - posit::conf(2023)
Presented by Mine Çetinkaya-Rundel
In this talk, we present a new capability in Quarto that provides a straightforward and user-friendly approach to creating truly reproducible manuscripts that are publication-ready for submission to popular journals. This new feature, Quarto manuscripts, includes the ability to produce a bundled output containing a standardized journal format, source documents, source computations, referenced resources, and execution information into a single bundle that is ingested into journal review and production processes. We’ll demo how Quarto manuscripts work and how you can incorporate them into your current manuscript development process as well as touch on pain points in your current workflow that Quarto manuscripts help alleviate.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Quarto (1). Session Code: TALK-1070

R! You Going?! - posit::conf(2023)
Presented by SherAaron Hurt
3 things to remember when starting your journey to become a data scientist
Everyone will have a different journey when becoming a data scientist. However, there are a few tips to consider to make the journey less daunting and more enjoyable. Listen, as I tell my story as a data scientist and offer resources and tips to build confidence for those who are new to their journey. The tools are available however, it is not always easy to find them.
keywords: openscience, The Carpentries, R programming language, GPS, data science journey, data science resources
Materials:
- https://www.linkedin.com/in/sheraaronhurt/
- carpentries.org/
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Teaching data science. Session Code: TALK-1097
Parameterized Quarto Reports Improve Understanding of Soil Health - posit::conf(2023)
Presented by Jadey Ryan
Learn how to use R and Quarto parameterized reporting in this four-step workflow to automate custom HTML and Word reports that are thoughtfully designed for audience interpretation and accessibility.
Soil health data are notoriously challenging to tidy and effectively communicate to farmers. We used functional programming with the tidyverse to reproducibly streamline data cleaning and summarization. To improve project outreach, we developed a Quarto project to dynamically create interactive HTML reports and printable PDFs. Custom to every farmer, reports include project goals, measured parameter descriptions, summary statistics, maps, tables, and graphs.
Our case study presents a workflow for data preparation and parameterized reporting, with best practices for effective data visualization, interpretation, and accessibility.
Talk materials: https://jadeyryan.com/talks/2023-09-25_posit_parameterized-quarto/
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Elevating your reports. Session Code: TALK-1160
Package Management for Data Scientists - posit::conf(2023)
Presented by Tyler Finethy
In my talk, “Package Management for Data Scientists,” we will discuss software dependencies for R and Python and the common issues faced during package installations. I will begin with an overview of package management, highlighting its crucial role in data science. We’ll then focus on practical strategies to prevent dependency errors and address effective troubleshooting when these problems occur. Lastly, we will look towards the future, discussing potential package management improvements, focusing on reproducibility and accessibility for those new to the field.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Managing packages. Session Code: TALK-1081
Open Source Property Assessment: Tidymodels to Allocate $16B in Property Taxes - posit::conf(2023)
Presented by Nicole Jardine and Dan Snow
How the Cook County Assessor’s Office uses R and tidymodels for its residential property valuation models.
The Cook County Assessor’s Office (CCAO) determines the current market value of properties for the purpose of property taxation. Since 2020, the CCAO has used R, tidymodels, and LightGBM to build predictive models that value Cook County’s 1.5 million residential properties, which are collectively worth over $400B. These predictive models are open-source, easily replicable, and have significantly improved valuation accuracy and equity over time.
Join CCAO Chief Data Officer Nicole Jardine and Director of Data Science Dan Snow as they walk through the CCAO’s modeling process, shares lessons learned, and offer a sneak peek at changes planned for the 2024 reassessment of Chicago.
Materials: https://github.com/ccao-data
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: End-to-end data science with real-world impact. Session Code: TALK-1147
Never again in outer par mode: making next-generation PDFs with Quarto & typst - posit::conf(2023)
Presented by Carlos Scheidegger
Quarto 1.4 will introduce support for Typst. Typst is a brand-new open-source typesetting system built from scratch to support the lessons we have learned over almost half a century of high-quality computer typesetting that TeX and LaTeX have enabled. If you’ve ever had to produce a PDF with Quarto and got stuck handling an inscrutable error message from LaTeX, or wanted to create a new template but were too intimated by LaTeX’s arcane syntax, this talk is for you. I’ll show you why we need an alternative for TeX and LaTeX , and why it will make Quarto even better.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Quarto (2). Session Code: TALK-1142

Motley Crews: Collaborating with Quarto - posit::conf(2023)
Presented by Susan McMillan, Wyl Schuth, and Michael Zenz
Adoption of Quarto for document creation has transformed the collaborative workflow for our small higher-education analytics team. Historically, content experts wrote in Word documents and data analysts used R for statistics and graphics. Specialization in different software tools created challenges for producing collaborative analytic reports, but Quarto has solved this problem. We will describe how we use Quarto for writing and editing text, embedding statistical analysis and graphics, and producing reports with a standard style in multiple formats, including web pages.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Elevating your reports. Session Code: TALK-1157
Matching Tools to Titans: Tailoring Posit Workbench for Every Cloud - posit::conf(2023)
Presented by James Blair
In an era of diverse cloud platforms, leveraging tools effectively is paramount. This talk highlights the adaptability of Posit Workbench within leading cloud platforms. Delve into strategic integrations, understand key challenges, and uncover practical solutions. By the end, attendees will be equipped with insights to harness Posit Workbench’s capabilities seamlessly across varied cloud environments.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Data science infrastructure for your org. Session Code: TALK-1115
Making a (Python) Web App is easy! - posit::conf(2023)
Presented by Marcos Huerta
Making Python Web apps using Dash, Streamlit, and Shiny for Python
This talk describes how to make distribution-free prediction intervals for regression models via the tidymodels framework.
By creating and deploying an interactive web application you can better share your data, code, and ideas easily with a broad audience. I plan to talk about several Python web application frameworks, and how you can use them to turn a class, function, or data set visualization into an interactive web page to share with the world. I plan to discuss building simple web applications with Plotly Dash, Streamlit, and Shiny for Python.
Materials:
- Comprehensive talk notes here: https://marcoshuerta.com/posts/positconf2023/
- https://www.tidymodels.org/learn/models/conformal-regression/
- https://probably.tidymodels.org/reference/index.html#regression-predictions
Corrections: In my live remarks, I said a Dash callback can have only one output: that is not correct, a Dash callback can update multiple outputs. I was trying to say that a Dash output can only be updated by one callback, but even that is no longer true as of Dash 2.9. https://dash.plotly.com/duplicate-callback-outputs""
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: The future is Shiny. Session Code: TALK-1086
Magic with WebAssembly and webR - posit::conf(2023)
Presented by George Stagg
Earlier this year the initial version of webR was released and users have begun building new interactive experiences with R on the web. In this talk, I’ll discuss webR’s TypeScript library and what it is able to do. The library allows users to interact with the R environment directly from JavaScript, which enables manipulation tricks that seem like magic. I’ll begin by describing how to move objects from R to JS and back again, and discuss the technology that makes this possible. I’ll continue with more advanced manipulation, such as invoking R functions from JS and talk about why you might want to do so. Finally, I’ll describe how messages are sent over webR’s communication channel and explain how this enables webR to work with Shinylive.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: I can’t believe it’s not magic: new tools for data science. Session Code: TALK-1152

Large Language Models in RStudio - posit::conf(2023)
Presented by James Wade
Large language models (LLMs), such as ChatGPT, have shown the potential to transform how we code. As an R package developer, I have contributed to the creation of two packages – gptstudio and gpttools – specifically designed to incorporate LLMs into R workflows within the RStudio environment.
The integration of ChatGPT allows users to efficiently add code comments, debug scripts, and address complex coding challenges directly from RStudio. With text embedding and semantic search, we can teach ChatGPT new tricks, resulting in more precise and context-aware responses. This talk will delve into hands-on examples to showcase the practical application of these models, as well as offer my perspective as a recent entrant into public package development.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: I can’t believe it’s not magic: new tools for data science. Session Code: TALK-1154
It’s All About Perspective: Making a Case for Generative Art - posit::conf(2023)
Presented by Meghan Santiago Harris
This talk explores how to create art in the R language while highlighting some similarities between the skills required for creating generative art and those needed to perform data science tasks in R.
Because the field of data science is inherently task-oriented, it is no wonder that most people struggle to see the utility of generative art past the bounds of a casual hobby. This talk will invite the participant to learn about generative art while focusing on ““why”” people should create it and its potential place in data science. This talk is suitable for all disciplines and artistic abilities. Furthermore, this talk will aim to expand the participant’s perspective on generative art with the following concepts:
- What is generative art and how can it be created in R or Python
- Justifications for generative art within Data Science
- Examples of programming skills that are transferrable between generative art and pragmatic data science projects
Materials:
- Link to the talk repo: https://github.com/Meghansaha/a_case_for_genart
- Link to the slides: https://meghansaha.github.io/a_case_for_genart/#/title-slide
- Link to the artpack package site: https://meghansaha.github.io/artpack/
- Personal Site: https://thetidytrekker.com/
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Developing your skillset; building your career. Session Code: TALK-1109
HTML and CSS for R Users - posit::conf(2023)
Presented by Albert Rapp
You can get the most out of popular R tools by combining them with easy-to-learn HTML & CSS commands.
It’s easy to think that R users do not need HTML and CSS. After all, R is a language designed for data analysis, right? But the reality is that these web standards are everywhere, even in R. In fact, many great tools like {ggtext}, {gt}, {shiny}, and Quarto unlock their full potential when you know a little bit of HTML & CSS. In this talk, I will demonstrate specific examples where R users can benefit from HTML and CSS and show you how to get started with these two languages.
Materials:
- Here’s the link to the video that I mention in the talk: https://youtu.be/QU8wSya-Y9E?si=zw59OSFPl1eJSY7M
- Part 1 of this two part series can be found at https://www.youtube.com/watch?v=jX4_Dnzhl0M
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Compelling design for apps and reports. Session Code: TALK-1105
How You Get Value as a 1-Person Posit Connect Team - posit::conf(2023)
Presented by Sean Nguyen
Sean, a sole Posit Connect developer, shares his experience in delivering business impact. He narrates his transition from crafting one-off reports to developing and deploying robust data science web applications using Python and R with Posit Connect. Despite its common association with large enterprise teams, Sean demonstrates how Posit Connect can be effectively utilized in smaller settings. He presents his work on creating and deploying end-to-end machine learning pipelines in Python, hosting them as APIs, and seamlessly integrating with Shiny apps via Posit Connect. This talk imparts practical strategies and techniques to foster user and executive adoption of Posit Connect within lean (and large) organizations.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Getting %$!@ done: productive workflows for data science. Session Code: TALK-1093
How to Help Developers Make Apps Users Love - posit::conf(2023)
Presented by Michał Parkoła
There are many resources that can help you design better apps.
But what if your org creates many apps?
Scaling good design to larger groups dials the challenge up to 11.
In this talk, I will share how we approach the problem at Appsilon.
- I will present our in-house Design Guide.
- I will share the successes and failures we’ve had building it and helping a wide variety of developers use it
- I will then share some tips about what you might want to consider if you want to help your org build better apps at scale.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Shiny user interfaces. Session Code: TALK-1127
How I Learned to Stop Worrying and Love Public Packages - posit::conf(2023)
Presented by Joe Roberts
The popularity of R and Python for data science is in no small part attributable to the vast collection of extension packages available for everything from common tasks like data cleaning to highly-specialized domain-specific functions. However, with that ease of sharing packages comes a larger target for bad actors trying to exploit them. We’ll explore these security risks along with approaches you can take to mitigate them using Posit Package Manager.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Managing packages. Session Code: TALK-1079
How Data Scientists Broke A/B Testing (And How We Can Fix It) - posit::conf(2023)
Presented by Carl Vogel
As data scientists, we care about making valid statistical inferences from experiments. And we’ve adapted well-established and well-understood statistical methods to help us do so in our A/B tests. Our stakeholders, though, care about making good product decisions efficiently. I’ll describe how the way we design A/B tests can put these goals in tension and why that often causes misalignment between how A/B tests are intended to be used and how they are actually used. I’ll also talk about how I’ve used R to implement alternative experimental approaches that have helped bridge the gap between data scientists and stakeholders.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Bridging the gap between data scientists and decision makers. Session Code: TALK-1076
Hitting the Target(s) of Data Orchestration - posit::conf(2023)
Presented by Alexandros Kouretsis
We are living at a time when the size of datasets can be overwhelming. Add to this that their process involves linking together different computing systems and software, and integrating dynamically changing reference data, and for sure, you have a problem. Reproducibility, traceability, and transparency have left the building.
Here is where Posit Connect along with the vast R ecosystem comes to save the day, allowing the creation of reproducible pipelines. I will share with you my first-hand experience in this presentation. In particular, how we used Targets in Posit Connect combined with AWS technologies in a bioinformatics pipeline. The result? An effective and secure workflow orchestration that is scalable and advances knowledge.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Leave it to the robots: automating your work. Session Code: TALK-1148
Grammar of Graphics in Python with Plotnine - posit::conf(2023)
Presented by Hassan Kibirige
{plotnine} brings the elegance of {ggplot2} to the Python programming language. Learn about The Grammar of Graphics and get a feel of why it is an effective way to create Statistical Graphics.
ggplot2 is one of the most loved visualisation libraries. It implements a Grammar of Graphics system, which requires one to think about data in terms of columns of variables and how to transform them into geometric objects. It is elegant and powerful. This is a talk about plotnine, which brings the elegance of ggplot2 to the Python programming language. It is an invitation to learn about the Grammar of Graphics system and to appreciate it. It will include some tips on how to avoid common frustrations as you learn the system.
Materials:
- Website: https://plotnine.org
- Source Code: https://github.com/has2k1/plotnine
- Slides for this talk: https://github.com/has2k1/my-talks
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Data science with Python. Session Code: TALK-1137

Github Copilot integration with RStudio, it’s finally here! - posit::conf(2023)
Presented by Tom Mock
This talk closes issue #10148, “Github Copilot integration with RStudio”, the most upvoted feature request in RStudio’s history. Code generating AI tools like Github Copilot‚ promise an “AI pair programmer that offers autocomplete-style suggestions as you code”. For the first time, we’ll show a native integration of Copilot into RStudio, helping to build on that promise by providing AI-generated “ghost text” autocompletion with R and other languages. I’ll also provide a comparison of Copilot’s “ghost text” to a chat-style interface in RStudio via the {chattr} package from the Posit MLVerse team.
To make the most of these new features, I’ll walk through some examples of how sharing additional context, comments, code, and other “prompt engineering” can help you go from code-generating AI tools that feels like an annoying backseat driver to an experienced copilot. We’ll close with a robust end-to-end example of how these new RStudio integrations and packages can help you be a more productive developer.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Data science infrastructure for your org. Session Code: TALK-1117
Getting the Most Out of Git - posit::conf(2023)
Presented by Colin Gillespie
Did you believe that Git will solve all of your data science worries? Instead, you’ve been plunged HEAD~1 first into merging (or is that rebasing?) chaos. Issues are ignored, branches are everywhere, main never works, and no one really knows who owns the repository.
Don’t worry! There are ways to escape this pit of despair. Over the last few years, we’ve worked with many data science teams. During this time, we’ve spotted common patterns and also common pitfalls. While one size does not fit all, there are golden rules that should be followed. At the end of this talk, you’ll understand the processes other data science teams implement to make Git work for them.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Getting %$!@ done: productive workflows for data science. Session Code: TALK-1091#
Thumbnail from happygitwithr.com, still from Heaven King video
From Data Confusion to Data Intelligence - posit::conf(2023)
Presented by Elaine McVey and David Meza
Data science teams operate in a unique environment, much different than the IT or software development life cycle. Hope from executives for the impact of data science is extremely high! Understanding of how to make data science efforts successful is very low! This creates an interesting set of organizational challenges for data and analytics teams. These are particularly clear when data science is being introduced at new companies, but plays out at organizations of all sizes. So, how do we navigate this dynamic? We will share some strategies for success.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: From Data Confusion to Data Intelligence. Session Code: KEY-1060
From Concept to Impact: Building and Launching Shiny Apps in the Workplace - posit::conf(2023)
Presented by Tiger Tang
Learn to build and launch a Shiny app like you are working on a start-up!
Unlock the potential of Shiny apps for your organization! Join Tiger as he shares insights from implementing Shiny apps at his workplace, handling over 160,000 internal requests. Discover a practical mindmap to find, build, and enhance Shiny app use cases, ensuring robustness and improved user engagement.
Materials: https://tigertang.org/posit_conf_2023/
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Bridging the gap between data scientists and decision makers. Session Code: TALK-1074
FOCAL Point: Utilizing Python, R, and Shiny to Capture, Process, and Visualize Motion - posit::conf
Presented by Justin Markel & Alyssa Burritt
One of the fastest movements in modern sports is a golf swing. Capturing this motion using a high-speed camera system creates many unique challenges in processing, analyzing, and visualizing the thousands of data points that are generated. These spatial coordinates can be quickly translated through Python scripts to well-known, industry-specific performance metrics and graphics in Shiny. Down the line, R utilities aid more complicated analyses and optimizations, driving new product innovations.
This talk will cover our company’s process of implementing these tools into our workflow and highlight key program features that have helped successfully combine these applications for users with a variety of technical backgrounds.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: R or Python? Why not both!. Session Code: TALK-1120
Extending Quarto - posit::conf(2023)
Presented by Richard Iannone
What are Quarto shortcode extensions? Think of them as powerful little programs you can run in your Quarto docs. I won’t show you how to build a shortcode extension during this talk but rather I’m going to take you on a trip across this new ecosystem of shortcode extensions that people have already written. For example, I’ll introduce you to the fancy-text extension for outputting nicely-formatted versions of fancy strings such as LaTeX and BibTeX; you’ll learn all about the fontawesome, lordicon, academicons, material-icons, and bsicons shortcode extensions that let you add all sorts of icons. This is only a sampling of the shortcode extensions I will present, there will be many other inspiring examples as well.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Quarto (2). Session Code: TALK-1141
epoxy: Super Glue for Data-driven Reports and Shiny Apps - posit::conf(2023)
Presented by Garrick Aden-Buie
R Markdown, Quarto, and Shiny are powerful frameworks that allow authors to create data-driven reports and apps. But truly excellent reports require a lot of work in the final steps to get numerical and stylistic formatting just right.
{epoxy} is a new package that uses {glue} to give authors templating superpowers. Epoxy works in R Markdown and Quarto, in markdown, LaTeX, and HTML outputs. It also provides easy templating for Shiny apps for dynamic data-driven reporting.
Beyond epoxy’s features, this talk will also touch on tips and approaches for data-driven reporting that will be useful to a wide audience, from R Markdown experts to the Quarto and Shiny curious.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Elevating your reports. Session Code: TALK-1155

duckplyr: Tight Integration of duckdb with R and the tidyverse - posit::conf(2023)
Presented by Kirill Müller
The duckplyr R package combines the convenience of dplyr with the performance of DuckDB. Better than dbplyr: Data frame in, data frame out, fully compatible with dplyr.
duckdb is the new high-performance analytical database system that works great with R, Python, and other host systems. dplyr is the grammar of data manipulation in the tidyverse, tightly integrated with R, but it works best for small or medium-sized data. The former has been designed with large or big data in mind, but currently, you need to formulate your queries in SQL.
The new duckplyr package offers the best of both worlds. It transforms a dplyr pipe into a query object that duckdb can execute, using an optimized query plan. It is better than dbplyr because the interface is “data frames in, data frames out”, and no intermediate SQL code is generated.
The talk first presents our results, a bit of the mechanics, and an outlook for this ambitious project.
Materials: https://github.com/duckdblabs/duckplyr/
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Databases for data science with duckdb and dbt. Session Code: TALK-1100
Diversify Your Career with Shiny for Python - posit::conf(2023)
Presented by Gordon Shotwell
A few years ago my company made a sudden shift from R to Python which was quite bad for my career because I didn’t really know Python. The main issue was that I couldn’t find a niche that allowed me to use my existing knowledge while learning the new language.
Shiny for Python is a great niche for R users because none of the Python web frameworks can do what Shiny can do. Additionally, almost all of your knowledge of the R package is applicable to the Python one.
This talk will provide an overview of the Python web application landscape and articulate what Shiny adds to this landscape, and then go through the five things that R users need to know before developing their first Shiny for Python application.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Data science with Python. Session Code: TALK-1138
Data Visualization with Seaborn - posit::conf(2023)
Presented by Michael Waskom
Seaborn is a Python library for statistical data visualization. After nearly a decade of development, seaborn recently introduced an entirely new API that is more explicitly based on a formal grammar of graphics. My talk will introduce this API and contrast it with the classic seaborn interface, sharing insights about the influence of the grammar of graphics on the ergonomics and maintainability of data visualization software.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Data science with Python. Session Code: TALK-1136
Data Science in Production: The Way to a Centralized Infrastructure - posit::conf(2023)
Presented by Oliver Bracht
In this talk, the success story of Covestro’s posit infrastructure is presented. The problem of the leading German material manufacturer was that no common development environment existed. With the help of eoda and Posit, a replicable, centralized development environment for R and Python was created. Although R and Python represent the core of the infrastructure, multiple languages and tools are unified. In addition to the collaboration of Covestro’s data science teams, compliance guidelines could also be better fulfilled. The staging architecture hereby provides developers with a concept for testing and going live with their products. This project presents a best practice approach to a data science infrastructure using Covestro as an example.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Data science infrastructure for your org. Session Code: TALK-1113
CRAN-ial Expansion: Taking Your R Package Development to New Frontiers with R-Universe - posit::conf
Presented by Mo Athanasia Mowinckel
Say goodbye to installation headaches and hello to a universe of possibilities with R-Universe! Take your R package development to new frontiers by organizing and sharing packages beyond the bounds of CRAN. R-Universe’s reliable package-building process strengthens installation and usage instructions, resulting in fewer support requests and an easy installation experience for users. With webpages and an API for exploring packages, R-Universe creates a streamlined and tidy ecosystem for R-package constellations. Also, you can build a custom toolchain for your users, relieving your workload and empowering users to help themselves. Join me to learn how to explore the vastness of R-Universe and expand your package development possibilities!
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Managing packages. Session Code: TALK-1080
Connect on Kubernetes: Content-level Containerization - posit::conf(2023)
Presented by E. David Aja, not Kelly O’Briant
Running Connect with off-host content execution in Kubernetes is very cool and allows you to enable some powerful and sophisticated workflows. The question is, do you really need it? How do you evaluate and decide? Let’s have a candid conversation about whether Connect content execution on Kubernetes is right for you and your organization.
Moving to Kubernetes will introduce complexity, so it’s important to have a strong motivating reason for making the switch. This talk will introduce new Connect features that are made possible by content-level containerization.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Data science infrastructure for your org. Session Code: TALK-1116
Conformal Inference with Tidymodels - posit::conf(2023)
Presented by Max Kuhn
Conformal inference theory enables any model to produce probabilistic predictions, such as prediction intervals. We’ll demonstrate how these analytical methods can be used with tidymodels. Simulations will show that the results have good coverage (i.e., a 90% interval should include the real point 90% of the time).
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Tidy up your models. Session Code: TALK-1085

Can I Have a Word? - posit::conf(2023)
Presented by Ellis Hughes
Since its release, {gt} has won over the hearts of many due to its flexible and powerful table-generating abilities. However, in cases where office products were required by downstream users, {gt}’s potential remained untapped. That all changed in 2022 when Rich Iannone and I collaborated to add Word documents as an official output type. Now, data scientists can engage stakeholders directly, wherever they are.
Join me for an upcoming talk where I’ll share my excitement about the new opportunities this update presents for the R community as well as future developments we can look forward to.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Elevating your reports. Session Code: TALK-1156

Becoming an R Package Author (or How I Got Rich Responding to GitHub Issues) - posit::conf(2023)
Presented by Matt Herman
The transition from analyzing data in R to making packages in R can feel like a big step. Writing code to clean data or make visualizations seems categorically different from building robust “software” on which other people rely.
In this talk, I’ll show why that distinction is not necessarily true by discussing my personal experience from learning R in graduate school to reporting bugs on GitHub to becoming a co-author of the tidycensus package and a practicing data scientist. The positive and supportive R community on GitHub, Twitter, and elsewhere contributes to why anyone who writes R code can become a package author.
- I have not actually gotten rich but I did get freelance data work based on my package contributions!
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Package development. Session Code: TALK-1133
Adding a Touch of glitr: Developing a Package of Themes on Top of ggplot - posit::conf(2023)
Presented by Aaron Chafetz and Karishma Srikanth Please note, a power issue cut off the first five minutes of the talk.
Explore how our team at the US Agency for International Development (USAID) created our own data viz branding R package on top of ggplot2 and how you can too.
How do you create brand cohesion across your large team when it comes to data viz? Inspired by the BBC’s bbplot, our team at the US Agency for International Development (USAID) developed a package on top of ggplot2 to create a common look and feel for our team’s products. This effort improved not just the cohesiveness of our work, but also trustworthiness. By creating this package, we reduced the reliance on using defaults and the time spent on each project customizing numerous graphic elements. More importantly, this package provided an easier on-ramp for new teammates to adopt R. We share our journey within a federal agency developing a style guide and aim to guide and inspire other organizations who could benefit from developing their own branding package and guidance.
Materials:
- https://speakerdeck.com/achafetz/adding-a-touch-of-glitr
- https://usaid-oha-si.github.io/glitr/
- https://issuu.com/achafetz/docs/oha_styleguide
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Compelling design for apps and reports. Session Code: TALK-1103
A hacker’s guide to open source LLMs - posit::conf(2023)
Presented by Jeremy Howard
In this deeply informative video, Jeremy Howard, co-founder of fast.ai and creator of the ULMFiT approach on which all modern language models (LMs) are based, takes you on a comprehensive journey through the fascinating landscape of LMs. Starting with the foundational concepts, Jeremy introduces the architecture and mechanics that make these AI systems tick. He then delves into critical evaluations of GPT-4, illuminates practical uses of language models in code writing and data analysis, and offers hands-on tips for working with the OpenAI API. The video also provides expert guidance on technical topics such as fine-tuning, decoding tokens, and running private instances of GPT models.
As we move further into the intricacies, Jeremy unpacks advanced strategies for model testing and optimization, utilizing tools like GPTQ and Hugging Face Transformers. He also explores the potential of specialized datasets like Orca and Platypus for fine-tuning and discusses cutting-edge trends in Retrieval Augmented Generation and information retrieval. Whether you’re new to the field or an established professional, this presentation offers a wealth of insights to help you navigate the ever-evolving world of language models.
(The above summary was, of course, created by an LLM!)
For the notebook used in this talk, see https://github.com/fastai/lm-hackers .
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Notebooks+LLMs may just be the future of coding. Session Code: KEY-1107
How the R for Data Science (R4DS) Online Learning Community Made Me a Better Student - posit::conf
Presented by Lydia Gibson
Through my participation in R4DS Online Learning Community, I have advanced my R and data science skills, making me a better student than I otherwise would have been through just my studies. As a non-traditional MS Statistics student with an undergraduate background in economics, I had absolutely no experience with the R programming language prior to pursuing my Master’s degree. In July 2021, with hopes of getting a headstart on learning R before beginning my degree program, I joined the R4DS Slack Workspace. Along with helping to improve my programming skills, R4DS has connected me with scholarships, mentorship, and other opportunities, and I think that it would be beneficial for other students to know about this great resource.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Developing your skillset; building your career. Session Code: TALK-1110