yardstick
Tidy methods for measuring model performance
yardstick is an R package for estimating model performance using tidy data principles. It provides a consistent, dplyr-like syntax for calculating accuracy metrics on classification and regression models.
The package supports both binary and multiclass classification metrics with multiple estimation methods (macro, micro, hand-till). It works seamlessly with grouped data frames for calculating metrics across resamples, and includes autoplot methods for visualizing performance curves like ROC, precision-recall, and gain curves. All metrics return results in a consistent tibble format that integrates naturally with tidymodels workflows.
Contributors#
Resources featuring yardstick#
Precision Medicine for All: Using Tidymodels to Validate PRS in Brazil (Flávia Rius) | posit::conf
Precision Medicine for All: Using Tidymodels to Validate Breast Cancer PRS in Brazil
Speaker(s): Flávia E. Rius
Abstract:
Polygenic risk scores (PRS) are a powerful way to measure someone’s risk for common diseases, such as diabetes, cardiovascular disease, and cancer. However, most PRS are developed using data from European populations, making it challenging to generalize results to other ancestries. In this talk, I’ll show how I used tidymodels tools—like yardstick, recipes, and workflows—to calculate metrics and validate a breast cancer PRS in the highly admixed Brazilian population. You will learn how to leverage tidymodels to make precision medicine more inclusive. posit::conf(2025) Subscribe to posit::conf updates: https://posit.co/about/subscription-management/
posit::conf(2023) Workshop: Introduction to tidymodels
Register now: http://pos.it/conf Instructors: Hannah Frick, Simon Couch, Emil Hvitfeldt Workshop Duration: 1-Day Workshop
This workshop is for you if you: • have intermediate R knowledge, experience with tidyverse packages, and either of the R pipes • can read data into R, transform and reshape data, and make a wide variety of graphs • have had some exposure to basic statistical concepts such as linear models, random forests, etc.
Intermediate or expert familiarity with modeling or machine learning is not required.
This workshop will teach you core tidymodels packages and their uses: data splitting/resampling with rsample, model fitting with parsnip, measuring model performance with yardstick, and basic pre-processing with recipes. Time permitting, you’ll be introduced to model optimization using the tune package. You’ll learn tidymodels syntax as well as the process of predictive modeling for tabular data



MLOps with vetiver in Python and R | Led by Julia Silge & Isabel Zimmerman
Many data scientists understand what goes into training a machine learning model, but creating a strategy to deploy and maintain that model can be daunting. In this meetup, learn what MLOps is, what principles can be used to create a practical MLOps strategy, and what kinds of tasks and components are involved. See how to get started with vetiver, a framework for MLOps tasks in R and Python that provides fluent tooling to version, deploy, and monitor your models.
Blog Post with Q&A: https://www.rstudio.com/blog/vetiver-answering-your-questions/
For folks interested in seeing what data artifacts look like on Connect, we have these for R: ⬢ Versioned model object: https://colorado.rstudio.com/rsc/seattle-housing-pin/ ⬢ Deployed API: https://colorado.rstudio.com/rsc/seattle-housing/ ⬢ Monitoring dashboard: https://colorado.rstudio.com/rsc/seattle-housing-dashboard/ ⬢ Create a custom yardstick metric: https://juliasilge.com/blog/nyc-airbnb/ ⬢ End point used in the demo: https://colorado.rstudio.com/rsc/scooby
Our team’s reading list (mentioned in the meetup)
Books: ⬢ Designing Machine Learning Systems by Chip Huyen: https://www.oreilly.com/library/view/designing-machine-learning/9781098107956/
Articles: ⬢ “Machine Learning Operations (MLOps): Overview, Definition, and Architecture” by Kreuzberger et al: https://arxiv.org/abs/2205.02302 ⬢ “From Concept Drift to Model Degradation: An Overview on Performance-Aware Drift Detectors” by Bayram et al: https://arxiv.org/abs/2203.11070 ⬢ “Towards Observability for Production Machine Learning Pipelines” by Shankar et al: https://arxiv.org/pdf/2108.13557.pdf ⬢ “The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction” by Breck et al: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/aad9f93b86b7addfea4c419b9100c6cdd26cacea.pdf
Web content: ⬢ How ML Breaks: A Decade of Outages for One Large ML Pipeline by Papasian and Underwood: https://www.youtube.com/watch?v=hBMHohkRgAA ⬢ MLOps Principles by INNOQ: https://ml-ops.org/content/mlops-principles ⬢ Google’s Practitioners Guide to MLOps by Salama et al: https://services.google.com/fh/files/misc/practitioners_guide_to_mlops_whitepaper.pdf ⬢ Gently Down the Stream by Mitch Seymour: https://www.gentlydownthe.stream/
Speaker bios: Julia Silge is a software engineer at RStudio focusing on open source MLOps tools, as well as an author and international keynote speaker. Julia loves making beautiful charts, Jane Austen, and her two cats.
Isabel Zimmerman is also a software engineer on the open source team at RStudio, where she works on building MLOps frameworks. When she’s not geeking out over new data science techniques, she can be found hanging out with her dog or watching Marvel movies




