Max Kuhn
Principal Software Engineer
Max has been improving R’s modeling capabilities and maintaining about 30 packages, including caret. At Posit, Max primarily works on modeling and data analysis APIs.
He was a Senior Director of Nonclinical Statistics at Pfizer Global R&D in Connecticut. He has applied models in the pharmaceutical and molecular diagnostic industries for over 18 years. Max has a Ph.D. in Biostatistics.
He and Kjell Johnson wrote the book Applied Predictive Modeling, which won the Ziegel award from the American Statistical Association, recognizing the best book reviewed in Technometrics in 2015.
Software by Max Kuhn#
Events attended by Max Kuhn#
Posts and resources by Max Kuhn#
Max Kuhn - TabPFN: A Deep-Learning Solution for Tabular Data
TabPFN: A Deep-Learning Solution for Tabular Data (Max Kuhn)
Abstract: There have been numerous proposals for deep neural networks for tabular data, such as rectangular data sets (e.g., data frames). To date, none have really worked well and take far too long to train. TabPFN is a model that emulates a Bayesian approach and trains a deep learning model on a prior of simulated tabular datasets. Version 2 was released this year and offers several significant advantages, but also has one notable disadvantage. I’ll introduce this model and show an example.
Presented at the 2025 R/Pharma Conference Europe/US Track.
Resources mentioned in the presentation:
- Presentation slides: https://topepo.github.io/2025-r-pharma/

Max Kuhn - Measuring LLM Effectiveness
For information on upcoming conferences, visit https://www.dataconf.ai .
Measuring LLM Effectiveness by Max Kuhn
Abstract: How can we quantify how accurately LLMs perform? In late 2024, Anthropic released a preprint of a manuscript about statistically analyzing model evaluations. The concepts are on target, but the statistical tactics have narrow applicability. A simpler statistical framework can be used to quantify LLM models that can be used in many more scenarios/experimental designs. We’ll describe these methods and show an example.
Bio: Max Kuhn is a software engineer at Posit PBC (nee RStudio). He is working on improving R’s modeling capabilities and maintaining about 30 packages, including caret. He was a Senior Director of Nonclinical Statistics at Pfizer Global R&D in Connecticut. He has been applying models in the pharmaceutical and diagnostic industries for over 18 years. Max has a Ph.D. in Biostatistics. He and Kjell Johnson wrote the book Applied Predictive Modeling, which won the Ziegel award from the American Statistical Association, which recognizes the best book reviewed in Technometrics in 2015. He has co-written several other books: Feature Engineering and Selection, Tidy Models with R, and Applied Machine Learning for Tabular Data (in process).
Presented at The New York Data Science & AI Conference Presented by Lander Analytics (August 27, 2025)
Hosted by Lander Analytics
(https://www.landeranalytics.com )

Max Kuhn - Evaluating Time-to-Event Models is Hard
Censoring in data can frequently occur when we have a time-to-event. For example, if we order a pizza that has not yet arrived after 5 minutes, it is censored; we don’t know the final delivery time, but we know it is at least 5 minutes. Censored values can appear in clinical trials, customer churn analysis, pet adoption statistics, or anywhere a duration of time is used. I’ll describe different ways to assess models for censored data and focus on metrics requiring an evaluation time (i.e., how well does the model work at 5 minutes?). I’ll also describe how you can use tidymodel’s expanded features for these data to tell if your model fits the data well. This talk is designed to be paired with the other tidymodels talk by Hannah Frick.
Talk by Max Kuhn
Slides: https://topepo.github.io/2024-posit-conf/ GitHub Repo: https://github.com/topepo/2024-posit-conf


Bryan Shalloway - Understanding, Generating, and Evaluating Prediction Intervals
For many problems concerning prediction, providing intervals is more useful than just offering point estimates. This talk will provide an overview of:
- How to think about uncertainty in your predictions (e.g., noise in the data vs uncertainty in estimation)
- Approaches to producing prediction intervals (e.g., parametric vs conformal)
- Measures and considerations when evaluating and training models for prediction intervals
While I will touch on some similar topics as Max Kuhn’s posit::conf(2023) talk on conformal inference, my talk will cover different points and have a broader focus. I hope attendees gain an understanding of some of the key tools and concepts related to prediction intervals and that they leave inspired to learn more.
Talk by Bryan Shalloway
Slides: https://github.com/brshallo/posit-2024/blob/main/shalloway-posit-conf.pdf GitHub Repo: https://github.com/brshallo/posit-2024

Max Kuhn -SHINYLIVE IS SO EASY
SHINYLIVE IS SO EASY by Max Kuhn
Visit https://rstats.ai for information on upcoming conferences.
Abstract: shinylive is an extension to the Quarto open-source scientific and technical publishing system. It enables shiny applications to run locally, without a shiny server using WebAssembly. I’ll show examples and discuss the limitations of using shinylive.
Bio: Max Kuhn is a software engineer at Posit PBC (nee RStudio). He is working on improving R’s modeling capabilities and maintaining about 30 packages, including caret. He was a Senior Director of Nonclinical Statistics at Pfizer Global R&D in Connecticut. He has been applying models in the pharmaceutical and diagnostic industries for over 18 years. Max has a Ph.D. in Biostatistics. He and Kjell Johnson wrote the book Applied Predictive Modeling, which won the Ziegel award from the American Statistical Association, which recognizes the best book reviewed in Technometrics in 2015. He has co-written several other books: Feature Engineering and Selection, Tidy Models with R, and Applied Machine Learning for Tabular Data (in process).
Twitter: https://twitter.com/topepos
Presented at the 2024 New York R Conference (May 17, 2024) Hosted by Lander Analytics (https://landeranalytics.com )

Conformal Inference with Tidymodels - posit::conf(2023)
Presented by Max Kuhn
Conformal inference theory enables any model to produce probabilistic predictions, such as prediction intervals. We’ll demonstrate how these analytical methods can be used with tidymodels. Simulations will show that the results have good coverage (i.e., a 90% interval should include the real point 90% of the time).
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Tidy up your models. Session Code: TALK-1085

Max Kuhn - Serverless Quarto
Serverless Quarto - Max Kuhn
Resources mentioned in the presentation:
- Slides: https://topepo.github.io/2023-r-pharma
- Example: https://topepo.github.io/shinylive-in-book-test
Bio: Max Kuhn is a software engineer at Posit (née RStudio). He is working on improving R’s modeling capabilities and maintaining about 30 packages, including caret. He was a Senior Director of Nonclinical Statistics at Pfizer and had been applying models in the pharmaceutical and diagnostic industries for over 18 years. Max has a Ph.D. in Biostatistics. He, and Kjell Johnson, wrote the book Applied Predictive Modeling, which won the Ziegel award from the American Statistical Association. Their second book, Feature Engineering and Selection, was published in 2019, and his book Tidy Models with R, was published in 2022.
Presented at the 2023 R/Pharma Conference (October 25, 2023)

Max Kuhn - The Post-Modeling Model to Fix the Model
The Post-Modeling Model to Fix the Model by Max Kuhn
Visit https://rstats.ai/nyr to learn more.
Abstract: It’s possible to get a model that has good numerical performance but has predictions that are not really consistent with the data. Model calibration is a tool that can fix this. We’ll show some examples of poor predictions and how different calibration tools can re-align them to the data.
Bio: Max Kuhn is a software engineer at RStudio. He is currently working on improving R’s modeling capabilities. He was a Senior Director of Nonclinical Statistics at Pfizer Global R&D in Connecticut. He was applying models in the pharmaceutical and diagnostic industries for over 18 years. Max has a Ph.D. in Biostatistics. Max is the author of numerous R packages for techniques in machine learning and reproducible research. He, and Kjell Johnson, wrote the book Applied Predictive Modeling, which won the Ziegel award from the American Statistical Association, which recognizes the best book reviewed in Technometrics in 2015. Their second book, Feature Engineering and Selection, was published in 2019.
Twitter: https://twitter.com/topepos
Presented at the 2023 New York R Conference (July 14, 2023)

posit::conf(2023) Workshop: Advanced tidymodels
Register now: http://pos.it/conf Instructor: Max Kuhn, Software Engineer, Posit Workshop Duration: 1-Day Workshop
This workshop is for you if you: • have used tidymodels packages like recipes, rsample, and parsnip • are comfortable with tidyverse syntax (e.g. piping, mutates, pivoting) • have some experience with resampling and modeling (e.g., linear regression, random forests, etc.), but we don’t expect you to be an expert in these
In this workshop, you will learn more about model optimization using the tune and finetune packages, including racing and iterative methods. You’ll be able to do more sophisticated feature engineering with recipes. Time permitting, model ensembles via stacking will be introduced. This course is focused on the analysis of tabular data and does not include deep learning methods.
Participants who have completed the “Introduction to tidymodels” workshop will be well-prepared for this course. Participants who are new to tidymodels will benefit from taking the Introduction to tidymodels workshop before joining this one

Max Kuhn | parsnip A tidy model interface | RStudio (2019)
parsnip is a new tidymodels package that generalizes model interfaces across packages. The idea is to have a single function interface for types of specific models (e.g. logistic regression) that lets the user choose the computational engine for training. For example, logistic regression could be fit with several R packages, Spark, Stan, and Tensorflow. parsnip also standardizes the return objects and sets up some new features for some upcoming packages.
VIEW MATERIALS https://github.com/rstudio/rstudio-conf/tree/master/2019/Parsnip--Max_Kuhn
About the Author Max Kuhn Dr. Max Kuhn is a Software Engineer at RStudio. He is the author or maintainer of several R packages for predictive modeling including caret, Cubist, C50 and others. He routinely teaches classes in predictive modeling at rstudio::conf, Predictive Analytics World, and UseR! and his publications include work on neuroscience biomarkers, drug discovery, molecular diagnostics and response surface methodology. He and Kjell Johnson wrote the award-winning book Applied Predictive Modeling in 2013





