<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Machine Learning on Posit Open Source</title>
    <link>https://posit-open-source.netlify.app/categories/machine-learning/</link>
    <description>Recent content in Machine Learning on Posit Open Source</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <lastBuildDate>Mon, 12 Jan 2026 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://posit-open-source.netlify.app/categories/machine-learning/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>orbital 0.4.0</title>
      <link>https://posit-open-source.netlify.app/blog/tidyverse/orbital-0-4-0/</link>
      <pubDate>Mon, 12 Jan 2026 00:00:00 +0000</pubDate>
      <guid>https://posit-open-source.netlify.app/blog/tidyverse/orbital-0-4-0/</guid>
      <dc:creator>Emil Hvitfeldt</dc:creator><description><![CDATA[<!--
TODO:
* [x] Look over / edit the post's title in the yaml
* [x] Edit (or delete) the description; note this appears in the Twitter card
* [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html))
* [x] Find photo & update yaml metadata
* [x] Create `thumbnail-sq.jpg`; height and width should be equal
* [x] Create `thumbnail-wd.jpg`; width should be >5x height
* [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)
* [x] Add intro sentence, e.g. the standard tagline for the package
* [ ] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html)
-->
<p>We&rsquo;re over the moon to announce the release of <a href="https://orbital.tidymodels.org/" target="_blank" rel="noopener">orbital</a>
 0.4.0. orbital lets you predict in databases using tidymodels workflows.</p>
<p>You can install it from CRAN with:</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"orbital"</span><span class='o'>)</span></span></code></pre>
</div>
<p>This blog post will cover the highlights, which are post processing support and the new <code>show_query()</code> method.</p>
<p>You can see a full list of changes in the <a href="https://orbital.tidymodels.org/news/index.html#orbital-040" target="_blank" rel="noopener">release notes</a>
.</p>
<h2 id="post-processing-support">Post processing support
</h2>
<p>The biggest improvement in this version is that <a href="https://orbital.tidymodels.org/reference/orbital.html" target="_blank" rel="noopener"><code>orbital()</code></a>
 now works for supported <a href="https://tailor.tidymodels.org/" target="_blank" rel="noopener">tailor</a>
 methods. See <a href="https://orbital.tidymodels.org/articles/supported-models.html#tailor-adjustments" target="_blank" rel="noopener">vignette</a>
 for a list of all supported post-processors.</p>
<p>Let&rsquo;s start by fitting a classification model on the <code>penguins</code> data set, using {xgboost} as the engine. We will be showcasing using an adjustment that only works on binary classification and will thus recode <code>species</code> to have levels <code>&quot;Adelie&quot;</code> and <code>&quot;not_Adelie&quot;</code>.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>penguins</span><span class='o'>$</span><span class='nv'>species</span> <span class='o'>&lt;-</span> <span class='nf'>forcats</span><span class='nf'>::</span><span class='nf'><a href='https://forcats.tidyverse.org/reference/fct_recode.html'>fct_recode</a></span><span class='o'>(</span></span>
<span> <span class='nv'>penguins</span><span class='o'>$</span><span class='nv'>species</span>,</span>
<span> not_Adelie <span class='o'>=</span> <span class='s'>"Chinstrap"</span>, not_Adelie <span class='o'>=</span> <span class='s'>"Gentoo"</span></span>
<span><span class='o'>)</span></span></code></pre>
</div>
<p>After we have modified the data, we set up a simple workflow, with a preprocessor using recipes and the model specification using parsnip.</p>
<p>We also set up a post processor using the tailor package. A single adjustment will be done by adding <code>adjust_equivocal_zone()</code>. This will apply an equivocal zone to our binary classification model. Stopping predictions that are too close to the thresholds by labeling them as <code>&quot;[EQ]&quot;</code>. Setting the argument <code>value = 0.2</code> means that any predictions with a predicted probability of between 0.3 and 0.7 will be predicted as <code>&quot;[EQ]&quot;</code> instead.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>rec_spec</span> <span class='o'>&lt;-</span> <span class='nf'>recipe</span><span class='o'>(</span><span class='nv'>species</span> <span class='o'>~</span> <span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>penguins</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'>step_unknown</span><span class='o'>(</span><span class='nf'>all_nominal_predictors</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'>step_dummy</span><span class='o'>(</span><span class='nf'>all_nominal_predictors</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'>step_impute_mean</span><span class='o'>(</span><span class='nf'>all_numeric_predictors</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'>step_zv</span><span class='o'>(</span><span class='nf'>all_predictors</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span></span>
<span></span>
<span><span class='nv'>lr_spec</span> <span class='o'>&lt;-</span> <span class='nf'>boost_tree</span><span class='o'>(</span>tree_depth <span class='o'>=</span> <span class='m'>1</span>, trees <span class='o'>=</span> <span class='m'>5</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'>set_mode</span><span class='o'>(</span><span class='s'>"classification"</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'>set_engine</span><span class='o'>(</span><span class='s'>"xgboost"</span><span class='o'>)</span></span>
<span></span>
<span><span class='nv'>tlr_spec</span> <span class='o'>&lt;-</span> <span class='nf'>tailor</span><span class='o'>(</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'>adjust_equivocal_zone</span><span class='o'>(</span>value <span class='o'>=</span> <span class='m'>0.2</span><span class='o'>)</span></span>
<span></span>
<span><span class='nv'>wf_spec</span> <span class='o'>&lt;-</span> <span class='nf'>workflow</span><span class='o'>(</span><span class='nv'>rec_spec</span>, <span class='nv'>lr_spec</span>, <span class='nv'>tlr_spec</span><span class='o'>)</span></span>
<span><span class='nv'>wf_fit</span> <span class='o'>&lt;-</span> <span class='nf'>fit</span><span class='o'>(</span><span class='nv'>wf_spec</span>, data <span class='o'>=</span> <span class='nv'>penguins</span><span class='o'>)</span></span></code></pre>
</div>
<p>With this fitted workflow object, we can call <a href="https://orbital.tidymodels.org/reference/orbital.html" target="_blank" rel="noopener"><code>orbital()</code></a>
 on it to create an orbital object. Notice that for <code>adjust_equivocal_zone()</code> to work, we need to set <code>type = c(&quot;class&quot;, &quot;prob&quot;)</code> as both are required for the <code>adjust_equivocal_zone()</code> transformation.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>orbital_obj</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://orbital.tidymodels.org/reference/orbital.html'>orbital</a></span><span class='o'>(</span><span class='nv'>wf_fit</span>, type <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"class"</span>, <span class='s'>"prob"</span><span class='o'>)</span><span class='o'>)</span></span>
<span><span class='nv'>orbital_obj</span></span>
<span><span class='c'>#&gt; </span></span>
<span><span class='c'>#&gt; <span style='color: #00BBBB;'>──</span> <span style='font-weight: bold;'>orbital Object</span> <span style='color: #00BBBB;'>───────────────────────────────────────────────────────</span></span></span>
<span><span class='c'>#&gt; • bill_length_mm = dplyr::if_else(is.na(bill_length_mm), 43.92193, ...</span></span>
<span><span class='c'>#&gt; • flipper_length_mm = dplyr::if_else(is.na(flipper_length_mm), 201 ...</span></span>
<span><span class='c'>#&gt; • .pred_class = dplyr::case_when(1 - 1/(1 + exp(dplyr::case_when(b ...</span></span>
<span><span class='c'>#&gt; • .pred_Adelie = 1 - 1/(1 + exp(dplyr::case_when(bill_length_mm &lt; ...</span></span>
<span><span class='c'>#&gt; • .pred_not_Adelie = 1 - (1 - 1/(1 + exp(dplyr::case_when(bill_len ...</span></span>
<span><span class='c'>#&gt; • .pred_class = dplyr::case_when( .pred_Adelie &gt; 0.5 + 0.2 ~ 'Adel ...</span></span>
<span><span class='c'>#&gt; ─────────────────────────────────────────────────────────────────────────</span></span>
<span><span class='c'>#&gt; 6 equations in total.</span></span>
<span></span></code></pre>
</div>
<p>This object contains all the information that is needed to produce predictions. Which we can produce with <a href="https://rdrr.io/r/stats/predict.html" target="_blank" rel="noopener"><code>predict()</code></a>
.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>preds</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/stats/predict.html'>predict</a></span><span class='o'>(</span><span class='nv'>orbital_obj</span>, <span class='nv'>penguins</span><span class='o'>)</span></span>
<span><span class='nv'>preds</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 344 × 3</span></span></span>
<span><span class='c'>#&gt;    .pred_class .pred_Adelie .pred_not_Adelie</span></span>
<span><span class='c'>#&gt;    <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>              <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span>            <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> Adelie             0.845            0.155</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> Adelie             0.845            0.155</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> Adelie             0.845            0.155</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> not_Adelie         0.291            0.709</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> Adelie             0.845            0.155</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> Adelie             0.845            0.155</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> Adelie             0.845            0.155</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> Adelie             0.845            0.155</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> Adelie             0.845            0.155</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>10</span> Adelie             0.845            0.155</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 334 more rows</span></span></span>
<span></span></code></pre>
</div>
<p>The predictions are working; however, we don&rsquo;t see any evidence that <code>adjust_equivocal_zone()</code> is working. A call to <code>count()</code> reveals that a couple of observation lands in the equivocal zone.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>count</span><span class='o'>(</span><span class='nv'>preds</span>, <span class='nv'>.pred_class</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 3 × 2</span></span></span>
<span><span class='c'>#&gt;   .pred_class     n</span></span>
<span><span class='c'>#&gt;   <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>       <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>1</span> Adelie        144</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>2</span> [EQ]           15</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>3</span> not_Adelie    185</span></span>
<span></span></code></pre>
</div>
<p>And we can further verify that they are correct.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/stats/filter.html'>filter</a></span><span class='o'>(</span><span class='nv'>preds</span>, <span class='nv'>.pred_class</span> <span class='o'>==</span> <span class='s'>'[EQ]'</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 15 × 3</span></span></span>
<span><span class='c'>#&gt;    .pred_class .pred_Adelie .pred_not_Adelie</span></span>
<span><span class='c'>#&gt;    <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>              <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span>            <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> [EQ]               0.483            0.517</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> [EQ]               0.483            0.517</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> [EQ]               0.483            0.517</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> [EQ]               0.483            0.517</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> [EQ]               0.483            0.517</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> [EQ]               0.483            0.517</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> [EQ]               0.483            0.517</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> [EQ]               0.348            0.652</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> [EQ]               0.348            0.652</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>10</span> [EQ]               0.348            0.652</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>11</span> [EQ]               0.348            0.652</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>12</span> [EQ]               0.348            0.652</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>13</span> [EQ]               0.483            0.517</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>14</span> [EQ]               0.483            0.517</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>15</span> [EQ]               0.483            0.517</span></span>
<span></span></code></pre>
</div>
<h2 id="new-show_query-method">New show_query method
</h2>
<p>One of the main purposes of orbital is to allow for predictions in databases.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://dbi.r-dbi.org'>DBI</a></span><span class='o'>)</span></span>
<span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://rsqlite.r-dbi.org'>RSQLite</a></span><span class='o'>)</span></span>
<span></span>
<span><span class='nv'>con_sqlite</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://dbi.r-dbi.org/reference/dbConnect.html'>dbConnect</a></span><span class='o'>(</span><span class='nf'><a href='https://rsqlite.r-dbi.org/reference/SQLite.html'>SQLite</a></span><span class='o'>(</span><span class='o'>)</span>, path <span class='o'>=</span> <span class='s'>":memory:"</span><span class='o'>)</span></span>
<span><span class='nv'>penguins_sqlite</span> <span class='o'>&lt;-</span> <span class='nf'>copy_to</span><span class='o'>(</span><span class='nv'>con_sqlite</span>, <span class='nv'>penguins</span>, name <span class='o'>=</span> <span class='s'>"penguins_table"</span><span class='o'>)</span></span></code></pre>
</div>
<p>Having set up a database we could have used <a href="https://orbital.tidymodels.org/reference/orbital_sql.html" target="_blank" rel="noopener"><code>orbital_sql()</code></a>
 to show what the SQL query would have looked like. For quick testing, the output isn&rsquo;t immediately ready to be pasted into its own file due to the <code>&lt;SQL&gt;</code> fragments within the output.</p>
<p>The <code>show_query()</code> method has been implemented to see exactly what the generated SQL looks like.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>show_query</span><span class='o'>(</span><span class='nv'>orbital_obj</span>, <span class='nv'>con_sqlite</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; CASE WHEN ((`bill_length_mm` IS NULL)) THEN 43.9219298245614 WHEN NOT ((`bill_length_mm` IS NULL)) THEN `bill_length_mm` END AS bill_length_mm</span></span>
<span><span class='c'>#&gt; CASE WHEN ((`flipper_length_mm` IS NULL)) THEN 201.0 WHEN NOT ((`flipper_length_mm` IS NULL)) THEN `flipper_length_mm` END AS flipper_length_mm</span></span>
<span><span class='c'>#&gt; CASE</span></span>
<span><span class='c'>#&gt; WHEN ((1.0 - 1.0 / (1.0 + EXP(((((CASE</span></span>
<span><span class='c'>#&gt; WHEN (`bill_length_mm` &lt; 42.4000015) THEN 0.627138138</span></span>
<span><span class='c'>#&gt; WHEN ((`bill_length_mm` &gt;= 42.4000015 OR (`bill_length_mm` IS NULL))) THEN (-0.449751347)</span></span>
<span><span class='c'>#&gt; END + CASE</span></span>
<span><span class='c'>#&gt; WHEN (`bill_length_mm` &lt; 43.2999992) THEN 0.425288886</span></span>
<span><span class='c'>#&gt; WHEN ((`bill_length_mm` &gt;= 43.2999992 OR (`bill_length_mm` IS NULL))) THEN (-0.398178101)</span></span>
<span><span class='c'>#&gt; END) + CASE</span></span>
<span><span class='c'>#&gt; WHEN (`bill_length_mm` &lt; 42.4000015) THEN 0.380251437</span></span>
<span><span class='c'>#&gt; WHEN ((`bill_length_mm` &gt;= 42.4000015 OR (`bill_length_mm` IS NULL))) THEN (-0.306771189)</span></span>
<span><span class='c'>#&gt; END) + CASE</span></span>
<span><span class='c'>#&gt; WHEN (`bill_length_mm` &lt; 44.4000015) THEN 0.286071777</span></span>
<span><span class='c'>#&gt; WHEN ((`bill_length_mm` &gt;= 44.4000015 OR (`bill_length_mm` IS NULL))) THEN (-0.330096036)</span></span>
<span><span class='c'>#&gt; END) + CASE</span></span>
<span><span class='c'>#&gt; WHEN (`flipper_length_mm` &lt; 203.0) THEN 0.209298179</span></span>
<span><span class='c'>#&gt; WHEN ((`flipper_length_mm` &gt;= 203.0 OR (`flipper_length_mm` IS NULL))) THEN (-0.348002464)</span></span>
<span><span class='c'>#&gt; END) + LOG(0.44186047 / (1.0 - 0.44186047))))) &gt; 0.5) THEN 'Adelie'</span></span>
<span><span class='c'>#&gt; ELSE 'not_Adelie'</span></span>
<span><span class='c'>#&gt; END AS .pred_class</span></span>
<span><span class='c'>#&gt; 1.0 - 1.0 / (1.0 + EXP(((((CASE</span></span>
<span><span class='c'>#&gt; WHEN (`bill_length_mm` &lt; 42.4000015) THEN 0.627138138</span></span>
<span><span class='c'>#&gt; WHEN ((`bill_length_mm` &gt;= 42.4000015 OR (`bill_length_mm` IS NULL))) THEN (-0.449751347)</span></span>
<span><span class='c'>#&gt; END + CASE</span></span>
<span><span class='c'>#&gt; WHEN (`bill_length_mm` &lt; 43.2999992) THEN 0.425288886</span></span>
<span><span class='c'>#&gt; WHEN ((`bill_length_mm` &gt;= 43.2999992 OR (`bill_length_mm` IS NULL))) THEN (-0.398178101)</span></span>
<span><span class='c'>#&gt; END) + CASE</span></span>
<span><span class='c'>#&gt; WHEN (`bill_length_mm` &lt; 42.4000015) THEN 0.380251437</span></span>
<span><span class='c'>#&gt; WHEN ((`bill_length_mm` &gt;= 42.4000015 OR (`bill_length_mm` IS NULL))) THEN (-0.306771189)</span></span>
<span><span class='c'>#&gt; END) + CASE</span></span>
<span><span class='c'>#&gt; WHEN (`bill_length_mm` &lt; 44.4000015) THEN 0.286071777</span></span>
<span><span class='c'>#&gt; WHEN ((`bill_length_mm` &gt;= 44.4000015 OR (`bill_length_mm` IS NULL))) THEN (-0.330096036)</span></span>
<span><span class='c'>#&gt; END) + CASE</span></span>
<span><span class='c'>#&gt; WHEN (`flipper_length_mm` &lt; 203.0) THEN 0.209298179</span></span>
<span><span class='c'>#&gt; WHEN ((`flipper_length_mm` &gt;= 203.0 OR (`flipper_length_mm` IS NULL))) THEN (-0.348002464)</span></span>
<span><span class='c'>#&gt; END) + LOG(0.44186047 / (1.0 - 0.44186047)))) AS .pred_Adelie</span></span>
<span><span class='c'>#&gt; 1.0 - (1.0 - 1.0 / (1.0 + EXP(((((CASE</span></span>
<span><span class='c'>#&gt; WHEN (`bill_length_mm` &lt; 42.4000015) THEN 0.627138138</span></span>
<span><span class='c'>#&gt; WHEN ((`bill_length_mm` &gt;= 42.4000015 OR (`bill_length_mm` IS NULL))) THEN (-0.449751347)</span></span>
<span><span class='c'>#&gt; END + CASE</span></span>
<span><span class='c'>#&gt; WHEN (`bill_length_mm` &lt; 43.2999992) THEN 0.425288886</span></span>
<span><span class='c'>#&gt; WHEN ((`bill_length_mm` &gt;= 43.2999992 OR (`bill_length_mm` IS NULL))) THEN (-0.398178101)</span></span>
<span><span class='c'>#&gt; END) + CASE</span></span>
<span><span class='c'>#&gt; WHEN (`bill_length_mm` &lt; 42.4000015) THEN 0.380251437</span></span>
<span><span class='c'>#&gt; WHEN ((`bill_length_mm` &gt;= 42.4000015 OR (`bill_length_mm` IS NULL))) THEN (-0.306771189)</span></span>
<span><span class='c'>#&gt; END) + CASE</span></span>
<span><span class='c'>#&gt; WHEN (`bill_length_mm` &lt; 44.4000015) THEN 0.286071777</span></span>
<span><span class='c'>#&gt; WHEN ((`bill_length_mm` &gt;= 44.4000015 OR (`bill_length_mm` IS NULL))) THEN (-0.330096036)</span></span>
<span><span class='c'>#&gt; END) + CASE</span></span>
<span><span class='c'>#&gt; WHEN (`flipper_length_mm` &lt; 203.0) THEN 0.209298179</span></span>
<span><span class='c'>#&gt; WHEN ((`flipper_length_mm` &gt;= 203.0 OR (`flipper_length_mm` IS NULL))) THEN (-0.348002464)</span></span>
<span><span class='c'>#&gt; END) + LOG(0.44186047 / (1.0 - 0.44186047))))) AS .pred_not_Adelie</span></span>
<span><span class='c'>#&gt; CASE</span></span>
<span><span class='c'>#&gt; WHEN (`.pred_Adelie` &gt; (0.5 + 0.2)) THEN 'Adelie'</span></span>
<span><span class='c'>#&gt; WHEN (`.pred_Adelie` &lt; (0.5 - 0.2)) THEN 'not_Adelie'</span></span>
<span><span class='c'>#&gt; ELSE '[EQ]'</span></span>
<span><span class='c'>#&gt; END AS .pred_class</span></span>
<span></span></code></pre>
</div>
<h2 id="acknowledgements">Acknowledgements
</h2>
<p>A big thank you to all the people who have contributed to orbital since the release of v0.4.0:</p>
<p><a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>
, <a href="https://github.com/frankiethull" target="_blank" rel="noopener">@frankiethull</a>
, <a href="https://github.com/jeroenjanssens" target="_blank" rel="noopener">@jeroenjanssens</a>
, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>
.</p>
]]></description>
      <enclosure url="https://posit-open-source.netlify.app/blog/tidyverse/orbital-0-4-0/thumbnail-wd.jpg" length="493114" type="image/jpeg" />
    </item>
    <item>
      <title>tidymodels &amp; xgboost</title>
      <link>https://posit-open-source.netlify.app/blog/tidyverse/2025/tidymodels-xgboost/</link>
      <pubDate>Mon, 15 Dec 2025 00:00:00 +0000</pubDate>
      <guid>https://posit-open-source.netlify.app/blog/tidyverse/2025/tidymodels-xgboost/</guid>
      <dc:creator>Emil Hvitfeldt</dc:creator><description><![CDATA[<p>The <a href="https://xgboost.readthedocs.io/en/stable/r_docs/R-package/docs/index.html" target="_blank" rel="noopener">xgboost</a>
 library has recently gotten a big CRAN release. Jumping from version 1.7.11.1 to 3.1.2.1. We at the tidymodels team have been following the development and have done our best to ensure that your experience is unaffected by this release.</p>
<p>In addition to all the new features and improvements that are now available for users relying on CRAN versions of packages, there are also a few breaking changes. Specifically between version 1.x and 2.x of the xgboost library. The xgboost team has kindly provided a <a href="https://xgboost.readthedocs.io/en/stable/R-package/migration_guide.html" target="_blank" rel="noopener">migration guide</a>
 for how to update your code if you are upgrading from before version 2.x.</p>
<p>If you are using xgboost purely through tidymodels via functions like <a href="https://parsnip.tidymodels.org/reference/boost_tree.html" target="_blank" rel="noopener"><code>parsnip::boost_tree()</code></a>
 and <a href="https://embed.tidymodels.org/reference/step_discretize_xgb.html" target="_blank" rel="noopener"><code>embed::step_discretize_xgb()</code></a>
, you should not need to change anything, as we have updated our packages to work with both the new and old versions of xgboost. If you are having any issues, please let us know by filing an issue for the affected package.</p>
<p>We look forward to integrating parsnip more deeply into these new changes, such as support for <a href="https://xgboost.readthedocs.io/en/stable/tutorials/categorical.html" target="_blank" rel="noopener">categorical predictors</a>
 and <a href="https://xgboost.readthedocs.io/en/stable/python/examples/quantile_regression.html#quantile-regression" target="_blank" rel="noopener">quantile regression</a>
.</p>
<p>Here are the package that we&rsquo;ve updated or helped the maintainers update</p>
<ul>
<li><a href="https://rstudio.github.io/bundle/dev/news/index.html#bundle-013" target="_blank" rel="noopener">bundle</a>
</li>
<li><a href="https://butcher.tidymodels.org/news/index.html#butcher-040" target="_blank" rel="noopener">butcher</a>
</li>
<li><a href="https://embed.tidymodels.org/news/index.html#embed-121" target="_blank" rel="noopener">embed</a>
</li>
<li><a href="https://github.com/tidymodels/lime/releases/tag/v0.5.4" target="_blank" rel="noopener">lime</a>
</li>
<li><a href="https://business-science.github.io/modeltime/" target="_blank" rel="noopener">modeltime</a>
</li>
<li><a href="https://orbital.tidymodels.org/news/index.html#orbital-041" target="_blank" rel="noopener">orbital</a>
</li>
<li><a href="https://parsnip.tidymodels.org/news/index.html#parsnip-140" target="_blank" rel="noopener">parsnip</a>
</li>
<li><a href="https://tidypredict.tidymodels.org/news/index.html#tidypredict-100" target="_blank" rel="noopener">tidypredict</a>
</li>
<li><a href="https://rstudio.github.io/vetiver-r/dev/news/index.html#vetiver-027" target="_blank" rel="noopener">vetiver</a>
</li>
<li><a href="https://github.com/holub008/xrf/releases/tag/0.3.0" target="_blank" rel="noopener">xf</a>
</li>
</ul>
]]></description>
      <enclosure url="https://posit-open-source.netlify.app/blog/tidyverse/2025/tidymodels-xgboost/thumbnail-wd.jpg" length="283800" type="image/jpeg" />
    </item>
    <item>
      <title>tidypredict 1.0.0</title>
      <link>https://posit-open-source.netlify.app/blog/tidyverse/2025/tidypredict-1-0-0/</link>
      <pubDate>Wed, 10 Dec 2025 00:00:00 +0000</pubDate>
      <guid>https://posit-open-source.netlify.app/blog/tidyverse/2025/tidypredict-1-0-0/</guid>
      <dc:creator>Emil Hvitfeldt</dc:creator><description><![CDATA[<!--
TODO:
* [x] Look over / edit the post's title in the yaml
* [x] Edit (or delete) the description; note this appears in the Twitter card
* [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html))
* [x] Find photo & update yaml metadata
* [x] Create `thumbnail-sq.jpg`; height and width should be equal
* [x] Create `thumbnail-wd.jpg`; width should be >5x height
* [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)
* [x] Add intro sentence, e.g. the standard tagline for the package
* [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html)
-->
<p>We&rsquo;re tickled pink to announce the release of version 1.0.0 of <a href="https://tidypredict.tidymodels.org/" target="_blank" rel="noopener">tidypredict</a>
. The main goal of tidypredict is to enable running predictions inside databases. It reads the model, extracts the components needed to calculate the prediction, and then creates an R formula that can be translated into SQL.</p>
<p>You can install them from CRAN with:</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"tidypredict"</span><span class='o'>)</span></span></code></pre>
</div>
<p>This blog post highlights the most important changes in this release, including faster computations for tree-based models, more efficient tree representations, glmnet model support, and a change in how random forests are handled. You can see a full list of changes in the <a href="https://tidypredict.tidymodels.org/news/index.html#tidypredict-100" target="_blank" rel="noopener">release notes</a>
.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://tidypredict.tidymodels.org'>tidypredict</a></span><span class='o'>)</span></span></code></pre>
</div>
<h2 id="improved-output-for-random-forest-models">Improved output for random forest models
</h2>
<p>The previous version of tidypredict <a href="https://tidypredict.tidymodels.org/reference/tidypredict_fit.html" target="_blank" rel="noopener"><code>tidypredict_fit()</code></a>
 would return a list of expressions, one for each tree, when applied to random forest models. This didn&rsquo;t align with what is returned by other types of models. In version 1.0.0, this has been changed to produce a single, combined expression that reflects how predictions should be made.</p>
<p>This is technically a breaking change, but one we believe is worthwhile, as it provides a more consistent output for <a href="https://tidypredict.tidymodels.org/reference/tidypredict_fit.html" target="_blank" rel="noopener"><code>tidypredict_fit()</code></a>
 and hides the technical details about how to combine trees from different packages.</p>
<h2 id="faster-parsing-of-trees">Faster parsing of trees
</h2>
<p>The parsing of xgboost, partykit, and ranger models should now be substantially faster than before. Examples have been shown to be 10 to 200 times faster. Please note that larger models, more trees, and deeper trees still take some time to parse.</p>
<h2 id="more-efficient-tree-expressions">More efficient tree expressions
</h2>
<p>All trees, whether they are a single tree or part of a collection of trees, such as in boosted trees or random forests, are encoded as <code>case_when()</code> statements by tidypredict. This means that the following tree.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>model</span> <span class='o'>&lt;-</span> <span class='nf'>partykit</span><span class='nf'>::</span><span class='nf'><a href='https://rdrr.io/pkg/partykit/man/ctree.html'>ctree</a></span><span class='o'>(</span><span class='nv'>mpg</span> <span class='o'>~</span> <span class='nv'>am</span> <span class='o'>+</span> <span class='nv'>cyl</span>, data <span class='o'>=</span> <span class='nv'>mtcars</span><span class='o'>)</span></span>
<span><span class='nv'>model</span></span>
<span><span class='c'>#&gt; </span></span>
<span><span class='c'>#&gt; Model formula:</span></span>
<span><span class='c'>#&gt; mpg ~ am + cyl</span></span>
<span><span class='c'>#&gt; </span></span>
<span><span class='c'>#&gt; Fitted party:</span></span>
<span><span class='c'>#&gt; [1] root</span></span>
<span><span class='c'>#&gt; |   [2] cyl &lt;= 4: 26.664 (n = 11, err = 203.4)</span></span>
<span><span class='c'>#&gt; |   [3] cyl &gt; 4</span></span>
<span><span class='c'>#&gt; |   |   [4] cyl &lt;= 6: 19.743 (n = 7, err = 12.7)</span></span>
<span><span class='c'>#&gt; |   |   [5] cyl &gt; 6: 15.100 (n = 14, err = 85.2)</span></span>
<span><span class='c'>#&gt; </span></span>
<span><span class='c'>#&gt; Number of inner nodes:    2</span></span>
<span><span class='c'>#&gt; Number of terminal nodes: 3</span></span>
<span></span></code></pre>
</div>
<p>Would be turned into the following <code>case_when()</code> statement.</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">case_when</span><span class="p">(</span>
</span></span><span class="line"><span class="cl"> <span class="n">cyl</span> <span class="o">&lt;=</span> <span class="m">4</span> <span class="o">~</span> <span class="m">26.6636363636364</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="n">cyl</span> <span class="o">&lt;=</span> <span class="m">6</span> <span class="o">&amp;</span> <span class="n">cyl</span> <span class="o">&gt;</span> <span class="m">4</span> <span class="o">~</span> <span class="m">19.7428571428571</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl"> <span class="n">cyl</span> <span class="o">&gt;</span> <span class="m">6</span> <span class="o">&amp;</span> <span class="n">cyl</span> <span class="o">&gt;</span> <span class="m">4</span> <span class="o">~=</span> <span class="m">15.1</span>
</span></span><span class="line"><span class="cl"><span class="p">)</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p>With this new update, we have taken advantage of the <code>.default</code> argument whenever possible, which should lead to faster predictions, as we no longer need to calculate redundant conditionals.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://tidypredict.tidymodels.org/reference/tidypredict_fit.html'>tidypredict_fit</a></span><span class='o'>(</span><span class='nv'>model</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; case_when(cyl &lt;= 4 ~ 26.6636363636364, cyl &lt;= 6 &amp; cyl &gt; 4 ~ 19.7428571428571, </span></span>
<span><span class='c'>#&gt;     .default = 15.1)</span></span>
<span></span></code></pre>
</div>
<h2 id="glmnet-support">Glmnet support
</h2>
<p>We now support the glmnet package. This package provides generalized linear models with lasso or elasticnet regularization.</p>
<p>The primary restriction when using a glmnet model with <code>tidypredict()</code> is that the model must have been fitted with the <code>lambda</code> argument set to a single value.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>model</span> <span class='o'>&lt;-</span> <span class='nf'>glmnet</span><span class='nf'>::</span><span class='nf'><a href='https://glmnet.stanford.edu/reference/glmnet.html'>glmnet</a></span><span class='o'>(</span><span class='nv'>mtcars</span><span class='o'>[</span>, <span class='o'>-</span><span class='m'>1</span><span class='o'>]</span>, <span class='nv'>mtcars</span><span class='o'>$</span><span class='nv'>mpg</span>, lambda <span class='o'>=</span> <span class='m'>0.01</span><span class='o'>)</span></span>
<span></span>
<span><span class='nf'><a href='https://tidypredict.tidymodels.org/reference/tidypredict_fit.html'>tidypredict_fit</a></span><span class='o'>(</span><span class='nv'>model</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; 13.0081464696679 + (cyl * -0.0773532164346008) + (disp * 0.00969507138358544) + </span></span>
<span><span class='c'>#&gt;     (hp * -0.0192462098902709) + (drat * 0.816753237688302) + </span></span>
<span><span class='c'>#&gt;     (wt * -3.41564341709663) + (qsec * 0.758580151032383) + (vs * </span></span>
<span><span class='c'>#&gt;     0.277874296242861) + (am * 2.47356523820533) + (gear * 0.645144527527598) + </span></span>
<span><span class='c'>#&gt;     (carb * -0.300886812079305)</span></span>
<span></span></code></pre>
</div>
<p><code>glmnet()</code> computes a collection of models using many sets of penalty values. This can be very efficient, but for tidypredict, we need to predict with a single penalty. Note how, as we increase the penalty, the extracted expression correctly removes terms with coefficients of <code>0</code> instead of leaving them as <code>(disp * 0)</code>.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>model</span> <span class='o'>&lt;-</span> <span class='nf'>glmnet</span><span class='nf'>::</span><span class='nf'><a href='https://glmnet.stanford.edu/reference/glmnet.html'>glmnet</a></span><span class='o'>(</span><span class='nv'>mtcars</span><span class='o'>[</span>, <span class='o'>-</span><span class='m'>1</span><span class='o'>]</span>, <span class='nv'>mtcars</span><span class='o'>$</span><span class='nv'>mpg</span>, lambda <span class='o'>=</span> <span class='m'>1</span><span class='o'>)</span></span>
<span></span>
<span><span class='nf'><a href='https://tidypredict.tidymodels.org/reference/tidypredict_fit.html'>tidypredict_fit</a></span><span class='o'>(</span><span class='nv'>model</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; 35.3137765116027 + (cyl * -0.871451193824228) + (hp * -0.0101173960249783) + </span></span>
<span><span class='c'>#&gt;     (wt * -2.59443677687505)</span></span>
<span></span></code></pre>
</div>
<p>tidypredict is used as the primary parser for models employed by the <a href="https://orbital.tidymodels.org/" target="_blank" rel="noopener">orbital</a>
 package. This means that all the changes seen in this post also take effect when using orbital with tidymodels workflows. Such as using <a href="https://parsnip.tidymodels.org/reference/linear_reg.html" target="_blank" rel="noopener"><code>parsnip::linear_reg()</code></a>
 with <code>engine = &quot;glmnet&quot;</code>.</p>
<h2 id="acknowledgements">Acknowledgements
</h2>
<p>A big thank you to all the folks who helped make this release happen: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>
, and <a href="https://github.com/jeroenjanssens" target="_blank" rel="noopener">@jeroenjanssens</a>
.</p>
]]></description>
      <enclosure url="https://posit-open-source.netlify.app/blog/tidyverse/2025/tidypredict-1-0-0/thumbnail-wd.jpg" length="315661" type="image/jpeg" />
    </item>
    <item>
      <title>Two New tidymodels Packages</title>
      <link>https://posit-open-source.netlify.app/blog/tidyverse/2025/two-new-tidymodels-packages/</link>
      <pubDate>Sat, 22 Nov 2025 00:00:00 +0000</pubDate>
      <guid>https://posit-open-source.netlify.app/blog/tidyverse/2025/two-new-tidymodels-packages/</guid>
      <dc:creator>Frances Lin</dc:creator>
      <dc:creator>Max Kuhn</dc:creator><description><![CDATA[<!--
TODO:
* [ ] Look over / edit the post's title in the yaml
* [ ] Edit (or delete) the description; note this appears in the Twitter card
* [ ] Pick category and tags (see existing with `hugodown::tidy_show_meta()`)
* [ ] Find photo & update yaml metadata
* [ ] Create `thumbnail-sq.jpg`; height and width should be equal
* [ ] Create `thumbnail-wd.jpg`; width should be >5x height
* [ ] `hugodown::use_tidy_thumbnails()`
* [ ] Add intro sentence, e.g. the standard tagline for the package
* [ ] `usethis::use_tidy_thanks()`
-->
<p>We&rsquo;re very chuffed to announce the release of <em>two</em> new modeling packages: filtro and important.</p>
<p>You can install them from CRAN with:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">install.packages</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="s">&#34;filtro&#34;</span><span class="p">,</span> <span class="s">&#34;important&#34;</span><span class="p">))</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p>This blog post will introduce both.</p>
<h2 id="filtro">filtro
</h2>
<p>Feature selection is an important step in building machine learning models that are robust and reliable. By keeping only the most relevant predictors, we can reduce overfitting, improve model performance, and speed up computation.</p>
<p><a href="https://filtro.tidymodels.org/" target="_blank" rel="noopener">filtro</a>
 is a low-level tidy tools designed for filter-based supervised feature selection. filtro makes it easy to score, rank, and select features using a wide range of statistical and model-based metrics. The scoring metrics include: p-values, correlation, random forest feature importance, information gain, and more.</p>
<p>With filtro, we can quickly rank the variables and select either the top proportion or the top number of features that best contribute to our model. It also supports <a href="https://scholar.google.com/scholar?hl=en&amp;as_sdt=0%2C7&amp;q=%22multi-parameter&#43;optimization%22&amp;btnG=" target="_blank" rel="noopener">multi-parameter optimization</a>
 via <a href="https://scholar.google.com/scholar?hl=en&amp;as_sdt=0%2C7&amp;q=%22desirability&#43;functions%22&amp;btnG=" target="_blank" rel="noopener">desirability functions</a>
. filtro is a standalone tool, but it integrates with other packages, allowing it to be used within the tidymodels workflows.</p>
<p>Currently, filtro implements a total of six filters. Like other elements of the framework, also filtro is extensible if you want to use a score we haven&rsquo;t implemented yet. You can read more on how to do this on <a href="https://www.tidymodels.org/learn/develop/filtro/" target="_blank" rel="noopener">tidymodels.org</a>
.</p>
<p>The available score class objects are:</p>
<pre tabindex="0"><code>##  [1] &#34;score_aov_fstat&#34;          &#34;score_aov_pval&#34;          
##  [3] &#34;score_cor_pearson&#34;        &#34;score_cor_spearman&#34;      
##  [5] &#34;score_gain_ratio&#34;         &#34;score_imp_rf&#34;            
##  [7] &#34;score_imp_rf_conditional&#34; &#34;score_imp_rf_oblique&#34;    
##  [9] &#34;score_info_gain&#34;          &#34;score_roc_auc&#34;           
## [11] &#34;score_sym_uncert&#34;         &#34;score_xtab_pval_chisq&#34;   
## [13] &#34;score_xtab_pval_fisher&#34;
</code></pre><p>Let&rsquo;s look at an example. <a href="https://www.google.com/search?q=Kuhn&#43;and&#43;Johnson&#43;Applied&#43;Predictive&#43;Modeling&#43;2013" target="_blank" rel="noopener">Kuhn and Johnson (2013)</a>
 described a data set where 176 samples were collected from a chemical manufacturing process. The goal is to predict process yield. Predictors are continuous, count, and categorical; some are correlated, and some contain missing values.</p>
<p>Let’s create an initial split of the data (which are in the modeldata package):</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">library</span><span class="p">(</span><span class="n">tidymodels</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nf">library</span><span class="p">(</span><span class="n">filtro</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nf">set.seed</span><span class="p">(</span><span class="m">1</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">yield_split</span> <span class="o">&lt;-</span> <span class="nf">initial_split</span><span class="p">(</span><span class="n">modeldata</span><span class="o">::</span><span class="n">chem_proc_yield</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">yield_split</span>
</span></span></code></pre></td></tr></table>
</div>
</div><pre tabindex="0"><code>## &lt;Training/Testing/Total&gt;
## &lt;132/44/176&gt;
</code></pre><div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">yield_train</span> <span class="o">&lt;-</span> <span class="nf">training</span><span class="p">(</span><span class="n">yield_split</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">yield_test</span> <span class="o">&lt;-</span> <span class="nf">testing</span><span class="p">(</span><span class="n">yield_split</span><span class="p">)</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p>We’d like to estimate the strength of the relationship between these 57 predictors and the process yield. We’ll quantify that in two ways. First is the old-fashioned <a href="https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient" target="_blank" rel="noopener">Spearman rank correlation</a>
 statistic. We can estimate these values and rank them by the absolute value of the correlations. We can also measure their value using a random forest variable importance. One quality of the predictors is that their values are correlated, so there may be some value in using an <em>oblique</em> random forest model. This creates a collection of tree-based models with splits that are linear combinations of the selected predictors.</p>
<p>To estimate the scores, we use the score objects contained in the package along with the <code>fit()</code> method:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span><span class="lnt">7
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">yield_rank_res</span> <span class="o">&lt;-</span>
</span></span><span class="line"><span class="cl">  <span class="n">score_cor_spearman</span> <span class="o">|&gt;</span>
</span></span><span class="line"><span class="cl">  <span class="nf">fit</span><span class="p">(</span><span class="n">yield</span> <span class="o">~</span> <span class="n">.,</span> <span class="n">data</span> <span class="o">=</span> <span class="n">yield_train</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># The object contains the statistics:</span>
</span></span><span class="line"><span class="cl"><span class="n">yield_rank_res</span><span class="o">@</span><span class="n">results</span> <span class="o">|&gt;</span> 
</span></span><span class="line"><span class="cl">  <span class="nf">arrange</span><span class="p">(</span><span class="nf">desc</span><span class="p">(</span><span class="nf">abs</span><span class="p">(</span><span class="n">score</span><span class="p">)))</span>
</span></span></code></pre></td></tr></table>
</div>
</div><pre tabindex="0"><code>## # A tibble: 57 × 4
##    name          score outcome predictor      
##    &lt;chr&gt;         &lt;dbl&gt; &lt;chr&gt;   &lt;chr&gt;          
##  1 cor_spearman  0.655 yield   man_proc_32    
##  2 cor_spearman -0.537 yield   man_proc_36    
##  3 cor_spearman  0.519 yield   bio_material_03
##  4 cor_spearman  0.502 yield   bio_material_06
##  5 cor_spearman  0.491 yield   man_proc_09    
##  6 cor_spearman  0.478 yield   bio_material_02
##  7 cor_spearman  0.446 yield   man_proc_33    
##  8 cor_spearman  0.421 yield   bio_material_12
##  9 cor_spearman -0.420 yield   man_proc_13    
## 10 cor_spearman  0.412 yield   bio_material_04
## # ℹ 47 more rows
</code></pre><p>To score via a random forest model, we only need to switch out the score object:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">yield_rf_res</span> <span class="o">&lt;-</span>
</span></span><span class="line"><span class="cl">  <span class="n">score_imp_rf_oblique</span> <span class="o">|&gt;</span>
</span></span><span class="line"><span class="cl">  <span class="nf">fit</span><span class="p">(</span><span class="n">yield</span> <span class="o">~</span> <span class="n">.,</span> <span class="n">data</span> <span class="o">=</span> <span class="n">yield_train</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">yield_rf_res</span><span class="o">@</span><span class="n">results</span> <span class="o">|&gt;</span> 
</span></span><span class="line"><span class="cl">  <span class="nf">arrange</span><span class="p">(</span><span class="nf">desc</span><span class="p">(</span><span class="nf">abs</span><span class="p">(</span><span class="n">score</span><span class="p">)))</span>
</span></span></code></pre></td></tr></table>
</div>
</div><pre tabindex="0"><code>## # A tibble: 57 × 4
##    name            score outcome predictor      
##    &lt;chr&gt;           &lt;dbl&gt; &lt;chr&gt;   &lt;chr&gt;          
##  1 imp_rf_oblique 0.128  yield   man_proc_32    
##  2 imp_rf_oblique 0.0697 yield   man_proc_36    
##  3 imp_rf_oblique 0.0670 yield   man_proc_17    
##  4 imp_rf_oblique 0.0644 yield   man_proc_09    
##  5 imp_rf_oblique 0.0612 yield   man_proc_13    
##  6 imp_rf_oblique 0.0446 yield   bio_material_03
##  7 imp_rf_oblique 0.0315 yield   man_proc_33    
##  8 imp_rf_oblique 0.0263 yield   man_proc_11    
##  9 imp_rf_oblique 0.0263 yield   bio_material_04
## 10 imp_rf_oblique 0.0262 yield   bio_material_06
## # ℹ 47 more rows
</code></pre><p>We should probably combine the scores and do a joint ranking. To combine the two sets of statistics:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">class_score_list</span> <span class="o">&lt;-</span> <span class="nf">list</span><span class="p">(</span><span class="n">yield_rank_res</span><span class="p">,</span> <span class="n">yield_rf_res</span><span class="p">)</span> <span class="o">|&gt;</span>
</span></span><span class="line"><span class="cl">  <span class="nf">bind_scores</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">class_score_list</span>
</span></span></code></pre></td></tr></table>
</div>
</div><pre tabindex="0"><code>## # A tibble: 57 × 4
##    outcome predictor       cor_spearman imp_rf_oblique
##    &lt;chr&gt;   &lt;chr&gt;                  &lt;dbl&gt;          &lt;dbl&gt;
##  1 yield   bio_material_01        0.404        0.0178 
##  2 yield   bio_material_02        0.478        0.0190 
##  3 yield   bio_material_03        0.519        0.0446 
##  4 yield   bio_material_04        0.412        0.0263 
##  5 yield   bio_material_05        0.116        0.00639
##  6 yield   bio_material_06        0.502        0.0262 
##  7 yield   bio_material_07       -0.101        0.00151
##  8 yield   bio_material_08        0.369        0.00714
##  9 yield   bio_material_09        0.109        0.0122 
## 10 yield   bio_material_10        0.214        0.00998
## # ℹ 47 more rows
</code></pre><p>We can accomplish a joint ranking via desirability functions. Here, we set goals for each score (i.e., maximize, minimize, etc.). The algorithm rescales their values and uses a geometric mean for an overall ranking. The desirability2 package has some nice tools for this. Here&rsquo;s how we do it:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span><span class="lnt">7
</span><span class="lnt">8
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">library</span><span class="p">(</span><span class="n">desirability2</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">class_score_list</span> <span class="o">|&gt;</span>
</span></span><span class="line"><span class="cl">  <span class="nf">show_best_desirability_prop</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">    <span class="nf">maximize</span><span class="p">(</span><span class="n">cor_spearman</span><span class="p">,</span> <span class="n">low</span> <span class="o">=</span> <span class="m">0.25</span><span class="p">,</span> <span class="n">high</span> <span class="o">=</span> <span class="m">1</span><span class="p">),</span>
</span></span><span class="line"><span class="cl">    <span class="nf">maximize</span><span class="p">(</span><span class="n">imp_rf_oblique</span><span class="p">,</span> <span class="n">scale</span> <span class="o">=</span> <span class="m">2</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">  <span class="p">)</span> <span class="o">|&gt;</span> 
</span></span><span class="line"><span class="cl">  <span class="nf">arrange</span><span class="p">(</span><span class="nf">desc</span><span class="p">(</span><span class="n">.d_overall</span><span class="p">))</span> <span class="o">|&gt;</span> 
</span></span><span class="line"><span class="cl">  <span class="nf">select</span><span class="p">(</span><span class="o">-</span><span class="nf">starts_with</span><span class="p">(</span><span class="s">&#34;.d_max_&#34;</span><span class="p">))</span>
</span></span></code></pre></td></tr></table>
</div>
</div><pre tabindex="0"><code>## # A tibble: 57 × 5
##    outcome predictor       cor_spearman imp_rf_oblique .d_overall
##    &lt;chr&gt;   &lt;chr&gt;                  &lt;dbl&gt;          &lt;dbl&gt;      &lt;dbl&gt;
##  1 yield   man_proc_32            0.655         0.128      0.735 
##  2 yield   man_proc_09            0.491         0.0644     0.291 
##  3 yield   bio_material_03        0.519         0.0446     0.217 
##  4 yield   man_proc_33            0.446         0.0315     0.134 
##  5 yield   bio_material_06        0.502         0.0262     0.129 
##  6 yield   bio_material_04        0.412         0.0263     0.104 
##  7 yield   bio_material_02        0.478         0.0190     0.0926
##  8 yield   bio_material_01        0.404         0.0178     0.0719
##  9 yield   bio_material_11        0.381         0.0194     0.0714
## 10 yield   man_proc_12            0.391         0.0183     0.0705
## # ℹ 47 more rows
</code></pre><p>Using the <code>scale = 2</code> option puts more weight on the random forest results.</p>
<p>It is unlikely that users will work with filtro directly; it is much better to incorporate these feature selection tools inside a model workflow (as we will see below).</p>
<p>Now that we&rsquo;ve looked at filtro, next up is the important package (yes, this is what we named it).</p>
<h2 id="important">important
</h2>
<p>The <a href="https://important.tidymodels.org/" target="_blank" rel="noopener">important</a>
 package does two things. First, it provides yet another tool for calculating random forest-like permutation importance scores. We highly value other packages that perform these same calculations (such as <a href="https://modeloriented.github.io/DALEX/" target="_blank" rel="noopener">DALEX</a>
 and <a href="https://github.com/koalaverse/vip/" target="_blank" rel="noopener">vip</a>
). Our rationale for creating another package for this is that we&rsquo;ve developed interfaces for censored regression, including dynamic metrics such as Brier scores or ROC curves that evaluate models at a specific time point. These dynamic methods aren&rsquo;t available in other packages, and the peculiarities of these metrics make them difficult to incorporate into existing frameworks.</p>
<p>Other niceties about importance scores are that any metric from the yardstick package can be used, and we have optimized parallel processing for the underlying computations. For the latter feature, we support the future and mirai packages for parallel processing.</p>
<p>important also has three recipe steps for supervised feature selection (similar to what Steven Pawley did with his <a href="https://stevenpawley.github.io/colino/" target="_blank" rel="noopener">colino package</a>
). The steps are:</p>
<ul>
<li><a href="https://important.tidymodels.org/reference/step_predictor_best.html" target="_blank" rel="noopener"><code>step_predictors_best()</code></a>
</li>
<li><a href="https://important.tidymodels.org/reference/step_predictor_retain.html" target="_blank" rel="noopener"><code>step_predictors_retain()</code></a>
</li>
<li><a href="https://important.tidymodels.org/reference/step_predictor_desirability.html" target="_blank" rel="noopener"><code>step_predictors_desirability()</code></a>
</li>
</ul>
<p>Let&rsquo;s look at the last one, which mirrors our analysis above.</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span><span class="lnt">16
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">library</span><span class="p">(</span><span class="n">important</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">goals</span> <span class="o">&lt;-</span>
</span></span><span class="line"><span class="cl">  <span class="nf">desirability</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">    <span class="nf">maximize</span><span class="p">(</span><span class="n">cor_spearman</span><span class="p">,</span> <span class="n">low</span> <span class="o">=</span> <span class="m">0.25</span><span class="p">,</span> <span class="n">high</span> <span class="o">=</span> <span class="m">1</span><span class="p">),</span>
</span></span><span class="line"><span class="cl">    <span class="nf">maximize</span><span class="p">(</span><span class="n">imp_rf_oblique</span><span class="p">,</span> <span class="n">scale</span> <span class="o">=</span> <span class="m">2</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">  <span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">yield_rec</span> <span class="o">&lt;-</span>
</span></span><span class="line"><span class="cl">  <span class="nf">recipe</span><span class="p">(</span><span class="n">yield</span> <span class="o">~</span> <span class="n">.,</span> <span class="n">data</span> <span class="o">=</span> <span class="n">yield_train</span><span class="p">)</span> <span class="o">|&gt;</span>
</span></span><span class="line"><span class="cl">  <span class="nf">step_impute_knn</span><span class="p">(</span><span class="nf">all_predictors</span><span class="p">(),</span> <span class="n">neighbors</span> <span class="o">=</span> <span class="m">10</span><span class="p">)</span> <span class="o">|&gt;</span>
</span></span><span class="line"><span class="cl">  <span class="nf">step_predictor_desirability</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">    <span class="nf">all_predictors</span><span class="p">(),</span>
</span></span><span class="line"><span class="cl">    <span class="n">score</span> <span class="o">=</span> <span class="n">goals</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="n">prop_terms</span> <span class="o">=</span> <span class="m">1</span> <span class="o">/</span> <span class="m">10</span>
</span></span><span class="line"><span class="cl">  <span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">yield_rec</span>
</span></span></code></pre></td></tr></table>
</div>
</div><pre tabindex="0"><code>## 
</code></pre><pre tabindex="0"><code>## ── Recipe ───────────────────────────────────────────────────────
</code></pre><pre tabindex="0"><code>## 
</code></pre><pre tabindex="0"><code>## ── Inputs
</code></pre><pre tabindex="0"><code>## Number of variables by role
</code></pre><pre tabindex="0"><code>## outcome:    1
## predictor: 57
</code></pre><pre tabindex="0"><code>## 
</code></pre><pre tabindex="0"><code>## ── Operations
</code></pre><pre tabindex="0"><code>## • K-nearest neighbor imputation for: all_predictors()
</code></pre><pre tabindex="0"><code>## • Feature selection via desirability functions (`cor_spearman`
##   and `imp_rf_oblique`) on: all_predictors()
</code></pre><p>When combined with a specific model, we can tune the number of neighbors as well as the proportion of predictors retained (10% above).</p>
<p><code>prep()</code> will do the appropriate estimation steps:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">trained_rec</span> <span class="o">&lt;-</span> <span class="nf">prep</span><span class="p">(</span><span class="n">yield_rec</span><span class="p">)</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p>Which 10% of the predictors were retained? The <code>tidy()</code> method can list the scores and their rankings:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">scores</span> <span class="o">&lt;-</span> <span class="nf">tidy</span><span class="p">(</span><span class="n">trained_rec</span><span class="p">,</span> <span class="n">number</span> <span class="o">=</span> <span class="m">2</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">scores</span> <span class="o">|&gt;</span>
</span></span><span class="line"><span class="cl">  <span class="nf">arrange</span><span class="p">(</span><span class="nf">desc</span><span class="p">(</span><span class="n">.d_overall</span><span class="p">))</span> <span class="o">|&gt;</span>
</span></span><span class="line"><span class="cl">  <span class="nf">select</span><span class="p">(</span><span class="o">-</span><span class="nf">starts_with</span><span class="p">(</span><span class="s">&#34;.d_max_&#34;</span><span class="p">),</span> <span class="o">-</span><span class="n">id</span><span class="p">)</span>
</span></span></code></pre></td></tr></table>
</div>
</div><pre tabindex="0"><code>## # A tibble: 57 × 5
##    terms           removed cor_spearman imp_rf_oblique .d_overall
##    &lt;chr&gt;           &lt;lgl&gt;          &lt;dbl&gt;          &lt;dbl&gt;      &lt;dbl&gt;
##  1 man_proc_32     FALSE          0.655         0.128       0.735
##  2 man_proc_36     FALSE         -0.530         0.0668      0.325
##  3 man_proc_09     FALSE          0.491         0.0673      0.304
##  4 man_proc_13     FALSE         -0.420         0.0725      0.275
##  5 bio_material_03 FALSE          0.519         0.0517      0.249
##  6 bio_material_06 TRUE           0.502         0.0445      0.210
##  7 man_proc_17     TRUE          -0.303         0.0749      0.158
##  8 man_proc_33     TRUE           0.443         0.0374      0.156
##  9 bio_material_02 TRUE           0.478         0.0330      0.151
## 10 bio_material_04 TRUE           0.412         0.0347      0.133
## # ℹ 47 more rows
</code></pre><div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="c1"># What percentage was removed?</span>
</span></span><span class="line"><span class="cl"><span class="nf">mean</span><span class="p">(</span><span class="n">scores</span><span class="o">$</span><span class="n">removed</span> <span class="o">*</span> <span class="m">100</span><span class="p">)</span>
</span></span></code></pre></td></tr></table>
</div>
</div><pre tabindex="0"><code>## [1] 91.22807
</code></pre><h2 id="summary">Summary
</h2>
<p>Both filtro and important satisfy a feature for tidymodels that has been highly ranked in our user surveys: supervised feature selection. filtro contains the underlying framework and important provides recipe steps that can be used in a workflow.</p>
]]></description>
      <enclosure url="https://posit-open-source.netlify.app/blog/tidyverse/2025/two-new-tidymodels-packages/thumbnail-wd.jpg" length="97105" type="image/jpeg" />
    </item>
    <item>
      <title>Q3 2025 tidymodels digest</title>
      <link>https://posit-open-source.netlify.app/blog/tidyverse/2025/tidymodels-2025-q3/</link>
      <pubDate>Tue, 18 Nov 2025 00:00:00 +0000</pubDate>
      <guid>https://posit-open-source.netlify.app/blog/tidyverse/2025/tidymodels-2025-q3/</guid>
      <dc:creator>Emil Hvitfeldt</dc:creator><description><![CDATA[<!--
TODO:
* [x] Look over / edit the post's title in the yaml
* [x] Edit (or delete) the description; note this appears in the Twitter card
* [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html))
* [x] Find photo & update yaml metadata
* [x] Create `thumbnail-sq.jpg`; height and width should be equal
* [x] Create `thumbnail-wd.jpg`; width should be >5x height
* [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)
* [x] Add intro sentence, e.g. the standard tagline for the package
* [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html)
-->
<p>The tidymodels framework is a collection of R packages for modeling and machine learning using tidyverse principles.</p>
<p>Since the beginning of 2021, we have been publishing quarterly updates here on the tidyverse blog summarizing what&rsquo;s new in the tidymodels ecosystem. The purpose of these regular posts is to share useful new features and any updates you may have missed. You can check out the tidymodels tag to find all tidymodels blog posts here, including our roundup posts as well as those that are more focused.</p>
<p>Since our last update we have had some larger releases that you can read about in these posts.</p>
<ul>
<li><a href="https://tidyverse.org/blog/2025/11/tune-2/" target="_blank" rel="noopener">tune 2.0.0</a>
</li>
<li><a href="https://tidyverse.org/blog/2025/04/recipes-1-3-0/" target="_blank" rel="noopener">recipes 1.3.0</a>
</li>
<li><a href="https://tidyverse.org/blog/2025/04/rsample-1-3-0/" target="_blank" rel="noopener">rsample 1.3.0</a>
</li>
<li><a href="https://tidyverse.org/blog/2025/03/tidymodels-sparsity/" target="_blank" rel="noopener">improved sparsity support in tidymodels</a>
</li>
</ul>
<p>The post will update, you on which packages have changed and the improvements you should know about that haven&rsquo;t been covered in the above posts.</p>
<p>Here&rsquo;s a list of the packages and their News sections:</p>
<ul>
<li><a href="https://dials.tidymodels.org/news/index.html" target="_blank" rel="noopener">dials</a>
</li>
<li><a href="https://parsnip.tidymodels.org/news/index.html" target="_blank" rel="noopener">parsnip</a>
</li>
<li><a href="https://rsample.tidymodels.org/news/index.html" target="_blank" rel="noopener">rsample</a>
</li>
<li><a href="https://recipes.tidymodels.org/news/index.html" target="_blank" rel="noopener">recipes</a>
</li>
<li><a href="https://probably.tidymodels.org/news/index.html" target="_blank" rel="noopener">probably</a>
</li>
<li><a href="https://brulee.tidymodels.org/news/index.html" target="_blank" rel="noopener">brulee</a>
</li>
</ul>
<p>Let&rsquo;s look at a few specific updates.</p>
<h2 id="quiet-linear-svm-models">Quiet linear svm models
</h2>
<p>When you used to fit a linear SVM model, you would get a message that you were not able to avoid.</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span><span class="lnt">7
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">library</span><span class="p">(</span><span class="n">parsnip</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nf">library</span><span class="p">(</span><span class="n">modeldata</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">res</span> <span class="o">&lt;-</span> 
</span></span><span class="line"><span class="cl">  <span class="nf">svm_linear</span><span class="p">(</span><span class="n">mode</span> <span class="o">=</span> <span class="s">&#34;classification&#34;</span><span class="p">,</span> <span class="n">engine</span> <span class="o">=</span> <span class="s">&#34;kernlab&#34;</span><span class="p">)</span> <span class="o">|&gt;</span> 
</span></span><span class="line"><span class="cl">  <span class="nf">fit</span><span class="p">(</span><span class="n">Class</span> <span class="o">~</span> <span class="n">.,</span> <span class="n">data</span> <span class="o">=</span> <span class="n">two_class_dat</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt;  Setting default kernel parameters</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p>This message by itself was not that useful and was unable to turn off in a reasonable way. We have silenced this message to hopefully alleviate some of the noise that came from using this method.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://github.com/tidymodels/parsnip'>parsnip</a></span><span class='o'>)</span></span>
<span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://modeldata.tidymodels.org'>modeldata</a></span><span class='o'>)</span></span>
<span><span class='c'>#&gt; </span></span>
<span><span class='c'>#&gt; Attaching package: 'modeldata'</span></span>
<span></span><span><span class='c'>#&gt; The following object is masked from 'package:datasets':</span></span>
<span><span class='c'>#&gt; </span></span>
<span><span class='c'>#&gt;     penguins</span></span>
<span></span><span></span>
<span><span class='nv'>res</span> <span class='o'>&lt;-</span> </span>
<span>  <span class='nf'><a href='https://parsnip.tidymodels.org/reference/svm_linear.html'>svm_linear</a></span><span class='o'>(</span>mode <span class='o'>=</span> <span class='s'>"classification"</span>, engine <span class='o'>=</span> <span class='s'>"kernlab"</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span>
<span>  <span class='nf'><a href='https://generics.r-lib.org/reference/fit.html'>fit</a></span><span class='o'>(</span><span class='nv'>Class</span> <span class='o'>~</span> <span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>two_class_dat</span><span class='o'>)</span></span>
<span><span class='nv'>res</span></span>
<span><span class='c'>#&gt; parsnip model object</span></span>
<span><span class='c'>#&gt; </span></span>
<span><span class='c'>#&gt; Support Vector Machine object of class "ksvm" </span></span>
<span><span class='c'>#&gt; </span></span>
<span><span class='c'>#&gt; SV type: C-svc  (classification) </span></span>
<span><span class='c'>#&gt;  parameter : cost C = 1 </span></span>
<span><span class='c'>#&gt; </span></span>
<span><span class='c'>#&gt; Linear (vanilla) kernel function. </span></span>
<span><span class='c'>#&gt; </span></span>
<span><span class='c'>#&gt; Number of Support Vectors : 361 </span></span>
<span><span class='c'>#&gt; </span></span>
<span><span class='c'>#&gt; Objective Function Value : -357.1487 </span></span>
<span><span class='c'>#&gt; Training error : 0.178255 </span></span>
<span><span class='c'>#&gt; Probability model included.</span></span>
<span></span></code></pre>
</div>
<h2 id="fewer-numeric-overflow-issues-in-brulee">Fewer numeric overflow issues in brulee
</h2>
<p>The brulee package has been improved to try to help avoid numeric overflow in the loss functions. The following things have been done to help deal with this type of issue.</p>
<ul>
<li>
<p>Starting values were transitioned to using Gaussian distribution (instead of uniform) with a smaller standard deviation.</p>
</li>
<li>
<p>The results always contain the initial results to use as a fallback if there is overflow during the first epoch.</p>
</li>
<li>
<p><code>brulee_mlp()</code> has two additional parameters, <code>grad_value_clip</code> and <code>grad_value_clip</code>, that prevent issues.</p>
</li>
<li>
<p>The warning was changed to &ldquo;Early stopping occurred at epoch {X} due to numerical overflow of the loss function.&rdquo;</p>
</li>
</ul>
<h2 id="additional-torch-optimizers-in-brulee">Additional torch optimizers in brulee
</h2>
<p>Several additional optimizers have been added: <code>&quot;ADAMw&quot;</code>, <code>&quot;Adadelta&quot;</code>, <code>&quot;Adagrad&quot;</code>, and <code>&quot;RMSprop&quot;</code>. Previously, the options were <code>&quot;SGD&quot;</code> and <code>LBFGS&quot;</code>. ## Acknowledgements</p>
<p>We want to sincerely thank everyone who contributed to these packages since their previous versions:</p>
<ul>
<li>dials: <a href="https://github.com/brendad8" target="_blank" rel="noopener">@brendad8</a>
, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>
, <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>
, and <a href="https://github.com/Wander03" target="_blank" rel="noopener">@Wander03</a>
.</li>
<li>parsnip: <a href="https://github.com/chillerb" target="_blank" rel="noopener">@chillerb</a>
, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>
, <a href="https://github.com/jmgirard" target="_blank" rel="noopener">@jmgirard</a>
, <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>
, and <a href="https://github.com/ZWael" target="_blank" rel="noopener">@ZWael</a>
.</li>
<li>rsample: <a href="https://github.com/abichat" target="_blank" rel="noopener">@abichat</a>
, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>
, <a href="https://github.com/mkiang" target="_blank" rel="noopener">@mkiang</a>
, and <a href="https://github.com/vincentarelbundock" target="_blank" rel="noopener">@vincentarelbundock</a>
.</li>
<li>recipes: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>
, <a href="https://github.com/SimonDedman" target="_blank" rel="noopener">@SimonDedman</a>
, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>
.</li>
<li>probably: <a href="https://github.com/abichat" target="_blank" rel="noopener">@abichat</a>
, <a href="https://github.com/ayueme" target="_blank" rel="noopener">@ayueme</a>
, <a href="https://github.com/dchiu911" target="_blank" rel="noopener">@dchiu911</a>
, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>
, <a href="https://github.com/frankiethull" target="_blank" rel="noopener">@frankiethull</a>
, <a href="https://github.com/gaborcsardi" target="_blank" rel="noopener">@gaborcsardi</a>
, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>
, <a href="https://github.com/Jeffrothschild" target="_blank" rel="noopener">@Jeffrothschild</a>
, <a href="https://github.com/jgaeb" target="_blank" rel="noopener">@jgaeb</a>
, <a href="https://github.com/jrwinget" target="_blank" rel="noopener">@jrwinget</a>
, <a href="https://github.com/mark-burdon" target="_blank" rel="noopener">@mark-burdon</a>
, <a href="https://github.com/martinhulin" target="_blank" rel="noopener">@martinhulin</a>
, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>
, <a href="https://github.com/teunbrand" target="_blank" rel="noopener">@teunbrand</a>
, <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>
, <a href="https://github.com/wjakethompson" target="_blank" rel="noopener">@wjakethompson</a>
, and <a href="https://github.com/yellowbridge" target="_blank" rel="noopener">@yellowbridge</a>
.</li>
<li>brulee: <a href="https://github.com/genec1" target="_blank" rel="noopener">@genec1</a>
, <a href="https://github.com/talegari" target="_blank" rel="noopener">@talegari</a>
, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>
.</li>
</ul>
]]></description>
      <enclosure url="https://posit-open-source.netlify.app/blog/tidyverse/2025/tidymodels-2025-q3/thumbnail-wd.jpg" length="838271" type="image/jpeg" />
    </item>
    <item>
      <title>tune version 2.0.0</title>
      <link>https://posit-open-source.netlify.app/blog/tidyverse/2025/tune-2/</link>
      <pubDate>Wed, 05 Nov 2025 00:00:00 +0000</pubDate>
      <guid>https://posit-open-source.netlify.app/blog/tidyverse/2025/tune-2/</guid>
      <dc:creator>Max Kuhn</dc:creator>
      <dc:creator>Simon Couch</dc:creator>
      <dc:creator>Emil Hvitfeldt</dc:creator>
      <dc:creator>Hannah Frick</dc:creator><description><![CDATA[<p>We&rsquo;re very chuffed to announce the release of <a href="https://tune.tidymodels.org" target="_blank" rel="noopener">tune</a>
 <strong>2.0.0</strong>. tune is a package that can be used to resample models and/or optimize their tuning parameters</p>
<p>You can install it from CRAN with:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">install.packages</span><span class="p">(</span><span class="s">&#34;tune&#34;</span><span class="p">)</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p>This blog post will describe the two major updates to the package. You can see a full list of changes in the <a href="https://tune.tidymodels.org/news/index.html#tune-200" target="_blank" rel="noopener">release notes</a>
.</p>
<p>Those two big improvements to the package: new parallel processing features and postprocessing.</p>
<h2 id="using-future-or-mirai-for-parallel-processing">Using future or mirai for parallel processing
</h2>
<p><a href="https://www.tidyverse.org/blog/2024/04/tune-1-2-0/#modernized-support-for-parallel-processing" target="_blank" rel="noopener">Historically</a>
, we&rsquo;ve used the foreach package to run calculations in parallel. Sadly, that package is no longer under active development. We&rsquo;ve been <a href="https://tidyverse.org/blog/2024/04/tune-1-2-0/#modernized-support-for-parallel-processing" target="_blank" rel="noopener">progressively moving away</a>
 from it, and as of this version, it is deprecated. In its place, we&rsquo;ve added functionality for the <a href="https://future.futureverse.org/" target="_blank" rel="noopener">future</a>
 and <a href="https://mirai.r-lib.org/" target="_blank" rel="noopener">mirai</a>
 packages.</p>
<p>Previously, you would load a foreach parallel backend package, such as doParallel, doMC, or doFuture, and then register it. For  example:</p>
<pre tabindex="0"><code>library(doParallel)
cl &lt;- makePSOCKcluster()
registerDoParallel(cl)
</code></pre><p>Instead, you can use the future package via:</p>
<pre tabindex="0"><code>library(future)
plan(&#34;multisession&#34;)
</code></pre><p>or the mirai package by using</p>
<pre tabindex="0"><code>library(mirai)
daemons(num_cores)
</code></pre><p>Each of these is configurable to run in various ways, such as on remote servers.</p>
<p><a href="https://tune.tidymodels.org/articles/extras/optimizations.html#foreach-legacy" target="_blank" rel="noopener">tidymodels.org</a>
 and the tune <a href="https://tune.tidymodels.org/reference/parallelism.html" target="_blank" rel="noopener">pkgdown site</a>
 have more information to help users switch away from foreach.</p>
<h2 id="tuning-your-postprocessor">Tuning your postprocessor
</h2>
<p>A postprocessor is an operation that modifies model predictions.  For example, if your classifier can separate classes but its probability estimates are not accurate enough, you can add a <em>calibrator</em> operation that can attempt to adjust those probability estimates. Another good example is for binary classifiers, where the default threshold for classifying a prediction as an event can be adjusted based on its corresponding probability estimate.</p>
<p>Currently, we&rsquo;ve enabled postprocessing using the <a href="https://www.tidyverse.org/blog/2024/10/postprocessing-preview/" target="_blank" rel="noopener">tailor package</a>
. The operations that are currently available:</p>
<ul>
<li><code>adjust_numeric_calibration()</code>: Estimate and apply a calibration model for regression problems.</li>
<li><code>adjust_numeric_range()</code>: Truncate the range of predictions.</li>
<li><code>adjust_probability_calibration()</code>: Estimate and apply a calibration model for classification problems.</li>
<li><code>adjust_probability_threshold()</code>: Covert binary class probabilities to hard class predictions using different thresholds.</li>
<li><code>adjust_equivocal_zone()</code>: <em>Decline</em> to predict a sample if its strongest class probability is low.</li>
<li><code>adjust_predictions_custom()</code>: A general <code>mutate()</code>-like adjustment.</li>
</ul>
<p>If the operations have arguments, these can be tuned in the same way as the preprocessors (e.g., a recipe) or the supervised model. For example, let&rsquo;s tune the probability threshold for a random forest classifier.</p>
<p>We&rsquo;ll simulate some data with a class imbalance:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">library</span><span class="p">(</span><span class="n">tidymodels</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nf">set.seed</span><span class="p">(</span><span class="m">296</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">sim_data</span> <span class="o">&lt;-</span> <span class="nf">sim_classification</span><span class="p">(</span><span class="m">2000</span><span class="p">,</span> <span class="n">intercept</span> <span class="o">=</span> <span class="m">-12</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">sim_data</span> <span class="o">|&gt;</span> <span class="nf">count</span><span class="p">(</span><span class="n">class</span><span class="p">)</span>
</span></span></code></pre></td></tr></table>
</div>
</div><pre tabindex="0"><code>## # A tibble: 2 × 2
##   class       n
##   &lt;fct&gt;   &lt;int&gt;
## 1 class_1   234
## 2 class_2  1766
</code></pre><p>We&rsquo;ll resampling them via 10-fold cross-validation:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">sim_rs</span> <span class="o">&lt;-</span> <span class="nf">vfold_cv</span><span class="p">(</span><span class="n">sim_data</span><span class="p">,</span> <span class="n">strata</span> <span class="o">=</span> <span class="n">class</span><span class="p">)</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p>We define a tailor object that tags the class probability threshold for optimization:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">tlr_spec</span> <span class="o">&lt;-</span> 
</span></span><span class="line"><span class="cl">  <span class="nf">tailor</span><span class="p">()</span> <span class="o">|&gt;</span> 
</span></span><span class="line"><span class="cl">  <span class="nf">adjust_probability_threshold</span><span class="p">(</span><span class="n">threshold</span> <span class="o">=</span> <span class="nf">tune</span><span class="p">())</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p>We also specify a random forest that uses its default tuning parameters:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">rf_spec</span> <span class="o">&lt;-</span> <span class="nf">rand_forest</span><span class="p">(</span><span class="n">mode</span> <span class="o">=</span> <span class="s">&#34;classification&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">rf_thrsh_wflow</span> <span class="o">&lt;-</span> <span class="nf">workflow</span><span class="p">(</span><span class="n">class</span> <span class="o">~</span> <span class="n">.,</span> <span class="n">rf_spec</span><span class="p">,</span> <span class="n">tlr_spec</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">rf_thrsh_wflow</span>
</span></span></code></pre></td></tr></table>
</div>
</div><pre tabindex="0"><code>## ══ Workflow ════════════════════════════════════════════════════════════
## Preprocessor: Formula
## Model: rand_forest()
## Postprocessor: tailor
## 
## ── Preprocessor ────────────────────────────────────────────────────────
## class ~ .
## 
## ── Model ───────────────────────────────────────────────────────────────
## Random Forest Model Specification (classification)
## 
## Computational engine: ranger 
## 
## 
## ── Postprocessor ───────────────────────────────────────────────────────
</code></pre><pre tabindex="0"><code>## 
</code></pre><pre tabindex="0"><code>## ── tailor ──────────────────────────────────────────────────────────────
</code></pre><pre tabindex="0"><code>## A binary postprocessor with 1 adjustment:
</code></pre><pre tabindex="0"><code>## 
</code></pre><pre tabindex="0"><code>## • Adjust probability threshold to optimized value.
</code></pre><pre tabindex="0"><code>## NA
## NA
## NA
</code></pre><p>With a class imbalance, the default 50% threshold yields high specificity but low sensitivity. When we alter the threshold, those numbers will change, and we can select the best trade-off for our application. Let&rsquo;s tune the workflow:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">cls_mtr</span> <span class="o">&lt;-</span> <span class="nf">metric_set</span><span class="p">(</span><span class="n">roc_auc</span><span class="p">,</span> <span class="n">sensitivity</span><span class="p">,</span> <span class="n">specificity</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># To run all resamples in parallel:</span>
</span></span><span class="line"><span class="cl"><span class="n">mirai</span><span class="o">::</span><span class="nf">daemons</span><span class="p">(</span><span class="m">10</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nf">set.seed</span><span class="p">(</span><span class="m">985</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">rf_thrsh_res</span> <span class="o">&lt;-</span> 
</span></span><span class="line"><span class="cl">  <span class="n">rf_thrsh_wflow</span> <span class="o">|&gt;</span> 
</span></span><span class="line"><span class="cl">  <span class="nf">tune_grid</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">    <span class="n">resamples</span> <span class="o">=</span> <span class="n">sim_rs</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="n">grid</span> <span class="o">=</span> <span class="nf">tibble</span><span class="p">(</span><span class="n">threshold</span> <span class="o">=</span> <span class="nf">seq</span><span class="p">(</span><span class="m">0</span><span class="p">,</span> <span class="m">0.6</span><span class="p">,</span> <span class="n">by</span> <span class="o">=</span> <span class="m">0.01</span><span class="p">)),</span>
</span></span><span class="line"><span class="cl">    <span class="n">metrics</span> <span class="o">=</span> <span class="n">cls_mtr</span>
</span></span><span class="line"><span class="cl">  <span class="p">)</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p>Let&rsquo;s visualize the results:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">autoplot</span><span class="p">(</span><span class="n">rf_thrsh_res</span><span class="p">)</span> <span class="o">+</span> <span class="nf">lims</span><span class="p">(</span><span class="n">y</span> <span class="o">=</span> <span class="m">0</span><span class="o">:</span><span class="m">1</span><span class="p">)</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p><div class="not-prose"><figure>
    <img class="h-auto max-w-full rounded-lg"
      src="https://posit-open-source.netlify.app/blog/tidyverse/2025/tune-2/figure/autoplot-1.png"
      alt="plot of chunk autoplot" 
      loading="lazy"
    >
  </figure></div>
</p>
<p>We can see that we can improve sensitivity by <em>reducing</em> the threshold. The rate of decay in specificity is slow compared to the gain in sensitivity until thresholds less than 10% are used. The Brier score is constant over the threshold since it only uses the estimated class probabilities, which are unaffected by the threshold.</p>
<p>We&rsquo;ve taken great pains to avoid redundant calculations. In this example, for each resample, a single random forest model is trained, and then the postprocessing grid is evaluated. This <em>conditional execution</em> strategy is used to fit the fewest possible preprocessors, models, and postprocessors.</p>
<p>For this classification example, recent updates to the <a href="https://desirability2.tidymodels.org/#using-with-the-tune-package" target="_blank" rel="noopener">desirability2</a>
 package can enable you to jointly find the best sensitivity/specificity trade-off using the threshold parameter <em>and</em> model calibration/separation using other parameters.</p>
<p>We&rsquo;ll add more examples and tutorials to tidymodels.org to showcase what we can do with postprocessing.</p>
<h2 id="whats-next">What&rsquo;s next
</h2>
<p>This had been a race towards posit::conf(2025). Our focus had to be on the two big features for this release (since we taught workshops that use them). There are a few other relatively minor issues to address as the year closes.</p>
<p>One is to swap the package that we currently use for Gaussian Processes in Bayesian optimization from the GPfit package to the <a href="https://github.com/CollinErickson/GauPro" target="_blank" rel="noopener">GauPro</a>
 package. The former is not actively supported, and the latter has a few features that we&rsquo;d love to have. Specifically, better kernel methods for non-numeric tuning parameters (e.g., the type of activation function used in neural networks). Hopefully, we&rsquo;ll have another planned release before the end of the year.</p>
<p>Another near-future development goal is to have comprehensive integration for quantile regression models. We&rsquo;ve added a few parsnip engines already and will expand the support in yardstick and tune.</p>
<h2 id="acknowledgements">Acknowledgements
</h2>
<p>We&rsquo;d like to thanks everyone who contributed since the previous version: <a href="https://github.com/3styleJam" target="_blank" rel="noopener">@3styleJam</a>
, <a href="https://github.com/Diyar0D" target="_blank" rel="noopener">@Diyar0D</a>
, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>
, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>
, <a href="https://github.com/MatthieuStigler" target="_blank" rel="noopener">@MatthieuStigler</a>
, <a href="https://github.com/MattJEM" target="_blank" rel="noopener">@MattJEM</a>
, <a href="https://github.com/mthulin" target="_blank" rel="noopener">@mthulin</a>
, <a href="https://github.com/tjburch" target="_blank" rel="noopener">@tjburch</a>
, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>
.</p>
]]></description>
      <enclosure url="https://posit-open-source.netlify.app/blog/tidyverse/2025/tune-2/thumbnail-wd.jpg" length="359031" type="image/jpeg" />
    </item>
    <item>
      <title>mall 0.2.0</title>
      <link>https://posit-open-source.netlify.app/blog/ai/edgarmall02/</link>
      <pubDate>Tue, 19 Aug 2025 00:00:00 +0000</pubDate>
      <guid>https://posit-open-source.netlify.app/blog/ai/edgarmall02/</guid>
      <dc:creator>Edgar Ruiz</dc:creator><description><![CDATA[<p><a href="https://mlverse.github.io/mall/" target="_blank" rel="noopener">mall</a>
 uses Large Language Models (LLM) to run
Natural Language Processing (NLP) operations against your data. This package
is available for both R, and Python. Version 0.2.0 has been released to
<a href="https://cran.r-project.org/web/packages/mall/index.html" target="_blank" rel="noopener">CRAN</a>
 and
<a href="https://pypi.org/project/mlverse-mall/" target="_blank" rel="noopener">PyPi</a>
 respectively.</p>
<p>In R, you can install the latest version with:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">install.packages</span><span class="p">(</span><span class="s">&#34;mall&#34;</span><span class="p">)</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p>In Python, with:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="n">pip</span> <span class="n">install</span> <span class="n">mlverse</span><span class="o">-</span><span class="n">mall</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p>This release expands the number of LLM providers you can use with <code>mall</code>. Also,
in Python it introduces the option to run the NLP operations over string vectors,
and in R, it enables support for &lsquo;parallelized&rsquo; requests.</p>
<p>It is also very exciting to announce a brand new cheatsheet for this package. It
is available in print (PDF) and HTML format!</p>
<h2 id="more-llm-providers">More LLM providers
</h2>
<p>The biggest highlight of this release is the the ability to use external LLM
providers such as <a href="https://openai.com/" target="_blank" rel="noopener">OpenAI</a>
, <a href="https://gemini.google.com/" target="_blank" rel="noopener">Gemini</a>

and <a href="https://www.anthropic.com/" target="_blank" rel="noopener">Anthropic</a>
. Instead of writing integration for
each provider one by one, <code>mall</code> uses specialized integration packages to act as
intermediates.</p>
<p>In R, <code>mall</code> uses the <a href="https://ellmer.tidyverse.org/index.html" target="_blank" rel="noopener"><code>ellmer</code></a>
 package
to integrate with <a href="https://ellmer.tidyverse.org/reference/index.html#chatbots" target="_blank" rel="noopener">a variety of LLM providers</a>
.
To access the new feature, first create a chat connection, and then pass that
connection to <code>llm_use()</code>. Here is an example of connecting and using OpenAI:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">install.packages</span><span class="p">(</span><span class="s">&#34;ellmer&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nf">library</span><span class="p">(</span><span class="n">mall</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nf">library</span><span class="p">(</span><span class="n">ellmer</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">chat</span> <span class="o">&lt;-</span> <span class="nf">chat_openai</span><span class="p">()</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; Using model = &#34;gpt-4.1&#34;.</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nf">llm_use</span><span class="p">(</span><span class="n">chat</span><span class="p">,</span> <span class="n">.cache</span> <span class="o">=</span> <span class="s">&#34;_my_cache&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; </span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; ── mall session object </span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; Backend: ellmerLLM session: model:gpt-4.1R session: cache_folder:_my_cache</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p>In Python, <code>mall</code> uses <a href="https://posit-dev.github.io/chatlas/" target="_blank" rel="noopener"><code>chatlas</code></a>
 as
the integration point with the LLM. <code>chatlas</code> also integrates with
<a href="https://posit-dev.github.io/chatlas/reference/#chat-model-providers" target="_blank" rel="noopener">several LLM providers</a>
.
To use, first instantiate a <code>chatlas</code> chat connection class, and then pass that
to the <a href="https://pola.rs/" target="_blank" rel="noopener">Polars</a>
 data frame via the <code>&lt;DF&gt;.llm.use()</code> function:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="n">pip</span> <span class="n">install</span> <span class="n">chatlas</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">mall</span>
</span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">chatlas</span> <span class="kn">import</span> <span class="n">ChatOpenAI</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">chat</span> <span class="o">=</span> <span class="n">ChatOpenAI</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">data</span> <span class="o">=</span> <span class="n">mall</span><span class="o">.</span><span class="n">MallData</span>
</span></span><span class="line"><span class="cl"><span class="n">reviews</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">reviews</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">reviews</span><span class="o">.</span><span class="n">llm</span><span class="o">.</span><span class="n">use</span><span class="p">(</span><span class="n">chat</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; {&#39;backend&#39;: &#39;chatlas&#39;, &#39;chat&#39;: &lt;Chat OpenAI/gpt-4.1 turns=0 tokens=0/0 $0.0&gt;</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; , &#39;_cache&#39;: &#39;_mall_cache&#39;}</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p>Connecting <code>mall</code> to external LLM providers introduces a consideration of cost.
Most providers charge for the use of their API, so there is a potential that a
large table, with long texts, could be an expensive operation.</p>
<h2 id="parallel-requests-r-only">Parallel requests (R only)
</h2>
<p>A new feature introduced in <a href="https://www.tidyverse.org/blog/2025/07/ellmer-0-3-0" target="_blank" rel="noopener"><code>ellmer</code> 0.3.0</a>

enables the access to submit multiple prompts in parallel, rather than in sequence.
This makes it faster, and potentially cheaper, to process a table. If the provider
supports this feature, <code>ellmer</code> is able to leverage it via the
<a href="https://ellmer.tidyverse.org/reference/parallel_chat.html" target="_blank" rel="noopener"><code>parallel_chat()</code></a>

function. Gemini and OpenAI support the feature.</p>
<p>In the new release of <code>mall</code>, the integration with <code>ellmer</code> has been specially
written to take advantage of parallel chat. The internals have been re-written to
submit the NLP-specific instructions as a system message in order
reduce the size of each prompt. Additionally, the cache system has also been
re-tooled to support batched requests.</p>
<h2 id="nlp-operations-without-a-table">NLP operations without a table
</h2>
<p>Since its initial version, <code>mall</code> has provided the ability for R users to perform
the NLP operations over a string vector, in other words, without needing a table.
Starting with the new release, <code>mall</code> also provides this same functionality
in its Python version.</p>
<p><code>mall</code> can process vectors contained in a <code>list</code> object. To use, initialize a
new <code>LLMVec</code> class object with either an Ollama model, or a <code>chatlas</code> <code>Chat</code>
object, and then access the same NLP functions as the Polars extension.</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span><span class="lnt">7
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="c1"># Initialize a Chat object</span>
</span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">chatlas</span> <span class="kn">import</span> <span class="n">ChatOllama</span>
</span></span><span class="line"><span class="cl"><span class="n">chat</span> <span class="o">=</span> <span class="n">ChatOllama</span><span class="p">(</span><span class="n">model</span> <span class="o">=</span> <span class="s2">&#34;llama3.2&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Pass it to a new LLMVec</span>
</span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">mall</span> <span class="kn">import</span> <span class="n">LLMVec</span>
</span></span><span class="line"><span class="cl"><span class="n">llm</span> <span class="o">=</span> <span class="n">LLMVec</span><span class="p">(</span><span class="n">chat</span><span class="p">)</span>    
</span></span></code></pre></td></tr></table>
</div>
</div><p>Access the functions via the new LLMVec object, and pass the text to be processed.</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="n">llm</span><span class="o">.</span><span class="n">sentiment</span><span class="p">([</span><span class="s2">&#34;I am happy&#34;</span><span class="p">,</span> <span class="s2">&#34;I am sad&#34;</span><span class="p">])</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; [&#39;positive&#39;, &#39;negative&#39;]</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">llm</span><span class="o">.</span><span class="n">translate</span><span class="p">([</span><span class="s2">&#34;Este es el mejor dia!&#34;</span><span class="p">],</span> <span class="s2">&#34;english&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; [&#39;This is the best day!&#39;]</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p>For more information visit the reference page: <a href="https://mlverse.github.io/mall/reference/LlmVec.html" target="_blank" rel="noopener">LLMVec</a>
</p>
<h2 id="new-cheatsheet">New cheatsheet
</h2>
<p>The brand new official cheatsheet is now available from Posit:
<a href="https://rstudio.github.io/cheatsheets/nlp-with-llms.pdf" target="_blank" rel="noopener">Natural Language processing using LLMs in R/Python</a>
.
Its mean feature is that one side of the page is dedicated to the R version,
and the other side of the page to the Python version.</p>
<p><div class="not-prose"><figure>
    <img class="h-auto max-w-full rounded-lg"
      src="https://posit-open-source.netlify.app/blog/ai/edgarmall02/images/cheatsheet.png"
      alt="" 
      loading="lazy"
    >
  </figure></div>
</p>
<p>An web page version is also availabe in the official cheatsheet site
<a href="https://rstudio.github.io/cheatsheets/html/nlp-with-llms.html" target="_blank" rel="noopener">here</a>
. It takes
advantage of the tab feature that lets you select between R and Python
explanations and examples.</p>
<p><div class="not-prose"><figure>
    <img class="h-auto max-w-full rounded-lg"
      src="https://posit-open-source.netlify.app/blog/ai/edgarmall02/images/html-cheatsheet.png"
      alt="" 
      loading="lazy"
    >
  </figure></div>
</p>
]]></description>
      <enclosure url="https://posit-open-source.netlify.app/blog/ai/edgarmall02/thumbnail.png" length="690897" type="image/png" />
    </item>
    <item>
      <title>recipes 1.3.0</title>
      <link>https://posit-open-source.netlify.app/blog/tidyverse/2025/recipes-1-3-0/</link>
      <pubDate>Mon, 28 Apr 2025 00:00:00 +0000</pubDate>
      <guid>https://posit-open-source.netlify.app/blog/tidyverse/2025/recipes-1-3-0/</guid>
      <dc:creator>Emil Hvitfeldt</dc:creator><description><![CDATA[<!--
TODO:
* [x] Look over / edit the post's title in the yaml
* [x] Edit (or delete) the description; note this appears in the Twitter card
* [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html))
* [x] Find photo & update yaml metadata
* [x] Create `thumbnail-sq.jpg`; height and width should be equal
* [x] Create `thumbnail-wd.jpg`; width should be >5x height
* [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)
* [x] Add intro sentence, e.g. the standard tagline for the package
* [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html)
-->
<p>We&rsquo;re thrilled to announce the release of <a href="https://recipes.tidymodels.org/" target="_blank" rel="noopener">recipes</a>
 1.3.0. recipes lets you create a pipeable sequence of feature engineering steps.</p>
<p>You can install it from CRAN with:</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"recipes"</span><span class='o'>)</span></span></code></pre>
</div>
<p>This blog post will walk through some of the highlights of this release, which includes changes to how <code>strings_as_factors</code> are specified, deprecation of <a href="https://recipes.tidymodels.org/reference/step_select.html" target="_blank" rel="noopener"><code>step_select()</code></a>
, new <code>contrasts</code> argument for <a href="https://recipes.tidymodels.org/reference/step_dummy.html" target="_blank" rel="noopener"><code>step_dummy()</code></a>
, and improvements for <a href="https://recipes.tidymodels.org/reference/step_impute_bag.html" target="_blank" rel="noopener"><code>step_impute_bag()</code></a>
.</p>
<p>You can see a full list of changes in the <a href="https://recipes.tidymodels.org/news/index.html#recipes-130" target="_blank" rel="noopener">release notes</a>
.</p>
<p>Let&rsquo;s first load the package:</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://github.com/tidymodels/recipes'>recipes</a></span><span class='o'>)</span></span></code></pre>
</div>
<h2 id="strings_as_factors"><code>strings_as_factors</code>
</h2>
<p>Recipes by default convert predictor strings to factors, and the option for that is located in <a href="https://recipes.tidymodels.org/reference/prep.html" target="_blank" rel="noopener"><code>prep()</code></a>
. This caused an issue when you wanted to set <code>strings_as_factors = FALSE</code> for a recipe that is used somewhere else like in a workflow.</p>
<p>This is no longer an issue as we have moved the argument to <a href="https://recipes.tidymodels.org/reference/recipe.html" target="_blank" rel="noopener"><code>recipe()</code></a>
 itself. We are at the same time deprecating the use of <code>strings_as_factors</code> when used in <a href="https://recipes.tidymodels.org/reference/prep.html" target="_blank" rel="noopener"><code>prep()</code></a>
. Here is an example:</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://modeldata.tidymodels.org'>modeldata</a></span><span class='o'>)</span></span>
<span><span class='nv'>tate_text</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 4,284 × 5</span></span></span>
<span><span class='c'>#&gt;        id artist             title                                  medium  year</span></span>
<span><span class='c'>#&gt;     <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span>              <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>                                  <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span>  <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 1</span>  <span style='text-decoration: underline;'>21</span>926 Absalon            Proposals for a Habitat                Video…  <span style='text-decoration: underline;'>1</span>990</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 2</span>  <span style='text-decoration: underline;'>20</span>472 Auerbach, Frank    Michael                                Etchi…  <span style='text-decoration: underline;'>1</span>990</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 3</span>  <span style='text-decoration: underline;'>20</span>474 Auerbach, Frank    Geoffrey                               Etchi…  <span style='text-decoration: underline;'>1</span>990</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 4</span>  <span style='text-decoration: underline;'>20</span>473 Auerbach, Frank    Jake                                   Etchi…  <span style='text-decoration: underline;'>1</span>990</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 5</span>  <span style='text-decoration: underline;'>20</span>513 Auerbach, Frank    To the Studios                         Oil p…  <span style='text-decoration: underline;'>1</span>990</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 6</span>  <span style='text-decoration: underline;'>21</span>389 Ayres, OBE Gillian Phaëthon                               Oil p…  <span style='text-decoration: underline;'>1</span>990</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> <span style='text-decoration: underline;'>121</span>187 Barlow, Phyllida   Untitled                               Acryl…  <span style='text-decoration: underline;'>1</span>990</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 8</span>  <span style='text-decoration: underline;'>19</span>455 Baselitz, Georg    Green VIII                             Woodc…  <span style='text-decoration: underline;'>1</span>990</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 9</span>  <span style='text-decoration: underline;'>20</span>938 Beattie, Basil     Present Bound                          Oil p…  <span style='text-decoration: underline;'>1</span>990</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>10</span> <span style='text-decoration: underline;'>105</span>941 Beuys, Joseph      Joseph Beuys: A Private Collection. A… Print…  <span style='text-decoration: underline;'>1</span>990</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 4,274 more rows</span></span></span>
<span></span></code></pre>
</div>
<p>We are loading the modeldata package to get <code>tate_text</code> which has a character column <code>title</code>. If we don&rsquo;t do anything then it turns into a factor.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://recipes.tidymodels.org/reference/recipe.html'>recipe</a></span><span class='o'>(</span><span class='o'>~</span><span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>tate_text</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'><a href='https://recipes.tidymodels.org/reference/prep.html'>prep</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'><a href='https://recipes.tidymodels.org/reference/bake.html'>bake</a></span><span class='o'>(</span><span class='nv'>tate_text</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 4,284 × 5</span></span></span>
<span><span class='c'>#&gt;        id artist             title                                  medium  year</span></span>
<span><span class='c'>#&gt;     <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span>              <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span>                                  <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span>  <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 1</span>  <span style='text-decoration: underline;'>21</span>926 Absalon            Proposals for a Habitat                Video…  <span style='text-decoration: underline;'>1</span>990</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 2</span>  <span style='text-decoration: underline;'>20</span>472 Auerbach, Frank    Michael                                Etchi…  <span style='text-decoration: underline;'>1</span>990</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 3</span>  <span style='text-decoration: underline;'>20</span>474 Auerbach, Frank    Geoffrey                               Etchi…  <span style='text-decoration: underline;'>1</span>990</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 4</span>  <span style='text-decoration: underline;'>20</span>473 Auerbach, Frank    Jake                                   Etchi…  <span style='text-decoration: underline;'>1</span>990</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 5</span>  <span style='text-decoration: underline;'>20</span>513 Auerbach, Frank    To the Studios                         Oil p…  <span style='text-decoration: underline;'>1</span>990</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 6</span>  <span style='text-decoration: underline;'>21</span>389 Ayres, OBE Gillian Phaëthon                               Oil p…  <span style='text-decoration: underline;'>1</span>990</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> <span style='text-decoration: underline;'>121</span>187 Barlow, Phyllida   Untitled                               Acryl…  <span style='text-decoration: underline;'>1</span>990</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 8</span>  <span style='text-decoration: underline;'>19</span>455 Baselitz, Georg    Green VIII                             Woodc…  <span style='text-decoration: underline;'>1</span>990</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 9</span>  <span style='text-decoration: underline;'>20</span>938 Beattie, Basil     Present Bound                          Oil p…  <span style='text-decoration: underline;'>1</span>990</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>10</span> <span style='text-decoration: underline;'>105</span>941 Beuys, Joseph      Joseph Beuys: A Private Collection. A… Print…  <span style='text-decoration: underline;'>1</span>990</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 4,274 more rows</span></span></span>
<span></span></code></pre>
</div>
<p>But we can set <code>strings_as_factors = FALSE</code> in <a href="https://recipes.tidymodels.org/reference/recipe.html" target="_blank" rel="noopener"><code>recipe()</code></a>
 and it won&rsquo;t anymore.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://recipes.tidymodels.org/reference/recipe.html'>recipe</a></span><span class='o'>(</span><span class='o'>~</span><span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>tate_text</span>, strings_as_factors <span class='o'>=</span> <span class='kc'>FALSE</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'><a href='https://recipes.tidymodels.org/reference/prep.html'>prep</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'><a href='https://recipes.tidymodels.org/reference/bake.html'>bake</a></span><span class='o'>(</span><span class='nv'>tate_text</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 4,284 × 5</span></span></span>
<span><span class='c'>#&gt;        id artist             title                                  medium  year</span></span>
<span><span class='c'>#&gt;     <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span>              <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>                                  <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span>  <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 1</span>  <span style='text-decoration: underline;'>21</span>926 Absalon            Proposals for a Habitat                Video…  <span style='text-decoration: underline;'>1</span>990</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 2</span>  <span style='text-decoration: underline;'>20</span>472 Auerbach, Frank    Michael                                Etchi…  <span style='text-decoration: underline;'>1</span>990</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 3</span>  <span style='text-decoration: underline;'>20</span>474 Auerbach, Frank    Geoffrey                               Etchi…  <span style='text-decoration: underline;'>1</span>990</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 4</span>  <span style='text-decoration: underline;'>20</span>473 Auerbach, Frank    Jake                                   Etchi…  <span style='text-decoration: underline;'>1</span>990</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 5</span>  <span style='text-decoration: underline;'>20</span>513 Auerbach, Frank    To the Studios                         Oil p…  <span style='text-decoration: underline;'>1</span>990</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 6</span>  <span style='text-decoration: underline;'>21</span>389 Ayres, OBE Gillian Phaëthon                               Oil p…  <span style='text-decoration: underline;'>1</span>990</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> <span style='text-decoration: underline;'>121</span>187 Barlow, Phyllida   Untitled                               Acryl…  <span style='text-decoration: underline;'>1</span>990</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 8</span>  <span style='text-decoration: underline;'>19</span>455 Baselitz, Georg    Green VIII                             Woodc…  <span style='text-decoration: underline;'>1</span>990</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 9</span>  <span style='text-decoration: underline;'>20</span>938 Beattie, Basil     Present Bound                          Oil p…  <span style='text-decoration: underline;'>1</span>990</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>10</span> <span style='text-decoration: underline;'>105</span>941 Beuys, Joseph      Joseph Beuys: A Private Collection. A… Print…  <span style='text-decoration: underline;'>1</span>990</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 4,274 more rows</span></span></span>
<span></span></code></pre>
</div>
<p>This change should also make pragmatic sense as whether you want to turn strings into factors is something that should encoded into the recipe itself.</p>
<h2 id="deprecating-step_select">Deprecating <code>step_select()</code>
</h2>
<p>We have started the process of deprecating <a href="https://recipes.tidymodels.org/reference/step_select.html" target="_blank" rel="noopener"><code>step_select()</code></a>
. Given the number of issues people are having with the step and the fact that it doesn&rsquo;t play well with workflows we think this is the right call.</p>
<p>There are two main use cases where <a href="https://recipes.tidymodels.org/reference/step_select.html" target="_blank" rel="noopener"><code>step_select()</code></a>
 was used: removing variables, and selecting variables. Removing variables when done with <code>-</code> in <a href="https://recipes.tidymodels.org/reference/step_select.html" target="_blank" rel="noopener"><code>step_select()</code></a>
</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://recipes.tidymodels.org/reference/recipe.html'>recipe</a></span><span class='o'>(</span><span class='nv'>mpg</span> <span class='o'>~</span> <span class='nv'>.</span>, <span class='nv'>mtcars</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'><a href='https://recipes.tidymodels.org/reference/step_select.html'>step_select</a></span><span class='o'>(</span><span class='o'>-</span><span class='nf'><a href='https://tidyselect.r-lib.org/reference/starts_with.html'>starts_with</a></span><span class='o'>(</span><span class='s'>"d"</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'><a href='https://recipes.tidymodels.org/reference/prep.html'>prep</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'><a href='https://recipes.tidymodels.org/reference/bake.html'>bake</a></span><span class='o'>(</span>new_data <span class='o'>=</span> <span class='kc'>NULL</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 32 × 9</span></span></span>
<span><span class='c'>#&gt;      cyl    hp    wt  qsec    vs    am  gear  carb   mpg</span></span>
<span><span class='c'>#&gt;    <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 1</span>     6   110  2.62  16.5     0     1     4     4  21  </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 2</span>     6   110  2.88  17.0     0     1     4     4  21  </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 3</span>     4    93  2.32  18.6     1     1     4     1  22.8</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 4</span>     6   110  3.22  19.4     1     0     3     1  21.4</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 5</span>     8   175  3.44  17.0     0     0     3     2  18.7</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 6</span>     6   105  3.46  20.2     1     0     3     1  18.1</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 7</span>     8   245  3.57  15.8     0     0     3     4  14.3</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 8</span>     4    62  3.19  20       1     0     4     2  24.4</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 9</span>     4    95  3.15  22.9     1     0     4     2  22.8</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>10</span>     6   123  3.44  18.3     1     0     4     4  19.2</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 22 more rows</span></span></span>
<span></span></code></pre>
</div>
<p>These use cases can seamlessly be converted to use <a href="https://recipes.tidymodels.org/reference/step_rm.html" target="_blank" rel="noopener"><code>step_rm()</code></a>
 without the <code>-</code> for the same result.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://recipes.tidymodels.org/reference/recipe.html'>recipe</a></span><span class='o'>(</span><span class='nv'>mpg</span> <span class='o'>~</span> <span class='nv'>.</span>, <span class='nv'>mtcars</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'><a href='https://recipes.tidymodels.org/reference/step_rm.html'>step_rm</a></span><span class='o'>(</span><span class='nf'><a href='https://tidyselect.r-lib.org/reference/starts_with.html'>starts_with</a></span><span class='o'>(</span><span class='s'>"d"</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'><a href='https://recipes.tidymodels.org/reference/prep.html'>prep</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'><a href='https://recipes.tidymodels.org/reference/bake.html'>bake</a></span><span class='o'>(</span>new_data <span class='o'>=</span> <span class='kc'>NULL</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 32 × 9</span></span></span>
<span><span class='c'>#&gt;      cyl    hp    wt  qsec    vs    am  gear  carb   mpg</span></span>
<span><span class='c'>#&gt;    <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 1</span>     6   110  2.62  16.5     0     1     4     4  21  </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 2</span>     6   110  2.88  17.0     0     1     4     4  21  </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 3</span>     4    93  2.32  18.6     1     1     4     1  22.8</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 4</span>     6   110  3.22  19.4     1     0     3     1  21.4</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 5</span>     8   175  3.44  17.0     0     0     3     2  18.7</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 6</span>     6   105  3.46  20.2     1     0     3     1  18.1</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 7</span>     8   245  3.57  15.8     0     0     3     4  14.3</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 8</span>     4    62  3.19  20       1     0     4     2  24.4</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 9</span>     4    95  3.15  22.9     1     0     4     2  22.8</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>10</span>     6   123  3.44  18.3     1     0     4     4  19.2</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 22 more rows</span></span></span>
<span></span></code></pre>
</div>
<p>For selecting variables there are two cases. The first is as a tool to select which variables to use in our model. We recommend that you use <a href="https://dplyr.tidyverse.org/reference/select.html" target="_blank" rel="noopener"><code>select()</code></a>
 to do that before passing the data into the <a href="https://recipes.tidymodels.org/reference/recipe.html" target="_blank" rel="noopener"><code>recipe()</code></a>
. This is especially helpful since <a href="https://www.tidyverse.org/blog/2024/07/recipes-1-1-0/#column-type-checking" target="_blank" rel="noopener">recipes are tighter with respect to their input types</a>
, so only passing the data you need to use is helpful.</p>
<p>If you need to do the selection after another step takes effect you should still be able to do so, by using <a href="https://recipes.tidymodels.org/reference/step_rm.html" target="_blank" rel="noopener"><code>step_rm()</code></a>
 in the following manner.</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">step_rm</span><span class="p">(</span><span class="n">recipe</span><span class="p">,</span> <span class="nf">all_predictors</span><span class="p">(),</span> <span class="o">-</span><span class="nf">all_of</span><span class="p">(</span><span class="o">&lt;</span><span class="n">variables</span> <span class="n">that</span> <span class="n">you</span> <span class="n">want</span> <span class="n">to</span> <span class="n">keep</span><span class="o">&gt;</span><span class="p">))</span>
</span></span></code></pre></td></tr></table>
</div>
</div><h2 id="step_dummy-contrasts-argument"><code>step_dummy()</code> contrasts argument
</h2>
<p>Contrasts such as <a href="https://rdrr.io/r/stats/contrast.html" target="_blank" rel="noopener"><code>contr.treatment()</code></a>
 and <a href="https://rdrr.io/r/stats/contrast.html" target="_blank" rel="noopener"><code>contr.poly()</code></a>
 are used in <a href="https://recipes.tidymodels.org/reference/step_dummy.html" target="_blank" rel="noopener"><code>step_dummy()</code></a>
 to determine how the steps should translate categorical values into one or more numeric columns. Traditionally the contrasts were set using <a href="https://rdrr.io/r/base/options.html" target="_blank" rel="noopener"><code>options()</code></a>
 like so:</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/base/options.html'>options</a></span><span class='o'>(</span>contrasts <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span>unordered <span class='o'>=</span> <span class='s'>"contr.poly"</span>, ordered <span class='o'>=</span> <span class='s'>"contr.poly"</span><span class='o'>)</span><span class='o'>)</span></span></code></pre>
</div>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://recipes.tidymodels.org/reference/recipe.html'>recipe</a></span><span class='o'>(</span><span class='o'>~</span><span class='nv'>species</span> <span class='o'>+</span> <span class='nv'>island</span>, <span class='nv'>penguins</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'><a href='https://recipes.tidymodels.org/reference/step_dummy.html'>step_dummy</a></span><span class='o'>(</span><span class='nf'><a href='https://recipes.tidymodels.org/reference/has_role.html'>all_nominal_predictors</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'><a href='https://recipes.tidymodels.org/reference/prep.html'>prep</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'><a href='https://recipes.tidymodels.org/reference/bake.html'>bake</a></span><span class='o'>(</span>new_data <span class='o'>=</span> <span class='nv'>penguins</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 344 × 4</span></span></span>
<span><span class='c'>#&gt;    species_Chinstrap species_Gentoo island_Dream island_Torgersen</span></span>
<span><span class='c'>#&gt;                <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span>          <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span>        <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span>            <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 1</span>            -<span style='color: #BB0000;'>0.707</span>          0.408        0.707            0.408</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 2</span>            -<span style='color: #BB0000;'>0.707</span>          0.408        0.707            0.408</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 3</span>            -<span style='color: #BB0000;'>0.707</span>          0.408        0.707            0.408</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 4</span>            -<span style='color: #BB0000;'>0.707</span>          0.408        0.707            0.408</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 5</span>            -<span style='color: #BB0000;'>0.707</span>          0.408        0.707            0.408</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 6</span>            -<span style='color: #BB0000;'>0.707</span>          0.408        0.707            0.408</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 7</span>            -<span style='color: #BB0000;'>0.707</span>          0.408        0.707            0.408</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 8</span>            -<span style='color: #BB0000;'>0.707</span>          0.408        0.707            0.408</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 9</span>            -<span style='color: #BB0000;'>0.707</span>          0.408        0.707            0.408</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>10</span>            -<span style='color: #BB0000;'>0.707</span>          0.408        0.707            0.408</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 334 more rows</span></span></span>
<span></span></code></pre>
</div>
<p>The issue with this approach is that it pulls from <a href="https://rdrr.io/r/base/options.html" target="_blank" rel="noopener"><code>options()</code></a>
 when it needs it instead of storing the information. This means that if you put this recipe in production you will need to set the option in the production environment to match that of the training environment.</p>
<div class="highlight">
</div>
<p>To fix this issue we have given <a href="https://recipes.tidymodels.org/reference/step_dummy.html" target="_blank" rel="noopener"><code>step_dummy()</code></a>
 an argument <code>contrasts</code> that work in much the same way. You simply specify the contrast you want and it will be stored in the object for easy deployment.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://recipes.tidymodels.org/reference/recipe.html'>recipe</a></span><span class='o'>(</span><span class='o'>~</span><span class='nv'>species</span> <span class='o'>+</span> <span class='nv'>island</span>, <span class='nv'>penguins</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'><a href='https://recipes.tidymodels.org/reference/step_dummy.html'>step_dummy</a></span><span class='o'>(</span></span>
<span>    <span class='nf'><a href='https://recipes.tidymodels.org/reference/has_role.html'>all_nominal_predictors</a></span><span class='o'>(</span><span class='o'>)</span>, contrasts <span class='o'>=</span> <span class='s'>"contr.poly"</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'><a href='https://recipes.tidymodels.org/reference/prep.html'>prep</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'><a href='https://recipes.tidymodels.org/reference/bake.html'>bake</a></span><span class='o'>(</span>new_data <span class='o'>=</span> <span class='nv'>penguins</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 344 × 4</span></span></span>
<span><span class='c'>#&gt;    species_Chinstrap species_Gentoo island_Dream island_Torgersen</span></span>
<span><span class='c'>#&gt;                <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span>          <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span>        <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span>            <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 1</span>            -<span style='color: #BB0000;'>0.707</span>          0.408        0.707            0.408</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 2</span>            -<span style='color: #BB0000;'>0.707</span>          0.408        0.707            0.408</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 3</span>            -<span style='color: #BB0000;'>0.707</span>          0.408        0.707            0.408</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 4</span>            -<span style='color: #BB0000;'>0.707</span>          0.408        0.707            0.408</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 5</span>            -<span style='color: #BB0000;'>0.707</span>          0.408        0.707            0.408</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 6</span>            -<span style='color: #BB0000;'>0.707</span>          0.408        0.707            0.408</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 7</span>            -<span style='color: #BB0000;'>0.707</span>          0.408        0.707            0.408</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 8</span>            -<span style='color: #BB0000;'>0.707</span>          0.408        0.707            0.408</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 9</span>            -<span style='color: #BB0000;'>0.707</span>          0.408        0.707            0.408</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>10</span>            -<span style='color: #BB0000;'>0.707</span>          0.408        0.707            0.408</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 334 more rows</span></span></span>
<span></span></code></pre>
</div>
<p>If you are using a contrasts from an external package such as <a href="https://hardhat.tidymodels.org/reference/contr_one_hot.html" target="_blank" rel="noopener"><code>hardhat::contr_one_hot()</code></a>
 you will need to have the package loaded in the environments you are working in with <a href="https://github.com/tidymodels/hardhat" target="_blank" rel="noopener"><code>library(hardhat)</code></a>
 and setting <code>contrasts = &quot;contr_one_hot&quot;</code>. You will also need to call <a href="https://github.com/tidymodels/hardhat" target="_blank" rel="noopener"><code>library(hardhat)</code></a>
 in any production environments you are using this recipe.</p>
<h2 id="tidyselect-can-be-used-everywhere">tidyselect can be used everywhere
</h2>
<p>Several steps such as <a href="https://recipes.tidymodels.org/reference/step_pls.html" target="_blank" rel="noopener"><code>step_pls()</code></a>
 and <a href="https://recipes.tidymodels.org/reference/step_impute_bag.html" target="_blank" rel="noopener"><code>step_impute_bag()</code></a>
 require the selection of more than just the affected columns. <a href="https://recipes.tidymodels.org/reference/step_pls.html" target="_blank" rel="noopener"><code>step_pls()</code></a>
 needs you to select an <code>outcome</code> variable and <a href="https://recipes.tidymodels.org/reference/step_impute_bag.html" target="_blank" rel="noopener"><code>step_impute_bag()</code></a>
 needs you to select which variables to impute with, <code>impute_with</code>, if you don&rsquo;t want to use all predictors. Previously these needed to be strings or use special selectors like <a href="https://recipes.tidymodels.org/reference/step_impute_bag.html" target="_blank" rel="noopener"><code>imp_vars()</code></a>
. You don&rsquo;t have to do that anymore. You can now use tidyselect in these arguments too.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://recipes.tidymodels.org/reference/recipe.html'>recipe</a></span><span class='o'>(</span><span class='nv'>mpg</span> <span class='o'>~</span> <span class='nv'>.</span>, <span class='nv'>mtcars</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'><a href='https://recipes.tidymodels.org/reference/step_pls.html'>step_pls</a></span><span class='o'>(</span><span class='nf'><a href='https://recipes.tidymodels.org/reference/has_role.html'>all_predictors</a></span><span class='o'>(</span><span class='o'>)</span>, outcome <span class='o'>=</span> <span class='nv'>mpg</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'><a href='https://recipes.tidymodels.org/reference/prep.html'>prep</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'><a href='https://recipes.tidymodels.org/reference/bake.html'>bake</a></span><span class='o'>(</span>new_data <span class='o'>=</span> <span class='nv'>mtcars</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 32 × 3</span></span></span>
<span><span class='c'>#&gt;      mpg   PLS1   PLS2</span></span>
<span><span class='c'>#&gt;    <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span>  <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span>  <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 1</span>  21    0.693  0.895</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 2</span>  21    0.650  0.654</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 3</span>  22.8  2.78   0.378</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 4</span>  21.4  0.210 -<span style='color: #BB0000;'>0.368</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 5</span>  18.7 -<span style='color: #BB0000;'>1.95</span>   0.845</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 6</span>  18.1  0.137 -<span style='color: #BB0000;'>0.624</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 7</span>  14.3 -<span style='color: #BB0000;'>2.77</span>   0.364</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 8</span>  24.4  1.81  -<span style='color: #BB0000;'>1.30</span> </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 9</span>  22.8  2.12  -<span style='color: #BB0000;'>1.95</span> </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>10</span>  19.2  0.531 -<span style='color: #BB0000;'>1.51</span> </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 22 more rows</span></span></span>
<span></span></code></pre>
</div>
<p>For arguments that allow for multiple selections now work with recipes selectors like <a href="https://recipes.tidymodels.org/reference/has_role.html" target="_blank" rel="noopener"><code>all_numeric_predictors()</code></a>
 and <a href="https://recipes.tidymodels.org/reference/has_role.html" target="_blank" rel="noopener"><code>has_role()</code></a>
.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://recipes.tidymodels.org/reference/recipe.html'>recipe</a></span><span class='o'>(</span><span class='nv'>mpg</span> <span class='o'>~</span> <span class='nv'>.</span>, <span class='nv'>mtcars</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'><a href='https://recipes.tidymodels.org/reference/step_impute_bag.html'>step_impute_bag</a></span><span class='o'>(</span><span class='nf'><a href='https://recipes.tidymodels.org/reference/has_role.html'>all_predictors</a></span><span class='o'>(</span><span class='o'>)</span>, impute_with <span class='o'>=</span> <span class='nf'><a href='https://recipes.tidymodels.org/reference/has_role.html'>has_role</a></span><span class='o'>(</span><span class='s'>"predictor"</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'><a href='https://recipes.tidymodels.org/reference/prep.html'>prep</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'><a href='https://recipes.tidymodels.org/reference/bake.html'>bake</a></span><span class='o'>(</span>new_data <span class='o'>=</span> <span class='nv'>mtcars</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 32 × 11</span></span></span>
<span><span class='c'>#&gt;      cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb   mpg</span></span>
<span><span class='c'>#&gt;    <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 1</span>     6  160    110  3.9   2.62  16.5     0     1     4     4  21  </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 2</span>     6  160    110  3.9   2.88  17.0     0     1     4     4  21  </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 3</span>     4  108     93  3.85  2.32  18.6     1     1     4     1  22.8</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 4</span>     6  258    110  3.08  3.22  19.4     1     0     3     1  21.4</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 5</span>     8  360    175  3.15  3.44  17.0     0     0     3     2  18.7</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 6</span>     6  225    105  2.76  3.46  20.2     1     0     3     1  18.1</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 7</span>     8  360    245  3.21  3.57  15.8     0     0     3     4  14.3</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 8</span>     4  147.    62  3.69  3.19  20       1     0     4     2  24.4</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 9</span>     4  141.    95  3.92  3.15  22.9     1     0     4     2  22.8</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>10</span>     6  168.   123  3.92  3.44  18.3     1     0     4     4  19.2</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 22 more rows</span></span></span>
<span></span></code></pre>
</div>
<p>These changes are backwards compatible meaning that the old ways still work with minimal warnings.</p>
<h2 id="step_impute_bag-now-takes-up-less-memory"><code>step_impute_bag()</code> now takes up less memory
</h2>
<p>We have another benefit for users of <a href="https://recipes.tidymodels.org/reference/step_impute_bag.html" target="_blank" rel="noopener"><code>step_impute_bag()</code></a>
. For each variable it imputes on, it fits a bagged tree model, which is then used to predict with for imputation. It was noticed that these models had a larger memory footprint than was needed. This has been remedied, so now there should be a noticeable decrease in size for recipes with <a href="https://recipes.tidymodels.org/reference/step_impute_bag.html" target="_blank" rel="noopener"><code>step_impute_bag()</code></a>
.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>rec</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://recipes.tidymodels.org/reference/recipe.html'>recipe</a></span><span class='o'>(</span><span class='nv'>Sale_Price</span> <span class='o'>~</span> <span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>ames</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'><a href='https://recipes.tidymodels.org/reference/step_impute_bag.html'>step_impute_bag</a></span><span class='o'>(</span><span class='nf'><a href='https://tidyselect.r-lib.org/reference/starts_with.html'>starts_with</a></span><span class='o'>(</span><span class='s'>"Lot_"</span><span class='o'>)</span>, impute_with <span class='o'>=</span> <span class='nf'><a href='https://recipes.tidymodels.org/reference/has_role.html'>all_numeric_predictors</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'><a href='https://recipes.tidymodels.org/reference/prep.html'>prep</a></span><span class='o'>(</span><span class='o'>)</span></span>
<span></span>
<span><span class='nf'>lobstr</span><span class='nf'>::</span><span class='nf'><a href='https://lobstr.r-lib.org/reference/obj_size.html'>obj_size</a></span><span class='o'>(</span><span class='nv'>rec</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; 20.23 MB</span></span>
<span></span></code></pre>
</div>
<p>This recipe took up over <code>75 MB</code> and now takes up <code>20 MB</code>.</p>
<h2 id="acknowledgements">Acknowledgements
</h2>
<p>Many thanks to all the people who contributed to recipes since the last release!</p>
<p><a href="https://github.com/chillerb" target="_blank" rel="noopener">@chillerb</a>
, <a href="https://github.com/dshemetov" target="_blank" rel="noopener">@dshemetov</a>
, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>
, <a href="https://github.com/kevbaer" target="_blank" rel="noopener">@kevbaer</a>
, <a href="https://github.com/nhward" target="_blank" rel="noopener">@nhward</a>
, <a href="https://github.com/regisely" target="_blank" rel="noopener">@regisely</a>
, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>
.</p>
]]></description>
      <enclosure url="https://posit-open-source.netlify.app/blog/tidyverse/2025/recipes-1-3-0/thumbnail-wd.jpg" length="698394" type="image/jpeg" />
    </item>
    <item>
      <title>rsample 1.3.0</title>
      <link>https://posit-open-source.netlify.app/blog/tidyverse/2025/rsample-1-3-0/</link>
      <pubDate>Thu, 03 Apr 2025 00:00:00 +0000</pubDate>
      <guid>https://posit-open-source.netlify.app/blog/tidyverse/2025/rsample-1-3-0/</guid>
      <dc:creator>Hannah Frick</dc:creator><description><![CDATA[<!--
TODO:
* [x] Look over / edit the post's title in the yaml
* [x] Edit (or delete) the description; note this appears in the Twitter card
* [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html))
* [x] Find photo & update yaml metadata
* [x] Create `thumbnail-sq.jpg`; height and width should be equal
* [x] Create `thumbnail-wd.jpg`; width should be >5x height
* [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)
* [x] Add intro sentence, e.g. the standard tagline for the package
* [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html)
-->
<p>We&rsquo;re thrilled to announce the release of <a href="https://rsample.tidymodels.org/" target="_blank" rel="noopener">rsample</a>
 1.3.0. rsample makes it easy to create resamples for assessing model performance. It is part of the tidymodels framework, a collection of R packages for modeling and machine learning using tidyverse principles.</p>
<p>You can install it from CRAN with:</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"rsample"</span><span class='o'>)</span></span></code></pre>
</div>
<p>This blog post will walk you through the more flexible grouping for calculating bootstrap confidence intervals and highlight the contributions made by participants of the tidyverse developer day.</p>
<p>You can see a full list of changes in the <a href="https://rsample.tidymodels.org/news/index.html#rsample-130" target="_blank" rel="noopener">release notes</a>
.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://rsample.tidymodels.org'>rsample</a></span><span class='o'>)</span></span></code></pre>
</div>
<h2 id="flexible-grouping-for-bootstrap-intervals">Flexible grouping for bootstrap intervals
</h2>
<p>Resampling allows you get an understanding of the variability of an estimate, e.g., a summary statistic of your data. If you want to lean on statistical theory and get confidence intervals for your estimate, you can reach for the bootstrap resampling scheme: calculating your summary statistic on the bootstrap samples enables you to calculate confidence intervals around your point estimate.</p>
<p>rsample contains a family of <code>int_*()</code> functions to calculate bootstrap confidence intervals of different flavors: percentile intervals, &ldquo;BCa&rdquo; intervals, and bootstrap-t intervals. If you want to dive into the technical details, Chapter 11 of <a href="https://hastie.su.domains/CASI/" target="_blank" rel="noopener">CASI</a>
 is a good place to start.</p>
<p>You can calculate the confidence intervals based on a grouping in your data. However, so far, rsample would only let you provide a single grouping variable. With this release, we are extending this functionality to allow a more flexible grouping.</p>
<p>The motivating application for us was to be able to calculate confidence intervals around multiple model performance metrics, including dynamic metrics for time-to-event models which depend on an evaluation time point. So in this case, the metric is one grouping variable and the evaluation time another. But let&rsquo;s pull back complexity for an example of how the new rsample functionality works!</p>
<p>We have a dataset with delivery times for orders containing one or more items. We&rsquo;ll do some data wrangling with it, so we are also loading dplyr.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://dplyr.tidyverse.org'>dplyr</a></span><span class='o'>)</span></span>
<span><span class='c'>#&gt; </span></span>
<span><span class='c'>#&gt; Attaching package: 'dplyr'</span></span>
<span></span><span><span class='c'>#&gt; The following objects are masked from 'package:stats':</span></span>
<span><span class='c'>#&gt; </span></span>
<span><span class='c'>#&gt;     filter, lag</span></span>
<span></span><span><span class='c'>#&gt; The following objects are masked from 'package:base':</span></span>
<span><span class='c'>#&gt; </span></span>
<span><span class='c'>#&gt;     intersect, setdiff, setequal, union</span></span>
<span></span><span><span class='nf'><a href='https://rdrr.io/r/utils/data.html'>data</a></span><span class='o'>(</span><span class='nv'>deliveries</span>, package <span class='o'>=</span> <span class='s'>"modeldata"</span><span class='o'>)</span></span>
<span></span>
<span><span class='nv'>deliveries</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 10,012 × 31</span></span></span>
<span><span class='c'>#&gt;    time_to_delivery  hour day   distance item_01 item_02 item_03 item_04 item_05</span></span>
<span><span class='c'>#&gt;               <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span>    <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span>   <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span>   <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span>   <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span>   <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span>   <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 1</span>             16.1  11.9 Thu       3.15       0       0       2       0       0</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 2</span>             22.9  19.2 Tue       3.69       0       0       0       0       0</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 3</span>             30.3  18.4 Fri       2.06       0       0       0       0       1</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 4</span>             33.4  15.8 Thu       5.97       0       0       0       0       0</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 5</span>             27.2  19.6 Fri       2.52       0       0       0       1       0</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 6</span>             19.6  13.0 Sat       3.35       1       0       0       1       0</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 7</span>             22.1  15.5 Sun       2.46       0       0       1       1       0</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 8</span>             26.6  17.0 Thu       2.21       0       0       1       0       0</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 9</span>             30.8  16.7 Fri       2.62       0       0       0       0       0</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>10</span>             17.4  11.9 Sun       2.75       0       2       1       0       0</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 10,002 more rows</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 22 more variables: item_06 &lt;int&gt;, item_07 &lt;int&gt;, item_08 &lt;int&gt;,</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>#   item_09 &lt;int&gt;, item_10 &lt;int&gt;, item_11 &lt;int&gt;, item_12 &lt;int&gt;, item_13 &lt;int&gt;,</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>#   item_14 &lt;int&gt;, item_15 &lt;int&gt;, item_16 &lt;int&gt;, item_17 &lt;int&gt;, item_18 &lt;int&gt;,</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>#   item_19 &lt;int&gt;, item_20 &lt;int&gt;, item_21 &lt;int&gt;, item_22 &lt;int&gt;, item_23 &lt;int&gt;,</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>#   item_24 &lt;int&gt;, item_25 &lt;int&gt;, item_26 &lt;int&gt;, item_27 &lt;int&gt;</span></span></span>
<span></span></code></pre>
</div>
<p>Instead of fitting a whole model here, we are calculating a straightforward summary statistic for how much delivery time increases if an item is included in the order. So the item is one grouping factor. As a second one, we are using whether the order was delivered on a weekday or a weekend. Let&rsquo;s start by making that weekend indicator and reshaping the data to make it easier to calculate our summary statistic.</p>
<p>Note that the name for the weekend indicator column, <code>.weekend</code>, starts with a dot. That is important as it is the convention to signal to rsample that this is an additional grouping variable.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>item_data</span> <span class='o'>&lt;-</span> <span class='nv'>deliveries</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span>
<span>  <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span>.weekend <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/ifelse.html'>ifelse</a></span><span class='o'>(</span><span class='nv'>day</span> <span class='o'><a href='https://rdrr.io/r/base/match.html'>%in%</a></span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"Sat"</span>, <span class='s'>"Sun"</span><span class='o'>)</span>, <span class='s'>"weekend"</span>, <span class='s'>"weekday"</span><span class='o'>)</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span>
<span>  <span class='nf'><a href='https://dplyr.tidyverse.org/reference/select.html'>select</a></span><span class='o'>(</span><span class='nv'>time_to_delivery</span>, <span class='nv'>.weekend</span>, <span class='nf'><a href='https://tidyselect.r-lib.org/reference/starts_with.html'>starts_with</a></span><span class='o'>(</span><span class='s'>"item"</span><span class='o'>)</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span>
<span>  <span class='nf'>tidyr</span><span class='nf'>::</span><span class='nf'><a href='https://tidyr.tidyverse.org/reference/pivot_longer.html'>pivot_longer</a></span><span class='o'>(</span><span class='nf'><a href='https://tidyselect.r-lib.org/reference/starts_with.html'>starts_with</a></span><span class='o'>(</span><span class='s'>"item"</span><span class='o'>)</span>, names_to <span class='o'>=</span> <span class='s'>"item"</span>, values_to <span class='o'>=</span> <span class='s'>"value"</span><span class='o'>)</span> </span></code></pre>
</div>
<p>Next, we are making a small function that calculates the ratio of average delivery times with and without the item included in the order, as a estimate of how much a specific item in an order increases the delivery time.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>relative_increase</span> <span class='o'>&lt;-</span> <span class='kr'>function</span><span class='o'>(</span><span class='nv'>data</span><span class='o'>)</span> <span class='o'>&#123;</span></span>
<span>  <span class='nv'>data</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span>
<span>    <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span>includes_item <span class='o'>=</span> <span class='nv'>value</span> <span class='o'>&gt;</span> <span class='m'>0</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span>
<span>    <span class='nf'><a href='https://dplyr.tidyverse.org/reference/summarise.html'>summarize</a></span><span class='o'>(</span></span>
<span>      has <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/mean.html'>mean</a></span><span class='o'>(</span><span class='nv'>time_to_delivery</span><span class='o'>[</span><span class='nv'>includes_item</span><span class='o'>]</span><span class='o'>)</span>,</span>
<span>      has_not <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/mean.html'>mean</a></span><span class='o'>(</span><span class='nv'>time_to_delivery</span><span class='o'>[</span><span class='o'>!</span><span class='nv'>includes_item</span><span class='o'>]</span><span class='o'>)</span>,</span>
<span>      .by <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='nv'>item</span>, <span class='nv'>.weekend</span><span class='o'>)</span></span>
<span>    <span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span>
<span>    <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span>estimate <span class='o'>=</span> <span class='nv'>has</span> <span class='o'>/</span> <span class='nv'>has_not</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span>
<span>    <span class='nf'><a href='https://dplyr.tidyverse.org/reference/select.html'>select</a></span><span class='o'>(</span>term <span class='o'>=</span> <span class='nv'>item</span>, <span class='nv'>.weekend</span>, <span class='nv'>estimate</span><span class='o'>)</span></span>
<span><span class='o'>&#125;</span></span></code></pre>
</div>
<p>We can calculate that on our entire dataset.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>relative_increase</span><span class='o'>(</span><span class='nv'>item_data</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 54 × 3</span></span></span>
<span><span class='c'>#&gt;    term    .weekend estimate</span></span>
<span><span class='c'>#&gt;    <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>   <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>       <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> item_01 weekday      1.07</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> item_02 weekday      1.02</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> item_03 weekday      1.02</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> item_04 weekday      1.00</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> item_05 weekday      1.00</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> item_06 weekday      1.01</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> item_07 weekday      1.03</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> item_08 weekday      1.01</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> item_09 weekday      1.01</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>10</span> item_10 weekday      1.06</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 44 more rows</span></span></span>
<span></span></code></pre>
</div>
<p>This is fine, but what we really want here is to get confidence intervals around these estimates!</p>
<p>So let&rsquo;s make bootstrap samples and calculate our statistic on those.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>1</span><span class='o'>)</span></span>
<span><span class='nv'>item_bootstrap</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rsample.tidymodels.org/reference/bootstraps.html'>bootstraps</a></span><span class='o'>(</span><span class='nv'>item_data</span>, times <span class='o'>=</span> <span class='m'>1000</span><span class='o'>)</span></span>
<span></span>
<span><span class='nv'>item_stats</span> <span class='o'>&lt;-</span></span>
<span>  <span class='nv'>item_bootstrap</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span>
<span>  <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span>stats <span class='o'>=</span> <span class='nf'>purrr</span><span class='nf'>::</span><span class='nf'><a href='https://purrr.tidyverse.org/reference/map.html'>map</a></span><span class='o'>(</span><span class='nv'>splits</span>, <span class='o'>~</span> <span class='nf'><a href='https://rsample.tidymodels.org/reference/as.data.frame.rsplit.html'>analysis</a></span><span class='o'>(</span><span class='nv'>.x</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'>relative_increase</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span></span></code></pre>
</div>
<p>Now we have everything we need to calculate the confidence intervals, stashed into the tibbles in the <code>stats</code> column: an <code>estimate</code>, a <code>term</code> (the primary grouping variable), and our additional grouping variable <code>.weekend</code>, starting with a dot. What&rsquo;s left to do is call one of the <code>int_*()</code> functions and specify which column contains the statistics. Here, we&rsquo;ll calculate percentile intervals with <a href="https://rsample.tidymodels.org/reference/int_pctl.html" target="_blank" rel="noopener"><code>int_pctl()</code></a>
.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>item_ci</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rsample.tidymodels.org/reference/int_pctl.html'>int_pctl</a></span><span class='o'>(</span><span class='nv'>item_stats</span>, statistics <span class='o'>=</span> <span class='nv'>stats</span>, alpha <span class='o'>=</span> <span class='m'>0.1</span><span class='o'>)</span></span>
<span><span class='nv'>item_ci</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 54 × 7</span></span></span>
<span><span class='c'>#&gt;    term    .weekend .lower .estimate .upper .alpha .method   </span></span>
<span><span class='c'>#&gt;    <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>   <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>     <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span>     <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span>  <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span>  <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>     </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> item_01 weekday   1.05      1.07    1.09    0.1 percentile</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> item_01 weekend   1.04      1.07    1.10    0.1 percentile</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> item_02 weekday   1.00      1.02    1.03    0.1 percentile</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> item_02 weekend   0.996     1.01    1.03    0.1 percentile</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> item_03 weekday   1.01      1.02    1.04    0.1 percentile</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> item_03 weekend   0.970     0.990   1.01    0.1 percentile</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> item_04 weekday   0.989     1.00    1.02    0.1 percentile</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> item_04 weekend   0.998     1.02    1.03    0.1 percentile</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> item_05 weekday   0.987     1.00    1.02    0.1 percentile</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>10</span> item_05 weekend   0.982     1.00    1.03    0.1 percentile</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 44 more rows</span></span></span>
<span></span></code></pre>
</div>
<h2 id="tidyverse-developer-day">Tidyverse developer day
</h2>
<p>At the tidyverse developer day after posit::conf, rsample got a lot of love in form of contributions by various community members. People improved documentation and examples, move deprecations along, tightened checks to support good practice, and upgraded errors and warnings, both in style and content. None of these changes are flashy new features but all of them are essential to rsample working well!</p>
<p>So for example, leave-one-out (LOO) cross-validation is not a great choice of resampling scheme in most situations. From <a href="https://www.tmwr.org/resampling#leave-one-out-cross-validation" target="_blank" rel="noopener">Tidy modeling with R</a>
:</p>
<blockquote>
<p>For anything but pathologically small samples, LOO is computationally excessive, and it may not have good statistical properties.</p>
</blockquote>
<p>It was possible, however, to create implicit LOO samples by using <a href="https://rsample.tidymodels.org/reference/vfold_cv.html" target="_blank" rel="noopener"><code>vfold_cv()</code></a>
 with the number of folds set to the number of rows in the data. With a dev day contribution, this now errors:</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rsample.tidymodels.org/reference/vfold_cv.html'>vfold_cv</a></span><span class='o'>(</span><span class='nv'>mtcars</span>, v <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/nrow.html'>nrow</a></span><span class='o'>(</span><span class='nv'>mtcars</span><span class='o'>)</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `vfold_cv()`:</span></span></span>
<span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> Leave-one-out cross-validation is not supported by this function.</span></span>
<span><span class='c'>#&gt; <span style='color: #BB0000;'>✖</span> You set `v` to `nrow(data)`, which would result in a leave-one-out</span></span>
<span><span class='c'>#&gt;   cross-validation.</span></span>
<span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Use `loo_cv()` in this case.</span></span>
<span></span></code></pre>
</div>
<p>This is to make users pause and consider if this a good choice for their dataset. If you require LOO, you can still use <a href="https://rsample.tidymodels.org/reference/loo_cv.html" target="_blank" rel="noopener"><code>loo_cv()</code></a>
.</p>
<p>Error messages in general have been a focus of ours across various tidymodels packages, rsample is no exception. We opened a bunch of issues to tackle all of rsample - and all got closed! Some of these changes are purely internal, upgrading manual formatting to let the cli package do the work. While the error message in most cases doesn&rsquo;t <em>look</em> different, it&rsquo;s a great deal more consistency in formatting.</p>
<p>For some error messages, the additional functionality in cli makes it easy to improve readability. This error message used to be one block of text, now it comes as three bullet points.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rsample.tidymodels.org/reference/permutations.html'>permutations</a></span><span class='o'>(</span><span class='nv'>mtcars</span>, <span class='nf'><a href='https://tidyselect.r-lib.org/reference/everything.html'>everything</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `permutations()`:</span></span></span>
<span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> You have selected all columns to permute.</span></span>
<span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> This effectively reorders the rows in the original data without changing the</span></span>
<span><span class='c'>#&gt;   data structure.</span></span>
<span><span class='c'>#&gt; → Please select fewer columns to permute.</span></span>
<span></span></code></pre>
</div>
<p>Changes like these are super helpful to users and developers alike. A big thank you to all the contributors!</p>
<h2 id="acknowledgements">Acknowledgements
</h2>
<p>Many thanks to all the people who contributed to rsample since the last release!</p>
<p><a href="https://github.com/agmurray" target="_blank" rel="noopener">@agmurray</a>
, <a href="https://github.com/brshallo" target="_blank" rel="noopener">@brshallo</a>
, <a href="https://github.com/ccani007" target="_blank" rel="noopener">@ccani007</a>
, <a href="https://github.com/dicook" target="_blank" rel="noopener">@dicook</a>
, <a href="https://github.com/Dpananos" target="_blank" rel="noopener">@Dpananos</a>
, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>
, <a href="https://github.com/gaborcsardi" target="_blank" rel="noopener">@gaborcsardi</a>
, <a href="https://github.com/gregor-fausto" target="_blank" rel="noopener">@gregor-fausto</a>
, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>
, <a href="https://github.com/JamesHWade" target="_blank" rel="noopener">@JamesHWade</a>
, <a href="https://github.com/jttoivon" target="_blank" rel="noopener">@jttoivon</a>
, <a href="https://github.com/krz" target="_blank" rel="noopener">@krz</a>
, <a href="https://github.com/laurabrianna" target="_blank" rel="noopener">@laurabrianna</a>
, <a href="https://github.com/malcolmbarrett" target="_blank" rel="noopener">@malcolmbarrett</a>
, <a href="https://github.com/MatthieuStigler" target="_blank" rel="noopener">@MatthieuStigler</a>
, <a href="https://github.com/msberends" target="_blank" rel="noopener">@msberends</a>
, <a href="https://github.com/nmercadeb" target="_blank" rel="noopener">@nmercadeb</a>
, <a href="https://github.com/PriKalra" target="_blank" rel="noopener">@PriKalra</a>
, <a href="https://github.com/seb09" target="_blank" rel="noopener">@seb09</a>
, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>
, <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>
, <a href="https://github.com/ZWael" target="_blank" rel="noopener">@ZWael</a>
, and <a href="https://github.com/zz77zz" target="_blank" rel="noopener">@zz77zz</a>
.</p>
]]></description>
      <enclosure url="https://posit-open-source.netlify.app/blog/tidyverse/2025/rsample-1-3-0/thumbnail-wd.jpg" length="509250" type="image/jpeg" />
    </item>
    <item>
      <title>Improved sparsity support in tidymodels</title>
      <link>https://posit-open-source.netlify.app/blog/tidyverse/2025/tidymodels-sparsity/</link>
      <pubDate>Wed, 19 Mar 2025 00:00:00 +0000</pubDate>
      <guid>https://posit-open-source.netlify.app/blog/tidyverse/2025/tidymodels-sparsity/</guid>
      <dc:creator>Emil Hvitfeldt</dc:creator><description><![CDATA[<p>Photo by <a href="https://unsplash.com/@oxygenvisuals?utm_content=creditCopyText&utm_medium=referral&utm_source=unsplash">Oliver Olah</a> on <a href="https://unsplash.com/photos/green-tree-in-the-middle-of-grass-field-KD8nzFznQQ0?utm_content=creditCopyText&utm_medium=referral&utm_source=unsplash">Unsplash</a></p>
<!--
TODO:
* [x] Look over / edit the post's title in the yaml
* [x] Edit (or delete) the description; note this appears in the Twitter card
* [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html))
* [x] Find photo & update yaml metadata
* [x] Create `thumbnail-sq.jpg`; height and width should be equal
* [x] Create `thumbnail-wd.jpg`; width should be >5x height
* [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)
* [x] Add intro sentence, e.g. the standard tagline for the package
* [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html)
-->
<p>We&rsquo;re stoked to announce tidymodels now fully supports sparse data from end to end. We have been working on this for <a href="https://github.com/tidymodels/recipes/pull/515" target="_blank" rel="noopener">over 5 years</a>
. This is an extension of the work we have done <a href="https://www.tidyverse.org/blog/2020/11/tidymodels-sparse-support/" target="_blank" rel="noopener">previously</a>
 with blueprints, which would carry the data sparsely some of the way.</p>
<p>You will need <a href="https://recipes.tidymodels.org/news/index.html#recipes-120" target="_blank" rel="noopener">recipes 1.2.0</a>
, <a href="https://parsnip.tidymodels.org/news/index.html#parsnip-130" target="_blank" rel="noopener">parsnip 1.3.0</a>
, <a href="https://workflows.tidymodels.org/news/index.html#workflows-120" target="_blank" rel="noopener">workflows 1.2.0</a>
 or later for this to work.</p>
<h2 id="what-are-sparse-data">What are sparse data?
</h2>
<p>The term <strong>sparse data</strong> refers to a data set containing many zeroes. Sparse data appears in all kinds of fields and can be produced in a number of preprocessing methods. The reason why we care about sparse data is because of how computers store numbers. A 32-bit integer value takes 4 bytes to store. An array of 32-bit integers takes 40 bytes, and so on. This happens because each value is written down.</p>
<p>A sparse representation instead stores the locations and values of the non-zero entries. Suppose we have the following vector with 20 entries:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="m">1</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="m">3</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="m">7</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="m">0</span><span class="p">)</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p>It could be represented sparsely using the 3 values <code>positions = c(1, 3, 7)</code>, <code>values = c(3, 5, 8)</code>, and <code>length = 20</code>. Now, we have seven values to represent a vector of 20 elements. Since some modeling tasks contain even sparser data, this type of representation starts to show real benefits in terms of execution time and memory consumption.</p>
<p>The tidymodels set of packages has undergone several internal changes to allow it to represent data sparsely internally when it would be beneficial. These changes allow you to fit models that contain sparse data faster and more memory efficiently than before. Moreover, it allows you to fit models previously not possible due to them not fitting in memory.</p>
<h2 id="sparse-matrix-support">Sparse matrix support
</h2>
<p>The first benefit of these changes is that <code>recipe()</code>, <code>prep()</code>, <code>bake()</code>, <code>fit()</code>, and <a href="https://rdrr.io/r/stats/predict.html" target="_blank" rel="noopener"><code>predict()</code></a>
 now accept sparse matrices created using the Matrix package.</p>
<p>The <code>permeability_qsar</code> data set from the modeldata package contains quite a lot of zeroes in the predictors, so we will use it as a demonstration. Starting by coercing it into a sparse matrix.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://tidymodels.tidymodels.org'>tidymodels</a></span><span class='o'>)</span></span>
<span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://Matrix.R-forge.R-project.org'>Matrix</a></span><span class='o'>)</span></span>
<span><span class='nv'>permeability_sparse</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/methods/as.html'>as</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/matrix.html'>as.matrix</a></span><span class='o'>(</span><span class='nv'>permeability_qsar</span><span class='o'>)</span>, <span class='s'>"sparseMatrix"</span><span class='o'>)</span></span></code></pre>
</div>
<p>We can now use this sparse matrix in our code the same way as a dense matrix or data frame:</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>rec_spec</span> <span class='o'>&lt;-</span> <span class='nf'>recipe</span><span class='o'>(</span><span class='nv'>permeability</span> <span class='o'>~</span> <span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>permeability_sparse</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'>step_zv</span><span class='o'>(</span><span class='nf'>all_predictors</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span></span>
<span></span>
<span><span class='nv'>mod_spec</span> <span class='o'>&lt;-</span> <span class='nf'>boost_tree</span><span class='o'>(</span><span class='s'>"regression"</span>, <span class='s'>"xgboost"</span><span class='o'>)</span></span>
<span></span>
<span><span class='nv'>wf_spec</span> <span class='o'>&lt;-</span> <span class='nf'>workflow</span><span class='o'>(</span><span class='nv'>rec_spec</span>, <span class='nv'>mod_spec</span><span class='o'>)</span></span></code></pre>
</div>
<p>Model training has the usual syntax:</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>wf_fit</span> <span class='o'>&lt;-</span> <span class='nf'>fit</span><span class='o'>(</span><span class='nv'>wf_spec</span>, <span class='nv'>permeability_sparse</span><span class='o'>)</span></span></code></pre>
</div>
<p>as does prediction:</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/stats/predict.html'>predict</a></span><span class='o'>(</span><span class='nv'>wf_fit</span>, <span class='nv'>permeability_sparse</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 165 × 1</span></span></span>
<span><span class='c'>#&gt;     .pred</span></span>
<span><span class='c'>#&gt;     <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> 10.5  </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 2</span>  1.50 </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> 13.1  </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 4</span>  1.10 </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 5</span>  1.25 </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 6</span>  0.738</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> 29.3  </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 8</span>  2.44 </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> 36.3  </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>10</span>  4.31 </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 155 more rows</span></span></span>
<span></span></code></pre>
</div>
<p>Note that only some models/engines work well with sparse data. These are all listed here <a href="https://www.tidymodels.org/find/sparse/" target="_blank" rel="noopener">https://www.tidymodels.org/find/sparse/</a>
. If the model doesn&rsquo;t support sparse data, it will be coerced into the default non-sparse representation and used as usual.</p>
<p>With a few exceptions, it should work like any other data set. However, this approach has two main limitations. The first is that we are limited to regression tasks since the outcome has to be numeric to be part of the sparse matrix.</p>
<p>The second limitation is that it only works with non-formula methods for parsnip and workflows. This means that you can use a recipe with <code>add_recipe()</code> or select variables directly with <code>add_variables()</code> when using a workflow. And you need to use <code>fit_xy()</code> instead of <code>fit()</code> when using a parsnip object by itself.</p>
<p>If this is of interest we also have a <a href="https://www.tidymodels.org/" target="_blank" rel="noopener">https://www.tidymodels.org/</a>
 post about <a href="https://www.tidymodels.org/learn/work/sparse-matrix/" target="_blank" rel="noopener">using sparse matrices in tidymodels</a>
.</p>
<h2 id="sparse-data-from-recipes-steps">Sparse data from recipes steps
</h2>
<p>Where this sparsity support really starts to shine is when the recipe we use will generate sparse data. They come in two flavors, sparsity creation steps and sparsity preserving steps. Both listed here: <a href="https://www.tidymodels.org/find/sparse/" target="_blank" rel="noopener">https://www.tidymodels.org/find/sparse/</a>
.</p>
<p>Some steps like <code>step_dummy()</code>, <code>step_indicate_na()</code>, and <a href="https://textrecipes.tidymodels.org/reference/step_tf.html" target="_blank" rel="noopener"><code>textrecipes::step_tf()</code></a>
 will almost always produce a lot of zeroes. We take advantage of that by generating it sparsely when it is beneficial. If these steps end up producing sparse vectors, we want to make sure the sparsity is preserved. A couple of handfuls of steps, such as <code>step_impute_mean()</code> and <code>step_scale(),</code> have been updated to be able to work efficiently with sparse vectors. Both types of steps are detailed in the above-linked list of compatible methods.</p>
<p>What this means in practice is that if you use a model/engine that supports sparse data and have a recipe that produces enough sparse data, then the steps will switch to produce sparse data by using a new sparse data format to store the data (when appropriate) as the recipe is being processed. Then if the model can accept sparse objects, we convert the data from our new sparse format to a standard sparse matrix object. Increasing performance when possible while preserving performance otherwise.</p>
<p>Below is a simple recipe using the <code>ames</code> data set. <code>step_dummy()</code> is applied to all the categorical predictors, leading to a significant amount of zeroes.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>rec_spec</span> <span class='o'>&lt;-</span> <span class='nf'>recipe</span><span class='o'>(</span><span class='nv'>Sale_Price</span> <span class='o'>~</span> <span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>ames</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'>step_zv</span><span class='o'>(</span><span class='nf'>all_predictors</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'>step_normalize</span><span class='o'>(</span><span class='nf'>all_numeric_predictors</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'>step_dummy</span><span class='o'>(</span><span class='nf'>all_nominal_predictors</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span></span>
<span></span>
<span><span class='nv'>mod_spec</span> <span class='o'>&lt;-</span> <span class='nf'>boost_tree</span><span class='o'>(</span><span class='s'>"regression"</span>, <span class='s'>"xgboost"</span><span class='o'>)</span></span>
<span></span>
<span><span class='nv'>wf_spec</span> <span class='o'>&lt;-</span> <span class='nf'>workflow</span><span class='o'>(</span><span class='nv'>rec_spec</span>, <span class='nv'>mod_spec</span><span class='o'>)</span></span></code></pre>
</div>
<p>When we go to fit it now, it takes around 125ms and allocates 37.2MB. Compared to before these changes it would take around 335ms and allocate 67.5MB.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>wf_fit</span> <span class='o'>&lt;-</span> <span class='nf'>fit</span><span class='o'>(</span><span class='nv'>wf_spec</span>, <span class='nv'>ames</span><span class='o'>)</span></span></code></pre>
</div>
<p>We see similar speedups when we predictor with around 20ms and 25.2MB now, compared to around 60ms and 55.6MB before.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/stats/predict.html'>predict</a></span><span class='o'>(</span><span class='nv'>wf_fit</span>, <span class='nv'>ames</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 2,930 × 1</span></span></span>
<span><span class='c'>#&gt;      .pred</span></span>
<span><span class='c'>#&gt;      <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> <span style='text-decoration: underline;'>208</span>649.</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> <span style='text-decoration: underline;'>115</span>339.</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> <span style='text-decoration: underline;'>148</span>634.</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> <span style='text-decoration: underline;'>239</span>770.</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> <span style='text-decoration: underline;'>190</span>082.</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> <span style='text-decoration: underline;'>184</span>604.</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> <span style='text-decoration: underline;'>208</span>572.</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> <span style='text-decoration: underline;'>177</span>403 </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> <span style='text-decoration: underline;'>261</span>000.</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>10</span> <span style='text-decoration: underline;'>198</span>604.</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 2,920 more rows</span></span></span>
<span></span></code></pre>
</div>
<p>These improvements are tightly related to memory allocation, which depends on the sparsity of the data set produced by the recipe. This is why it is hard to say how much benefit you will see. We have seen orders of magnitudes of improvements, both in terms of time and memory allocation. We have also been able to fit models where previously the data was too big to fit in memory.</p>
<p>Please see the post on tidymodels.org, which goes into more detail about when you are likely to benefit from this and how to change your recipes and workflows to take full advantage of this new feature.</p>
<p>There is also a <a href="https://www.tidymodels.org/" target="_blank" rel="noopener">https://www.tidymodels.org/</a>
 post going into a bit more detail about how to <a href="https://www.tidymodels.org/learn/work/sparse-recipe/" target="_blank" rel="noopener">use recipes to produce sparse data</a>
.</p>
]]></description>
      <enclosure url="https://posit-open-source.netlify.app/blog/tidyverse/2025/tidymodels-sparsity/thumbnail-wd.jpg" length="368137" type="image/jpeg" />
    </item>
    <item>
      <title>Q1 2025 tidymodels digest</title>
      <link>https://posit-open-source.netlify.app/blog/tidyverse/2025/tidymodels-2025-q1/</link>
      <pubDate>Thu, 27 Feb 2025 00:00:00 +0000</pubDate>
      <guid>https://posit-open-source.netlify.app/blog/tidyverse/2025/tidymodels-2025-q1/</guid>
      <dc:creator>Max Kuhn</dc:creator><description><![CDATA[<!--
TODO:
* [ ] Look over / edit the post's title in the yaml
* [ ] Edit (or delete) the description; note this appears in the Twitter card
* [ ] Pick category and tags (see existing with `hugodown::tidy_show_meta()`)
* [ ] Find photo & update yaml metadata
* [ ] Create `thumbnail-sq.jpg`; height and width should be equal
* [ ] Create `thumbnail-wd.jpg`; width should be >5x height
* [ ] `hugodown::use_tidy_thumbnails()`
* [ ] Add intro sentence, e.g. the standard tagline for the package
* [ ] `usethis::use_tidy_thanks()`
-->
<p>The tidymodels framework is a collection of R packages for modeling and machine learning using tidyverse principles.</p>
<p>Since the beginning of 2021, we have been publishing quarterly updates here on the tidyverse blog summarizing what’s new in the tidymodels ecosystem. The purpose of these regular posts is to share useful new features and any updates you may have missed. You can check out the tidymodels tag to find all tidymodels blog posts here, including our roundup posts as well as those that are more focused.</p>
<p>We&rsquo;ve sent a steady stream of tidymodels packages to CRAN recently. We usually release them in batches since many of our packages are tightly coupled with one another. Internally, this process is referred to as the &ldquo;cascade&rdquo; of CRAN submissions.</p>
<p>The post will update you on which packages have changed and the major improvements you should know about.</p>
<p>Here&rsquo;s a list of the packages and their News sections:</p>
<ul>
<li><a href="https://baguette.tidymodels.org/news/index.html" target="_blank" rel="noopener">baguette</a>
</li>
<li><a href="https://brulee.tidymodels.org/news/index.html" target="_blank" rel="noopener">brulee</a>
</li>
<li><a href="https://censored.tidymodels.org/news/index.html" target="_blank" rel="noopener">censored</a>
</li>
<li><a href="https://dials.tidymodels.org/news/index.html" target="_blank" rel="noopener">dials</a>
</li>
<li><a href="https://hardhat.tidymodels.org/news/index.html" target="_blank" rel="noopener">hardhat</a>
</li>
<li><a href="https://parsnip.tidymodels.org/news/index.html" target="_blank" rel="noopener">parsnip</a>
</li>
<li><a href="https://recipes.tidymodels.org/news/index.html" target="_blank" rel="noopener">recipes</a>
</li>
<li><a href="https://tidymodels.tidymodels.org/news/index.html" target="_blank" rel="noopener">tidymodels</a>
</li>
<li><a href="https://tune.tidymodels.org/news/index.html" target="_blank" rel="noopener">tune</a>
</li>
<li><a href="https://workflows.tidymodels.org/news/index.html" target="_blank" rel="noopener">workflows</a>
</li>
</ul>
<p>Let&rsquo;s look at a few specific updates.</p>
<h2 id="improvements-in-errors-and-warnings">Improvements in errors and warnings
</h2>
<p>A group effort was made to improve our error and warning messages across many packages. This started with an internal &ldquo;upkeep week&rdquo; (which ended up being 3-4 weeks) and concluded at the <a href="https://www.tidyverse.org/blog/2024/04/tdd-2024/" target="_blank" rel="noopener">Tidy Dev Day in Seattle</a>
 after posit::conf(2024).</p>
<p>The goal was to use new tools in the cli and rlang packages to make messages more informative than they used to be. For example, using:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">tidy</span><span class="p">(</span><span class="n">pca_extract_trained</span><span class="p">,</span> <span class="n">number</span> <span class="o">=</span> <span class="m">3</span><span class="p">,</span> <span class="n">type</span> <span class="o">=</span> <span class="s">&#34;variances&#34;</span><span class="p">)</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p>used to result in the error message:</p>
<pre tabindex="0"><code>Error in `match.arg()`:
! &#39;arg&#39; should be one of &#34;coef&#34;, &#34;variance&#34;
</code></pre><p>The new system references the function that you called and not the underlying base R function that actually errored. It also suggests a solution:</p>
<pre tabindex="0"><code>Error in `tidy()`:
! `type` must be one of &#34;coef&#34; or &#34;variance&#34;, not &#34;variances&#34;.
i Did you mean &#34;variance&#34;?
</code></pre><p>The rlang package created a set of <a href="https://usethis.r-lib.org/reference/use_standalone.html" target="_blank" rel="noopener">standalone files</a>
 that contain high-quality type checkers and related functions. This also improves the information that users get from an error. For example, using an inappropriate formula value in <code>fit(linear_reg(), &quot;boop&quot;, mtcars)</code>, the old message was:</p>
<pre tabindex="0"><code>Error in `fit()`:
! The `formula` argument must be a formula, but it is a &lt;character&gt;.
</code></pre><p>and now you see:</p>
<pre tabindex="0"><code>Error in `fit()`:
! `formula` must be a formula, not the string &#34;boop&#34;.
</code></pre><p>This was <em>a lot</em> of work and we&rsquo;re still aren’t finished. Two events helped us get as far as we did.</p>
<p>First, Simon Couch made the <a href="https://simonpcouch.github.io/chores/" target="_blank" rel="noopener">chores</a>
 package (its previous name was &ldquo;pal&rdquo;), which enabled us to use AI tools to solve small-scope problems, such as converting old rlang error code to use the new <a href="https://rlang.r-lib.org/reference/topic-condition-formatting.html" target="_blank" rel="noopener">cli syntax</a>
. I can’t overstate how much of a speed-up this was for us.</p>
<p>Second, at developer day, many external folks pitched in to make pull requests from a list of issues:</p>
<div class="figure" style="text-align: center">
<img src="https://posit-open-source.netlify.app/blog/tidyverse/2025/tidymodels-2025-q1/IMG_4743.jpeg" alt="Organizing Tidy Dev Day issues."  />
<p class="caption">Organizing Tidy Dev Day issues.</p>
</div>
<p>I love these sessions for many reasons, but mostly because we meet users and contributors to our packages in person and work with them on specific tasks.</p>
<p>There is a lot more to do here; we have a lot of secondary packages that would benefit from these improvements too.</p>
<h2 id="quantile-regression-in-parsnip">Quantile regression in parsnip
</h2>
<p>One big update in parsnip was a new modeling mode of <code>&quot;quantile regression&quot;</code>. Daniel McDonald and Ryan Tibshirani largely provided some inertia for this work based on their <a href="https://delphi.cmu.edu/" target="_blank" rel="noopener">disease modeling framework</a>
.</p>
<p>You can generate quantile predictions by first creating a model specification, which includes the quantiles that you want to predict:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">library</span><span class="p">(</span><span class="n">tidymodels</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nf">tidymodels_prefer</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">ames</span> <span class="o">&lt;-</span> 
</span></span><span class="line"><span class="cl">  <span class="n">modeldata</span><span class="o">::</span><span class="n">ames</span> <span class="o">|&gt;</span> 
</span></span><span class="line"><span class="cl">  <span class="nf">mutate</span><span class="p">(</span><span class="n">Sale_Price</span> <span class="o">=</span> <span class="nf">log10</span><span class="p">(</span><span class="n">Sale_Price</span><span class="p">))</span> <span class="o">|&gt;</span> 
</span></span><span class="line"><span class="cl">  <span class="nf">select</span><span class="p">(</span><span class="n">Sale_Price</span><span class="p">,</span> <span class="n">Latitude</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">quant_spec</span> <span class="o">&lt;-</span> 
</span></span><span class="line"><span class="cl">  <span class="nf">linear_reg</span><span class="p">()</span> <span class="o">|&gt;</span> 
</span></span><span class="line"><span class="cl">  <span class="nf">set_engine</span><span class="p">(</span><span class="s">&#34;quantreg&#34;</span><span class="p">)</span> <span class="o">|&gt;</span> 
</span></span><span class="line"><span class="cl">  <span class="nf">set_mode</span><span class="p">(</span><span class="s">&#34;quantile regression&#34;</span><span class="p">,</span> <span class="n">quantile_levels</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">0.1</span><span class="p">,</span> <span class="m">0.5</span><span class="p">,</span> <span class="m">0.9</span><span class="p">))</span>
</span></span><span class="line"><span class="cl"><span class="n">quant_spec</span>
</span></span></code></pre></td></tr></table>
</div>
</div><pre tabindex="0"><code>## Linear Regression Model Specification (quantile regression)
## 
## Computational engine: quantreg
</code></pre><pre tabindex="0"><code>## Quantile levels: 0.1, 0.5, and 0.9.
</code></pre><p>We&rsquo;ll add some spline terms via a recipe and fit the model:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span><span class="lnt">7
</span><span class="lnt">8
</span><span class="lnt">9
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">spline_rec</span> <span class="o">&lt;-</span> 
</span></span><span class="line"><span class="cl">  <span class="nf">recipe</span><span class="p">(</span><span class="n">Sale_Price</span> <span class="o">~</span> <span class="n">.,</span> <span class="n">data</span> <span class="o">=</span> <span class="n">ames</span><span class="p">)</span> <span class="o">|&gt;</span> 
</span></span><span class="line"><span class="cl">  <span class="nf">step_spline_natural</span><span class="p">(</span><span class="n">Latitude</span><span class="p">,</span> <span class="n">deg_free</span> <span class="o">=</span> <span class="m">10</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">quant_fit</span> <span class="o">&lt;-</span> 
</span></span><span class="line"><span class="cl">  <span class="nf">workflow</span><span class="p">(</span><span class="n">spline_rec</span><span class="p">,</span> <span class="n">quant_spec</span><span class="p">)</span> <span class="o">|&gt;</span> 
</span></span><span class="line"><span class="cl">  <span class="nf">fit</span><span class="p">(</span><span class="n">data</span> <span class="o">=</span> <span class="n">ames</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">quant_fit</span>
</span></span></code></pre></td></tr></table>
</div>
</div><pre tabindex="0"><code>## ══ Workflow [trained] ═════════════════════════════════════════════════
## Preprocessor: Recipe
## Model: linear_reg()
## 
## ── Preprocessor ───────────────────────────────────────────────────────
## 1 Recipe Step
## 
## • step_spline_natural()
## 
## ── Model ──────────────────────────────────────────────────────────────
## Call:
## quantreg::rq(formula = ..y ~ ., tau = quantile_levels, data = data)
## 
## Coefficients:
##               tau= 0.1    tau= 0.5    tau= 0.9
## (Intercept) 4.71981123  5.07728741  5.25221335
## Latitude_01 1.22409173  0.70928577  0.79000849
## Latitude_02 0.19561816  0.04937750  0.02832633
## Latitude_03 0.16616065  0.02045910  0.14730573
## Latitude_04 0.30583648  0.08489487  0.15595080
## Latitude_05 0.21663212  0.02016258 -0.01110625
## Latitude_06 0.33541228  0.12005254  0.03006777
## Latitude_07 0.47732205  0.09146728  0.17394021
## Latitude_08 0.24028784  0.30450058  0.26144584
## Latitude_09 0.05840312 -0.14733781 -0.11911843
## Latitude_10 1.52800673  0.95994216  1.21750501
## 
## Degrees of freedom: 2930 total; 2919 residual
</code></pre><p>For prediction, tidymodels always returns a data frame with as many rows as the input data set (here: <code>ames</code>). The result for quantile predictions is a special vctrs class:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">quant_pred</span> <span class="o">&lt;-</span> <span class="nf">predict</span><span class="p">(</span><span class="n">quant_fit</span><span class="p">,</span> <span class="n">ames</span><span class="p">)</span> 
</span></span><span class="line"><span class="cl"><span class="n">quant_pred</span> <span class="o">|&gt;</span> <span class="nf">slice</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="m">4</span><span class="p">)</span>
</span></span></code></pre></td></tr></table>
</div>
</div><pre tabindex="0"><code>## # A tibble: 4 × 1
##   .pred_quantile
##        &lt;qtls(3)&gt;
## 1         [5.33]
## 2         [5.33]
## 3         [5.33]
## 4         [5.31]
</code></pre><div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">class</span><span class="p">(</span><span class="n">quant_pred</span><span class="o">$</span><span class="n">.pred_quantile</span><span class="p">)</span>
</span></span></code></pre></td></tr></table>
</div>
</div><pre tabindex="0"><code>## [1] &#34;quantile_pred&#34; &#34;vctrs_vctr&#34;    &#34;list&#34;
</code></pre><p>where the output <code>[5.31]</code> shows the middle quantile.</p>
<p>We can expand the set of quantile predictions so that there are three rows for each source row in <code>ames</code>. There’s also an integer column called <code>.row</code> so that we can merge the data with the source data:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">quant_pred</span><span class="o">$</span><span class="n">.pred_quantile[1]</span>
</span></span></code></pre></td></tr></table>
</div>
</div><pre tabindex="0"><code>## &lt;quantiles[1]&gt;
## [1] [5.33]
## # Quantile levels: 0.1 0.5 0.9
</code></pre><div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">as_tibble</span><span class="p">(</span><span class="n">quant_pred</span><span class="o">$</span><span class="n">.pred_quantile[1]</span><span class="p">)</span>
</span></span></code></pre></td></tr></table>
</div>
</div><pre tabindex="0"><code>## # A tibble: 3 × 3
##   .pred_quantile .quantile_levels  .row
##            &lt;dbl&gt;            &lt;dbl&gt; &lt;int&gt;
## 1           5.08              0.1     1
## 2           5.33              0.5     1
## 3           5.52              0.9     1
</code></pre><p>Here are the predicted quantile values:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span><span class="lnt">7
</span><span class="lnt">8
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">quant_pred</span><span class="o">$</span><span class="n">.pred_quantile</span> <span class="o">|&gt;</span> 
</span></span><span class="line"><span class="cl">  <span class="nf">as_tibble</span><span class="p">()</span> <span class="o">|&gt;</span> 
</span></span><span class="line"><span class="cl">  <span class="nf">full_join</span><span class="p">(</span><span class="n">ames</span> <span class="o">|&gt;</span> <span class="nf">add_rowindex</span><span class="p">(),</span> <span class="n">by</span> <span class="o">=</span> <span class="s">&#34;.row&#34;</span><span class="p">)</span> <span class="o">|&gt;</span> 
</span></span><span class="line"><span class="cl">  <span class="nf">arrange</span><span class="p">(</span><span class="n">Latitude</span><span class="p">)</span> <span class="o">|&gt;</span> 
</span></span><span class="line"><span class="cl">  <span class="nf">ggplot</span><span class="p">(</span><span class="nf">aes</span><span class="p">(</span><span class="n">x</span> <span class="o">=</span> <span class="n">Latitude</span><span class="p">))</span> <span class="o">+</span> 
</span></span><span class="line"><span class="cl">  <span class="nf">geom_point</span><span class="p">(</span><span class="n">data</span> <span class="o">=</span> <span class="n">ames</span><span class="p">,</span> <span class="nf">aes</span><span class="p">(</span><span class="n">y</span> <span class="o">=</span> <span class="n">Sale_Price</span><span class="p">),</span> <span class="n">alpha</span> <span class="o">=</span> <span class="m">1</span> <span class="o">/</span> <span class="m">5</span><span class="p">)</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">  <span class="nf">geom_line</span><span class="p">(</span><span class="nf">aes</span><span class="p">(</span><span class="n">y</span> <span class="o">=</span> <span class="n">.pred_quantile</span><span class="p">,</span> <span class="n">col</span> <span class="o">=</span> <span class="nf">format</span><span class="p">(</span><span class="n">.quantile_levels</span><span class="p">)),</span> 
</span></span><span class="line"><span class="cl">            <span class="n">show.legend</span> <span class="o">=</span> <span class="kc">FALSE</span><span class="p">,</span> <span class="n">linewidth</span> <span class="o">=</span> <span class="m">1.5</span><span class="p">)</span> 
</span></span></code></pre></td></tr></table>
</div>
</div><div class="figure" style="text-align: center">
<img src="https://posit-open-source.netlify.app/blog/tidyverse/2025/tidymodels-2025-q1/figure/quant-plot-1.svg" alt="10%, 50%, and 90% quantile predictions." width="80%" />
<p class="caption">10%, 50%, and 90% quantile predictions.</p>
</div>
<p>For now, the new mode does not have many engines. We need to implement some performance statistics in the yardstick package before integrating these models into the whole tidymodels ecosystem.</p>
<p>In other news, we’ve added some additional neural network models based on some improvements in the brulee package. Namely, two-layer networks can be tuned for feed-forward networks on tabular data (using torch).</p>
<p>One other improvement has been simmering for a long time: the ability to exploit sparse data structures better. We’ve improved our <code>fit()</code> interfaces for the few model engines that can use sparsely encoded data. There is much more to come on this in a few months, especially around recipes, so stay tuned.</p>
<p>Finally, we’ve created a set of <a href="https://parsnip.tidymodels.org/articles/checklists.html" target="_blank" rel="noopener">checklists</a>
 that can be used when creating new models or engines. These are very helpful, even for us, since there is a lot of minutiae to remember.</p>
<h2 id="parallelism-in-tune">Parallelism in tune
</h2>
<p>This was a small maintenance release mostly related to parallel processing. Up to now, tune facilitated parallelism using the <a href="https://cran.r-project.org/package=foreach" target="_blank" rel="noopener">foreach</a>
 package. That package is mature but not actively developed, so we have been slowly moving toward using the <a href="https://www.futureverse.org/packages-overview.html" target="_blank" rel="noopener">future</a>
 package(s).</p>
<p>The <a href="https://www.tidyverse.org/blog/2024/04/tune-1-2-0/#modernized-support-for-parallel-processing" target="_blank" rel="noopener">first step in this journey</a>
 was to keep using foreach internally (but lean toward future) but to encourage users to move from directly invoking the foreach package and, instead, load and use the future package.</p>
<p>We’re now moving folks into the second stage. tune will now raise a warning when:</p>
<ul>
<li>A parallel backend has been registered with foreach, and</li>
<li>No <a href="https://future.futureverse.org/reference/plan.html" target="_blank" rel="noopener"><code>plan()</code></a>
 has been specified with future.</li>
</ul>
<p>This will allow users to transition their existing code to only future and allow us to update existing documentation and training materials.</p>
<p>We anticipate that the third stage, <strong>removing foreach entirely</strong>, will occur sometime before posit::conf(2025) in September.</p>
<h2 id="things-to-look-forward-to">Things to look forward to
</h2>
<p>We are working hard on a few major initiatives that we plan on showing off at <a href="https://posit.co/conference/" target="_blank" rel="noopener">posit::conf(2025)</a>
.</p>
<p>First is integrated support for sparse <strong>data</strong>. The emphasis is on &ldquo;data&rdquo; because users can use a data frame of sparse vectors <em>or</em> the usual sparse matrix format. This is a big deal because it does not force you to convert non-numeric data into a numeric matrix format. Again, we’ll discuss this more in the future, but you should be able to use sparse data frames in parsnip, recipes, tune, etc.</p>
<p>The second initiative is the longstanding goal of adding <strong>postprocessing</strong> to tidymodels. Just as you can add a preprocessor to a model workflow, you will be able to add a set of postprocessing adjustments to the predictions your model generates. See our <a href="https://www.tidyverse.org/blog/2024/10/postprocessing-preview/" target="_blank" rel="noopener">previous post</a>
 for a sneak peek.</p>
<p>Finally, this year&rsquo;s <a href="https://www.tidyverse.org/blog/2025/01/tidymodels-2025-internship/" target="_blank" rel="noopener">summer internship</a>
 focuses on supervised feature selection methods. We’ll also have releases (and probably another package) for these tools.</p>
<p>These should come to fruition (and CRAN) before or around August 2025.</p>
<h2 id="acknowledgements">Acknowledgements
</h2>
<p>We want to sincerely thank everyone who contributed to these packages since their previous versions:</p>
<p><a href="https://github.com/AlbertoImg" target="_blank" rel="noopener">@AlbertoImg</a>
, <a href="https://github.com/asb2111" target="_blank" rel="noopener">@asb2111</a>
, <a href="https://github.com/balraadjsings" target="_blank" rel="noopener">@balraadjsings</a>
, <a href="https://github.com/bcjaeger" target="_blank" rel="noopener">@bcjaeger</a>
, <a href="https://github.com/beansrowning" target="_blank" rel="noopener">@beansrowning</a>
, <a href="https://github.com/BrennanAntone" target="_blank" rel="noopener">@BrennanAntone</a>
, <a href="https://github.com/cheryldietrich" target="_blank" rel="noopener">@cheryldietrich</a>
, <a href="https://github.com/chillerb" target="_blank" rel="noopener">@chillerb</a>
, <a href="https://github.com/conarr5" target="_blank" rel="noopener">@conarr5</a>
, <a href="https://github.com/corybrunson" target="_blank" rel="noopener">@corybrunson</a>
, <a href="https://github.com/dajmcdon" target="_blank" rel="noopener">@dajmcdon</a>
, <a href="https://github.com/davidrsch" target="_blank" rel="noopener">@davidrsch</a>
, <a href="https://github.com/Edgar-Zamora" target="_blank" rel="noopener">@Edgar-Zamora</a>
, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>
, <a href="https://github.com/gaborcsardi" target="_blank" rel="noopener">@gaborcsardi</a>
, <a href="https://github.com/gimholte" target="_blank" rel="noopener">@gimholte</a>
, <a href="https://github.com/grantmcdermott" target="_blank" rel="noopener">@grantmcdermott</a>
, <a href="https://github.com/grouptheory" target="_blank" rel="noopener">@grouptheory</a>
, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>
, <a href="https://github.com/ilaria-kode" target="_blank" rel="noopener">@ilaria-kode</a>
, <a href="https://github.com/JamesHWade" target="_blank" rel="noopener">@JamesHWade</a>
, <a href="https://github.com/jesusherranz" target="_blank" rel="noopener">@jesusherranz</a>
, <a href="https://github.com/jkylearmstrong" target="_blank" rel="noopener">@jkylearmstrong</a>
, <a href="https://github.com/joranE" target="_blank" rel="noopener">@joranE</a>
, <a href="https://github.com/joscani" target="_blank" rel="noopener">@joscani</a>
, <a href="https://github.com/Joscelinrocha" target="_blank" rel="noopener">@Joscelinrocha</a>
, <a href="https://github.com/josho88" target="_blank" rel="noopener">@josho88</a>
, <a href="https://github.com/joshuagi" target="_blank" rel="noopener">@joshuagi</a>
, <a href="https://github.com/JosiahParry" target="_blank" rel="noopener">@JosiahParry</a>
, <a href="https://github.com/jrosell" target="_blank" rel="noopener">@jrosell</a>
, <a href="https://github.com/jrwinget" target="_blank" rel="noopener">@jrwinget</a>
, <a href="https://github.com/KarlKoe" target="_blank" rel="noopener">@KarlKoe</a>
, <a href="https://github.com/kscott-1" target="_blank" rel="noopener">@kscott-1</a>
, <a href="https://github.com/lilykoff" target="_blank" rel="noopener">@lilykoff</a>
, <a href="https://github.com/lionel-" target="_blank" rel="noopener">@lionel-</a>
, <a href="https://github.com/LouisMPenrod" target="_blank" rel="noopener">@LouisMPenrod</a>
, <a href="https://github.com/luisDVA" target="_blank" rel="noopener">@luisDVA</a>
, <a href="https://github.com/marcelglueck" target="_blank" rel="noopener">@marcelglueck</a>
, <a href="https://github.com/marcozanotti" target="_blank" rel="noopener">@marcozanotti</a>
, <a href="https://github.com/martaalcalde" target="_blank" rel="noopener">@martaalcalde</a>
, <a href="https://github.com/mattwarkentin" target="_blank" rel="noopener">@mattwarkentin</a>
, <a href="https://github.com/mihem" target="_blank" rel="noopener">@mihem</a>
, <a href="https://github.com/mitchellmanware" target="_blank" rel="noopener">@mitchellmanware</a>
, <a href="https://github.com/naokiohno" target="_blank" rel="noopener">@naokiohno</a>
, <a href="https://github.com/nhward" target="_blank" rel="noopener">@nhward</a>
, <a href="https://github.com/npelikan" target="_blank" rel="noopener">@npelikan</a>
, <a href="https://github.com/obgeneralao" target="_blank" rel="noopener">@obgeneralao</a>
, <a href="https://github.com/owenjonesuob" target="_blank" rel="noopener">@owenjonesuob</a>
, <a href="https://github.com/pbhogale" target="_blank" rel="noopener">@pbhogale</a>
, <a href="https://github.com/Peter4801" target="_blank" rel="noopener">@Peter4801</a>
, <a href="https://github.com/pgg1309" target="_blank" rel="noopener">@pgg1309</a>
, <a href="https://github.com/reisner" target="_blank" rel="noopener">@reisner</a>
, <a href="https://github.com/rfsaldanha" target="_blank" rel="noopener">@rfsaldanha</a>
, <a href="https://github.com/rkb965" target="_blank" rel="noopener">@rkb965</a>
, <a href="https://github.com/RobLBaker" target="_blank" rel="noopener">@RobLBaker</a>
, <a href="https://github.com/RodDalBen" target="_blank" rel="noopener">@RodDalBen</a>
, <a href="https://github.com/SantiagoD999" target="_blank" rel="noopener">@SantiagoD999</a>
, <a href="https://github.com/shum461" target="_blank" rel="noopener">@shum461</a>
, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>
, <a href="https://github.com/szimmer" target="_blank" rel="noopener">@szimmer</a>
, <a href="https://github.com/talegari" target="_blank" rel="noopener">@talegari</a>
, <a href="https://github.com/therealjpetereit" target="_blank" rel="noopener">@therealjpetereit</a>
, <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>
, <a href="https://github.com/walkerjameschris" target="_blank" rel="noopener">@walkerjameschris</a>
, and  <a href="https://github.com/ZWael" target="_blank" rel="noopener">@ZWael</a>
</p>
]]></description>
      <enclosure url="https://posit-open-source.netlify.app/blog/tidyverse/2025/tidymodels-2025-q1/thumbnail-wd.jpg" length="92976" type="image/jpeg" />
    </item>
    <item>
      <title>orbital 0.3.0</title>
      <link>https://posit-open-source.netlify.app/blog/tidyverse/2025/orbital-0-3-0/</link>
      <pubDate>Mon, 13 Jan 2025 00:00:00 +0000</pubDate>
      <guid>https://posit-open-source.netlify.app/blog/tidyverse/2025/orbital-0-3-0/</guid>
      <dc:creator>Emil Hvitfeldt</dc:creator><description><![CDATA[<!--
TODO:
* [x] Look over / edit the post's title in the yaml
* [x] Edit (or delete) the description; note this appears in the Twitter card
* [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html))
* [x] Find photo & update yaml metadata
* [x] Create `thumbnail-sq.jpg`; height and width should be equal
* [x] Create `thumbnail-wd.jpg`; width should be >5x height
* [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)
* [x] Add intro sentence, e.g. the standard tagline for the package
* [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html)
-->
<p>We&rsquo;re thrilled to announce the release of <a href="https://orbital.tidymodels.org/" target="_blank" rel="noopener">orbital</a>
 0.3.0. orbital lets you predict in databases using tidymodels workflows.</p>
<p>You can install it from CRAN with:</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"orbital"</span><span class='o'>)</span></span></code></pre>
</div>
<p>This blog post will cover the highlights, which are classification support and the new augment method.</p>
<p>You can see a full list of changes in the <a href="https://orbital.tidymodels.org/news/index.html#orbital-030" target="_blank" rel="noopener">release notes</a>
.</p>
<h2 id="classification-support">Classification support
</h2>
<p>The biggest improvement in this version is that <a href="https://orbital.tidymodels.org/reference/orbital.html" target="_blank" rel="noopener"><code>orbital()</code></a>
 now works for supported classification models. See <a href="https://orbital.tidymodels.org/articles/supported-models.html#supported-models" target="_blank" rel="noopener">vignette</a>
 for list of all supported models.</p>
<p>Let&rsquo;s start by fitting a classification model on the <code>penguins</code> data set, using {xgboost} as the engine.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>rec_spec</span> <span class='o'>&lt;-</span> <span class='nf'>recipe</span><span class='o'>(</span><span class='nv'>species</span> <span class='o'>~</span> <span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>penguins</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'>step_unknown</span><span class='o'>(</span><span class='nf'>all_nominal_predictors</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'>step_dummy</span><span class='o'>(</span><span class='nf'>all_nominal_predictors</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'>step_impute_mean</span><span class='o'>(</span><span class='nf'>all_numeric_predictors</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'>step_zv</span><span class='o'>(</span><span class='nf'>all_predictors</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span></span>
<span></span>
<span><span class='nv'>lr_spec</span> <span class='o'>&lt;-</span> <span class='nf'>boost_tree</span><span class='o'>(</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'>set_mode</span><span class='o'>(</span><span class='s'>"classification"</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'>set_engine</span><span class='o'>(</span><span class='s'>"xgboost"</span><span class='o'>)</span></span>
<span></span>
<span><span class='nv'>wf_spec</span> <span class='o'>&lt;-</span> <span class='nf'>workflow</span><span class='o'>(</span><span class='nv'>rec_spec</span>, <span class='nv'>lr_spec</span><span class='o'>)</span></span>
<span><span class='nv'>wf_fit</span> <span class='o'>&lt;-</span> <span class='nf'>fit</span><span class='o'>(</span><span class='nv'>wf_spec</span>, data <span class='o'>=</span> <span class='nv'>penguins</span><span class='o'>)</span></span></code></pre>
</div>
<p>With this fitted workflow object, we can call <a href="https://orbital.tidymodels.org/reference/orbital.html" target="_blank" rel="noopener"><code>orbital()</code></a>
 on it to create an orbital object.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>orbital_obj</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://orbital.tidymodels.org/reference/orbital.html'>orbital</a></span><span class='o'>(</span><span class='nv'>wf_fit</span><span class='o'>)</span></span>
<span><span class='nv'>orbital_obj</span></span>
<span><span class='c'>#&gt; </span></span>
<span><span class='c'>#&gt; <span style='color: #00BBBB;'>──</span> <span style='font-weight: bold;'>orbital Object</span> <span style='color: #00BBBB;'>──────────────────────────────────────────────────────────────</span></span></span>
<span><span class='c'>#&gt; • island = dplyr::if_else(is.na(island), "unknown", island)</span></span>
<span><span class='c'>#&gt; • sex = dplyr::if_else(is.na(sex), "unknown", sex)</span></span>
<span><span class='c'>#&gt; • island_Dream = as.numeric(island == "Dream")</span></span>
<span><span class='c'>#&gt; • island_Torgersen = as.numeric(island == "Torgersen")</span></span>
<span><span class='c'>#&gt; • sex_male = as.numeric(sex == "male")</span></span>
<span><span class='c'>#&gt; • sex_unknown = as.numeric(sex == "unknown")</span></span>
<span><span class='c'>#&gt; • bill_length_mm = dplyr::if_else(is.na(bill_length_mm), 43.92193, bill_l ...</span></span>
<span><span class='c'>#&gt; • bill_depth_mm = dplyr::if_else(is.na(bill_depth_mm), 17.15117, bill_dep ...</span></span>
<span><span class='c'>#&gt; • flipper_length_mm = dplyr::if_else(is.na(flipper_length_mm), 201, flipp ...</span></span>
<span><span class='c'>#&gt; • body_mass_g = dplyr::if_else(is.na(body_mass_g), 4202, body_mass_g)</span></span>
<span><span class='c'>#&gt; • island_Dream = dplyr::if_else(is.na(island_Dream), 0.3604651, island_Dr ...</span></span>
<span><span class='c'>#&gt; • island_Torgersen = dplyr::if_else(is.na(island_Torgersen), 0.1511628, i ...</span></span>
<span><span class='c'>#&gt; • sex_male = dplyr::if_else(is.na(sex_male), 0.4883721, sex_male)</span></span>
<span><span class='c'>#&gt; • sex_unknown = dplyr::if_else(is.na(sex_unknown), 0.03197674, sex_unknow ...</span></span>
<span><span class='c'>#&gt; • Adelie = 0 + dplyr::case_when((bill_depth_mm &lt; 15.1 | is.na(bill_depth_ ...</span></span>
<span><span class='c'>#&gt; • Chinstrap = 0 + dplyr::case_when((island_Dream &lt; 0.5 | is.na(island_Dre ...</span></span>
<span><span class='c'>#&gt; • Gentoo = 0 + dplyr::case_when((bill_depth_mm &lt; 15.95 | is.na(bill_depth ...</span></span>
<span><span class='c'>#&gt; • .pred_class = dplyr::case_when(Adelie &gt; Chinstrap &amp; Adelie &gt; Gentoo ~ " ...</span></span>
<span><span class='c'>#&gt; ────────────────────────────────────────────────────────────────────────────────</span></span>
<span><span class='c'>#&gt; 18 equations in total.</span></span>
<span></span></code></pre>
</div>
<p>This object contains all the information that is needed to produce predictions. Which we can produce with <a href="https://rdrr.io/r/stats/predict.html" target="_blank" rel="noopener"><code>predict()</code></a>
.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/stats/predict.html'>predict</a></span><span class='o'>(</span><span class='nv'>orbital_obj</span>, <span class='nv'>penguins</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 344 × 1</span></span></span>
<span><span class='c'>#&gt;    .pred_class</span></span>
<span><span class='c'>#&gt;    <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>      </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> Adelie     </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> Adelie     </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> Adelie     </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> Adelie     </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> Adelie     </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> Adelie     </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> Adelie     </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> Adelie     </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> Adelie     </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>10</span> Adelie     </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 334 more rows</span></span></span>
<span></span></code></pre>
</div>
<p>The main thing to note here is that the orbital package produces character vectors instead of factors. This is done as a unifying approach since many databases don&rsquo;t have factor types.</p>
<p>Speaking of databases, you can <a href="https://rdrr.io/r/stats/predict.html" target="_blank" rel="noopener"><code>predict()</code></a>
 on an orbital object using tables from databases. Below we create an ephemeral in-memory RSQLite database.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://dbi.r-dbi.org'>DBI</a></span><span class='o'>)</span></span>
<span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://rsqlite.r-dbi.org'>RSQLite</a></span><span class='o'>)</span></span>
<span></span>
<span><span class='nv'>con_sqlite</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://dbi.r-dbi.org/reference/dbConnect.html'>dbConnect</a></span><span class='o'>(</span><span class='nf'><a href='https://rsqlite.r-dbi.org/reference/SQLite.html'>SQLite</a></span><span class='o'>(</span><span class='o'>)</span>, path <span class='o'>=</span> <span class='s'>":memory:"</span><span class='o'>)</span></span>
<span><span class='nv'>penguins_sqlite</span> <span class='o'>&lt;-</span> <span class='nf'>copy_to</span><span class='o'>(</span><span class='nv'>con_sqlite</span>, <span class='nv'>penguins</span>, name <span class='o'>=</span> <span class='s'>"penguins_table"</span><span class='o'>)</span></span></code></pre>
</div>
<p>And we can predict with it like normal. All the calculations are sent to the database for execution.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/stats/predict.html'>predict</a></span><span class='o'>(</span><span class='nv'>orbital_obj</span>, <span class='nv'>penguins_sqlite</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># Source:   SQL [?? x 1]</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># Database: sqlite 3.47.1 []</span></span></span>
<span><span class='c'>#&gt;    .pred_class</span></span>
<span><span class='c'>#&gt;    <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>      </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> Adelie     </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> Adelie     </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> Adelie     </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> Adelie     </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> Adelie     </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> Adelie     </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> Adelie     </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> Adelie     </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> Adelie     </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>10</span> Adelie     </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># ℹ more rows</span></span></span>
<span></span></code></pre>
</div>
<p>This works the same with <a href="https://orbital.tidymodels.org/articles/databases.html" target="_blank" rel="noopener">many types of databases</a>
.</p>
<p>Classification is different from regression in part because it comes with multiple prediction types. The above example showed the default which is hard classification. You can set the type of prediction you want with the <code>type</code> argument to <code>orbital</code>. For classification models, possible options are <code>&quot;class&quot;</code> and <code>&quot;prob&quot;</code>.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>orbital_obj_prob</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://orbital.tidymodels.org/reference/orbital.html'>orbital</a></span><span class='o'>(</span><span class='nv'>wf_fit</span>, type <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"class"</span>, <span class='s'>"prob"</span><span class='o'>)</span><span class='o'>)</span></span>
<span><span class='nv'>orbital_obj_prob</span></span>
<span><span class='c'>#&gt; </span></span>
<span><span class='c'>#&gt; <span style='color: #00BBBB;'>──</span> <span style='font-weight: bold;'>orbital Object</span> <span style='color: #00BBBB;'>──────────────────────────────────────────────────────────────</span></span></span>
<span><span class='c'>#&gt; • island = dplyr::if_else(is.na(island), "unknown", island)</span></span>
<span><span class='c'>#&gt; • sex = dplyr::if_else(is.na(sex), "unknown", sex)</span></span>
<span><span class='c'>#&gt; • island_Dream = as.numeric(island == "Dream")</span></span>
<span><span class='c'>#&gt; • island_Torgersen = as.numeric(island == "Torgersen")</span></span>
<span><span class='c'>#&gt; • sex_male = as.numeric(sex == "male")</span></span>
<span><span class='c'>#&gt; • sex_unknown = as.numeric(sex == "unknown")</span></span>
<span><span class='c'>#&gt; • bill_length_mm = dplyr::if_else(is.na(bill_length_mm), 43.92193, bill_l ...</span></span>
<span><span class='c'>#&gt; • bill_depth_mm = dplyr::if_else(is.na(bill_depth_mm), 17.15117, bill_dep ...</span></span>
<span><span class='c'>#&gt; • flipper_length_mm = dplyr::if_else(is.na(flipper_length_mm), 201, flipp ...</span></span>
<span><span class='c'>#&gt; • body_mass_g = dplyr::if_else(is.na(body_mass_g), 4202, body_mass_g)</span></span>
<span><span class='c'>#&gt; • island_Dream = dplyr::if_else(is.na(island_Dream), 0.3604651, island_Dr ...</span></span>
<span><span class='c'>#&gt; • island_Torgersen = dplyr::if_else(is.na(island_Torgersen), 0.1511628, i ...</span></span>
<span><span class='c'>#&gt; • sex_male = dplyr::if_else(is.na(sex_male), 0.4883721, sex_male)</span></span>
<span><span class='c'>#&gt; • sex_unknown = dplyr::if_else(is.na(sex_unknown), 0.03197674, sex_unknow ...</span></span>
<span><span class='c'>#&gt; • Adelie = 0 + dplyr::case_when((bill_depth_mm &lt; 15.1 | is.na(bill_depth_ ...</span></span>
<span><span class='c'>#&gt; • Chinstrap = 0 + dplyr::case_when((island_Dream &lt; 0.5 | is.na(island_Dre ...</span></span>
<span><span class='c'>#&gt; • Gentoo = 0 + dplyr::case_when((bill_depth_mm &lt; 15.95 | is.na(bill_depth ...</span></span>
<span><span class='c'>#&gt; • .pred_class = dplyr::case_when(Adelie &gt; Chinstrap &amp; Adelie &gt; Gentoo ~ " ...</span></span>
<span><span class='c'>#&gt; • norm = exp(Adelie) + exp(Chinstrap) + exp(Gentoo)</span></span>
<span><span class='c'>#&gt; • .pred_Adelie = exp(Adelie) / norm</span></span>
<span><span class='c'>#&gt; • .pred_Chinstrap = exp(Chinstrap) / norm</span></span>
<span><span class='c'>#&gt; • .pred_Gentoo = exp(Gentoo) / norm</span></span>
<span><span class='c'>#&gt; ────────────────────────────────────────────────────────────────────────────────</span></span>
<span><span class='c'>#&gt; 22 equations in total.</span></span>
<span></span></code></pre>
</div>
<p>Notice how we can select both <code>&quot;class&quot;</code> and <code>&quot;prob&quot;</code>. The predictions now include both hard and soft class predictions.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/stats/predict.html'>predict</a></span><span class='o'>(</span><span class='nv'>orbital_obj_prob</span>, <span class='nv'>penguins</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 344 × 4</span></span></span>
<span><span class='c'>#&gt;    .pred_class .pred_Adelie .pred_Chinstrap .pred_Gentoo</span></span>
<span><span class='c'>#&gt;    <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>              <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span>           <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span>        <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> Adelie             0.989         0.005<span style='text-decoration: underline;'>54</span>      0.005<span style='text-decoration: underline;'>60</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> Adelie             0.989         0.005<span style='text-decoration: underline;'>54</span>      0.005<span style='text-decoration: underline;'>60</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> Adelie             0.989         0.005<span style='text-decoration: underline;'>54</span>      0.005<span style='text-decoration: underline;'>60</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> Adelie             0.709         0.024<span style='text-decoration: underline;'>5</span>       0.267  </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> Adelie             0.989         0.005<span style='text-decoration: underline;'>54</span>      0.005<span style='text-decoration: underline;'>60</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> Adelie             0.989         0.005<span style='text-decoration: underline;'>54</span>      0.005<span style='text-decoration: underline;'>60</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> Adelie             0.989         0.005<span style='text-decoration: underline;'>54</span>      0.005<span style='text-decoration: underline;'>60</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> Adelie             0.989         0.005<span style='text-decoration: underline;'>54</span>      0.005<span style='text-decoration: underline;'>60</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> Adelie             0.979         0.005<span style='text-decoration: underline;'>49</span>      0.015<span style='text-decoration: underline;'>8</span> </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>10</span> Adelie             0.980         0.005<span style='text-decoration: underline;'>59</span>      0.014<span style='text-decoration: underline;'>8</span> </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 334 more rows</span></span></span>
<span></span></code></pre>
</div>
<p>That works equally well in databases.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/stats/predict.html'>predict</a></span><span class='o'>(</span><span class='nv'>orbital_obj_prob</span>, <span class='nv'>penguins_sqlite</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># Source:   SQL [?? x 4]</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># Database: sqlite 3.47.1 []</span></span></span>
<span><span class='c'>#&gt;    .pred_class .pred_Adelie .pred_Chinstrap .pred_Gentoo</span></span>
<span><span class='c'>#&gt;    <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>              <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span>           <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span>        <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> Adelie             0.989         0.005<span style='text-decoration: underline;'>54</span>      0.005<span style='text-decoration: underline;'>60</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> Adelie             0.989         0.005<span style='text-decoration: underline;'>54</span>      0.005<span style='text-decoration: underline;'>60</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> Adelie             0.989         0.005<span style='text-decoration: underline;'>54</span>      0.005<span style='text-decoration: underline;'>60</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> Adelie             0.709         0.024<span style='text-decoration: underline;'>5</span>       0.267  </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> Adelie             0.989         0.005<span style='text-decoration: underline;'>54</span>      0.005<span style='text-decoration: underline;'>60</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> Adelie             0.989         0.005<span style='text-decoration: underline;'>54</span>      0.005<span style='text-decoration: underline;'>60</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> Adelie             0.989         0.005<span style='text-decoration: underline;'>54</span>      0.005<span style='text-decoration: underline;'>60</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> Adelie             0.989         0.005<span style='text-decoration: underline;'>54</span>      0.005<span style='text-decoration: underline;'>60</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> Adelie             0.979         0.005<span style='text-decoration: underline;'>49</span>      0.015<span style='text-decoration: underline;'>8</span> </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>10</span> Adelie             0.980         0.005<span style='text-decoration: underline;'>59</span>      0.014<span style='text-decoration: underline;'>8</span> </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># ℹ more rows</span></span></span>
<span></span></code></pre>
</div>
<h2 id="new-augment-method">New augment method
</h2>
<p>The users of tidymodels have found the <a href="https://generics.r-lib.org/reference/augment.html" target="_blank" rel="noopener"><code>augment()</code></a>
 function to be a handy tool. This function performs predictions and returns them alongside the original data set.</p>
<p>This release adds <a href="https://generics.r-lib.org/reference/augment.html" target="_blank" rel="noopener"><code>augment()</code></a>
 support for orbital objects.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://generics.r-lib.org/reference/augment.html'>augment</a></span><span class='o'>(</span><span class='nv'>orbital_obj</span>, <span class='nv'>penguins</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 344 × 8</span></span></span>
<span><span class='c'>#&gt;    .pred_class species island    bill_length_mm bill_depth_mm flipper_length_mm</span></span>
<span><span class='c'>#&gt;    <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>       <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span>   <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span>              <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span>         <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span>             <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> Adelie      Adelie  Torgersen           39.1          18.7               181</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> Adelie      Adelie  Torgersen           39.5          17.4               186</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> Adelie      Adelie  Torgersen           40.3          18                 195</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> Adelie      Adelie  Torgersen           <span style='color: #BB0000;'>NA</span>            <span style='color: #BB0000;'>NA</span>                  <span style='color: #BB0000;'>NA</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> Adelie      Adelie  Torgersen           36.7          19.3               193</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> Adelie      Adelie  Torgersen           39.3          20.6               190</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> Adelie      Adelie  Torgersen           38.9          17.8               181</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> Adelie      Adelie  Torgersen           39.2          19.6               195</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> Adelie      Adelie  Torgersen           34.1          18.1               193</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>10</span> Adelie      Adelie  Torgersen           42            20.2               190</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 334 more rows</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 2 more variables: body_mass_g &lt;int&gt;, sex &lt;fct&gt;</span></span></span>
<span></span></code></pre>
</div>
<p>The function works for most databases, but for technical reasons doesn&rsquo;t work with all. It has been confirmed to not work work in spark databases or arrow tables.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://generics.r-lib.org/reference/augment.html'>augment</a></span><span class='o'>(</span><span class='nv'>orbital_obj</span>, <span class='nv'>penguins_sqlite</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># Source:   SQL [?? x 8]</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># Database: sqlite 3.47.1 []</span></span></span>
<span><span class='c'>#&gt;    .pred_class species island    bill_length_mm bill_depth_mm flipper_length_mm</span></span>
<span><span class='c'>#&gt;    <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>       <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>   <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>              <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span>         <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span>             <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> Adelie      Adelie  Torgersen           39.1          18.7               181</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> Adelie      Adelie  Torgersen           39.5          17.4               186</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> Adelie      Adelie  Torgersen           40.3          18                 195</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> Adelie      Adelie  Torgersen           <span style='color: #BB0000;'>NA</span>            <span style='color: #BB0000;'>NA</span>                  <span style='color: #BB0000;'>NA</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> Adelie      Adelie  Torgersen           36.7          19.3               193</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> Adelie      Adelie  Torgersen           39.3          20.6               190</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> Adelie      Adelie  Torgersen           38.9          17.8               181</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> Adelie      Adelie  Torgersen           39.2          19.6               195</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> Adelie      Adelie  Torgersen           34.1          18.1               193</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>10</span> Adelie      Adelie  Torgersen           42            20.2               190</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># ℹ more rows</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 2 more variables: body_mass_g &lt;int&gt;, sex &lt;chr&gt;</span></span></span>
<span></span></code></pre>
</div>
<h2 id="acknowledgements">Acknowledgements
</h2>
<p>A big thank you to all the people who have contributed to orbital since the release of v0.3.0:</p>
<p><a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>
, <a href="https://github.com/joscani" target="_blank" rel="noopener">@joscani</a>
, <a href="https://github.com/jrosell" target="_blank" rel="noopener">@jrosell</a>
, <a href="https://github.com/npelikan" target="_blank" rel="noopener">@npelikan</a>
, and <a href="https://github.com/szimmer" target="_blank" rel="noopener">@szimmer</a>
.</p>
]]></description>
      <enclosure url="https://posit-open-source.netlify.app/blog/tidyverse/2025/orbital-0-3-0/thumbnail-wd.jpg" length="60423" type="image/jpeg" />
    </item>
    <item>
      <title>Introducing mall for R...and Python</title>
      <link>https://posit-open-source.netlify.app/blog/ai/edgarmallintro/</link>
      <pubDate>Wed, 30 Oct 2024 00:00:00 +0000</pubDate>
      <guid>https://posit-open-source.netlify.app/blog/ai/edgarmallintro/</guid>
      <dc:creator>Edgar Ruiz</dc:creator><description><![CDATA[<h2 id="the-beginning">The beginning
</h2>
<p>A few months ago, while working on the Databricks with R workshop, I came
across some of their custom SQL functions. These particular functions are
prefixed with &ldquo;ai_&rdquo;, and they run NLP with a simple SQL call:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="cl"><span class="o">&gt;</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">ai_analyze_sentiment</span><span class="p">(</span><span class="s1">&#39;I am happy&#39;</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="n">positive</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="o">&gt;</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">ai_analyze_sentiment</span><span class="p">(</span><span class="s1">&#39;I am sad&#39;</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="n">negative</span><span class="w">
</span></span></span></code></pre></td></tr></table>
</div>
</div><p>This was a revelation to me. It showcased a new way to use
LLMs in our daily work as analysts. To-date, I had primarily employed LLMs
for code completion and development tasks. However, this new approach
focuses on using LLMs directly against our data instead.</p>
<p>My first reaction was to try and access the custom functions via R. With
<a href="https://github.com/tidyverse/dbplyr" target="_blank" rel="noopener"><code>dbplyr</code></a>
 we can access SQL functions
in R, and it was great to see them work:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">orders</span> <span class="o">|&gt;</span>
</span></span><span class="line"><span class="cl">  <span class="nf">mutate</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">    <span class="n">sentiment</span> <span class="o">=</span> <span class="nf">ai_analyze_sentiment</span><span class="p">(</span><span class="n">o_comment</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">  <span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; # Source:   SQL [6 x 2]</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt;   o_comment                   sentiment</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt;   &lt;chr&gt;                        &lt;chr&gt;    </span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; 1 &#34;, pending theodolites …    neutral  </span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; 2 &#34;uriously special foxes …   neutral  </span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; 3 &#34;sleep. courts after the …  neutral  </span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; 4 &#34;ess foxes may sleep …      neutral  </span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; 5 &#34;ts wake blithely unusual … mixed    </span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; 6 &#34;hins sleep. fluffily …     neutral</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p>One downside of this integration is that even though accessible through R, we
require a live connection to Databricks in order to utilize an LLM in this
manner, thereby limiting the number of people who can benefit from it.</p>
<p>According to their documentation, Databricks is leveraging the Llama 3.1 70B
model. While this is a highly effective Large Language Model, its enormous size
poses a significant challenge for most users&rsquo; machines, making it impractical
to run on standard hardware.</p>
<h2 id="reaching-viability">Reaching viability
</h2>
<p>LLM development has been accelerating at a rapid pace. Initially, only online
Large Language Models (LLMs) were viable for daily use. This sparked concerns among
companies hesitant to share their data externally. Moreover, the cost of using
LLMs online can be substantial, per-token charges can add up quickly.</p>
<p>The ideal solution would be to integrate an LLM into our own systems, requiring
three essential components:</p>
<ol>
<li>A model that can fit comfortably in memory</li>
<li>A model that achieves sufficient accuracy for NLP tasks</li>
<li>An intuitive interface between the model and the user&rsquo;s laptop</li>
</ol>
<p>In the past year, having all three of these elements was nearly impossible.
Models capable of fitting in-memory were either inaccurate or excessively slow.
However, recent advancements, such as <a href="https://www.llama.com/" target="_blank" rel="noopener">Llama from Meta</a>

and cross-platform interaction engines like <a href="https://ollama.com/" target="_blank" rel="noopener">Ollama</a>
, have
made it feasible to deploy these models, offering a promising solution for
companies looking to integrate LLMs into their workflows.</p>
<h2 id="the-project">The project
</h2>
<p>This project started as an exploration, driven by my interest in leveraging a
&ldquo;general-purpose&rdquo; LLM to produce results comparable to those from Databricks AI
functions. The primary challenge was determining how much setup and preparation
would be required for such a model to deliver reliable and consistent results.</p>
<p>Without access to a design document or open-source code, I relied solely on the
LLM&rsquo;s output as a testing ground. This presented several obstacles, including
the numerous options available for fine-tuning the model. Even within prompt
engineering, the possibilities are vast. To ensure the model was not too
specialized or focused on a specific subject or outcome, I needed to strike a
delicate balance between accuracy and generality.</p>
<p>Fortunately, after conducting extensive testing, I discovered that a simple
&ldquo;one-shot&rdquo; prompt yielded the best results. By &ldquo;best,&rdquo; I mean that the answers
were both accurate for a given row and consistent across multiple rows.
Consistency was crucial, as it meant providing answers that were one of the
specified options (positive, negative, or neutral), without any additional
explanations.</p>
<p>The following is an example of a prompt that worked reliably against
Llama 3.2:</p>
<pre><code>&gt;&gt;&gt; You are a helpful sentiment engine. Return only one of the 
... following answers: positive, negative, neutral. No capitalization. 
... No explanations. The answer is based on the following text: 
... I am happy
positive
</code></pre>
<p>As a side note, my attempts to submit multiple rows at once proved unsuccessful.
In fact, I spent a significant amount of time exploring different approaches,
such as submitting 10 or 2 rows simultaneously, formatting them in JSON or
CSV formats. The results were often inconsistent, and it didn&rsquo;t seem to accelerate
the process enough to be worth the effort.</p>
<p>Once I became comfortable with the approach, the next step was wrapping the
functionality within an R package.</p>
<h2 id="the-approach">The approach
</h2>
<p>One of my goals was to make the mall package as &ldquo;ergonomic&rdquo; as possible. In
other words, I wanted to ensure that using the package in R and Python
integrates seamlessly with how data analysts use their preferred language on a
daily basis.</p>
<p>For R, this was relatively straightforward. I simply needed to verify that the
functions worked well with pipes (<code>%&gt;%</code> and <code>|&gt;</code>) and could be easily
incorporated into packages like those in the <code>tidyverse</code>:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">reviews</span> <span class="o">|&gt;</span> 
</span></span><span class="line"><span class="cl">  <span class="nf">llm_sentiment</span><span class="p">(</span><span class="n">review</span><span class="p">)</span> <span class="o">|&gt;</span> 
</span></span><span class="line"><span class="cl">  <span class="nf">filter</span><span class="p">(</span><span class="n">.sentiment</span> <span class="o">==</span> <span class="s">&#34;positive&#34;</span><span class="p">)</span> <span class="o">|&gt;</span> 
</span></span><span class="line"><span class="cl">  <span class="nf">select</span><span class="p">(</span><span class="n">review</span><span class="p">)</span> 
</span></span><span class="line"><span class="cl"><span class="c1">#&gt;                                                               review</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; 1 This has been the best TV I&#39;ve ever used. Great screen, and sound.</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p>However, for Python, being a non-native language for me, meant that I had to adapt my
thinking about data manipulation. Specifically, I learned that in Python,
objects (like pandas DataFrames) &ldquo;contain&rdquo; transformation functions by design.</p>
<p>This insight led me to investigate if the Pandas API allows for extensions,
and fortunately, it did! After exploring the possibilities, I decided to start
with Polar, which allowed me to extend its API by creating a new namespace.
This simple addition enabled users to easily access the necessary functions:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="o">&gt;&gt;&gt;</span> <span class="kn">import</span> <span class="nn">polars</span> <span class="k">as</span> <span class="nn">pl</span>
</span></span><span class="line"><span class="cl"><span class="o">&gt;&gt;&gt;</span> <span class="kn">import</span> <span class="nn">mall</span>
</span></span><span class="line"><span class="cl"><span class="o">&gt;&gt;&gt;</span> <span class="n">df</span> <span class="o">=</span> <span class="n">pl</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="nb">dict</span><span class="p">(</span><span class="n">x</span> <span class="o">=</span> <span class="p">[</span><span class="s2">&#34;I am happy&#34;</span><span class="p">,</span> <span class="s2">&#34;I am sad&#34;</span><span class="p">]))</span>
</span></span><span class="line"><span class="cl"><span class="o">&gt;&gt;&gt;</span> <span class="n">df</span><span class="o">.</span><span class="n">llm</span><span class="o">.</span><span class="n">sentiment</span><span class="p">(</span><span class="s2">&#34;x&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">shape</span><span class="p">:</span> <span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="err">┌────────────┬───────────┐</span>
</span></span><span class="line"><span class="cl"><span class="err">│</span> <span class="n">x</span>          <span class="err">┆</span> <span class="n">sentiment</span> <span class="err">│</span>
</span></span><span class="line"><span class="cl"><span class="err">│</span> <span class="o">---</span>        <span class="err">┆</span> <span class="o">---</span>       <span class="err">│</span>
</span></span><span class="line"><span class="cl"><span class="err">│</span> <span class="nb">str</span>        <span class="err">┆</span> <span class="nb">str</span>       <span class="err">│</span>
</span></span><span class="line"><span class="cl"><span class="err">╞════════════╪═══════════╡</span>
</span></span><span class="line"><span class="cl"><span class="err">│</span> <span class="n">I</span> <span class="n">am</span> <span class="n">happy</span> <span class="err">┆</span> <span class="n">positive</span>  <span class="err">│</span>
</span></span><span class="line"><span class="cl"><span class="err">│</span> <span class="n">I</span> <span class="n">am</span> <span class="n">sad</span>   <span class="err">┆</span> <span class="n">negative</span>  <span class="err">│</span>
</span></span><span class="line"><span class="cl"><span class="err">└────────────┴───────────┘</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p>By keeping all the new functions within the llm namespace, it becomes very easy
for users to find and utilize the ones they need:</p>
<p><div class="not-prose"><figure>
    <img class="h-auto max-w-full rounded-lg"
      src="https://posit-open-source.netlify.app/blog/ai/edgarmallintro/images/llm-namespace.png"
      alt="" 
      loading="lazy"
    >
  </figure></div>
</p>
<h2 id="whats-next">What&rsquo;s next
</h2>
<p>I think it will be easier to know what is to come for <code>mall</code> once the community
uses it and provides feedback. I anticipate that adding more LLM back ends will
be the main request. The other possible enhancement will be when new updated
models are available, then the prompts may need to be updated for that given
model. I experienced this going from LLama 3.1 to Llama 3.2. There was a need
to tweak one of the prompts. The package is structured in a way the future
tweaks like that will be additions to the package, and not replacements to the
prompts, so as to retains backwards compatibility.</p>
<p>This is the first time I write an article about the history and structure of a
project. This particular effort was so unique because of the R + Python, and the
LLM aspects of it, that I figured it is worth sharing.</p>
<p>If you wish to learn more about <code>mall</code>, feel free to visit its official site:
<a href="https://mlverse.github.io/mall/" target="_blank" rel="noopener">https://mlverse.github.io/mall/</a>
</p>
]]></description>
      <enclosure url="https://posit-open-source.netlify.app/blog/ai/edgarmallintro/thumbnail.png" length="225127" type="image/png" />
    </item>
    <item>
      <title>Postprocessing is coming to tidymodels</title>
      <link>https://posit-open-source.netlify.app/blog/tidyverse/2024/postprocessing-preview/</link>
      <pubDate>Tue, 08 Oct 2024 00:00:00 +0000</pubDate>
      <guid>https://posit-open-source.netlify.app/blog/tidyverse/2024/postprocessing-preview/</guid>
      <dc:creator>Simon Couch</dc:creator>
      <dc:creator>Hannah Frick</dc:creator>
      <dc:creator>Max Kuhn</dc:creator><description><![CDATA[<p>We&rsquo;re bristling with elation to share about a set of upcoming features for postprocessing with tidymodels. Postprocessors refine predictions outputted from machine learning models to improve predictive performance or better satisfy distributional limitations. The developmental versions of many tidymodels core packages include changes to support postprocessors, and we&rsquo;re ready to share about our work and hear the community&rsquo;s thoughts on our progress so far.</p>
<p>Postprocessing support with tidymodels hasn&rsquo;t yet made it to CRAN, but you can install the needed versions of tidymodels packages with the following code.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>pak</span><span class='nf'>::</span><span class='nf'><a href='https://pak.r-lib.org/reference/pak.html'>pak</a></span><span class='o'>(</span></span>
<span>  <span class='nf'><a href='https://rdrr.io/r/base/paste.html'>paste0</a></span><span class='o'>(</span></span>
<span>    <span class='s'>"tidymodels/"</span>,</span>
<span>    <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"tune"</span>, <span class='s'>"workflows"</span>, <span class='s'>"rsample"</span>, <span class='s'>"tailor"</span><span class='o'>)</span></span>
<span>  <span class='o'>)</span></span>
<span><span class='o'>)</span></span></code></pre>
</div>
<p>Now, we load packages with those developmental versions installed.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://tidymodels.tidymodels.org'>tidymodels</a></span><span class='o'>)</span></span>
<span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://github.com/tidymodels/probably'>probably</a></span><span class='o'>)</span></span>
<span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://github.com/tidymodels/tailor'>tailor</a></span><span class='o'>)</span></span></code></pre>
</div>
<p>Existing tidymodels users might have spotted something funky already; who is this tailor character?</p>
<h2 id="meet-tailor">Meet tailor👋
</h2>
<p>The tailor package introduces tailor objects, which compose iterative adjustments to model predictions. tailor is to postprocessing as recipes is to preprocessing; applying your mental model of recipes to tailor should get you a good bit of the way there.</p>
<div style="width: 140%; max-width: 140%; overflow-x: auto;">
<table>
  <thead>
      <tr>
          <th>Tool</th>
          <th>Applied to...</th>
          <th>Initialize with...</th>
          <th>Composes...</th>
          <th>Train with...</th>
          <th>Predict with...</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>recipes</td>
          <td>Training data</td>
          <td><code>recipe()</code></td>
          <td><code>step_*()</code>s</td>
          <td><code>prep()</code></td>
          <td><code>bake()</code></td>
      </tr>
      <tr>
          <td>tailor</td>
          <td>Model predictions</td>
          <td><a href="https://tailor.tidymodels.org/reference/tailor.html" target="_blank" rel="noopener"><code>tailor()</code></a>
</td>
          <td><code>adjust_*()</code>ments</td>
          <td><a href="https://generics.r-lib.org/reference/fit.html" target="_blank" rel="noopener"><code>fit()</code></a>
</td>
          <td><a href="https://rdrr.io/r/stats/predict.html" target="_blank" rel="noopener"><code>predict()</code></a>
</td>
      </tr>
  </tbody>
</table>
</div>
<p>First, users can initialize a tailor object with <a href="https://tailor.tidymodels.org/reference/tailor.html" target="_blank" rel="noopener"><code>tailor()</code></a>
.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://tailor.tidymodels.org/reference/tailor.html'>tailor</a></span><span class='o'>(</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; </span></span>
<span></span><span><span class='c'>#&gt; <span style='color: #00BBBB;'>──</span> <span style='font-weight: bold;'>tailor</span> <span style='color: #00BBBB;'>──────────────────────────────────────────────────────────────────────</span></span></span>
<span></span><span><span class='c'>#&gt; A postprocessor with 0 adjustments.</span></span>
<span></span></code></pre>
</div>
<p>Tailors compose &ldquo;adjustments,&rdquo; analogous to steps from the recipes package.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://tailor.tidymodels.org/reference/tailor.html'>tailor</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span>
<span>  <span class='nf'><a href='https://tailor.tidymodels.org/reference/adjust_probability_threshold.html'>adjust_probability_threshold</a></span><span class='o'>(</span>threshold <span class='o'>=</span> <span class='m'>.7</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; </span></span>
<span></span><span><span class='c'>#&gt; <span style='color: #00BBBB;'>──</span> <span style='font-weight: bold;'>tailor</span> <span style='color: #00BBBB;'>──────────────────────────────────────────────────────────────────────</span></span></span>
<span></span><span><span class='c'>#&gt; A binary postprocessor with 1 adjustment:</span></span>
<span></span><span><span class='c'>#&gt; </span></span>
<span></span><span><span class='c'>#&gt; <span style='color: #00BBBB;'>•</span> Adjust probability threshold to 0.7.</span></span>
<span></span></code></pre>
</div>
<p>As an example, we&rsquo;ll apply this tailor to the <code>two_class_example</code> data made available after loading tidymodels.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/head.html'>head</a></span><span class='o'>(</span><span class='nv'>two_class_example</span><span class='o'>)</span></span>
<span><span class='c'>#&gt;    truth      Class1       Class2 predicted</span></span>
<span><span class='c'>#&gt; 1 Class2 0.003589243 0.9964107574    Class2</span></span>
<span><span class='c'>#&gt; 2 Class1 0.678621054 0.3213789460    Class1</span></span>
<span><span class='c'>#&gt; 3 Class2 0.110893522 0.8891064779    Class2</span></span>
<span><span class='c'>#&gt; 4 Class1 0.735161703 0.2648382969    Class1</span></span>
<span><span class='c'>#&gt; 5 Class2 0.016239960 0.9837600397    Class2</span></span>
<span><span class='c'>#&gt; 6 Class1 0.999275071 0.0007249286    Class1</span></span>
<span></span></code></pre>
</div>
<p>This data gives the true value of an outcome variable <code>truth</code> as well as predicted probabilities (<code>Class1</code> and <code>Class2</code>). The hard class predictions, in <code>predicted</code>, are <code>&quot;Class1&quot;</code> if the probability assigned to <code>&quot;Class1&quot;</code> is above .5, and <code>&quot;Class2&quot;</code> otherwise.</p>
<p>The model predicts <code>&quot;Class1&quot;</code> more often than it does <code>&quot;Class2&quot;</code>.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>two_class_example</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'>count</span><span class='o'>(</span><span class='nv'>predicted</span><span class='o'>)</span></span>
<span><span class='c'>#&gt;   predicted   n</span></span>
<span><span class='c'>#&gt; 1    Class1 277</span></span>
<span><span class='c'>#&gt; 2    Class2 223</span></span>
<span></span></code></pre>
</div>
<p>If we wanted the model to predict <code>&quot;Class2&quot;</code> more often, we could increase the probability threshold assigned to <code>&quot;Class1&quot;</code> above which the hard class prediction will be <code>&quot;Class1&quot;</code>. In the tailor package, this adjustment is implemented in <a href="https://tailor.tidymodels.org/reference/adjust_probability_threshold.html" target="_blank" rel="noopener"><code>adjust_probability_threshold()</code></a>
, which can be situated in a tailor object.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>tlr</span> <span class='o'>&lt;-</span></span>
<span>  <span class='nf'><a href='https://tailor.tidymodels.org/reference/tailor.html'>tailor</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span>
<span>  <span class='nf'><a href='https://tailor.tidymodels.org/reference/adjust_probability_threshold.html'>adjust_probability_threshold</a></span><span class='o'>(</span>threshold <span class='o'>=</span> <span class='m'>.7</span><span class='o'>)</span></span>
<span></span>
<span><span class='nv'>tlr</span></span>
<span><span class='c'>#&gt; </span></span>
<span></span><span><span class='c'>#&gt; <span style='color: #00BBBB;'>──</span> <span style='font-weight: bold;'>tailor</span> <span style='color: #00BBBB;'>──────────────────────────────────────────────────────────────────────</span></span></span>
<span></span><span><span class='c'>#&gt; A binary postprocessor with 1 adjustment:</span></span>
<span></span><span><span class='c'>#&gt; </span></span>
<span></span><span><span class='c'>#&gt; <span style='color: #00BBBB;'>•</span> Adjust probability threshold to 0.7.</span></span>
<span></span></code></pre>
</div>
<p>tailors must be fitted before they can predict on new data. For adjustments like <a href="https://tailor.tidymodels.org/reference/adjust_probability_threshold.html" target="_blank" rel="noopener"><code>adjust_probability_threshold()</code></a>
, there&rsquo;s no training that actually happens at the <a href="https://generics.r-lib.org/reference/fit.html" target="_blank" rel="noopener"><code>fit()</code></a>
 step besides recording the name and type of relevant variables. For other adjustments, like numeric calibration with <a href="https://tailor.tidymodels.org/reference/adjust_numeric_calibration.html" target="_blank" rel="noopener"><code>adjust_numeric_calibration()</code></a>
, parameters are actually estimated at the <a href="https://generics.r-lib.org/reference/fit.html" target="_blank" rel="noopener"><code>fit()</code></a>
 stage and separate data should be used to train the postprocessor and evaluate its performance. More on this in <a href="#tailors-in-context">Tailors in context</a>
.</p>
<p>In this case, though, we can <a href="https://generics.r-lib.org/reference/fit.html" target="_blank" rel="noopener"><code>fit()</code></a>
 on the whole dataset. The resulting object is still a tailor, but is now flagged as trained.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>tlr_trained</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://generics.r-lib.org/reference/fit.html'>fit</a></span><span class='o'>(</span></span>
<span>  <span class='nv'>tlr</span>,</span>
<span>  <span class='nv'>two_class_example</span>,</span>
<span>  outcome <span class='o'>=</span> <span class='nv'>truth</span>,</span>
<span>  estimate <span class='o'>=</span> <span class='nv'>predicted</span>,</span>
<span>  probabilities <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='nv'>Class1</span>, <span class='nv'>Class2</span><span class='o'>)</span></span>
<span><span class='o'>)</span></span>
<span></span>
<span><span class='nv'>tlr_trained</span></span>
<span><span class='c'>#&gt; </span></span>
<span></span><span><span class='c'>#&gt; <span style='color: #00BBBB;'>──</span> <span style='font-weight: bold;'>tailor</span> <span style='color: #00BBBB;'>──────────────────────────────────────────────────────────────────────</span></span></span>
<span></span><span><span class='c'>#&gt; A binary postprocessor with 1 adjustment:</span></span>
<span></span><span><span class='c'>#&gt; </span></span>
<span></span><span><span class='c'>#&gt; <span style='color: #00BBBB;'>•</span> Adjust probability threshold to 0.7. [trained]</span></span>
<span></span></code></pre>
</div>
<p>When used with a model <a href="https://workflows.tidymodels.org" target="_blank" rel="noopener">workflow</a>
 via <a href="https://workflows.tidymodels.org/dev/reference/add_tailor.html" target="_blank" rel="noopener"><code>add_tailor()</code></a>
, the arguments to <a href="https://generics.r-lib.org/reference/fit.html" target="_blank" rel="noopener"><code>fit()</code></a>
 a tailor will be set automatically. Generally, as in recipes, we recommend that users add tailors to model workflows for training and prediction rather than using them standalone for greater ease of use and to prevent data leakage, but tailors are totally functional by themselves, too.</p>
<p>Now, when passed new data, the trained tailor will determine the outputted class based on whether the probability assigned to the level <code>&quot;Class1&quot;</code> is above <code>.7</code>, resulting in more predictions of <code>&quot;Class2&quot;</code> than before.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/stats/predict.html'>predict</a></span><span class='o'>(</span><span class='nv'>tlr_trained</span>, <span class='nv'>two_class_example</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'>count</span><span class='o'>(</span><span class='nv'>predicted</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 2 × 2</span></span></span>
<span><span class='c'>#&gt;   predicted     n</span></span>
<span><span class='c'>#&gt;   <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span>     <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>1</span> Class1      236</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>2</span> Class2      264</span></span>
<span></span></code></pre>
</div>
<p>Changing the probability threshold is one of many possible adjustments available in tailor.</p>
<ul>
<li>For probabilities: <a href="https://tailor.tidymodels.org/reference/adjust_probability_calibration.html" target="_blank" rel="noopener">calibration</a>
</li>
<li>For transformation of probabilities to hard class predictions: <a href="https://tailor.tidymodels.org/reference/adjust_probability_threshold.html" target="_blank" rel="noopener">thresholds</a>
, <a href="https://tailor.tidymodels.org/reference/adjust_equivocal_zone.html" target="_blank" rel="noopener">equivocal zones</a>
</li>
<li>For numeric outcomes: <a href="https://tailor.tidymodels.org/reference/adjust_numeric_calibration.html" target="_blank" rel="noopener">calibration</a>
, <a href="https://tailor.tidymodels.org/reference/adjust_numeric_range.html" target="_blank" rel="noopener">range</a>
</li>
</ul>
<p>Support for tailors is now plumbed through workflows (via <a href="https://workflows.tidymodels.org/dev/reference/add_tailor.html" target="_blank" rel="noopener"><code>add_tailor()</code></a>
) and tune, and rsample includes a set of infrastructural changes to prevent data leakage behind the scenes. That said, we haven&rsquo;t yet implemented support for tuning parameters in tailors, but we plan to implement that before this functionality heads to CRAN.</p>
<h2 id="tailors-in-context">Tailors in context
</h2>
<p>As an example, let&rsquo;s model a study of food delivery times in minutes (i.e., the time from the initial order to receiving the food) for a single restaurant. The <code>deliveries</code> data is available upon loading the tidymodels meta-package.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/data.html'>data</a></span><span class='o'>(</span><span class='nv'>deliveries</span><span class='o'>)</span></span>
<span></span>
<span><span class='c'># split into training and testing sets</span></span>
<span><span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>1</span><span class='o'>)</span></span>
<span><span class='nv'>delivery_split</span> <span class='o'>&lt;-</span> <span class='nf'>initial_split</span><span class='o'>(</span><span class='nv'>deliveries</span><span class='o'>)</span></span>
<span><span class='nv'>delivery_train</span> <span class='o'>&lt;-</span> <span class='nf'>training</span><span class='o'>(</span><span class='nv'>delivery_split</span><span class='o'>)</span></span>
<span><span class='nv'>delivery_test</span>  <span class='o'>&lt;-</span> <span class='nf'>testing</span><span class='o'>(</span><span class='nv'>delivery_split</span><span class='o'>)</span></span>
<span></span>
<span><span class='c'># resample the training set using 10-fold cross-validation</span></span>
<span><span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>1</span><span class='o'>)</span></span>
<span><span class='nv'>delivery_folds</span> <span class='o'>&lt;-</span> <span class='nf'>vfold_cv</span><span class='o'>(</span><span class='nv'>delivery_train</span><span class='o'>)</span></span>
<span></span>
<span><span class='c'># print out the training set</span></span>
<span><span class='nv'>delivery_train</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 7,509 × 31</span></span></span>
<span><span class='c'>#&gt;    time_to_delivery  hour day   distance item_01 item_02 item_03 item_04 item_05</span></span>
<span><span class='c'>#&gt;               <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span>    <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span>   <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span>   <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span>   <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span>   <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span>   <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 1</span>             21.2  16.1 Tue       3.02       0       0       0       0       0</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 2</span>             17.9  12.4 Sun       3.37       0       0       0       0       0</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 3</span>             22.4  14.2 Fri       2.59       0       0       0       0       0</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 4</span>             30.9  19.1 Sat       2.77       0       0       0       0       0</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 5</span>             30.1  16.5 Fri       2.05       0       0       0       1       0</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 6</span>             35.3  14.7 Sat       4.57       0       0       2       1       1</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 7</span>             13.1  11.5 Sat       2.09       0       0       0       0       0</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 8</span>             18.3  13.4 Tue       2.35       0       2       1       0       0</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 9</span>             25.2  20.5 Sat       2.43       0       0       0       1       0</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>10</span>             30.7  16.7 Fri       2.24       0       0       0       1       0</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 7,499 more rows</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 22 more variables: item_06 &lt;int&gt;, item_07 &lt;int&gt;, item_08 &lt;int&gt;,</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>#   item_09 &lt;int&gt;, item_10 &lt;int&gt;, item_11 &lt;int&gt;, item_12 &lt;int&gt;, item_13 &lt;int&gt;,</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>#   item_14 &lt;int&gt;, item_15 &lt;int&gt;, item_16 &lt;int&gt;, item_17 &lt;int&gt;, item_18 &lt;int&gt;,</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>#   item_19 &lt;int&gt;, item_20 &lt;int&gt;, item_21 &lt;int&gt;, item_22 &lt;int&gt;, item_23 &lt;int&gt;,</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>#   item_24 &lt;int&gt;, item_25 &lt;int&gt;, item_26 &lt;int&gt;, item_27 &lt;int&gt;</span></span></span>
<span></span></code></pre>
</div>
<p>Let&rsquo;s deliberately define a regression model that has poor predicted values: a boosted tree with only three ensemble members.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>delivery_wflow</span> <span class='o'>&lt;-</span></span>
<span>  <span class='nf'>workflow</span><span class='o'>(</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span>
<span>  <span class='nf'>add_formula</span><span class='o'>(</span><span class='nv'>time_to_delivery</span> <span class='o'>~</span> <span class='nv'>.</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span>
<span>  <span class='nf'>add_model</span><span class='o'>(</span><span class='nf'>boost_tree</span><span class='o'>(</span>mode <span class='o'>=</span> <span class='s'>"regression"</span>, trees <span class='o'>=</span> <span class='m'>3</span><span class='o'>)</span><span class='o'>)</span></span></code></pre>
</div>
<p>Evaluating against resamples:</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>1</span><span class='o'>)</span></span>
<span><span class='nv'>delivery_res</span> <span class='o'>&lt;-</span> </span>
<span>  <span class='nf'>fit_resamples</span><span class='o'>(</span></span>
<span>    <span class='nv'>delivery_wflow</span>, </span>
<span>    <span class='nv'>delivery_folds</span>, </span>
<span>    control <span class='o'>=</span> <span class='nf'>control_resamples</span><span class='o'>(</span>save_pred <span class='o'>=</span> <span class='kc'>TRUE</span><span class='o'>)</span></span>
<span>  <span class='o'>)</span></span></code></pre>
</div>
<p>The $R^2$ looks quite strong!</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://tune.tidymodels.org/reference/collect_predictions.html'>collect_metrics</a></span><span class='o'>(</span><span class='nv'>delivery_res</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 2 × 6</span></span></span>
<span><span class='c'>#&gt;   .metric .estimator  mean     n std_err .config             </span></span>
<span><span class='c'>#&gt;   <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>   <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>      <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span>   <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>               </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>1</span> rmse    standard   9.52     10 0.053<span style='text-decoration: underline;'>3</span>  Preprocessor1_Model1</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>2</span> rsq     standard   0.853    10 0.003<span style='text-decoration: underline;'>57</span> Preprocessor1_Model1</span></span>
<span></span></code></pre>
</div>
<p>Let&rsquo;s take a closer look at the predictions, though. How well are they calibrated? We can use the <a href="https://probably.tidymodels.org/reference/cal_plot_regression.html" target="_blank" rel="noopener"><code>cal_plot_regression()</code></a>
 helper from the probably package to put together a quick diagnostic plot.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://tune.tidymodels.org/reference/collect_predictions.html'>collect_predictions</a></span><span class='o'>(</span><span class='nv'>delivery_res</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span>
<span>  <span class='nf'><a href='https://probably.tidymodels.org/reference/cal_plot_regression.html'>cal_plot_regression</a></span><span class='o'>(</span>truth <span class='o'>=</span> <span class='nv'>time_to_delivery</span>, estimate <span class='o'>=</span> <span class='nv'>.pred</span><span class='o'>)</span></span>
</code></pre>
<img src="https://posit-open-source.netlify.app/blog/tidyverse/2024/postprocessing-preview/figs/predictions-bad-boost-1.png" width="700px" style="display: block; margin: auto;" />
</div>
<p>Ooof.</p>
<p>In comes tailor! Numeric calibration can help address the correlated errors here. We can add a tailor to our existing workflow to &ldquo;bump up&rdquo; predictions towards their true value.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>delivery_wflow_improved</span> <span class='o'>&lt;-</span></span>
<span>  <span class='nv'>delivery_wflow</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span>
<span>  <span class='nf'>add_tailor</span><span class='o'>(</span><span class='nf'><a href='https://tailor.tidymodels.org/reference/tailor.html'>tailor</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://tailor.tidymodels.org/reference/adjust_numeric_calibration.html'>adjust_numeric_calibration</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span></span></code></pre>
</div>
<p>The resampling code looks the same from here.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>1</span><span class='o'>)</span></span>
<span><span class='nv'>delivery_res_improved</span> <span class='o'>&lt;-</span> </span>
<span>  <span class='nf'>fit_resamples</span><span class='o'>(</span></span>
<span>    <span class='nv'>delivery_wflow_improved</span>, </span>
<span>    <span class='nv'>delivery_folds</span>, </span>
<span>    control <span class='o'>=</span> <span class='nf'>control_resamples</span><span class='o'>(</span>save_pred <span class='o'>=</span> <span class='kc'>TRUE</span><span class='o'>)</span></span>
<span>  <span class='o'>)</span></span></code></pre>
</div>
<p>Checking out the same plot reveals a much better fit!</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://tune.tidymodels.org/reference/collect_predictions.html'>collect_predictions</a></span><span class='o'>(</span><span class='nv'>delivery_res_improved</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span>
<span>  <span class='nf'><a href='https://probably.tidymodels.org/reference/cal_plot_regression.html'>cal_plot_regression</a></span><span class='o'>(</span>truth <span class='o'>=</span> <span class='nv'>time_to_delivery</span>, estimate <span class='o'>=</span> <span class='nv'>.pred</span><span class='o'>)</span></span>
</code></pre>
<img src="https://posit-open-source.netlify.app/blog/tidyverse/2024/postprocessing-preview/figs/predictios-better-boost-1.png" width="700px" style="display: block; margin: auto;" />
</div>
<p>There&rsquo;s actually some tricky data leakage prevention happening under the hood here. When you add tailors to workflow and fit them with tune, this is all taken care of for you. If you&rsquo;re interested in using tailors outside of that context, check out <a href="https://workflows.tidymodels.org/dev/reference/add_tailor.html#data-usage" target="_blank" rel="noopener">this documentation section</a>
 in <code>add_tailor()</code>.</p>
<h2 id="whats-to-come">What&rsquo;s to come
</h2>
<p>We&rsquo;re excited about how this work is shaping up and would love to hear yall&rsquo;s thoughts on what we&rsquo;ve brought together so far. Please do comment on our social media posts about this blog entry or leave issues on the <a href="https://github.com/tidymodels/tailor" target="_blank" rel="noopener">tailor GitHub repository</a>
 and let us know what you think!</p>
<p>Before these changes head out to CRAN, we&rsquo;ll also be implementing tuning functionality for postprocessors. You&rsquo;ll be able to tag arguments like <code>adjust_probability_threshold(threshold)</code> or <code>adjust_probability_calibration(method)</code> with <code>tune()</code> to optimize across several values. Besides that, post-processing with tidymodels should &ldquo;just work&rdquo; on the developmental versions of our packages&mdash;let us know if you come across anything wonky.</p>
<h2 id="acknowledgements">Acknowledgements
</h2>
<p>Postprocessing support has been a longstanding feature request across many of our repositories; we&rsquo;re grateful for the community discussions there for shaping this work. Additionally, we thank Ryan Tibshirani and Daniel McDonald for fruitful discussions on how we might scope these features.</p>
]]></description>
      <enclosure url="https://posit-open-source.netlify.app/blog/tidyverse/2024/postprocessing-preview/thumbnail-wd.jpg" length="386938" type="image/jpeg" />
    </item>
    <item>
      <title>recipes 1.1.0</title>
      <link>https://posit-open-source.netlify.app/blog/tidyverse/2024/recipes-1-1-0/</link>
      <pubDate>Mon, 08 Jul 2024 00:00:00 +0000</pubDate>
      <guid>https://posit-open-source.netlify.app/blog/tidyverse/2024/recipes-1-1-0/</guid>
      <dc:creator>Emil Hvitfeldt</dc:creator><description><![CDATA[<!--
TODO:
* [x] Look over / edit the post's title in the yaml
* [x] Edit (or delete) the description; note this appears in the Twitter card
* [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html))
* [x] Find photo & update yaml metadata
* [x] Create `thumbnail-sq.jpg`; height and width should be equal
* [x] Create `thumbnail-wd.jpg`; width should be >5x height
* [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)
* [x] Add intro sentence, e.g. the standard tagline for the package
* [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html)
-->
<p>We&rsquo;re thrilled to announce the release of <a href="https://recipes.tidymodels.org/" target="_blank" rel="noopener">recipes</a>
 1.1.0. recipes lets you create a pipeable sequence of feature engineering steps.</p>
<p>You can install it from CRAN with:</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"recipes"</span><span class='o'>)</span></span></code></pre>
</div>
<p>This blog post will go over some of the bigger changes in this release. Improvements in column type checking, allowing more data types to be passed to recipes, use of long formulas and better error for misspelled argument names.</p>
<p>You can see a full list of changes in the <a href="https://github.com/tidymodels/recipes/releases/tag/v1.1.0" target="_blank" rel="noopener">release notes</a>
.</p>
<h2 id="column-type-checking">Column type checking
</h2>
<p>A <a href="https://github.com/tidymodels/recipes/issues/793" target="_blank" rel="noopener">longtime issue</a>
 in recipes came from the fact that recipes didn&rsquo;t keep a <a href="https://vctrs.r-lib.org/articles/type-size.html" target="_blank" rel="noopener">prototype</a>
 (ptype) of the data it was specified with. This would cause unexpected things to happen or uninformative error messages to appear if different data was used to <a href="https://recipes.tidymodels.org/reference/prep.html" target="_blank" rel="noopener"><code>prep()</code></a>
 than was used to create the <a href="https://recipes.tidymodels.org/reference/recipe.html" target="_blank" rel="noopener"><code>recipe()</code></a>
.</p>
<p>Every recipe you create starts with a call to <a href="https://recipes.tidymodels.org/reference/recipe.html" target="_blank" rel="noopener"><code>recipe()</code></a>
. In the below example, we create a recipe where <code>x2</code> starts by being a character vector, but the recipe is prepped where <code>x2</code> is a numeric vector. This didn&rsquo;t produce any warnings or errors, silently doing something unintended.</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span><span class="lnt">16
</span><span class="lnt">17
</span><span class="lnt">18
</span><span class="lnt">19
</span><span class="lnt">20
</span><span class="lnt">21
</span><span class="lnt">22
</span><span class="lnt">23
</span><span class="lnt">24
</span><span class="lnt">25
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">data_template</span> <span class="o">&lt;-</span> <span class="nf">tibble</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">  <span class="n">outcome</span> <span class="o">=</span> <span class="nf">rnorm</span><span class="p">(</span><span class="m">10</span><span class="p">),</span> 
</span></span><span class="line"><span class="cl">  <span class="n">x1</span> <span class="o">=</span> <span class="nf">rnorm</span><span class="p">(</span><span class="m">10</span><span class="p">),</span> 
</span></span><span class="line"><span class="cl">  <span class="n">x2</span> <span class="o">=</span> <span class="nf">sample</span><span class="p">(</span><span class="kc">letters</span><span class="p">,</span> <span class="m">10</span><span class="p">,</span> <span class="bp">T</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">rec</span> <span class="o">&lt;-</span> <span class="nf">recipe</span><span class="p">(</span><span class="n">outcome</span> <span class="o">~</span> <span class="n">.,</span> <span class="n">data_template</span><span class="p">)</span> <span class="o">%&gt;%</span>
</span></span><span class="line"><span class="cl">  <span class="nf">step_bin2factor</span><span class="p">(</span><span class="nf">all_numeric_predictors</span><span class="p">())</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">data_training</span> <span class="o">&lt;-</span> <span class="nf">tibble</span><span class="p">(</span><span class="n">outcome</span> <span class="o">=</span> <span class="nf">rnorm</span><span class="p">(</span><span class="m">1000</span><span class="p">),</span> <span class="n">x1</span> <span class="o">=</span> <span class="nf">rnorm</span><span class="p">(</span><span class="m">1000</span><span class="p">),</span> <span class="n">x2</span> <span class="o">=</span> <span class="nf">rnorm</span><span class="p">(</span><span class="m">1000</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nf">prep</span><span class="p">(</span><span class="n">rec</span><span class="p">,</span> <span class="n">training</span> <span class="o">=</span> <span class="n">data_training</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; </span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; ── Recipe ──────────────────────────────────────────────────────────────────────</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; </span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; ── Inputs</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; Number of variables by role</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; outcome:   1</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; predictor: 2</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; </span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; ── Training information</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; Training data contained 1000 data points and no incomplete rows.</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; </span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; ── Operations</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; • Dummy variable to factor conversion for: x1 | Trained</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p>Now, we get an error detailing how the data is different.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>data_template</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span>outcome <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/stats/Normal.html'>rnorm</a></span><span class='o'>(</span><span class='m'>10</span><span class='o'>)</span>, x1 <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/stats/Normal.html'>rnorm</a></span><span class='o'>(</span><span class='m'>10</span><span class='o'>)</span>, x2 <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/sample.html'>sample</a></span><span class='o'>(</span><span class='nv'>letters</span>, <span class='m'>10</span>, <span class='kc'>T</span><span class='o'>)</span><span class='o'>)</span></span>
<span></span>
<span><span class='nv'>rec</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://recipes.tidymodels.org/reference/recipe.html'>recipe</a></span><span class='o'>(</span><span class='nv'>outcome</span> <span class='o'>~</span> <span class='nv'>.</span>, <span class='nv'>data_template</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span>
<span>  <span class='nf'><a href='https://recipes.tidymodels.org/reference/step_bin2factor.html'>step_bin2factor</a></span><span class='o'>(</span><span class='nf'><a href='https://recipes.tidymodels.org/reference/has_role.html'>all_numeric_predictors</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span></span>
<span></span>
<span><span class='nv'>data_training</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span>outcome <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/stats/Normal.html'>rnorm</a></span><span class='o'>(</span><span class='m'>1000</span><span class='o'>)</span>, x1 <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/stats/Normal.html'>rnorm</a></span><span class='o'>(</span><span class='m'>1000</span><span class='o'>)</span>, x2 <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/stats/Normal.html'>rnorm</a></span><span class='o'>(</span><span class='m'>1000</span><span class='o'>)</span><span class='o'>)</span></span>
<span></span>
<span><span class='nf'><a href='https://recipes.tidymodels.org/reference/prep.html'>prep</a></span><span class='o'>(</span><span class='nv'>rec</span>, training <span class='o'>=</span> <span class='nv'>data_training</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `prep()`:</span></span></span>
<span><span class='c'>#&gt; <span style='color: #BB0000;'>✖</span> The following variable has the wrong class:</span></span>
<span><span class='c'>#&gt; <span style='color: #00BBBB;'>•</span> `x2` must have class <span style='color: #0000BB;'>&lt;numeric&gt;</span>, not <span style='color: #0000BB;'>&lt;character&gt;</span>.</span></span>
<span></span></code></pre>
</div>
<p>Note that recipes created before version 1.1.0 don&rsquo;t contain any ptype information, and will not undergo checking. Rerunning the code to create the recipe will add ptype information to the recipe.</p>
<h2 id="input-checking-in-recipe">Input checking in <code>recipe()</code>
</h2>
<p>We have relaxed the requirements of data frames, while making feedback more helpful when something goes wrong.</p>
<p>The data was previously passed through <a href="https://rdrr.io/r/stats/model.frame.html" target="_blank" rel="noopener"><code>model.frame()</code></a>
 inside the recipe, which restricted what could be handled. Previously prohibited input included data frames with list-columns or <a href="https://r-spatial.github.io/sf/" target="_blank" rel="noopener">sf</a>
 data frames. Both of these are now supported, as long as they are a <code>data.frame</code> object.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>data_listcolumn</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span></span>
<span>  y <span class='o'>=</span> <span class='m'>1</span><span class='o'>:</span><span class='m'>4</span>,</span>
<span>  x <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span><span class='m'>1</span><span class='o'>:</span><span class='m'>3</span>, <span class='m'>4</span><span class='o'>:</span><span class='m'>6</span>, <span class='m'>3</span><span class='o'>:</span><span class='m'>1</span>, <span class='m'>1</span><span class='o'>:</span><span class='m'>10</span><span class='o'>)</span></span>
<span><span class='o'>)</span></span>
<span></span>
<span><span class='nf'><a href='https://recipes.tidymodels.org/reference/recipe.html'>recipe</a></span><span class='o'>(</span><span class='nv'>y</span> <span class='o'>~</span> <span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>data_listcolumn</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; </span></span>
<span></span><span><span class='c'>#&gt; <span style='color: #00BBBB;'>──</span> <span style='font-weight: bold;'>Recipe</span> <span style='color: #00BBBB;'>──────────────────────────────────────────────────────────────────────</span></span></span>
<span></span><span><span class='c'>#&gt; </span></span>
<span></span><span><span class='c'>#&gt; ── Inputs</span></span>
<span></span><span><span class='c'>#&gt; Number of variables by role</span></span>
<span></span><span><span class='c'>#&gt; outcome:   1</span></span>
<span><span class='c'>#&gt; predictor: 1</span></span>
<span></span></code></pre>
</div>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://r-spatial.github.io/sf/'>sf</a></span><span class='o'>)</span></span>
<span><span class='c'>#&gt; Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE</span></span>
<span></span><span><span class='nv'>pathshp</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/system.file.html'>system.file</a></span><span class='o'>(</span><span class='s'>"shape/nc.shp"</span>, package <span class='o'>=</span> <span class='s'>"sf"</span><span class='o'>)</span></span>
<span><span class='nv'>data_sf</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://r-spatial.github.io/sf/reference/st_read.html'>st_read</a></span><span class='o'>(</span><span class='nv'>pathshp</span>, quiet <span class='o'>=</span> <span class='kc'>TRUE</span><span class='o'>)</span></span>
<span></span>
<span><span class='nf'><a href='https://recipes.tidymodels.org/reference/recipe.html'>recipe</a></span><span class='o'>(</span><span class='nv'>AREA</span> <span class='o'>~</span> <span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>data_sf</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; </span></span>
<span></span><span><span class='c'>#&gt; <span style='color: #00BBBB;'>──</span> <span style='font-weight: bold;'>Recipe</span> <span style='color: #00BBBB;'>──────────────────────────────────────────────────────────────────────</span></span></span>
<span></span><span><span class='c'>#&gt; </span></span>
<span></span><span><span class='c'>#&gt; ── Inputs</span></span>
<span></span><span><span class='c'>#&gt; Number of variables by role</span></span>
<span></span><span><span class='c'>#&gt; outcome:    1</span></span>
<span><span class='c'>#&gt; predictor: 14</span></span>
<span></span></code></pre>
</div>
<p>We are excited to see what people can do with these new options.</p>
<p>Another way to tell a recipe what variables should be included and what roles they should have is to use <a href="https://recipes.tidymodels.org/reference/roles.html" target="_blank" rel="noopener"><code>add_role()</code></a>
 and <a href="https://recipes.tidymodels.org/reference/roles.html" target="_blank" rel="noopener"><code>update_role()</code></a>
. But if you were not careful, you could end up in situations where the same variable is labeled as both the outcome and predictor.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># didn't used to throw a warning</span></span>
<span><span class='nf'><a href='https://recipes.tidymodels.org/reference/recipe.html'>recipe</a></span><span class='o'>(</span><span class='nv'>mtcars</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'><a href='https://recipes.tidymodels.org/reference/roles.html'>update_role</a></span><span class='o'>(</span><span class='nf'><a href='https://tidyselect.r-lib.org/reference/everything.html'>everything</a></span><span class='o'>(</span><span class='o'>)</span>, new_role <span class='o'>=</span> <span class='s'>"predictor"</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'><a href='https://recipes.tidymodels.org/reference/roles.html'>add_role</a></span><span class='o'>(</span><span class='s'>"mpg"</span>, new_role <span class='o'>=</span> <span class='s'>"outcome"</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `add_role()`:</span></span></span>
<span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> `mpg` cannot get <span style='color: #0000BB;'>"outcome"</span> role as it already has role <span style='color: #0000BB;'>"predictor"</span>.</span></span>
<span></span></code></pre>
</div>
<p>This error can be avoided by using <a href="https://recipes.tidymodels.org/reference/roles.html" target="_blank" rel="noopener"><code>update_role()</code></a>
 instead of <a href="https://recipes.tidymodels.org/reference/roles.html" target="_blank" rel="noopener"><code>add_role()</code></a>
.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://recipes.tidymodels.org/reference/recipe.html'>recipe</a></span><span class='o'>(</span><span class='nv'>mtcars</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'><a href='https://recipes.tidymodels.org/reference/roles.html'>update_role</a></span><span class='o'>(</span><span class='nf'><a href='https://tidyselect.r-lib.org/reference/everything.html'>everything</a></span><span class='o'>(</span><span class='o'>)</span>, new_role <span class='o'>=</span> <span class='s'>"predictor"</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'><a href='https://recipes.tidymodels.org/reference/roles.html'>update_role</a></span><span class='o'>(</span><span class='s'>"mpg"</span>, new_role <span class='o'>=</span> <span class='s'>"outcome"</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; </span></span>
<span></span><span><span class='c'>#&gt; <span style='color: #00BBBB;'>──</span> <span style='font-weight: bold;'>Recipe</span> <span style='color: #00BBBB;'>──────────────────────────────────────────────────────────────────────</span></span></span>
<span></span><span><span class='c'>#&gt; </span></span>
<span></span><span><span class='c'>#&gt; ── Inputs</span></span>
<span></span><span><span class='c'>#&gt; Number of variables by role</span></span>
<span></span><span><span class='c'>#&gt; outcome:    1</span></span>
<span><span class='c'>#&gt; predictor: 10</span></span>
<span></span></code></pre>
</div>
<h2 id="long-formulas-in-recipe">Long formulas in <code>recipe()</code>
</h2>
<p>Related to the changes we saw above, we now fully support very long formulas without hitting a <code>C stack usage</code> error.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>data_wide</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/matrix.html'>matrix</a></span><span class='o'>(</span><span class='m'>1</span><span class='o'>:</span><span class='m'>10000</span>, ncol <span class='o'>=</span> <span class='m'>10000</span><span class='o'>)</span></span>
<span><span class='nv'>data_wide</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/as.data.frame.html'>as.data.frame</a></span><span class='o'>(</span><span class='nv'>data_wide</span><span class='o'>)</span></span>
<span><span class='nf'><a href='https://rdrr.io/r/base/names.html'>names</a></span><span class='o'>(</span><span class='nv'>data_wide</span><span class='o'>)</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/paste.html'>paste0</a></span><span class='o'>(</span><span class='s'>"x"</span>, <span class='m'>1</span><span class='o'>:</span><span class='m'>10000</span><span class='o'>)</span><span class='o'>)</span></span>
<span></span>
<span><span class='nv'>long_formula</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/stats/formula.html'>as.formula</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/paste.html'>paste</a></span><span class='o'>(</span><span class='s'>"~ "</span>, <span class='nf'><a href='https://rdrr.io/r/base/paste.html'>paste</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/names.html'>names</a></span><span class='o'>(</span><span class='nv'>data_wide</span><span class='o'>)</span>, collapse <span class='o'>=</span> <span class='s'>" + "</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span></span>
<span></span>
<span><span class='nf'><a href='https://recipes.tidymodels.org/reference/recipe.html'>recipe</a></span><span class='o'>(</span><span class='nv'>long_formula</span>, <span class='nv'>data_wide</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; </span></span>
<span></span><span><span class='c'>#&gt; <span style='color: #00BBBB;'>──</span> <span style='font-weight: bold;'>Recipe</span> <span style='color: #00BBBB;'>──────────────────────────────────────────────────────────────────────</span></span></span>
<span></span><span><span class='c'>#&gt; </span></span>
<span></span><span><span class='c'>#&gt; ── Inputs</span></span>
<span></span><span><span class='c'>#&gt; Number of variables by role</span></span>
<span></span><span><span class='c'>#&gt; predictor: 10000</span></span>
<span></span></code></pre>
</div>
<h2 id="better-error-for-misspelled-argument-names">Better error for misspelled argument names
</h2>
<p>If you have used recipes long enough you are very likely to have run into the following error.</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">recipe</span><span class="p">(</span><span class="n">mpg</span> <span class="o">~</span> <span class="n">.,</span> <span class="n">data</span> <span class="o">=</span> <span class="n">mtcars</span><span class="p">)</span> <span class="o">|&gt;</span>
</span></span><span class="line"><span class="cl">  <span class="nf">step_pca</span><span class="p">(</span><span class="nf">all_numeric_predictors</span><span class="p">(),</span> <span class="n">number</span> <span class="o">=</span> <span class="m">4</span><span class="p">)</span> <span class="o">|&gt;</span>
</span></span><span class="line"><span class="cl">  <span class="nf">prep</span><span class="p">()</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; Error in `step_pca()`:</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; Caused by error in `prep()`:</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; ! Can&#39;t rename variables in this context.</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p>The first time you saw it, it didn&rsquo;t make much sense. Hopefully, you figured out that <a href="https://recipes.tidymodels.org/reference/step_pca.html" target="_blank" rel="noopener">step_pca()</a>
 doesn&rsquo;t have a <code>number</code> argument, and instead uses <code>num_comp</code> to determine the number of principal components to return. This confusion will be a thing of the past as we now include this improved error message.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://recipes.tidymodels.org/reference/recipe.html'>recipe</a></span><span class='o'>(</span><span class='nv'>mpg</span> <span class='o'>~</span> <span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>mtcars</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'><a href='https://recipes.tidymodels.org/reference/step_pca.html'>step_pca</a></span><span class='o'>(</span><span class='nf'><a href='https://recipes.tidymodels.org/reference/has_role.html'>all_numeric_predictors</a></span><span class='o'>(</span><span class='o'>)</span>, number <span class='o'>=</span> <span class='m'>4</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'><a href='https://recipes.tidymodels.org/reference/prep.html'>prep</a></span><span class='o'>(</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `step_pca()`:</span></span></span>
<span><span class='c'>#&gt; <span style='font-weight: bold;'>Caused by error in `prep()` at recipes/R/recipe.R:479:9:</span></span></span>
<span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> The following argument was specified but do not exist: `number`.</span></span>
<span></span></code></pre>
</div>
<h2 id="quality-of-life-increases-in-step_dummy">Quality of life increases in <code>step_dummy()</code>
</h2>
<p>I would imagine that one of the most used steps is <a href="https://recipes.tidymodels.org/reference/step_dummy.html" target="_blank" rel="noopener"><code>step_dummy()</code></a>
. We have improved the errors and warnings it spits out when things go sideways.</p>
<p>If you apply <a href="https://recipes.tidymodels.org/reference/step_dummy.html" target="_blank" rel="noopener"><code>step_dummy()</code></a>
 to a variable that contains a lot of levels, it will produce a lot of columns, and the resulting object may not fit in memory. This can lead to the following error.</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">data_id</span> <span class="o">&lt;-</span> <span class="nf">tibble</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">  <span class="n">id</span> <span class="o">=</span> <span class="nf">as.character</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="m">100000</span><span class="p">),</span> 
</span></span><span class="line"><span class="cl">  <span class="n">x1</span> <span class="o">=</span> <span class="nf">rnorm</span><span class="p">(</span><span class="m">100000</span><span class="p">),</span> 
</span></span><span class="line"><span class="cl">  <span class="n">x2</span> <span class="o">=</span> <span class="nf">sample</span><span class="p">(</span><span class="kc">letters</span><span class="p">,</span> <span class="m">100000</span><span class="p">,</span> <span class="kc">TRUE</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nf">recipe</span><span class="p">(</span><span class="o">~</span> <span class="n">.,</span> <span class="n">data</span> <span class="o">=</span> <span class="n">data_id</span><span class="p">)</span> <span class="o">|&gt;</span>
</span></span><span class="line"><span class="cl">  <span class="nf">step_dummy</span><span class="p">(</span><span class="nf">all_nominal_predictors</span><span class="p">())</span> <span class="o">|&gt;</span>
</span></span><span class="line"><span class="cl">  <span class="nf">prep</span><span class="p">()</span>
</span></span><span class="line"><span class="cl"><span class="c1">#&gt; Error: vector memory exhausted (limit reached?)</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p>Instead, you now get a more helpful error message.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>data_id</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span></span>
<span>  id <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/character.html'>as.character</a></span><span class='o'>(</span><span class='m'>1</span><span class='o'>:</span><span class='m'>100000</span><span class='o'>)</span>, </span>
<span>  x1 <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/stats/Normal.html'>rnorm</a></span><span class='o'>(</span><span class='m'>100000</span><span class='o'>)</span>, </span>
<span>  x2 <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/sample.html'>sample</a></span><span class='o'>(</span><span class='nv'>letters</span>, <span class='m'>100000</span>, <span class='kc'>TRUE</span><span class='o'>)</span></span>
<span><span class='o'>)</span></span>
<span></span>
<span><span class='nf'><a href='https://recipes.tidymodels.org/reference/recipe.html'>recipe</a></span><span class='o'>(</span><span class='o'>~</span> <span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>data_id</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'><a href='https://recipes.tidymodels.org/reference/step_dummy.html'>step_dummy</a></span><span class='o'>(</span><span class='nf'><a href='https://recipes.tidymodels.org/reference/has_role.html'>all_nominal_predictors</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'><a href='https://recipes.tidymodels.org/reference/prep.html'>prep</a></span><span class='o'>(</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `step_dummy()`:</span></span></span>
<span><span class='c'>#&gt; <span style='font-weight: bold;'>Caused by error:</span></span></span>
<span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> `id` contains too many levels (100000), which would result in a</span></span>
<span><span class='c'>#&gt;   data.frame too large to fit in memory.</span></span>
<span></span></code></pre>
</div>
<p>Likewise, you will get helpful errors if <a href="https://recipes.tidymodels.org/reference/step_dummy.html" target="_blank" rel="noopener"><code>step_dummy()</code></a>
 gets a <code>NA</code> or unseen values.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>data_train</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"a"</span>, <span class='s'>"b"</span><span class='o'>)</span><span class='o'>)</span></span>
<span><span class='nv'>data_unseen</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='s'>"c"</span><span class='o'>)</span></span>
<span></span>
<span><span class='nv'>rec_spec</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://recipes.tidymodels.org/reference/recipe.html'>recipe</a></span><span class='o'>(</span><span class='o'>~</span><span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>data_train</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span>
<span>  <span class='nf'><a href='https://recipes.tidymodels.org/reference/step_dummy.html'>step_dummy</a></span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span>
<span>  <span class='nf'><a href='https://recipes.tidymodels.org/reference/prep.html'>prep</a></span><span class='o'>(</span><span class='o'>)</span></span>
<span></span>
<span><span class='nv'>rec_spec</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span>
<span>  <span class='nf'><a href='https://recipes.tidymodels.org/reference/bake.html'>bake</a></span><span class='o'>(</span><span class='nv'>data_unseen</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; Warning: <span style='color: #BBBB00;'>!</span> There are new levels in `x`: <span style='color: #0000BB;'>"c"</span>.</span></span>
<span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Consider using step_novel() (`?recipes::step_novel()`) before `step_dummy()`</span></span>
<span><span class='c'>#&gt;   to handle unseen values.</span></span>
<span></span><span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 1 × 1</span></span></span>
<span><span class='c'>#&gt;     x_b</span></span>
<span><span class='c'>#&gt;   <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>1</span>    <span style='color: #BB0000;'>NA</span></span></span>
<span></span></code></pre>
</div>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>data_na</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='kc'>NA</span><span class='o'>)</span></span>
<span></span>
<span><span class='nv'>rec_spec</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span>
<span>  <span class='nf'><a href='https://recipes.tidymodels.org/reference/bake.html'>bake</a></span><span class='o'>(</span><span class='nv'>data_na</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; Warning: <span style='color: #BBBB00;'>!</span> There are new levels in `x`: <span style='color: #0000BB;'>NA</span>.</span></span>
<span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Consider using step_unknown() (`?recipes::step_unknown()`) before</span></span>
<span><span class='c'>#&gt;   `step_dummy()` to handle missing values.</span></span>
<span></span><span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 1 × 1</span></span></span>
<span><span class='c'>#&gt;     x_b</span></span>
<span><span class='c'>#&gt;   <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>1</span>    <span style='color: #BB0000;'>NA</span></span></span>
<span></span></code></pre>
</div>
<h2 id="acknowledgements">Acknowledgements
</h2>
<p>A big thank you to all the people who have contributed to recipes since the release of v1.0.10:</p>
<p><a href="https://github.com/brynhum" target="_blank" rel="noopener">@brynhum</a>
, <a href="https://github.com/DemetriPananos" target="_blank" rel="noopener">@DemetriPananos</a>
, <a href="https://github.com/diegoperoni" target="_blank" rel="noopener">@diegoperoni</a>
, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>
, <a href="https://github.com/JiahuaQu" target="_blank" rel="noopener">@JiahuaQu</a>
, <a href="https://github.com/joranE" target="_blank" rel="noopener">@joranE</a>
, <a href="https://github.com/nhward" target="_blank" rel="noopener">@nhward</a>
, <a href="https://github.com/olivroy" target="_blank" rel="noopener">@olivroy</a>
, and <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>
.</p>
<h2 id="chocolate-chocolate-chip-cookies">Chocolate Chocolate Chip Cookies
</h2>
<p>preheat oven 350°F</p>
<ul>
<li>1/3c butter</li>
<li>1/2 + 1/3c sugar</li>
</ul>
<p>mix until fluffy</p>
<ul>
<li>1 tsp vanilla</li>
<li>1 egg</li>
</ul>
<p>mix until combined</p>
<ul>
<li>1/2c cocoa</li>
<li>1/2 tsp baking soda</li>
<li>1c flour</li>
</ul>
<p>mix until combined</p>
<ul>
<li>3/4c chocolate chips</li>
</ul>
<p>bake for about 8 mins, depending on size! they will crack on top, but still be soft.</p>
]]></description>
      <enclosure url="https://posit-open-source.netlify.app/blog/tidyverse/2024/recipes-1-1-0/thumbnail-wd.jpg" length="477764" type="image/jpeg" />
    </item>
    <item>
      <title>bonsai 0.3.0</title>
      <link>https://posit-open-source.netlify.app/blog/tidyverse/2024/bonsai-0-3-0/</link>
      <pubDate>Tue, 25 Jun 2024 00:00:00 +0000</pubDate>
      <guid>https://posit-open-source.netlify.app/blog/tidyverse/2024/bonsai-0-3-0/</guid>
      <dc:creator>Simon Couch</dc:creator><description><![CDATA[<p>We&rsquo;re brimming with glee to announce the release of <a href="https://bonsai.tidymodels.org" target="_blank" rel="noopener">bonsai</a>
 0.3.0. bonsai is a parsnip extension package for tree-based models, and includes support for random forest and gradient-boosted tree frameworks like partykit and LightGBM. This most recent release of the package introduces support for the <code>&quot;aorsf&quot;</code> engine, which implements accelerated oblique random forests (Jaeger et al. 2022, Jaeger et al. 2024).</p>
<p>You can install it from CRAN with:</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"bonsai"</span><span class='o'>)</span></span></code></pre>
</div>
<p>This blog post will demonstrate a modeling workflow where the benefits of using oblique random forests shine through.</p>
<p>You can see a full list of changes in the <a href="https://bonsai.tidymodels.org/news/index.html#bonsai-030" target="_blank" rel="noopener">release notes</a>
.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://tidymodels.tidymodels.org'>tidymodels</a></span><span class='o'>)</span></span>
<span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://bonsai.tidymodels.org/'>bonsai</a></span><span class='o'>)</span></span>
<span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://plsmod.tidymodels.org'>plsmod</a></span><span class='o'>)</span></span>
<span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://github.com/tidymodels/corrr'>corrr</a></span><span class='o'>)</span></span></code></pre>
</div>
<h2 id="the-meats-data">The <code>meats</code> data
</h2>
<p>The modeldata package, loaded automatically with the tidymodels meta-package, includes several example datasets to demonstrate modeling problems. We&rsquo;ll make use of a dataset called <code>meats</code> in this post. Each row is a measurement of a sample of finely chopped meat.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>meats</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 215 × 103</span></span></span>
<span><span class='c'>#&gt;    x_001 x_002 x_003 x_004 x_005 x_006 x_007 x_008 x_009 x_010 x_011 x_012 x_013</span></span>
<span><span class='c'>#&gt;    <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 1</span>  2.62  2.62  2.62  2.62  2.62  2.62  2.62  2.62  2.63  2.63  2.63  2.63  2.64</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 2</span>  2.83  2.84  2.84  2.85  2.85  2.86  2.86  2.87  2.87  2.88  2.88  2.89  2.90</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 3</span>  2.58  2.58  2.59  2.59  2.59  2.59  2.59  2.60  2.60  2.60  2.60  2.61  2.61</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 4</span>  2.82  2.82  2.83  2.83  2.83  2.83  2.83  2.84  2.84  2.84  2.84  2.85  2.85</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 5</span>  2.79  2.79  2.79  2.79  2.80  2.80  2.80  2.80  2.81  2.81  2.81  2.82  2.82</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 6</span>  3.01  3.02  3.02  3.03  3.03  3.04  3.04  3.05  3.06  3.06  3.07  3.08  3.09</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 7</span>  2.99  2.99  3.00  3.01  3.01  3.02  3.02  3.03  3.04  3.04  3.05  3.06  3.07</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 8</span>  2.53  2.53  2.53  2.53  2.53  2.53  2.53  2.53  2.54  2.54  2.54  2.54  2.54</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 9</span>  3.27  3.28  3.29  3.29  3.30  3.31  3.31  3.32  3.33  3.33  3.34  3.35  3.36</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>10</span>  3.40  3.41  3.41  3.42  3.43  3.43  3.44  3.45  3.46  3.47  3.48  3.48  3.49</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 205 more rows</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 90 more variables: x_014 &lt;dbl&gt;, x_015 &lt;dbl&gt;, x_016 &lt;dbl&gt;, x_017 &lt;dbl&gt;,</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>#   x_018 &lt;dbl&gt;, x_019 &lt;dbl&gt;, x_020 &lt;dbl&gt;, x_021 &lt;dbl&gt;, x_022 &lt;dbl&gt;,</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>#   x_023 &lt;dbl&gt;, x_024 &lt;dbl&gt;, x_025 &lt;dbl&gt;, x_026 &lt;dbl&gt;, x_027 &lt;dbl&gt;,</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>#   x_028 &lt;dbl&gt;, x_029 &lt;dbl&gt;, x_030 &lt;dbl&gt;, x_031 &lt;dbl&gt;, x_032 &lt;dbl&gt;,</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>#   x_033 &lt;dbl&gt;, x_034 &lt;dbl&gt;, x_035 &lt;dbl&gt;, x_036 &lt;dbl&gt;, x_037 &lt;dbl&gt;,</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>#   x_038 &lt;dbl&gt;, x_039 &lt;dbl&gt;, x_040 &lt;dbl&gt;, x_041 &lt;dbl&gt;, x_042 &lt;dbl&gt;, …</span></span></span>
<span></span></code></pre>
</div>
<p>From that dataset&rsquo;s documentation:</p>
<blockquote>
<p>These data are recorded on a Tecator Infratec Food and Feed Analyzer&hellip; For each meat sample the data consists of a 100 channel spectrum of absorbances and the contents of moisture (water), fat and protein. The absorbance is -log10 of the transmittance measured by the spectrometer. The three contents, measured in percent, are determined by analytic chemistry.</p>
</blockquote>
<p>We&rsquo;ll try to predict the protein content, as a percentage, using the absorbance measurements.</p>
<p>Before we take a further look, let&rsquo;s split up our data. I&rsquo;ll first select off two other possible outcome variables and, after splitting into training and testing sets, resample the data using 5-fold cross-validation with 2 repeats.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>meats</span> <span class='o'>&lt;-</span> <span class='nv'>meats</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'>select</span><span class='o'>(</span><span class='o'>-</span><span class='nv'>water</span>, <span class='o'>-</span><span class='nv'>fat</span><span class='o'>)</span></span>
<span></span>
<span><span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>1</span><span class='o'>)</span></span>
<span><span class='nv'>meats_split</span> <span class='o'>&lt;-</span> <span class='nf'>initial_split</span><span class='o'>(</span><span class='nv'>meats</span><span class='o'>)</span></span>
<span><span class='nv'>meats_train</span> <span class='o'>&lt;-</span> <span class='nf'>training</span><span class='o'>(</span><span class='nv'>meats_split</span><span class='o'>)</span></span>
<span><span class='nv'>meats_test</span> <span class='o'>&lt;-</span> <span class='nf'>testing</span><span class='o'>(</span><span class='nv'>meats_split</span><span class='o'>)</span></span>
<span><span class='nv'>meats_folds</span> <span class='o'>&lt;-</span> <span class='nf'>vfold_cv</span><span class='o'>(</span><span class='nv'>meats_train</span>, v <span class='o'>=</span> <span class='m'>5</span>, repeats <span class='o'>=</span> <span class='m'>2</span><span class='o'>)</span></span></code></pre>
</div>
<p>The tricky parts of this modeling problem are that:</p>
<ol>
<li>There are few observations to work with (215 total).</li>
<li>Each of these 100 absorbance measurements are <em>highly</em> correlated.</li>
</ol>
<p>Visualizing that correlation:</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>meats_train</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span>
<span>  <span class='nf'><a href='https://corrr.tidymodels.org/reference/correlate.html'>correlate</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span>
<span>  <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/autoplot.html'>autoplot</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>+</span></span>
<span>  <span class='nf'>theme</span><span class='o'>(</span>axis.text.x <span class='o'>=</span> <span class='nf'>element_blank</span><span class='o'>(</span><span class='o'>)</span>, axis.text.y <span class='o'>=</span> <span class='nf'>element_blank</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; Correlation computed with</span></span>
<span><span class='c'>#&gt; <span style='color: #00BBBB;'>•</span> Method: 'pearson'</span></span>
<span><span class='c'>#&gt; <span style='color: #00BBBB;'>•</span> Missing treated using: 'pairwise.complete.obs'</span></span>
<span></span></code></pre>
<img src="https://posit-open-source.netlify.app/blog/tidyverse/2024/bonsai-0-3-0/figs/correlate-1.png" width="700px" style="display: block; margin: auto;" />
</div>
<p>Almost all of these pairwise correlations between predictors are near 1, besides the last variable and every other variable. That last variable with weaker correlation values? It&rsquo;s the outcome.</p>
<h2 id="baseline-models">Baseline models
</h2>
<p>There are several existing model implementations in tidymodels that are resilient to highly correlated predictors. The first one I&rsquo;d probably reach for is an elastic net: an interpolation of the LASSO and Ridge regularized linear regression models. Evaluating that modeling approach against resamples:</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># define a regularized linear model</span></span>
<span><span class='nv'>spec_lr</span> <span class='o'>&lt;-</span> </span>
<span>  <span class='nf'><a href='https://parsnip.tidymodels.org/reference/linear_reg.html'>linear_reg</a></span><span class='o'>(</span>penalty <span class='o'>=</span> <span class='nf'><a href='https://hardhat.tidymodels.org/reference/tune.html'>tune</a></span><span class='o'>(</span><span class='o'>)</span>, mixture <span class='o'>=</span> <span class='nf'><a href='https://hardhat.tidymodels.org/reference/tune.html'>tune</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span>
<span>  <span class='nf'><a href='https://parsnip.tidymodels.org/reference/set_engine.html'>set_engine</a></span><span class='o'>(</span><span class='s'>"glmnet"</span><span class='o'>)</span></span>
<span></span>
<span><span class='c'># try out different penalization approaches</span></span>
<span><span class='nv'>res_lr</span> <span class='o'>&lt;-</span> <span class='nf'>tune_grid</span><span class='o'>(</span><span class='nv'>spec_lr</span>, <span class='nv'>protein</span> <span class='o'>~</span> <span class='nv'>.</span>, <span class='nv'>meats_folds</span><span class='o'>)</span></span>
<span></span>
<span><span class='nf'>show_best</span><span class='o'>(</span><span class='nv'>res_lr</span>, metric <span class='o'>=</span> <span class='s'>"rmse"</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 5 × 8</span></span></span>
<span><span class='c'>#&gt;         penalty mixture .metric .estimator  mean     n std_err .config          </span></span>
<span><span class='c'>#&gt;           <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span>   <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>   <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>      <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span>   <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>            </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 0.000<span style='text-decoration: underline;'>032</span>4       0.668 rmse    standard    1.24    10  0.051<span style='text-decoration: underline;'>6</span> Preprocessor1_Mo…</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 0.000<span style='text-decoration: underline;'>000</span>005<span style='text-decoration: underline;'>24</span>   0.440 rmse    standard    1.25    10  0.054<span style='text-decoration: underline;'>8</span> Preprocessor1_Mo…</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 0.000<span style='text-decoration: underline;'>000</span>461     0.839 rmse    standard    1.26    10  0.053<span style='text-decoration: underline;'>8</span> Preprocessor1_Mo…</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 0.000<span style='text-decoration: underline;'>005</span>50      0.965 rmse    standard    1.26    10  0.054<span style='text-decoration: underline;'>0</span> Preprocessor1_Mo…</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>5</span> 0.000<span style='text-decoration: underline;'>000</span>048<span style='text-decoration: underline;'>9</span>    0.281 rmse    standard    1.26    10  0.053<span style='text-decoration: underline;'>4</span> Preprocessor1_Mo…</span></span>
<span></span><span><span class='nf'>show_best</span><span class='o'>(</span><span class='nv'>res_lr</span>, metric <span class='o'>=</span> <span class='s'>"rsq"</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 5 × 8</span></span></span>
<span><span class='c'>#&gt;         penalty mixture .metric .estimator  mean     n std_err .config          </span></span>
<span><span class='c'>#&gt;           <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span>   <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>   <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>      <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span>   <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>            </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 0.000<span style='text-decoration: underline;'>032</span>4       0.668 rsq     standard   0.849    10  0.012<span style='text-decoration: underline;'>6</span> Preprocessor1_Mo…</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 0.000<span style='text-decoration: underline;'>000</span>005<span style='text-decoration: underline;'>24</span>   0.440 rsq     standard   0.848    10  0.012<span style='text-decoration: underline;'>8</span> Preprocessor1_Mo…</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 0.000<span style='text-decoration: underline;'>000</span>461     0.839 rsq     standard   0.846    10  0.011<span style='text-decoration: underline;'>4</span> Preprocessor1_Mo…</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 0.000<span style='text-decoration: underline;'>005</span>50      0.965 rsq     standard   0.846    10  0.011<span style='text-decoration: underline;'>1</span> Preprocessor1_Mo…</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>5</span> 0.000<span style='text-decoration: underline;'>000</span>048<span style='text-decoration: underline;'>9</span>    0.281 rsq     standard   0.846    10  0.012<span style='text-decoration: underline;'>6</span> Preprocessor1_Mo…</span></span>
<span></span></code></pre>
</div>
<p>That best RMSE value of 1.24 gives us a baseline to work with, and the best R-squared 0.85 seems like a good start.</p>
<p>Many tree-based model implementations in tidymodels generally handle correlated predictors well. Just to be apples-to-apples with <code>&quot;aorsf&quot;</code>, let&rsquo;s use a different random forest engine to get a better sense for baseline performance:</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>spec_rf</span> <span class='o'>&lt;-</span> </span>
<span>  <span class='nf'><a href='https://parsnip.tidymodels.org/reference/rand_forest.html'>rand_forest</a></span><span class='o'>(</span>mtry <span class='o'>=</span> <span class='nf'><a href='https://hardhat.tidymodels.org/reference/tune.html'>tune</a></span><span class='o'>(</span><span class='o'>)</span>, min_n <span class='o'>=</span> <span class='nf'><a href='https://hardhat.tidymodels.org/reference/tune.html'>tune</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span>
<span>  <span class='c'># this is the default engine, but for consistency's sake:</span></span>
<span>  <span class='nf'><a href='https://parsnip.tidymodels.org/reference/set_engine.html'>set_engine</a></span><span class='o'>(</span><span class='s'>"ranger"</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span>
<span>  <span class='nf'><a href='https://parsnip.tidymodels.org/reference/set_args.html'>set_mode</a></span><span class='o'>(</span><span class='s'>"regression"</span><span class='o'>)</span></span>
<span></span>
<span><span class='nv'>res_rf</span> <span class='o'>&lt;-</span> <span class='nf'>tune_grid</span><span class='o'>(</span><span class='nv'>spec_rf</span>, <span class='nv'>protein</span> <span class='o'>~</span> <span class='nv'>.</span>, <span class='nv'>meats_folds</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; <span style='color: #0000BB;'>i</span> <span style='color: #000000;'>Creating pre-processing data to finalize unknown parameter: mtry</span></span></span>
<span></span><span></span>
<span><span class='nf'>show_best</span><span class='o'>(</span><span class='nv'>res_rf</span>, metric <span class='o'>=</span> <span class='s'>"rmse"</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 5 × 8</span></span></span>
<span><span class='c'>#&gt;    mtry min_n .metric .estimator  mean     n std_err .config              </span></span>
<span><span class='c'>#&gt;   <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>   <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>      <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span>   <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>                </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>1</span>    96     4 rmse    standard    2.37    10  0.090<span style='text-decoration: underline;'>5</span> Preprocessor1_Model08</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>2</span>    41     6 rmse    standard    2.39    10  0.088<span style='text-decoration: underline;'>3</span> Preprocessor1_Model01</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>3</span>    88    10 rmse    standard    2.43    10  0.081<span style='text-decoration: underline;'>6</span> Preprocessor1_Model06</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>4</span>    79    17 rmse    standard    2.51    10  0.074<span style='text-decoration: underline;'>0</span> Preprocessor1_Model07</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>5</span>    27    18 rmse    standard    2.52    10  0.077<span style='text-decoration: underline;'>8</span> Preprocessor1_Model04</span></span>
<span></span><span><span class='nf'>show_best</span><span class='o'>(</span><span class='nv'>res_rf</span>, metric <span class='o'>=</span> <span class='s'>"rsq"</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 5 × 8</span></span></span>
<span><span class='c'>#&gt;    mtry min_n .metric .estimator  mean     n std_err .config              </span></span>
<span><span class='c'>#&gt;   <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>   <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>      <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span>   <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>                </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>1</span>    96     4 rsq     standard   0.424    10  0.038<span style='text-decoration: underline;'>5</span> Preprocessor1_Model08</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>2</span>    41     6 rsq     standard   0.409    10  0.039<span style='text-decoration: underline;'>4</span> Preprocessor1_Model01</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>3</span>    88    10 rsq     standard   0.387    10  0.036<span style='text-decoration: underline;'>5</span> Preprocessor1_Model06</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>4</span>    79    17 rsq     standard   0.353    10  0.040<span style='text-decoration: underline;'>4</span> Preprocessor1_Model07</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>5</span>    27    18 rsq     standard   0.346    10  0.039<span style='text-decoration: underline;'>7</span> Preprocessor1_Model04</span></span>
<span></span></code></pre>
</div>
<p>Not so hot. Just to show I&rsquo;m not making a straw man here, I&rsquo;ll evaluate a few more alternative modeling approaches behind the curtain and print out their best performance metrics:</p>
<ul>
<li><strong>Gradient boosted tree with LightGBM</strong>. Best RMSE: 2.34. Best R-squared: 0.43.</li>
<li><strong>Partial least squares regression</strong>. Best RMSE: 1.39. Best R-squared: 0.81.</li>
<li><strong>Support vector machine</strong>. Best RMSE: 2.28. Best R-squared: 0.46.</li>
</ul>
<p>This is a tricky one.</p>
<h2 id="introducing-accelerated-oblique-random-forests">Introducing accelerated oblique random forests
</h2>
<p>The 0.3.0 release of bonsai introduces support for accelerated oblique random forests via the <code>&quot;aorsf&quot;</code> engine for classification and regression in tidymodels. (Tidy survival modelers might note that <a href="https://www.tidyverse.org/blog/2023/04/censored-0-2-0/" target="_blank" rel="noopener">we already support <code>&quot;aorsf&quot;</code> for censored regression</a>
 via the <a href="https://censored.tidymodels.org" target="_blank" rel="noopener">censored</a>
 parsnip extension package!)</p>
<p>Unlike trees in conventional random forests, which create splits using thresholds based on individual predictors (e.g. <code>x_001 &gt; 3</code>), oblique random forests use linear combinations of predictors to create splits (e.g. <code>x_001 * x_002 &gt; 7.5</code>) and have been shown to improve predictive performance related to conventional random forests for a variety of applications (Menze et al. 2011). &ldquo;Oblique&rdquo; references the appearance of decision boundaries when a set of splits is plotted; I&rsquo;ve grabbed a visual from the <a href="https://github.com/ropensci/aorsf?tab=readme-ov-file#what-does-oblique-mean" target="_blank" rel="noopener">aorsf README</a>
 that demonstrates:</p>
<div class="highlight">
<img src="https://posit-open-source.netlify.app/blog/tidyverse/2024/bonsai-0-3-0/figures/oblique.png" alt="Two plots of decision boundaries for a classification problem. One uses single-variable splitting and the other oblique splitting. Both trees partition the predictor space defined by predictors X1 and X2, but the oblique splits do a better job of separating the two classes thanks to an 'oblique' boundary formed by considering both X1 and X2 at the same time." width="700px" style="display: block; margin: auto;" />
</div>
<p>In the above, we&rsquo;d like to separate the purple dots from the orange squares. A tree in a traditional random forest, represented on the left, can only generate splits based on one of X1 or X2 at a time. A tree in an oblique random forest, represented on the right, can consider both X1 and X2 in creating decision boundaries, often resulting in stronger predictive performance.</p>
<p>Where does the &ldquo;accelerated&rdquo; come from? Generally, finding optimal oblique splits is computationally more intensive than finding single-predictor splits. The aorsf package uses something called &ldquo;Newton Raphson scoring&rdquo;&mdash;the same algorithm under the hood in the survival package&mdash;to identify splits based on linear combinations of predictor variables. This approach speeds up that process greatly, resulting in fit times that are analogous to implementations of traditional random forests in R (and hundreds of times faster than existing oblique random forest implementations, Jaeger et al. 2024).</p>
<p>The code to tune this model with the <code>&quot;aorsf&quot;</code> engine is the same as for <code>&quot;ranger&quot;</code>, except we switch out the <code>engine</code> argument to <a href="https://parsnip.tidymodels.org/reference/set_engine.html" target="_blank" rel="noopener"><code>set_engine()</code></a>
:</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>spec_aorsf</span> <span class='o'>&lt;-</span> </span>
<span>  <span class='nf'><a href='https://parsnip.tidymodels.org/reference/rand_forest.html'>rand_forest</a></span><span class='o'>(</span></span>
<span>    mtry <span class='o'>=</span> <span class='nf'><a href='https://hardhat.tidymodels.org/reference/tune.html'>tune</a></span><span class='o'>(</span><span class='o'>)</span>,</span>
<span>    min_n <span class='o'>=</span> <span class='nf'><a href='https://hardhat.tidymodels.org/reference/tune.html'>tune</a></span><span class='o'>(</span><span class='o'>)</span></span>
<span>  <span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span>
<span>  <span class='nf'><a href='https://parsnip.tidymodels.org/reference/set_engine.html'>set_engine</a></span><span class='o'>(</span><span class='s'>"aorsf"</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span>
<span>  <span class='nf'><a href='https://parsnip.tidymodels.org/reference/set_args.html'>set_mode</a></span><span class='o'>(</span><span class='s'>"regression"</span><span class='o'>)</span></span>
<span></span>
<span><span class='nv'>res_aorsf</span> <span class='o'>&lt;-</span> <span class='nf'>tune_grid</span><span class='o'>(</span><span class='nv'>spec_aorsf</span>, <span class='nv'>protein</span> <span class='o'>~</span> <span class='nv'>.</span>, <span class='nv'>meats_folds</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; <span style='color: #0000BB;'>i</span> <span style='color: #000000;'>Creating pre-processing data to finalize unknown parameter: mtry</span></span></span>
<span></span><span></span>
<span><span class='nf'>show_best</span><span class='o'>(</span><span class='nv'>res_aorsf</span>, metric <span class='o'>=</span> <span class='s'>"rmse"</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 5 × 8</span></span></span>
<span><span class='c'>#&gt;    mtry min_n .metric .estimator  mean     n std_err .config              </span></span>
<span><span class='c'>#&gt;   <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>   <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>      <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span>   <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>                </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>1</span>    87    11 rmse    standard   0.786    10  0.037<span style='text-decoration: underline;'>0</span> Preprocessor1_Model02</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>2</span>    98     8 rmse    standard   0.789    10  0.036<span style='text-decoration: underline;'>3</span> Preprocessor1_Model10</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>3</span>    48     5 rmse    standard   0.793    10  0.036<span style='text-decoration: underline;'>3</span> Preprocessor1_Model01</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>4</span>    16    17 rmse    standard   0.803    10  0.032<span style='text-decoration: underline;'>5</span> Preprocessor1_Model09</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>5</span>    31    18 rmse    standard   0.813    10  0.035<span style='text-decoration: underline;'>9</span> Preprocessor1_Model05</span></span>
<span></span><span><span class='nf'>show_best</span><span class='o'>(</span><span class='nv'>res_aorsf</span>, metric <span class='o'>=</span> <span class='s'>"rsq"</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 5 × 8</span></span></span>
<span><span class='c'>#&gt;    mtry min_n .metric .estimator  mean     n std_err .config              </span></span>
<span><span class='c'>#&gt;   <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>   <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>      <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span>   <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>                </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>1</span>    48     5 rsq     standard   0.946    10 0.004<span style='text-decoration: underline;'>46</span> Preprocessor1_Model01</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>2</span>    98     8 rsq     standard   0.945    10 0.004<span style='text-decoration: underline;'>82</span> Preprocessor1_Model10</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>3</span>    87    11 rsq     standard   0.945    10 0.004<span style='text-decoration: underline;'>84</span> Preprocessor1_Model02</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>4</span>    16    17 rsq     standard   0.941    10 0.003<span style='text-decoration: underline;'>70</span> Preprocessor1_Model09</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>5</span>    31    18 rsq     standard   0.940    10 0.005<span style='text-decoration: underline;'>47</span> Preprocessor1_Model05</span></span>
<span></span></code></pre>
</div>
<p>Holy smokes. The best RMSE from aorsf is 0.79, much more performant than the previous best RMSE from the elastic net with a value of 1.24, and the best R-squared is 0.95, much stronger than the previous best (also from the elastic net) of 0.85.</p>
<p>Especially if your modeling problems involve few samples of many, highly correlated predictors, give the <code>&quot;aorsf&quot;</code> modeling engine a whirl in your workflows and let us know what you think!</p>
<h2 id="references">References
</h2>
<p>Byron C. Jaeger, Sawyer Welden, Kristin Lenoir, Jaime L. Speiser, Matthew W. Segar, Ambarish Pandey, Nicholas M. Pajewski. 2024. &ldquo;Accelerated and Interpretable Oblique Random Survival Forests.&rdquo; <em>Journal of Computational and Graphical Statistics</em> 33.1: 192-207.</p>
<p>Byron C. Jaeger, Sawyer Welden, Kristin Lenoir, and Nicholas M. Pajewski. 2022. &ldquo;aorsf: An R package for Supervised Learning Using the Oblique Random Survival Forest.&rdquo; <em>The Journal of Open Source Software</em>.</p>
<p>Bjoern H. Menze, B. Michael Kelm, Daniel N. Splitthoff, Ullrich Koethe, and Fred A. Hamprecht. (2011). &ldquo;On Oblique Random Forests.&rdquo; <em>Joint European Conference on Machine Learning and Knowledge Discovery in Databases</em> (pp. 453&ndash;469). Springer.</p>
<h2 id="acknowledgements">Acknowledgements
</h2>
<p>Thank you to <a href="https://github.com/bcjaeger" target="_blank" rel="noopener">@bcjaeger</a>
, the aorsf author, for doing most of the work to implement aorsf support in bonsai. Thank you to <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>
, <a href="https://github.com/joranE" target="_blank" rel="noopener">@joranE</a>
, <a href="https://github.com/jrosell" target="_blank" rel="noopener">@jrosell</a>
, <a href="https://github.com/nipnipj" target="_blank" rel="noopener">@nipnipj</a>
, <a href="https://github.com/p-schaefer" target="_blank" rel="noopener">@p-schaefer</a>
, <a href="https://github.com/seb-mueller" target="_blank" rel="noopener">@seb-mueller</a>
, and <a href="https://github.com/tcovert" target="_blank" rel="noopener">@tcovert</a>
 for their contributions on the bonsai repository since version 0.2.1.</p>
]]></description>
      <enclosure url="https://posit-open-source.netlify.app/blog/tidyverse/2024/bonsai-0-3-0/thumbnail-wd.jpg" length="389634" type="image/jpeg" />
    </item>
    <item>
      <title>Introducing Keras 3 for R</title>
      <link>https://posit-open-source.netlify.app/blog/ai/kalinowskikeras3/</link>
      <pubDate>Tue, 21 May 2024 00:00:00 +0000</pubDate>
      <guid>https://posit-open-source.netlify.app/blog/ai/kalinowskikeras3/</guid>
      <dc:creator>Tomasz Kalinowski</dc:creator><description><![CDATA[<p>We are thrilled to introduce <code>keras3</code>, the next version of the Keras R
package. <code>keras3</code> is a ground-up rebuild of <code>{keras}</code>, maintaining the
beloved features of the original while refining and simplifying the API
based on valuable insights gathered over the past few years.</p>
<p>Keras provides a complete toolkit for building deep learning models in
R&mdash;it&rsquo;s never been easier to build, train, evaluate, and deploy deep
learning models.</p>
<h2 id="installation">Installation
</h2>
<p>To install Keras 3:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">install.packages</span><span class="p">(</span><span class="s">&#34;keras3&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nf">library</span><span class="p">(</span><span class="n">keras3</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nf">install_keras</span><span class="p">()</span>
</span></span></code></pre></td></tr></table>
</div>
</div><h2 id="whats-new">What&rsquo;s new:
</h2>
<h3 id="documentation">Documentation
</h3>
<p>Great documentation is essential, and we&rsquo;ve worked hard to make sure
that <code>keras3</code> has excellent documentation, both now, and in the future.</p>
<p>Keras 3 comes with a full refresh of the website:
<a href="https://keras.posit.co" target="_blank" rel="noopener">https://keras.posit.co</a>
. There, you will find guides, tutorials,
reference pages with rendered examples, and a new examples gallery. All
the reference pages and guides are also available via R&rsquo;s built-in help
system.</p>
<p>In a fast moving ecosystem like deep learning, creating great
documentation and wrappers once is not enough. There also need to be
workflows that ensure the documentation is up-to-date with upstream
dependencies. To accomplish this, {keras3} includes two new maintainer
features that ensure the R documentation and function wrappers will stay
up-to-date:</p>
<ul>
<li>
<p>We now take snapshots of the upstream documentation and API surface.
With each release, all R documentation is rebased on upstream
updates. This workflow ensures that all R documentation (guides,
examples, vignettes, and reference pages) and R function signatures
stay up-to-date with upstream. This snapshot-and-rebase
functionality is implemented in a new standalone R package,
<a href="https://github.com/t-kalinowski/doctether" target="_blank" rel="noopener">{doctether}</a>
, which may
be useful for R package maintainers needing to keep documentation in
parity with dependencies.</p>
</li>
<li>
<p>All examples and vignettes can now be evaluated and rendered during
a package build. This ensures that no stale or broken example code
makes it into a release. It also means all user facing example code
now additionally serves as an extended suite of snapshot unit and
integration tests.</p>
<p>Evaluating code in vignettes and examples is still not permitted
according to CRAN restrictions. We work around the CRAN restriction
by adding additional package build steps that pre-render
<a href="https://github.com/rstudio/keras/blob/main/man/roxygen/meta.R" target="_blank" rel="noopener">examples</a>

and
<a href="https://github.com/rstudio/keras/blob/main/tools/knit.R" target="_blank" rel="noopener">vignettes</a>
.</p>
</li>
</ul>
<p>Combined, these two features will make it substantially easier for Keras
in R to maintain feature parity and up-to-date documentation with the
Python API to Keras.</p>
<h3 id="multi-backend-support">Multi-backend support
</h3>
<p>Soon after its launch in 2015, Keras featured support for most popular
deep learning frameworks: TensorFlow, Theano, MXNet, and CNTK. Over
time, the landscape shifted; Theano, MXNet, and CNTK were retired, and
TensorFlow surged in popularity. In 2021, three years ago, TensorFlow
became the premier and only supported Keras backend. Now, the landscape
has shifted again.</p>
<p>Keras 3 brings the return of multi-backend support. Choose a backend by
calling:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">use_backend</span><span class="p">(</span><span class="s">&#34;jax&#34;</span><span class="p">)</span> <span class="c1"># or &#34;tensorflow&#34;, &#34;torch&#34;, &#34;numpy&#34;</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p>The default backend continues to be TensorFlow, which is the best choice
for most users today; for small-to-medium sized models this is still the
fastest backend. However, each backend has different strengths, and
being able to switch easily will let you adapt to changes as your
project, or the frameworks themselves, evolve.</p>
<p>Today, switching to the Jax backend can, for some model types, bring
substantial speed improvements. Jax is also the only backend that has
support for a new model parallelism distributed training API. Switching
to Torch can be helpful during development, often producing simpler
trackbacks while debugging.</p>
<p>Keras 3 also lets you incorporate any pre-existing Torch, Jax, or Flax
module as a standard Keras layer by using the appropriate wrapper,
letting you build atop existing projects with Keras. For example, train
a Torch model using the Keras high-level training API (<code>compile()</code> +
<code>fit()</code>), or include a Flax module as a component of a larger Keras
model. The new multi-backend support lets you use Keras à la carte.</p>
<h3 id="the-ops-family">The &lsquo;Ops&rsquo; family
</h3>
<p><code>{keras3}</code> introduces a new &ldquo;Operations&rdquo; family of function. The Ops
family, currently with over <a href="https://keras.posit.co/reference/index.html#operations" target="_blank" rel="noopener">200
functions</a>
,
provides a comprehensive suite of operations typically needed when
operating on nd-arrays for deep learning. The Operation family
supersedes and greatly expands on the former family of backend functions
prefixed with <code>k_</code> in the <code>{keras}</code> package.</p>
<p>The Ops functions let you write backend-agnostic code. They provide a
uniform API, regardless of if you&rsquo;re working with TensorFlow Tensors,
Jax Arrays, Torch Tensors, Keras Symbolic Tensors, NumPy arrays, or R
arrays.</p>
<p>The Ops functions:</p>
<ul>
<li>all start with prefix <code>op_</code> (e.g., <code>op_stack()</code>)</li>
<li>all are pure functions (they produce no side-effects)</li>
<li>all use consistent 1-based indexing, and coerce doubles to integers
as needed</li>
<li>all are safe to use with any backend (tensorflow, jax, torch, numpy)</li>
<li>all are safe to use in both eager and graph/jit/tracing modes</li>
</ul>
<p>The Ops API includes:</p>
<ul>
<li>The entirety of the NumPy API (<code>numpy.*</code>)</li>
<li>The TensorFlow NN API (<code>tf.nn.*</code>)</li>
<li>Common linear algebra functions (A subset of <code>scipy.linalg.*</code>)</li>
<li>A subfamily of image transformers</li>
<li>A comprehensive set of loss functions</li>
<li>And more!</li>
</ul>
<h3 id="ingest-tabular-data-with-layer_feature_space">Ingest tabular data with <code>layer_feature_space()</code>
</h3>
<p><code>keras3</code> provides a new set of functions for building models that ingest
tabular data: <code>layer_feature_space()</code> and a family of feature
transformer functions (prefix, <code>feature_</code>) for building keras models
that can work with tabular data, either as inputs to a keras model, or
as preprocessing steps in a data loading pipeline (e.g., a
<code>tfdatasets::dataset_map()</code>).</p>
<p>See the <a href="https://keras.posit.co/reference/layer_feature_space.html" target="_blank" rel="noopener">reference
page</a>
 and an
example usage in a full <a href="https://keras.posit.co/articles/examples/structured_data/structured_data_classification_with_feature_space.html" target="_blank" rel="noopener">end-to-end
example</a>

to learn more.</p>
<h3 id="new-subclassing-api">New Subclassing API
</h3>
<p>The subclassing API has been refined and extended to <a href="https://keras.posit.co/reference/index.html#base-keras-classes" target="_blank" rel="noopener">more Keras
types</a>
.
Define subclasses simply by calling: <code>Layer()</code>, <code>Loss()</code>, <code>Metric()</code>,
<code>Callback()</code>, <code>Constraint()</code>, <code>Model()</code>, and <code>LearningRateSchedule()</code>.
Defining <code>{R6}</code> proxy classes is no longer necessary.</p>
<p>Additionally the documentation page for each of the subclassing
functions now contains a comprehensive listing of all the available
attributes and methods for that type. Check out
<a href="https://keras.posit.co/reference/Layer.html" target="_blank" rel="noopener"><code>?Layer</code></a>
 to see what&rsquo;s
possible.</p>
<h3 id="saving-and-export">Saving and Export
</h3>
<p>Keras 3 brings a new model serialization and export API. It is now much
simpler to save and restore models, and also, to export them for
serving.</p>
<ul>
<li>
<p><code>save_model()</code>/<code>load_model()</code>:<br>
A new high-level file format (extension: <code>.keras</code>) for saving and
restoring a full model.</p>
<p>The file format is backend-agnostic. This means that you can convert
trained models between backends, simply by saving with one backend,
and then loading with another. For example, train a model using Jax,
and then convert to Tensorflow for export.</p>
</li>
<li>
<p><code>export_savedmodel()</code>:<br>
Export just the forward pass of a model as a compiled artifact for
inference with <a href="https://www.tensorflow.org/tfx/guide/serving" target="_blank" rel="noopener">TF
Serving</a>
 or (soon)
<a href="https://posit.co/products/enterprise/connect/" target="_blank" rel="noopener">Posit Connect</a>
. This
is the easiest way to deploy a Keras model for efficient and
concurrent inference serving, all without any R or Python runtime
dependency.</p>
</li>
<li>
<p>Lower level entry points:</p>
<ul>
<li><code>save_model_weights()</code> / <code>load_model_weights()</code>:<br>
save just the weights as <code>.h5</code> files.</li>
<li><code>save_model_config()</code> / <code>load_model_config()</code>:<br>
save just the model architecture as a json file.</li>
</ul>
</li>
<li>
<p><code>register_keras_serializable()</code>:<br>
Register custom objects to enable them to be serialized and
deserialized.</p>
</li>
<li>
<p><code>serialize_keras_object()</code> / <code>deserialize_keras_object()</code>:<br>
Convert any Keras object to an R list of simple types that is safe
to convert to JSON or rds.</p>
</li>
<li>
<p>See the new <a href="https://keras.posit.co/articles/serialization_and_saving.html" target="_blank" rel="noopener">Serialization and Saving
vignette</a>

for more details and examples.</p>
</li>
</ul>
<h3 id="new-random-family">New <code>random</code> family
</h3>
<p>A new family of <a href="https://keras.posit.co/reference/index.html#random-tensor-generators" target="_blank" rel="noopener">random tensor
generators</a>
.
Like the Ops family, these work with all backends. Additionally, all the
RNG-using methods have support for stateless usage when you pass in a
seed generator. This enables tracing and compilation by frameworks that
have special support for stateless, pure, functions, like Jax. See
<a href="https://keras.posit.co/reference/random_seed_generator.html" target="_blank" rel="noopener"><code>?random_seed_generator()</code></a>

for example usage.</p>
<h3 id="other-additions">Other additions:
</h3>
<ul>
<li>
<p>New <a href="https://keras.posit.co/reference/shape.html" target="_blank" rel="noopener"><code>shape()</code></a>

function, one-stop utility for working with tensor shapes in all
contexts.</p>
</li>
<li>
<p>New and improved <code>print(model)</code> and <code>plot(model)</code> method. See some
examples of output in the <a href="https://keras.posit.co/articles/functional_api.html" target="_blank" rel="noopener">Functional API
guide</a>
</p>
</li>
<li>
<p>All new <code>fit()</code> progress bar and live metrics viewer output,
including new dark-mode support in the RStudio IDE.</p>
</li>
<li>
<p>New <a href="https://keras.posit.co/reference/index.html#configuration" target="_blank" rel="noopener"><code>config</code>
family</a>
,
a curated set of functions for getting and setting Keras global
configurations.</p>
</li>
<li>
<p>All of the other function families have expanded with new members:</p>
<ul>
<li><a href="https://keras.posit.co/reference/index.html#layers" target="_blank" rel="noopener">Layers</a>

(prefix, <code>layer_</code>)</li>
<li><a href="https://keras.posit.co/reference/index.html#activations" target="_blank" rel="noopener">Activation
functions</a>

(prefix, <code>activation_</code>)</li>
<li><a href="https://keras.posit.co/reference/index.html#optimizers" target="_blank" rel="noopener">Optimizers</a>

(prefix, <code>optimizer_</code>)</li>
<li><a href="https://keras.posit.co/reference/index.html#metrics" target="_blank" rel="noopener">Metrics</a>

(prefix <code>metric_</code>)</li>
<li><a href="https://keras.posit.co/reference/index.html#losses" target="_blank" rel="noopener">Losses</a>

(prefix <code>loss_)</code></li>
<li><a href="https://keras.posit.co/reference/index.html#image-preprocessing" target="_blank" rel="noopener">Image
preprocesing</a>

(prefixes <code>image_</code> and <code>op_image_</code>)</li>
<li><a href="https://keras.posit.co/reference/index.html#applications" target="_blank" rel="noopener">Applications</a>

(prefix, <code>application_</code>)</li>
</ul>
</li>
</ul>
<h3 id="migrating-from-keras-to-keras3">Migrating from <code>{keras}</code> to <code>{keras3}</code>
</h3>
<p><code>{keras3}</code> supersedes the <code>{keras}</code> package.</p>
<p>If you&rsquo;re writing new code today, you can start using <code>{keras3}</code> right
away.</p>
<p>If you have legacy code that uses <code>{keras}</code>, you are encouraged to
update the code for <code>{keras3}</code>. For many high-level API functions, such
as <code>layer_dense()</code>, <code>fit()</code>, and <code>keras_model()</code>, minimal to no changes
are required. However there is a long tail of small changes that you
might need to make when updating code that made use of the lower-level
Keras API. Some of those are documented here:
<a href="https://keras.io/guides/migrating_to_keras_3/" target="_blank" rel="noopener">https://keras.io/guides/migrating_to_keras_3/</a>
.</p>
<p>If you&rsquo;re running into issues or have questions about updating, don&rsquo;t
hesitate to ask on <a href="https://github.com/rstudio/keras/issues" target="_blank" rel="noopener">https://github.com/rstudio/keras/issues</a>
 or
<a href="https://github.com/rstudio/keras/discussions" target="_blank" rel="noopener">https://github.com/rstudio/keras/discussions</a>
.</p>
<p>The <code>{keras}</code> and <code>{keras3}</code> packages will coexist while the community
transitions. During the transition, <code>{keras}</code> will continue to receive
patch updates for compatibility with Keras v2, which continues to be
published to PyPi under the package name <code>tf-keras</code>. After <code>tf-keras</code> is
no longer maintained, the <code>{keras}</code> package will be archived.</p>
<h2 id="summary">Summary
</h2>
<p>In summary, <code>{keras3}</code> is a robust update to the Keras R package,
incorporating new features while preserving the ease of use and
functionality of the original. The new multi-backend support,
comprehensive suite of Ops functions, refined model serialization API,
and updated documentation workflows enable users to easily take
advantage of the latest developments in the deep learning community.</p>
<p>Whether you are a seasoned Keras user or just starting your deep
learning journey, Keras 3 provides the tools and flexibility to build,
train, and deploy models with ease and confidence. As we transition from
Keras 2 to Keras 3, we are committed to supporting the community and
ensuring a smooth migration. We invite you to explore the new features,
check out the updated documentation, and join the conversation on our
GitHub discussions page. Welcome to the next chapter of deep learning in
R with Keras 3!</p>
]]></description>
      <enclosure url="https://posit-open-source.netlify.app/blog/ai/kalinowskikeras3/thumbnail.png" length="25263" type="image/png" />
    </item>
    <item>
      <title>Q1 2024 tidymodels digest</title>
      <link>https://posit-open-source.netlify.app/blog/tidyverse/2024/tidymodels-2024-q1/</link>
      <pubDate>Wed, 24 Apr 2024 00:00:00 +0000</pubDate>
      <guid>https://posit-open-source.netlify.app/blog/tidyverse/2024/tidymodels-2024-q1/</guid>
      <dc:creator>Hannah Frick</dc:creator><description><![CDATA[<!--
TODO:
* [x] Look over / edit the post's title in the yaml
* [x] Edit (or delete) the description; note this appears in the Twitter card
* [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html))
* [x] Find photo & update yaml metadata
* [x] Create `thumbnail-sq.jpg`; height and width should be equal
* [x] Create `thumbnail-wd.jpg`; width should be >5x height
* [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)
* [x] Add intro sentence, e.g. the standard tagline for the package
* [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html)
-->
<p>The <a href="https://www.tidymodels.org/" target="_blank" rel="noopener">tidymodels</a>
 framework is a collection of R packages for modeling and machine learning using tidyverse principles.</p>
<p>Since the beginning of 2021, we have been publishing <a href="https://www.tidyverse.org/categories/roundup/" target="_blank" rel="noopener">quarterly updates</a>
 here on the tidyverse blog summarizing what&rsquo;s new in the tidymodels ecosystem. The purpose of these regular posts is to share useful new features and any updates you may have missed. You can check out the <a href="https://www.tidyverse.org/tags/tidymodels/" target="_blank" rel="noopener"><code>tidymodels</code> tag</a>
 to find all tidymodels blog posts here, including our roundup posts as well as those that are more focused, like these posts from the past couple of months:</p>
<ul>
<li><a href="https://www.tidyverse.org/blog/2024/04/tidymodels-survival-analysis/" target="_blank" rel="noopener">Survival analysis for time-to-event data with tidymodels</a>
</li>
<li><a href="https://www.tidyverse.org/blog/2024/03/tidymodels-fairness/" target="_blank" rel="noopener">Fair machine learning with tidymodels</a>
</li>
<li><a href="https://www.tidyverse.org/blog/2024/04/tune-1-2-0/" target="_blank" rel="noopener">tune 1.2.0</a>
</li>
</ul>
<p>Additionally, we have published several related articles on <a href="https://www.tidymodels.org/" target="_blank" rel="noopener">tidymodels.org</a>
:</p>
<ul>
<li><a href="https://www.tidymodels.org/learn/statistics/survival-case-study/" target="_blank" rel="noopener">How long until building complaints are dispositioned? A survival analysis case study</a>
</li>
<li><a href="https://www.tidymodels.org/learn/statistics/survival-metrics/" target="_blank" rel="noopener">Dynamic Performance Metrics for Event Time Data</a>
</li>
<li><a href="https://www.tidymodels.org/learn/statistics/survival-metrics-details/" target="_blank" rel="noopener">Accounting for Censoring in Performance Metrics for Event Time Data</a>
</li>
<li><a href="https://www.tidymodels.org/learn/work/fairness-detectors/" target="_blank" rel="noopener">Are GPT detectors fair? A machine learning fairness case study</a>
</li>
<li><a href="https://www.tidymodels.org/learn/work/fairness-readmission/" target="_blank" rel="noopener">Fair prediction of hospital readmission: a machine learning fairness case study</a>
</li>
<li><a href="https://www.tidymodels.org/learn/models/bootstrap-metrics/" target="_blank" rel="noopener">Confidence Intervals for Performance Metrics</a>
</li>
</ul>
<p>Since <a href="https://www.tidyverse.org/blog/2024/01/tidymodels-2023-q4/" target="_blank" rel="noopener">our last roundup post</a>
, there have been CRAN releases of 21 tidymodels packages. Here are links to their NEWS files:</p>
<div class="highlight">
<ul>
<li>baguette <a href="https://baguette.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.0.2)</a>
</li>
<li>brulee <a href="https://brulee.tidymodels.org/news/index.html" target="_blank" rel="noopener">(0.3.0)</a>
</li>
<li>butcher <a href="https://butcher.tidymodels.org/news/index.html" target="_blank" rel="noopener">(0.3.4)</a>
</li>
<li>censored <a href="https://censored.tidymodels.org/news/index.html" target="_blank" rel="noopener">(0.3.0)</a>
</li>
<li>dials <a href="https://dials.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.2.1)</a>
</li>
<li>embed <a href="https://embed.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.1.4)</a>
</li>
<li>finetune <a href="https://finetune.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.2.0)</a>
</li>
<li>hardhat <a href="https://hardhat.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.3.1)</a>
</li>
<li>modeldata <a href="https://modeldata.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.3.0)</a>
</li>
<li>parsnip <a href="https://parsnip.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.2.1)</a>
</li>
<li>probably <a href="https://probably.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.0.3)</a>
</li>
<li>recipes <a href="https://recipes.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.0.10)</a>
</li>
<li>rsample <a href="https://rsample.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.2.1)</a>
</li>
<li>shinymodels <a href="https://shinymodels.tidymodels.org/news/index.html" target="_blank" rel="noopener">(0.1.1)</a>
</li>
<li>stacks <a href="https://stacks.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.0.4)</a>
</li>
<li>tidyclust <a href="https://tidyclust.tidymodels.org/news/index.html" target="_blank" rel="noopener">(0.2.1)</a>
</li>
<li>tidymodels <a href="https://tidymodels.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.2.0)</a>
</li>
<li>tune <a href="https://tune.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.2.0)</a>
</li>
<li>workflows <a href="https://workflows.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.1.4)</a>
</li>
<li>workflowsets <a href="https://workflowsets.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.1.0)</a>
</li>
<li>yardstick <a href="https://yardstick.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.3.1)</a>
</li>
</ul>
</div>
<p>We&rsquo;ll highlight a few especially notable changes below: new prediction options in censored, consistency in augmenting parsnip models and workflows, as well as a new autoplot type for workflow sets.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://tidymodels.tidymodels.org'>tidymodels</a></span><span class='o'>)</span></span>
<span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://github.com/tidymodels/censored'>censored</a></span><span class='o'>)</span></span></code></pre>
</div>
<h2 id="new-prediction-options-in-censored">New prediction options in censored
</h2>
<p>As part of the framework-wide integration of survival analysis, the parsnip extension package censored has received some love in the form of new prediction options.</p>
<p>Random forests with the <code>&quot;aorsf&quot;</code> engine can now predict survival time, thanks to the new feature in the <a href="https://docs.ropensci.org/aorsf/" target="_blank" rel="noopener">aorsf</a>
 package itself. This means that all engines in censored can now predict survival time.</p>
<p>Let&rsquo;s predict survival time for the first five rows of the lung cancer dataset, survival analysis&rsquo; <code>mtcars</code>.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>rf_spec</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/rand_forest.html'>rand_forest</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'><a href='https://parsnip.tidymodels.org/reference/set_engine.html'>set_engine</a></span><span class='o'>(</span><span class='s'>"aorsf"</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'><a href='https://parsnip.tidymodels.org/reference/set_args.html'>set_mode</a></span><span class='o'>(</span><span class='s'>"censored regression"</span><span class='o'>)</span></span>
<span></span>
<span><span class='nv'>rf_fit</span> <span class='o'>&lt;-</span> <span class='nv'>rf_spec</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'><a href='https://generics.r-lib.org/reference/fit.html'>fit</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/pkg/survival/man/Surv.html'>Surv</a></span><span class='o'>(</span><span class='nv'>time</span>, <span class='nv'>status</span><span class='o'>)</span> <span class='o'>~</span> <span class='nv'>age</span> <span class='o'>+</span> <span class='nv'>sex</span>, data <span class='o'>=</span> <span class='nv'>lung</span><span class='o'>)</span></span>
<span></span>
<span><span class='nv'>lung_5</span> <span class='o'>&lt;-</span> <span class='nv'>lung</span><span class='o'>[</span><span class='m'>1</span><span class='o'>:</span><span class='m'>5</span>, <span class='o'>]</span></span>
<span><span class='nf'><a href='https://rdrr.io/r/stats/predict.html'>predict</a></span><span class='o'>(</span><span class='nv'>rf_fit</span>, new_data <span class='o'>=</span> <span class='nv'>lung_5</span>, type <span class='o'>=</span> <span class='s'>"time"</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 5 × 1</span></span></span>
<span><span class='c'>#&gt;   .pred_time</span></span>
<span><span class='c'>#&gt;        <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>1</span>       217.</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>2</span>       240.</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>3</span>       236.</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>4</span>       236.</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>5</span>       254.</span></span>
<span></span></code></pre>
</div>
<p>Some models allow for predictions based on different values for tuning parameter without having to refit the model. In parsnip, we refer to this as <a href="https://parsnip.tidymodels.org/articles/Submodels.html" target="_blank" rel="noopener">&ldquo;the submodel trick.&rdquo;</a>
 Some of those models are regularized models fitted with the <a href="https://glmnet.stanford.edu/" target="_blank" rel="noopener">glmnet</a>
 engine. In censored, the corresponding <a href="https://parsnip.tidymodels.org/reference/multi_predict.html" target="_blank" rel="noopener"><code>multi_predict()</code></a>
 method has now gained the prediction types <code>&quot;time&quot;</code> and <code>&quot;raw&quot;</code> in addition to the existing types <code>&quot;survival&quot;</code> and <code>&quot;linear_pred&quot;</code>.</p>
<p>Let&rsquo;s fit a regularized Cox model to illustrate. Note how we set the <code>penalty</code> to a fixed value of <code>0.1</code>.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>cox_fit</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/proportional_hazards.html'>proportional_hazards</a></span><span class='o'>(</span>penalty <span class='o'>=</span> <span class='m'>0.1</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'><a href='https://parsnip.tidymodels.org/reference/set_engine.html'>set_engine</a></span><span class='o'>(</span><span class='s'>"glmnet"</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'><a href='https://parsnip.tidymodels.org/reference/set_args.html'>set_mode</a></span><span class='o'>(</span><span class='s'>"censored regression"</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'><a href='https://generics.r-lib.org/reference/fit.html'>fit</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/pkg/survival/man/Surv.html'>Surv</a></span><span class='o'>(</span><span class='nv'>time</span>, <span class='nv'>status</span><span class='o'>)</span> <span class='o'>~</span> <span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>lung</span><span class='o'>)</span></span></code></pre>
</div>
<p>Predictions made with <a href="https://rdrr.io/r/stats/predict.html" target="_blank" rel="noopener"><code>predict()</code></a>
 use that penalty value of 0.1. With <a href="https://parsnip.tidymodels.org/reference/multi_predict.html" target="_blank" rel="noopener"><code>multi_predict()</code></a>
, we can change that value to something different without having to refit. Conveniently, we can predict for multiple penalty values as well.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/stats/predict.html'>predict</a></span><span class='o'>(</span><span class='nv'>cox_fit</span>, new_data <span class='o'>=</span> <span class='nv'>lung_5</span>, type <span class='o'>=</span> <span class='s'>"time"</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 5 × 1</span></span></span>
<span><span class='c'>#&gt;   .pred_time</span></span>
<span><span class='c'>#&gt;        <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>1</span>        <span style='color: #BB0000;'>NA</span> </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>2</span>       425.</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>3</span>        <span style='color: #BB0000;'>NA</span> </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>4</span>       350.</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>5</span>        <span style='color: #BB0000;'>NA</span></span></span>
<span></span><span></span>
<span><span class='nv'>mpred</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/multi_predict.html'>multi_predict</a></span><span class='o'>(</span><span class='nv'>cox_fit</span>, new_data <span class='o'>=</span> <span class='nv'>lung_5</span>, type <span class='o'>=</span> <span class='s'>"time"</span>, </span>
<span>                       penalty <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>0.01</span>, <span class='m'>0.1</span><span class='o'>)</span><span class='o'>)</span> </span>
<span><span class='nv'>mpred</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 5 × 1</span></span></span>
<span><span class='c'>#&gt;   .pred           </span></span>
<span><span class='c'>#&gt;   <span style='color: #555555; font-style: italic;'>&lt;list&gt;</span>          </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>1</span> <span style='color: #555555;'>&lt;tibble [2 × 2]&gt;</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>2</span> <span style='color: #555555;'>&lt;tibble [2 × 2]&gt;</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>3</span> <span style='color: #555555;'>&lt;tibble [2 × 2]&gt;</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>4</span> <span style='color: #555555;'>&lt;tibble [2 × 2]&gt;</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>5</span> <span style='color: #555555;'>&lt;tibble [2 × 2]&gt;</span></span></span>
<span></span></code></pre>
</div>
<p>The resulting tibble is nested by observation to follow the convention of one row per observation. For each observation, the predictions are stored in a tibble containing the penalty value along with the prediction.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>mpred</span><span class='o'>$</span><span class='nv'>.pred</span><span class='o'>[[</span><span class='m'>2</span><span class='o'>]</span><span class='o'>]</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 2 × 2</span></span></span>
<span><span class='c'>#&gt;   penalty .pred_time</span></span>
<span><span class='c'>#&gt;     <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span>      <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>1</span>    0.01       461.</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>2</span>    0.1        425.</span></span>
<span></span></code></pre>
</div>
<p>You can see that the predicted value from <a href="https://rdrr.io/r/stats/predict.html" target="_blank" rel="noopener"><code>predict()</code></a>
 matches the predicted value from <a href="https://parsnip.tidymodels.org/reference/multi_predict.html" target="_blank" rel="noopener"><code>multi_predict()</code></a>
 with a penalty of 0.1.</p>
<h2 id="consistent-augment-for-workflows-and-parsnip-models">Consistent <code>augment()</code> for workflows and parsnip models
</h2>
<p>If you are interested in exploring predictions in relation to predictors, <a href="https://generics.r-lib.org/reference/augment.html" target="_blank" rel="noopener"><code>augment()</code></a>
 is your extended <a href="https://rdrr.io/r/stats/predict.html" target="_blank" rel="noopener"><code>predict()</code></a>
 method: it will augment the inputted dataset with its predictions. For classification, it will add hard class predictions as well as class probabilities. For regression, it will add the numeric prediction. If the outcome variable is part of the dataset, it also calculates residuals. This has already been the case for fitted parsnip models, and the <a href="https://generics.r-lib.org/reference/augment.html" target="_blank" rel="noopener"><code>augment()</code></a>
 method for workflows will now also calculate residuals.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>spec_fit</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://generics.r-lib.org/reference/fit.html'>fit</a></span><span class='o'>(</span><span class='nf'><a href='https://parsnip.tidymodels.org/reference/linear_reg.html'>linear_reg</a></span><span class='o'>(</span><span class='o'>)</span>, <span class='nv'>mpg</span> <span class='o'>~</span> <span class='nv'>.</span>, <span class='nv'>mtcars</span><span class='o'>)</span></span>
<span><span class='nv'>wflow_fit</span> <span class='o'>&lt;-</span> <span class='nf'>workflow</span><span class='o'>(</span><span class='nv'>mpg</span> <span class='o'>~</span> <span class='nv'>.</span>, <span class='nf'><a href='https://parsnip.tidymodels.org/reference/linear_reg.html'>linear_reg</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://generics.r-lib.org/reference/fit.html'>fit</a></span><span class='o'>(</span><span class='nv'>mtcars</span><span class='o'>)</span></span>
<span></span>
<span><span class='nf'><a href='https://generics.r-lib.org/reference/augment.html'>augment</a></span><span class='o'>(</span><span class='nv'>spec_fit</span>, <span class='nv'>mtcars</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 32 × 13</span></span></span>
<span><span class='c'>#&gt;    .pred  .resid   mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear</span></span>
<span><span class='c'>#&gt;    <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span>   <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 1</span>  22.6 -<span style='color: #BB0000;'>1.60</span>    21       6  160    110  3.9   2.62  16.5     0     1     4</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 2</span>  22.1 -<span style='color: #BB0000;'>1.11</span>    21       6  160    110  3.9   2.88  17.0     0     1     4</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 3</span>  26.3 -<span style='color: #BB0000;'>3.45</span>    22.8     4  108     93  3.85  2.32  18.6     1     1     4</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 4</span>  21.2  0.163   21.4     6  258    110  3.08  3.22  19.4     1     0     3</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 5</span>  17.7  1.01    18.7     8  360    175  3.15  3.44  17.0     0     0     3</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 6</span>  20.4 -<span style='color: #BB0000;'>2.28</span>    18.1     6  225    105  2.76  3.46  20.2     1     0     3</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 7</span>  14.4 -<span style='color: #BB0000;'>0.086</span><span style='color: #BB0000; text-decoration: underline;'>3</span>  14.3     8  360    245  3.21  3.57  15.8     0     0     3</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 8</span>  22.5  1.90    24.4     4  147.    62  3.69  3.19  20       1     0     4</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 9</span>  24.4 -<span style='color: #BB0000;'>1.62</span>    22.8     4  141.    95  3.92  3.15  22.9     1     0     4</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>10</span>  18.7  0.501   19.2     6  168.   123  3.92  3.44  18.3     1     0     4</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 22 more rows</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 1 more variable: carb &lt;dbl&gt;</span></span></span>
<span></span><span></span>
<span><span class='nf'><a href='https://generics.r-lib.org/reference/augment.html'>augment</a></span><span class='o'>(</span><span class='nv'>wflow_fit</span>, <span class='nv'>mtcars</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 32 × 13</span></span></span>
<span><span class='c'>#&gt;    .pred  .resid   mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear</span></span>
<span><span class='c'>#&gt;  <span style='color: #555555;'>*</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span>   <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 1</span>  22.6 -<span style='color: #BB0000;'>1.60</span>    21       6  160    110  3.9   2.62  16.5     0     1     4</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 2</span>  22.1 -<span style='color: #BB0000;'>1.11</span>    21       6  160    110  3.9   2.88  17.0     0     1     4</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 3</span>  26.3 -<span style='color: #BB0000;'>3.45</span>    22.8     4  108     93  3.85  2.32  18.6     1     1     4</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 4</span>  21.2  0.163   21.4     6  258    110  3.08  3.22  19.4     1     0     3</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 5</span>  17.7  1.01    18.7     8  360    175  3.15  3.44  17.0     0     0     3</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 6</span>  20.4 -<span style='color: #BB0000;'>2.28</span>    18.1     6  225    105  2.76  3.46  20.2     1     0     3</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 7</span>  14.4 -<span style='color: #BB0000;'>0.086</span><span style='color: #BB0000; text-decoration: underline;'>3</span>  14.3     8  360    245  3.21  3.57  15.8     0     0     3</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 8</span>  22.5  1.90    24.4     4  147.    62  3.69  3.19  20       1     0     4</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'> 9</span>  24.4 -<span style='color: #BB0000;'>1.62</span>    22.8     4  141.    95  3.92  3.15  22.9     1     0     4</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>10</span>  18.7  0.501   19.2     6  168.   123  3.92  3.44  18.3     1     0     4</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 22 more rows</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 1 more variable: carb &lt;dbl&gt;</span></span></span>
<span></span></code></pre>
</div>
<p>Both methods also append on the left-hand side of the data frame, rather than the right-hand side. This means that prediction columns are always visible when printed, even for data frames with many columns. As you might expect, the order of the columns is the same for both methods as well.</p>
<h2 id="new-autoplot-type-for-workflow-sets">New autoplot type for workflow sets
</h2>
<p>Many tidymodels objects have <a href="https://ggplot2.tidyverse.org/reference/autoplot.html" target="_blank" rel="noopener"><code>autoplot()</code></a>
 methods for quickly getting a sense of the most important aspects of an object. For workflow sets, the method shows the value of the calculated performance metrics, as well as the respective rank of each workflow in the set. Let&rsquo;s put together a workflow set on the actual <code>mtcars</code> data and take a look at the default autoplot.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>mt_rec</span> <span class='o'>&lt;-</span> <span class='nf'>recipe</span><span class='o'>(</span><span class='nv'>mpg</span> <span class='o'>~</span> <span class='nv'>.</span>, <span class='nv'>mtcars</span><span class='o'>)</span></span>
<span><span class='nv'>mt_rec2</span> <span class='o'>&lt;-</span> <span class='nv'>mt_rec</span> <span class='o'>|&gt;</span> <span class='nf'>step_normalize</span><span class='o'>(</span><span class='nf'>all_numeric_predictors</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span></span>
<span><span class='nv'>mt_rec3</span> <span class='o'>&lt;-</span> <span class='nv'>mt_rec</span> <span class='o'>|&gt;</span> <span class='nf'>step_YeoJohnson</span><span class='o'>(</span><span class='nf'>all_numeric_predictors</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span></span>
<span></span>
<span><span class='nv'>wflow_set</span> <span class='o'>&lt;-</span> <span class='nf'>workflow_set</span><span class='o'>(</span></span>
<span>  <span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span>plain <span class='o'>=</span> <span class='nv'>mt_rec</span>, normalize <span class='o'>=</span> <span class='nv'>mt_rec2</span>, yeo_johnson <span class='o'>=</span> <span class='nv'>mt_rec3</span><span class='o'>)</span>, </span>
<span>  <span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span><span class='nf'><a href='https://parsnip.tidymodels.org/reference/linear_reg.html'>linear_reg</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span></span>
<span><span class='o'>)</span></span>
<span></span>
<span><span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>1</span><span class='o'>)</span></span>
<span><span class='nv'>wflow_set_fit</span> <span class='o'>&lt;-</span> <span class='nf'>workflow_map</span><span class='o'>(</span></span>
<span>  <span class='nv'>wflow_set</span>, </span>
<span>  <span class='s'>"fit_resamples"</span>, </span>
<span>  resamples <span class='o'>=</span> <span class='nf'>bootstraps</span><span class='o'>(</span><span class='nv'>mtcars</span><span class='o'>)</span></span>
<span><span class='o'>)</span></span>
<span></span>
<span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/autoplot.html'>autoplot</a></span><span class='o'>(</span><span class='nv'>wflow_set_fit</span><span class='o'>)</span></span>
</code></pre>
<img src="https://posit-open-source.netlify.app/blog/tidyverse/2024/tidymodels-2024-q1/figs/workflowsets-autoplot-1.png" width="700px" style="display: block; margin: auto;" />
</div>
<p>This allows you to grasp the metric values and rank of a workflow and let&rsquo;s you distinguish the type of preprocessor and model. In our case, we only have one type of model, and even just one type of preprocessor, a recipe. What we are much more interested in is which recipe corresponds to which rank. The new option of <code>type = &quot;wflow_id&quot;</code> lets us see which values and ranks correspond with which workflow and thus also with which recipe.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/autoplot.html'>autoplot</a></span><span class='o'>(</span><span class='nv'>wflow_set_fit</span>, type <span class='o'>=</span> <span class='s'>"wflow_id"</span><span class='o'>)</span></span>
</code></pre>
<img src="https://posit-open-source.netlify.app/blog/tidyverse/2024/tidymodels-2024-q1/figs/workflowsets-autoplot-new-1.png" width="700px" style="display: block; margin: auto;" />
</div>
<p>This makes it easy to spot that it&rsquo;s the Yeo-Johnson transformation that makes the difference here!</p>
<h2 id="acknowledgements">Acknowledgements
</h2>
<p>We&rsquo;d like to thank those in the community that contributed to tidymodels in the last quarter:</p>
<div class="highlight">
<ul>
<li>baguette: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>
, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>
.</li>
<li>brulee: <a href="https://github.com/jrosell" target="_blank" rel="noopener">@jrosell</a>
, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>
.</li>
<li>butcher: <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>
.</li>
<li>censored: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>
, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>
, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>
, and <a href="https://github.com/tripartio" target="_blank" rel="noopener">@tripartio</a>
.</li>
<li>dials: <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>
, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>
.</li>
<li>embed: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>
.</li>
<li>finetune: <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>
, <a href="https://github.com/jrosell" target="_blank" rel="noopener">@jrosell</a>
, <a href="https://github.com/mfansler" target="_blank" rel="noopener">@mfansler</a>
, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>
, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>
.</li>
<li>hardhat: <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>
, and <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>
.</li>
<li>modeldata: <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>
.</li>
<li>parsnip: <a href="https://github.com/birbritto" target="_blank" rel="noopener">@birbritto</a>
, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>
, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>
, <a href="https://github.com/jmunyoon" target="_blank" rel="noopener">@jmunyoon</a>
, <a href="https://github.com/marcelglueck" target="_blank" rel="noopener">@marcelglueck</a>
, <a href="https://github.com/mattheaphy" target="_blank" rel="noopener">@mattheaphy</a>
, <a href="https://github.com/mesdi" target="_blank" rel="noopener">@mesdi</a>
, <a href="https://github.com/nipnipj" target="_blank" rel="noopener">@nipnipj</a>
, <a href="https://github.com/pgg1309" target="_blank" rel="noopener">@pgg1309</a>
, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>
, <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>
, and <a href="https://github.com/wzbillings" target="_blank" rel="noopener">@wzbillings</a>
.</li>
<li>probably: <a href="https://github.com/brshallo" target="_blank" rel="noopener">@brshallo</a>
, <a href="https://github.com/Jeffrothschild" target="_blank" rel="noopener">@Jeffrothschild</a>
, <a href="https://github.com/jgaeb" target="_blank" rel="noopener">@jgaeb</a>
, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>
, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>
.</li>
<li>recipes: <a href="https://github.com/DemetriPananos" target="_blank" rel="noopener">@DemetriPananos</a>
, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>
, <a href="https://github.com/jdonland" target="_blank" rel="noopener">@jdonland</a>
, <a href="https://github.com/JiahuaQu" target="_blank" rel="noopener">@JiahuaQu</a>
, <a href="https://github.com/joranE" target="_blank" rel="noopener">@joranE</a>
, <a href="https://github.com/mikemahoney218" target="_blank" rel="noopener">@mikemahoney218</a>
, <a href="https://github.com/olivroy" target="_blank" rel="noopener">@olivroy</a>
, <a href="https://github.com/SantiagoD999" target="_blank" rel="noopener">@SantiagoD999</a>
, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>
, <a href="https://github.com/stufield" target="_blank" rel="noopener">@stufield</a>
, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>
.</li>
<li>rsample: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>
, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>
, <a href="https://github.com/mikemahoney218" target="_blank" rel="noopener">@mikemahoney218</a>
, <a href="https://github.com/paulcbauer" target="_blank" rel="noopener">@paulcbauer</a>
, <a href="https://github.com/StevenWallaert" target="_blank" rel="noopener">@StevenWallaert</a>
, <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>
, and <a href="https://github.com/ZWael" target="_blank" rel="noopener">@ZWael</a>
.</li>
<li>shinymodels: <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>
.</li>
<li>stacks: <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>
.</li>
<li>tidyclust: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>
, and <a href="https://github.com/katieburak" target="_blank" rel="noopener">@katieburak</a>
.</li>
<li>tidymodels: <a href="https://github.com/jkylearmstrong" target="_blank" rel="noopener">@jkylearmstrong</a>
, <a href="https://github.com/mine-cetinkaya-rundel" target="_blank" rel="noopener">@mine-cetinkaya-rundel</a>
, <a href="https://github.com/nikosGeography" target="_blank" rel="noopener">@nikosGeography</a>
, <a href="https://github.com/nipnipj" target="_blank" rel="noopener">@nipnipj</a>
, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>
.</li>
<li>tune: <a href="https://github.com/AlbertoImg" target="_blank" rel="noopener">@AlbertoImg</a>
, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>
, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>
, <a href="https://github.com/joranE" target="_blank" rel="noopener">@joranE</a>
, <a href="https://github.com/joshuagi" target="_blank" rel="noopener">@joshuagi</a>
, <a href="https://github.com/lionel-" target="_blank" rel="noopener">@lionel-</a>
, <a href="https://github.com/marcozanotti" target="_blank" rel="noopener">@marcozanotti</a>
, <a href="https://github.com/Peter4801" target="_blank" rel="noopener">@Peter4801</a>
, <a href="https://github.com/rfsaldanha" target="_blank" rel="noopener">@rfsaldanha</a>
, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>
, <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>
, and <a href="https://github.com/walkerjameschris" target="_blank" rel="noopener">@walkerjameschris</a>
.</li>
<li>workflows: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>
, <a href="https://github.com/mesdi" target="_blank" rel="noopener">@mesdi</a>
, <a href="https://github.com/Milardkh" target="_blank" rel="noopener">@Milardkh</a>
, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>
, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>
.</li>
<li>workflowsets: <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>
, and <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>
.</li>
<li>yardstick: <a href="https://github.com/asb2111" target="_blank" rel="noopener">@asb2111</a>
, <a href="https://github.com/Dpananos" target="_blank" rel="noopener">@Dpananos</a>
, <a href="https://github.com/EduMinsky" target="_blank" rel="noopener">@EduMinsky</a>
, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>
, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>
, and <a href="https://github.com/tripartio" target="_blank" rel="noopener">@tripartio</a>
.</li>
</ul>
</div>
<p>We&rsquo;re grateful for all of the tidymodels community, from observers to users to contributors. Happy modeling!</p>
]]></description>
      <enclosure url="https://posit-open-source.netlify.app/blog/tidyverse/2024/tidymodels-2024-q1/thumbnail-wd.jpg" length="274307" type="image/jpeg" />
    </item>
    <item>
      <title>tune 1.2.0</title>
      <link>https://posit-open-source.netlify.app/blog/tidyverse/2024/tune-1-2-0/</link>
      <pubDate>Thu, 18 Apr 2024 00:00:00 +0000</pubDate>
      <guid>https://posit-open-source.netlify.app/blog/tidyverse/2024/tune-1-2-0/</guid>
      <dc:creator>Simon Couch</dc:creator><description><![CDATA[<div class="highlight">
</div>
<p>We&rsquo;re indubitably amped to announce the release of <a href="https://tune.tidymodels.org/" target="_blank" rel="noopener">tune</a>
 1.2.0, a package for hyperparameter tuning in the <a href="https://www.tidymodels.org/" target="_blank" rel="noopener">tidymodels framework</a>
.</p>
<p>You can install it from CRAN, along with the rest of the core packages in tidymodels, using the tidymodels meta-package:</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"tidymodels"</span><span class='o'>)</span></span></code></pre>
</div>
<p>The 1.2.0 release of tune has introduced support for two major features that we&rsquo;ve written about on the tidyverse blog already:</p>
<ul>
<li><a href="https://www.tidyverse.org/blog/2024/04/tidymodels-survival-analysis/" target="_blank" rel="noopener">Survival analysis for time-to-event data with tidymodels</a>
</li>
<li><a href="https://www.tidyverse.org/blog/2024/03/tidymodels-fairness/" target="_blank" rel="noopener">Fair machine learning with tidymodels</a>
</li>
</ul>
<p>While those features got their own blog posts, there are several more features in this release that we thought were worth calling out. This post will highlight improvements to our support for parallel processing, the introduction of support for percentile confidence intervals for performance metrics, and a few other bits and bobs. You can see a full list of changes in the <a href="https://github.com/tidymodels/tune/releases/tag/v1.2.0" target="_blank" rel="noopener">release notes</a>
.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://tidymodels.tidymodels.org'>tidymodels</a></span><span class='o'>)</span></span></code></pre>
</div>
<p>Throughout this post, I&rsquo;ll refer to the example of tuning an XGBoost model to predict the fuel efficiency of various car models. I hear this is already a well-explored modeling problem, but alas:</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>2024</span><span class='o'>)</span></span>
<span></span>
<span><span class='nv'>xgb_res</span> <span class='o'>&lt;-</span> </span>
<span>  <span class='nf'>tune_grid</span><span class='o'>(</span></span>
<span>    <span class='nf'>boost_tree</span><span class='o'>(</span>mode <span class='o'>=</span> <span class='s'>"regression"</span>, mtry <span class='o'>=</span> <span class='nf'>tune</span><span class='o'>(</span><span class='o'>)</span>, learn_rate <span class='o'>=</span> <span class='nf'>tune</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span>,</span>
<span>    <span class='nv'>mpg</span> <span class='o'>~</span> <span class='nv'>.</span>,</span>
<span>    <span class='nf'>bootstraps</span><span class='o'>(</span><span class='nv'>mtcars</span><span class='o'>)</span>,</span>
<span>    control <span class='o'>=</span> <span class='nf'>control_grid</span><span class='o'>(</span>save_pred <span class='o'>=</span> <span class='kc'>TRUE</span><span class='o'>)</span></span>
<span>  <span class='o'>)</span></span></code></pre>
</div>
<p>Note that we&rsquo;ve used the <a href="https://tune.tidymodels.org/reference/control_grid.html" target="_blank" rel="noopener">control option</a>
 <code>save_pred = TRUE</code> to indicate that we want to save the predictions from our resampled models in the tuning results. Both <code>int_pctl()</code> and <code>compute_metrics()</code> below will need those predictions. The metrics for our resampled model look like so:</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>collect_metrics</span><span class='o'>(</span><span class='nv'>xgb_res</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 20 × 8</span></span></span>
<span><span class='c'>#&gt;    mtry learn_rate .metric .estimator   mean     n std_err .config              </span></span>
<span><span class='c'>#&gt;   <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span>      <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>   <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>       <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span>   <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>                </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>1</span>     2    0.002<span style='text-decoration: underline;'>04</span> rmse    standard   19.7      25  0.262  Preprocessor1_Model01</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>2</span>     2    0.002<span style='text-decoration: underline;'>04</span> rsq     standard    0.659    25  0.031<span style='text-decoration: underline;'>4</span> Preprocessor1_Model01</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>3</span>     6    0.008<span style='text-decoration: underline;'>59</span> rmse    standard   18.0      25  0.260  Preprocessor1_Model02</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>4</span>     6    0.008<span style='text-decoration: underline;'>59</span> rsq     standard    0.607    25  0.027<span style='text-decoration: underline;'>0</span> Preprocessor1_Model02</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>5</span>     3    0.027<span style='text-decoration: underline;'>6</span>  rmse    standard   14.0      25  0.267  Preprocessor1_Model03</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>6</span>     3    0.027<span style='text-decoration: underline;'>6</span>  rsq     standard    0.710    25  0.023<span style='text-decoration: underline;'>7</span> Preprocessor1_Model03</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 14 more rows</span></span></span>
<span></span></code></pre>
</div>
<h2 id="modernized-support-for-parallel-processing">Modernized support for parallel processing
</h2>
<p>The tidymodels framework has long supported evaluating models in parallel using the <a href="https://cran.r-project.org/web/packages/foreach/vignettes/foreach.html" target="_blank" rel="noopener">foreach</a>
 package. This release of tune has introduced support for parallelism using the <a href="https://www.futureverse.org/" target="_blank" rel="noopener">futureverse</a>
 framework, and we will begin deprecating our support for foreach in a coming release.</p>
<p>To tune a model in parallel with foreach, a user would load a <em>parallel backend</em> package (usually with a name like <a href="https://rdrr.io/r/base/library.html" target="_blank" rel="noopener"><code>library(doBackend)</code></a>
) and then <em>register</em> it with foreach (with a function call like <code>registerDoBackend()</code>). The tune package would then detect that registered backend and take it from there. For example, the code to distribute the above tuning process across 10 cores with foreach would look like:</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'>doMC</span><span class='o'>)</span></span>
<span><span class='nf'><a href='https://rdrr.io/pkg/doMC/man/registerDoMC.html'>registerDoMC</a></span><span class='o'>(</span>cores <span class='o'>=</span> <span class='m'>10</span><span class='o'>)</span></span>
<span></span>
<span><span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>2024</span><span class='o'>)</span></span>
<span></span>
<span><span class='nv'>xgb_res</span> <span class='o'>&lt;-</span> </span>
<span>  <span class='nf'>tune_grid</span><span class='o'>(</span></span>
<span>    <span class='nf'>boost_tree</span><span class='o'>(</span>mode <span class='o'>=</span> <span class='s'>"regression"</span>, mtry <span class='o'>=</span> <span class='nf'>tune</span><span class='o'>(</span><span class='o'>)</span>, learn_rate <span class='o'>=</span> <span class='nf'>tune</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span>,</span>
<span>    <span class='nv'>mpg</span> <span class='o'>~</span> <span class='nv'>.</span>,</span>
<span>    <span class='nf'>bootstraps</span><span class='o'>(</span><span class='nv'>mtcars</span><span class='o'>)</span>,</span>
<span>    control <span class='o'>=</span> <span class='nf'>control_grid</span><span class='o'>(</span>save_pred <span class='o'>=</span> <span class='kc'>TRUE</span><span class='o'>)</span></span>
<span>  <span class='o'>)</span></span></code></pre>
</div>
<p>The code to do so with future is similarly simple. Users first load the <a href="https://future.futureverse.org/index.html" target="_blank" rel="noopener">future</a>
 package, and then specify a <a href="https://future.futureverse.org/reference/plan.html" target="_blank" rel="noopener"><code>plan()</code></a>
 which dictates how computations will be distributed. For example, the code to distribute the above tuning process across 10 cores with future looks like:</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://future.futureverse.org'>future</a></span><span class='o'>)</span></span>
<span><span class='nf'><a href='https://future.futureverse.org/reference/plan.html'>plan</a></span><span class='o'>(</span><span class='nv'>multisession</span>, workers <span class='o'>=</span> <span class='m'>10</span><span class='o'>)</span></span>
<span></span>
<span><span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>2024</span><span class='o'>)</span></span>
<span></span>
<span><span class='nv'>xgb_res</span> <span class='o'>&lt;-</span> </span>
<span>  <span class='nf'>tune_grid</span><span class='o'>(</span></span>
<span>    <span class='nf'>boost_tree</span><span class='o'>(</span>mode <span class='o'>=</span> <span class='s'>"regression"</span>, mtry <span class='o'>=</span> <span class='nf'>tune</span><span class='o'>(</span><span class='o'>)</span>, learn_rate <span class='o'>=</span> <span class='nf'>tune</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span>,</span>
<span>    <span class='nv'>mpg</span> <span class='o'>~</span> <span class='nv'>.</span>,</span>
<span>    <span class='nf'>bootstraps</span><span class='o'>(</span><span class='nv'>mtcars</span><span class='o'>)</span>,</span>
<span>    control <span class='o'>=</span> <span class='nf'>control_grid</span><span class='o'>(</span>save_pred <span class='o'>=</span> <span class='kc'>TRUE</span><span class='o'>)</span></span>
<span>  <span class='o'>)</span></span></code></pre>
</div>
<p>For users, the transition to parallelism with future has several benefits:</p>
<ul>
<li>The futureverse presently supports a greater number of parallelism technologies and has been more likely to receive implementations for new ones.</li>
<li>Once foreach is fully deprecated, users will be able to use the <a href="https://www.tidyverse.org/blog/2023/04/tuning-delights/#interactive-issue-logging" target="_blank" rel="noopener">interactive logger</a>
 when tuning in parallel.</li>
</ul>
<p>From our perspective, transitioning our parallelism support to future makes our packages much more maintainable, reducing complexity in random number generation, error handling, and progress reporting.</p>
<p>In an upcoming release of the package, you&rsquo;ll see a deprecation warning when a foreach parallel backend is registered but no future plan has been specified, so start transitioning your code sooner than later!</p>
<h2 id="percentile-confidence-intervals">Percentile confidence intervals
</h2>
<p>Following up on changes in the <a href="https://github.com/tidymodels/rsample/releases/tag/v1.2.0" target="_blank" rel="noopener">most recent rsample release</a>
, tune has introduced a <a href="https://tune.tidymodels.org/reference/int_pctl.tune_results.html" target="_blank" rel="noopener">method for <code>int_pctl()</code></a>
 that calculates percentile confidence intervals for performance metrics. To calculate a 90% confidence interval for the values of each performance metric returned in <code>collect_metrics()</code>, we&rsquo;d write:</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>2024</span><span class='o'>)</span></span>
<span></span>
<span><span class='nf'>int_pctl</span><span class='o'>(</span><span class='nv'>xgb_res</span>, alpha <span class='o'>=</span> <span class='m'>.1</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 20 × 8</span></span></span>
<span><span class='c'>#&gt;   .metric .estimator .lower .estimate .upper .config             mtry learn_rate</span></span>
<span><span class='c'>#&gt;   <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>   <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>       <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span>     <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span>  <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>              <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span>      <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>1</span> rmse    bootstrap  18.1      19.9   22.0   Preprocessor1_Mod…     2    0.002<span style='text-decoration: underline;'>04</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>2</span> rsq     bootstrap   0.570     0.679  0.778 Preprocessor1_Mod…     2    0.002<span style='text-decoration: underline;'>04</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>3</span> rmse    bootstrap  16.6      18.3   19.9   Preprocessor1_Mod…     6    0.008<span style='text-decoration: underline;'>59</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>4</span> rsq     bootstrap   0.548     0.665  0.765 Preprocessor1_Mod…     6    0.008<span style='text-decoration: underline;'>59</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>5</span> rmse    bootstrap  12.5      14.1   15.9   Preprocessor1_Mod…     3    0.027<span style='text-decoration: underline;'>6</span> </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>6</span> rsq     bootstrap   0.622     0.720  0.818 Preprocessor1_Mod…     3    0.027<span style='text-decoration: underline;'>6</span> </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 14 more rows</span></span></span>
<span></span></code></pre>
</div>
<p>Note that the output has the same number of rows as the <code>collect_metrics()</code> output: one for each unique pair of metric and workflow.</p>
<p>This is very helpful for validation sets. Other resampling methods generate replicated performance statistics. We can compute simple interval estimates using the mean and standard error for those. Validation sets produce only one estimate, and these bootstrap methods are probably the best option for obtaining interval estimates.</p>
<h2 id="breaking-change-relocation-of-ellipses">Breaking change: relocation of ellipses
</h2>
<p>We&rsquo;ve made a <strong>breaking change</strong> in argument order for several functions in the package (and downstream packages like finetune and workflowsets). Ellipses (&hellip;) are now used consistently in the package to require optional arguments to be named. For functions that previously had unused ellipses at the end of the function signature, they have been moved to follow the last argument without a default value, and several other functions that previously did not have ellipses in their signatures gained them. This applies to methods for <code>augment()</code>, <code>collect_predictions()</code>, <code>collect_metrics()</code>, <code>select_best()</code>, <code>show_best()</code>, and <code>conf_mat_resampled()</code>.</p>
<h2 id="compute-new-metrics-without-re-fitting">Compute new metrics without re-fitting
</h2>
<p>We&rsquo;ve also added a new function, <a href="https://tune.tidymodels.org/reference/compute_metrics.html" target="_blank" rel="noopener"><code>compute_metrics()</code></a>
, that allows for calculating metrics that were not used when evaluating against resamples. For example, consider our <code>xgb_res</code> object. Since we didn&rsquo;t supply any metrics to evaluate, and this model is a regression model, tidymodels selected RMSE and R<sup>2</sup> as defaults:</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>collect_metrics</span><span class='o'>(</span><span class='nv'>xgb_res</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 20 × 8</span></span></span>
<span><span class='c'>#&gt;    mtry learn_rate .metric .estimator   mean     n std_err .config              </span></span>
<span><span class='c'>#&gt;   <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span>      <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>   <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>       <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span>   <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>                </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>1</span>     2    0.002<span style='text-decoration: underline;'>04</span> rmse    standard   19.7      25  0.262  Preprocessor1_Model01</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>2</span>     2    0.002<span style='text-decoration: underline;'>04</span> rsq     standard    0.659    25  0.031<span style='text-decoration: underline;'>4</span> Preprocessor1_Model01</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>3</span>     6    0.008<span style='text-decoration: underline;'>59</span> rmse    standard   18.0      25  0.260  Preprocessor1_Model02</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>4</span>     6    0.008<span style='text-decoration: underline;'>59</span> rsq     standard    0.607    25  0.027<span style='text-decoration: underline;'>0</span> Preprocessor1_Model02</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>5</span>     3    0.027<span style='text-decoration: underline;'>6</span>  rmse    standard   14.0      25  0.267  Preprocessor1_Model03</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>6</span>     3    0.027<span style='text-decoration: underline;'>6</span>  rsq     standard    0.710    25  0.023<span style='text-decoration: underline;'>7</span> Preprocessor1_Model03</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 14 more rows</span></span></span>
<span></span></code></pre>
</div>
<p>In the past, if you wanted to evaluate that workflow against a performance metric that you hadn&rsquo;t included in your <code>tune_grid()</code> run, you&rsquo;d need to re-run <code>tune_grid()</code>, fitting models and predicting new values all over again. Now, using the <code>compute_metrics()</code> function, you can use the <code>tune_grid()</code> output you&rsquo;ve already generated and compute any number of new metrics without having to fit any more models as long as you use the control option <code>save_pred = TRUE</code> when tuning.</p>
<p>So, say I want to additionally calculate Huber Loss and Mean Absolute Percent Error. I just pass those metrics along with the tuning result to <code>compute_metrics()</code>, and the result looks just like <code>collect_metrics()</code> output for the metrics originally calculated:</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>compute_metrics</span><span class='o'>(</span><span class='nv'>xgb_res</span>, <span class='nf'>metric_set</span><span class='o'>(</span><span class='nv'>huber_loss</span>, <span class='nv'>mape</span><span class='o'>)</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 20 × 8</span></span></span>
<span><span class='c'>#&gt;    mtry learn_rate .metric    .estimator  mean     n std_err .config            </span></span>
<span><span class='c'>#&gt;   <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span>      <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>      <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>      <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span>   <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>              </span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>1</span>     2    0.002<span style='text-decoration: underline;'>04</span> huber_loss standard    18.3    25  0.232  Preprocessor1_Mode…</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>2</span>     2    0.002<span style='text-decoration: underline;'>04</span> mape       standard    94.4    25  0.068<span style='text-decoration: underline;'>5</span> Preprocessor1_Mode…</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>3</span>     6    0.008<span style='text-decoration: underline;'>59</span> huber_loss standard    16.7    25  0.229  Preprocessor1_Mode…</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>4</span>     6    0.008<span style='text-decoration: underline;'>59</span> mape       standard    85.7    25  0.178  Preprocessor1_Mode…</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>5</span>     3    0.027<span style='text-decoration: underline;'>6</span>  huber_loss standard    12.6    25  0.230  Preprocessor1_Mode…</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>6</span>     3    0.027<span style='text-decoration: underline;'>6</span>  mape       standard    64.4    25  0.435  Preprocessor1_Mode…</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 14 more rows</span></span></span>
<span></span></code></pre>
</div>
<h2 id="easily-pivot-resampled-metrics">Easily pivot resampled metrics
</h2>
<p>Finally, the <code>collect_metrics()</code> method for tune results recently <a href="https://tune.tidymodels.org/reference/collect_predictions.html#arguments" target="_blank" rel="noopener">gained a new argument</a>
, <code>type</code>, indicating the shape of the returned metrics. The default, <code>type = &quot;long&quot;</code>, is the same shape as before. The argument value <code>type = &quot;wide&quot;</code> will allot each metric its own column, making it easier to compare metrics across different models.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>collect_metrics</span><span class='o'>(</span><span class='nv'>xgb_res</span>, type <span class='o'>=</span> <span class='s'>"wide"</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 10 × 5</span></span></span>
<span><span class='c'>#&gt;    mtry learn_rate .config                rmse   rsq</span></span>
<span><span class='c'>#&gt;   <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span>      <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span>                 <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>1</span>     2    0.002<span style='text-decoration: underline;'>04</span> Preprocessor1_Model01  19.7 0.659</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>2</span>     6    0.008<span style='text-decoration: underline;'>59</span> Preprocessor1_Model02  18.0 0.607</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>3</span>     3    0.027<span style='text-decoration: underline;'>6</span>  Preprocessor1_Model03  14.0 0.710</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>4</span>     2    0.037<span style='text-decoration: underline;'>1</span>  Preprocessor1_Model04  12.3 0.728</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>5</span>     5    0.005<span style='text-decoration: underline;'>39</span> Preprocessor1_Model05  18.8 0.595</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'>6</span>     9    0.011<span style='text-decoration: underline;'>0</span>  Preprocessor1_Model06  17.4 0.577</span></span>
<span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 4 more rows</span></span></span>
<span></span></code></pre>
</div>
<p>Under the hood, this is indeed just a <code>pivot_wider()</code> call. We&rsquo;ve found that it&rsquo;s time-consuming and error-prone to programmatically determine identifying columns when pivoting resampled metrics, so we&rsquo;ve localized and thoroughly tested the code that we use to do so with this feature.</p>
<h2 id="more-love-for-the-brier-score">More love for the Brier score
</h2>
<p>Tuning and resampling functions use default metrics when the user does not specify a custom metric set. For regression models, these are RMSE and R<sup>2</sup>. For classification, accuracy and the area under the ROC curve <em>were</em> the default. We&rsquo;ve also added the <a href="https://en.wikipedia.org/wiki/Brier_score" target="_blank" rel="noopener">Brier score</a>
 to the default classification metric list.</p>
<h2 id="acknowledgements">Acknowledgements
</h2>
<p>As always, we&rsquo;re appreciative of the community contributors who helped make this release happen: <a href="https://github.com/AlbertoImg" target="_blank" rel="noopener">@AlbertoImg</a>
, <a href="https://github.com/dramanica" target="_blank" rel="noopener">@dramanica</a>
, <a href="https://github.com/epiheather" target="_blank" rel="noopener">@epiheather</a>
, <a href="https://github.com/joranE" target="_blank" rel="noopener">@joranE</a>
, <a href="https://github.com/jrosell" target="_blank" rel="noopener">@jrosell</a>
, <a href="https://github.com/jxu" target="_blank" rel="noopener">@jxu</a>
, <a href="https://github.com/kbodwin" target="_blank" rel="noopener">@kbodwin</a>
, <a href="https://github.com/kenraywilliams" target="_blank" rel="noopener">@kenraywilliams</a>
, <a href="https://github.com/KJT-Habitat" target="_blank" rel="noopener">@KJT-Habitat</a>
, <a href="https://github.com/lionel-" target="_blank" rel="noopener">@lionel-</a>
, <a href="https://github.com/marcozanotti" target="_blank" rel="noopener">@marcozanotti</a>
, <a href="https://github.com/MasterLuke84" target="_blank" rel="noopener">@MasterLuke84</a>
, <a href="https://github.com/mikemahoney218" target="_blank" rel="noopener">@mikemahoney218</a>
, <a href="https://github.com/PathosEthosLogos" target="_blank" rel="noopener">@PathosEthosLogos</a>
, and <a href="https://github.com/Peter4801" target="_blank" rel="noopener">@Peter4801</a>
.</p>
<div class="highlight">
</div>
]]></description>
      <enclosure url="https://posit-open-source.netlify.app/blog/tidyverse/2024/tune-1-2-0/thumbnail-wd.jpg" length="119547" type="image/jpeg" />
    </item>
    <item>
      <title>Q4 2023 tidymodels digest</title>
      <link>https://posit-open-source.netlify.app/blog/tidyverse/2024/tidymodels-2023-q4/</link>
      <pubDate>Tue, 09 Jan 2024 00:00:00 +0000</pubDate>
      <guid>https://posit-open-source.netlify.app/blog/tidyverse/2024/tidymodels-2023-q4/</guid>
      <dc:creator>Emil Hvitfeldt</dc:creator><description><![CDATA[<!--
TODO:
* [ ] Look over / edit the post's title in the yaml
* [ ] Edit (or delete) the description; note this appears in the Twitter card
* [ ] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html))
* [ ] Find photo & update yaml metadata
* [ ] Create `thumbnail-sq.jpg`; height and width should be equal
* [ ] Create `thumbnail-wd.jpg`; width should be >5x height
* [ ] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)
* [ ] Add intro sentence, e.g. the standard tagline for the package
* [ ] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html)
-->
<p>The <a href="https://www.tidymodels.org/" target="_blank" rel="noopener">tidymodels</a>
 framework is a collection of R packages for modeling and machine learning using tidyverse principles.</p>
<p>Since the beginning of 2021, we have been publishing <a href="https://www.tidyverse.org/categories/roundup/" target="_blank" rel="noopener">quarterly updates</a>
 here on the tidyverse blog summarizing what&rsquo;s new in the tidymodels ecosystem. The purpose of these regular posts is to share useful new features and any updates you may have missed. You can check out the <a href="https://www.tidyverse.org/tags/tidymodels/" target="_blank" rel="noopener"><code>tidymodels</code> tag</a>
 to find all tidymodels blog posts here, including our roundup posts as well as those that are more focused, like this post from the past couple of months:</p>
<ul>
<li><a href="https://www.tidyverse.org/blog/2023/11/tidymodels-errors-q4/" target="_blank" rel="noopener">Three ways errors are about to get better in tidymodels</a>
</li>
</ul>
<p>Since <a href="https://www.tidyverse.org/blog/2022/12/tidymodels-2022-q4/" target="_blank" rel="noopener">our last roundup post</a>
, there have been CRAN releases of 7 tidymodels packages. Here are links to their NEWS files:</p>
<div class="highlight">
<ul>
<li>embed <a href="https://embed.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.1.3)</a>
</li>
<li>modeldb <a href="https://modeldb.tidymodels.org/news/index.html" target="_blank" rel="noopener">(0.3.0)</a>
</li>
<li>recipes <a href="https://recipes.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.0.9)</a>
</li>
<li>spatialsample <a href="https://spatialsample.tidymodels.org/news/index.html" target="_blank" rel="noopener">(0.5.1)</a>
</li>
<li>stacks <a href="https://stacks.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.0.3)</a>
</li>
<li>textrecipes <a href="https://textrecipes.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.0.6)</a>
</li>
<li>tidyposterior <a href="https://tidyposterior.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.0.1)</a>
</li>
</ul>
</div>
<p>We&rsquo;ll highlight a few especially notable changes below: updated warnings when normalizing, and better error messages in recipes.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://tidymodels.tidymodels.org'>tidymodels</a></span><span class='o'>)</span></span>
<span></span>
<span><span class='nf'><a href='https://rdrr.io/r/utils/data.html'>data</a></span><span class='o'>(</span><span class='s'>"ames"</span>, package <span class='o'>=</span> <span class='s'>"modeldata"</span><span class='o'>)</span></span></code></pre>
</div>
<h2 id="updated-warnings-when-normalizing">Updated warnings when normalizing
</h2>
<p>The latest release of recipes features an overhaul of the warnings and error messages to use the <a href="https://cli.r-lib.org/" target="_blank" rel="noopener">cli</a>
 package. With this, we are starting the project of providing more information signaling when things don&rsquo;t go well.</p>
<p>The first type of issue we now signal for is when you try to normalize data that contains elements such as <code>NA</code> or <code>Inf</code>. These can sneak in for several reasons, and before this release, it happened silently. Below we are creating a recipe using the <code>ames</code> data set, and before we normalize, we are taking the logarithms of all variables that pertain to square footage.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>rec</span> <span class='o'>&lt;-</span> <span class='nf'>recipe</span><span class='o'>(</span><span class='nv'>Sale_Price</span> <span class='o'>~</span> <span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>ames</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'>step_log</span><span class='o'>(</span><span class='nf'>contains</span><span class='o'>(</span><span class='s'>"SF"</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'>step_normalize</span><span class='o'>(</span><span class='nf'>all_numeric_predictors</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'>prep</span><span class='o'>(</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; Warning: Columns `BsmtFin_SF_1`, `BsmtFin_SF_2`, `Bsmt_Unf_SF`, `Total_Bsmt_SF`,</span></span>
<span><span class='c'>#&gt; `Second_Flr_SF`, `Wood_Deck_SF`, and `Open_Porch_SF` returned NaN, because</span></span>
<span><span class='c'>#&gt; variance cannot be calculated and scaling cannot be used. Consider avoiding</span></span>
<span><span class='c'>#&gt; `Inf` or `-Inf` values and/or setting `na_rm = TRUE` before normalizing.</span></span>
<span></span></code></pre>
</div>
<p>We now get a warning that something happened, telling us that it encountered <code>Inf</code> or <code>-Inf</code>. Knowing that, we can go back and investigate what went wrong. If we exclude <code>step_normalize()</code> and <code>bake()</code> the recipe, we see that a number of <code>-Inf</code> values appear.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>recipe</span><span class='o'>(</span><span class='nv'>Sale_Price</span> <span class='o'>~</span> <span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>ames</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'>step_log</span><span class='o'>(</span><span class='nf'>contains</span><span class='o'>(</span><span class='s'>"SF"</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'>prep</span><span class='o'>(</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'>bake</span><span class='o'>(</span>new_data <span class='o'>=</span> <span class='kc'>NULL</span>, <span class='nf'>contains</span><span class='o'>(</span><span class='s'>"SF"</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'>glimpse</span><span class='o'>(</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; Rows: 2,930</span></span>
<span><span class='c'>#&gt; Columns: 8</span></span>
<span><span class='c'>#&gt; $ BsmtFin_SF_1  <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 0.6931472, 1.7917595, 0.0000000, 0.0000000, 1.0986123, 1…</span></span>
<span><span class='c'>#&gt; $ BsmtFin_SF_2  <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> -Inf, 4.969813, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf…</span></span>
<span><span class='c'>#&gt; $ Bsmt_Unf_SF   <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 6.089045, 5.598422, 6.006353, 6.951772, 4.919981, 5.7807…</span></span>
<span><span class='c'>#&gt; $ Total_Bsmt_SF <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 6.984716, 6.782192, 7.192182, 7.654443, 6.833032, 6.8308…</span></span>
<span><span class='c'>#&gt; $ First_Flr_SF  <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 7.412160, 6.797940, 7.192182, 7.654443, 6.833032, 6.8308…</span></span>
<span><span class='c'>#&gt; $ Second_Flr_SF <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> -Inf, -Inf, -Inf, -Inf, 6.552508, 6.519147, -Inf, -Inf, …</span></span>
<span><span class='c'>#&gt; $ Wood_Deck_SF  <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 5.347108, 4.941642, 5.973810, -Inf, 5.356586, 5.886104, …</span></span>
<span><span class='c'>#&gt; $ Open_Porch_SF <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 4.127134, -Inf, 3.583519, -Inf, 3.526361, 3.583519, -Inf…</span></span>
<span></span></code></pre>
</div>
<p>Looking at the bare data set, we notice that the <code>-Inf</code> all appear where there are <code>0</code>, which makes sense since <code>log(0)</code> is undefined.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>ames</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'>select</span><span class='o'>(</span><span class='nf'>contains</span><span class='o'>(</span><span class='s'>"SF"</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'>glimpse</span><span class='o'>(</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; Rows: 2,930</span></span>
<span><span class='c'>#&gt; Columns: 8</span></span>
<span><span class='c'>#&gt; $ BsmtFin_SF_1  <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 2, 6, 1, 1, 3, 3, 3, 1, 3, 7, 7, 1, 7, 3, 3, 1, 3, 3, 4,…</span></span>
<span><span class='c'>#&gt; $ BsmtFin_SF_2  <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 0, 144, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1120, 0, 0, …</span></span>
<span><span class='c'>#&gt; $ Bsmt_Unf_SF   <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 441, 270, 406, 1045, 137, 324, 722, 1017, 415, 994, 763,…</span></span>
<span><span class='c'>#&gt; $ Total_Bsmt_SF <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 1080, 882, 1329, 2110, 928, 926, 1338, 1280, 1595, 994, …</span></span>
<span><span class='c'>#&gt; $ First_Flr_SF  <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> 1656, 896, 1329, 2110, 928, 926, 1338, 1280, 1616, 1028,…</span></span>
<span><span class='c'>#&gt; $ Second_Flr_SF <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> 0, 0, 0, 0, 701, 678, 0, 0, 0, 776, 892, 0, 676, 0, 0, 1…</span></span>
<span><span class='c'>#&gt; $ Wood_Deck_SF  <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> 210, 140, 393, 0, 212, 360, 0, 0, 237, 140, 157, 483, 0,…</span></span>
<span><span class='c'>#&gt; $ Open_Porch_SF <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> 62, 0, 36, 0, 34, 36, 0, 82, 152, 60, 84, 21, 75, 0, 54,…</span></span>
<span></span></code></pre>
</div>
<p>Knowing that it was <code>0</code> that caused the problem, we can set an <code>offset</code> to avoid taking <code>log(0)</code>.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>rec</span> <span class='o'>&lt;-</span> <span class='nf'>recipe</span><span class='o'>(</span><span class='nv'>Sale_Price</span> <span class='o'>~</span> <span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>ames</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'>step_log</span><span class='o'>(</span><span class='nf'>contains</span><span class='o'>(</span><span class='s'>"SF"</span><span class='o'>)</span>, offset <span class='o'>=</span> <span class='m'>0.5</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'>step_normalize</span><span class='o'>(</span><span class='nf'>all_numeric_predictors</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'>prep</span><span class='o'>(</span><span class='o'>)</span></span></code></pre>
</div>
<p>These warnings appear in <code>step_scale()</code>, <code>step_normalize()</code>, <code>step_center()</code> or <code>step_range()</code>.</p>
<h2 id="better-error-messages-in-recipes">Better error messages in recipes
</h2>
<p>Another problem that happens a lot when using recipes, is accidentally selecting variables that have the wrong types. Previously this caused the following error:</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>recipe</span><span class='o'>(</span><span class='nv'>Sale_Price</span> <span class='o'>~</span> <span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>ames</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'>step_dummy</span><span class='o'>(</span><span class='nf'>starts_with</span><span class='o'>(</span><span class='s'>"Lot_"</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'>prep</span><span class='o'>(</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; Error in `step_dummy()`:</span></span>
<span><span class='c'>#&gt; Caused by error in `prep()`:</span></span>
<span><span class='c'>#&gt; ! All columns selected for the step should be string, factor, or ordered.</span></span></code></pre>
</div>
<p>In the newest release, it will detail the offending variables and what was wrong with them.</p>
<div class="highlight">
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>recipe</span><span class='o'>(</span><span class='nv'>Sale_Price</span> <span class='o'>~</span> <span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>ames</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'>step_dummy</span><span class='o'>(</span><span class='nf'>starts_with</span><span class='o'>(</span><span class='s'>"Lot_"</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'>prep</span><span class='o'>(</span><span class='o'>)</span> <span class='o'>|&gt;</span></span>
<span>  <span class='nf'>bake</span><span class='o'>(</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `step_dummy()`:</span></span></span>
<span><span class='c'>#&gt; <span style='font-weight: bold;'>Caused by error in `prep()`:</span></span></span>
<span><span class='c'>#&gt; <span style='color: #BB0000;'>✖</span> All columns selected for the step should be factor or ordered.</span></span>
<span><span class='c'>#&gt; <span style='color: #00BBBB;'>•</span> 1 double variable found: `Lot_Frontage`</span></span>
<span><span class='c'>#&gt; <span style='color: #00BBBB;'>•</span> 1 integer variable found: `Lot_Area`</span></span>
<span></span></code></pre>
</div>
<h2 id="coming-attractions">Coming Attractions
</h2>
<p>In the next month or so we are planning a cascade of CRAN releases. There is a lot of new functionality coming your way, especially in the tune package.</p>
<p>A number of our packages will (finally) be able to cohesively fit, evaluate, tune, and predict models for event times (a.k.a., <a href="https://en.wikipedia.org/wiki/Survival_analysis" target="_blank" rel="noopener">survival analysis</a>
). If you don&rsquo;t do this type of work, you might not notice the new capabilities. However, if you do, tidymodels will be able to do a lot more for you.</p>
<p>We&rsquo;ve also implemented a number of features related to model fairness. These tools allow tidymodels users to identify when machine learning models behave unfairly towards certain groups of people, and will also be included in the upcoming releases of tidymodels packages in Q1.</p>
<p>We&rsquo;ll highlight a lot of these new capabilities in blog posts here as well as tutorials on <a href="https://www.tidymodels.org/" target="_blank" rel="noopener"><code>tidymodels.org</code></a>
.</p>
<p>So, there&rsquo;s a lot more coming! We are very excited to have these features officially available and to see what people can do with them.</p>
<h2 id="acknowledgements">Acknowledgements
</h2>
<p>We&rsquo;d like to thank those in the community that contributed to tidymodels in the last quarter:</p>
<div class="highlight">
<ul>
<li>embed: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>
.</li>
<li>modeldb: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>
, <a href="https://github.com/hadley" target="_blank" rel="noopener">@hadley</a>
, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>
.</li>
<li>recipes: <a href="https://github.com/atusy" target="_blank" rel="noopener">@atusy</a>
, <a href="https://github.com/bcadenato" target="_blank" rel="noopener">@bcadenato</a>
, <a href="https://github.com/collinberke" target="_blank" rel="noopener">@collinberke</a>
, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>
, <a href="https://github.com/gfronk" target="_blank" rel="noopener">@gfronk</a>
, <a href="https://github.com/jkennel" target="_blank" rel="noopener">@jkennel</a>
, <a href="https://github.com/joeycouse" target="_blank" rel="noopener">@joeycouse</a>
, <a href="https://github.com/jxu" target="_blank" rel="noopener">@jxu</a>
, <a href="https://github.com/mastoffel" target="_blank" rel="noopener">@mastoffel</a>
, <a href="https://github.com/matthewgson" target="_blank" rel="noopener">@matthewgson</a>
, <a href="https://github.com/millermc38" target="_blank" rel="noopener">@millermc38</a>
, <a href="https://github.com/ray-p144" target="_blank" rel="noopener">@ray-p144</a>
, <a href="https://github.com/sebsfox" target="_blank" rel="noopener">@sebsfox</a>
, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>
, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>
.</li>
<li>spatialsample: <a href="https://github.com/mikemahoney218" target="_blank" rel="noopener">@mikemahoney218</a>
.</li>
<li>stacks: <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>
, and <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>
.</li>
<li>textrecipes: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>
, <a href="https://github.com/jd4ds" target="_blank" rel="noopener">@jd4ds</a>
, and <a href="https://github.com/masurp" target="_blank" rel="noopener">@masurp</a>
.</li>
<li>tidyposterior: <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>
.</li>
</ul>
</div>
<p>We&rsquo;re grateful for all of the tidymodels community, from observers to users to contributors. Happy modeling!</p>
]]></description>
      <enclosure url="https://posit-open-source.netlify.app/blog/tidyverse/2024/tidymodels-2023-q4/thumbnail-wd.jpg" length="67239" type="image/jpeg" />
    </item>
  </channel>
</rss>
