I’m Edgar Ruiz, a Software Engineer at Posit. I’m a member of the Tidymodels team and the maintainer of sparklyr . My goal is to help data analysts be successful. I love contributing to that goal by building tools , teaching workshops, writing articles, and creating other assets such as websites , cheatsheets , and demos . My family and I currently reside in Biloxi, MS.
Software by Edgar Ruiz#
Posts and resources by Edgar Ruiz#
The mall package: using LLMs with data frames in R & Python | Edgar Ruiz | Data Science Lab
The Data Science Lab is a live weekly call. Register at pos.it/dslab! Discord invites go out each week on lives calls. We’d love to have you!
The Lab is an open, messy space for learning and asking questions. Think of it like pair coding with a friend or two. Learn something new, and share what you know to help others grow.
On this call, Libby Heeren is joined by Edgar Ruiz as they walk through how mall works (with ellmer) in R, and then python. The mall package lets you use LLMs to process tabular or vectors of data, letting you do things such as feeding it a column of reviews and asking mall to use an anthropic model via ellmer to add a column of summaries or sentiments. Follow along with the code here: https://github.com/LibbyHeeren/mall-package-r
Hosting crew from Posit: Libby Heeren, Isabella Velasquez, Edgar Ruiz
Edgar’s Bluesky: https://bsky.app/profile/theotheredgar.bsky.social Edgar’s LinkedIn: https://www.linkedin.com/in/edgararuiz/ Edgar’s GitHub: https://github.com/edgararuiz
Resources from the hosts and chat:
Ollama → https://ollama.com/download Posit Data Science Lab → https://posit.co/dslab mall package → https://mlverse.github.io/mall/ ellmer package → https://elmer.tidyverse.org/ Libby’s Positron theme (Catppuccin) → https://marketplace.visualstudio.com/items?itemName=Catppuccin.catppuccin-vsc GitHub repo with Libby and Edgar’s code → https://github.com/LibbyHeeren/mall-package-r LLM providers supported by ellmer → https://ellmer.tidyverse.org/index.html#providers vitals package → https://vitals.tidyverse.org/ chatlas package → https://posit-dev.github.io/chatlas/ polars package → https://pola.rs/ narwhals package → https://narwhals-dev.github.io/narwhals/ pandas package → https://pandas.pydata.org/ LM Studio → https://lmstudio.ai/ Simon Couch’s blog → https://www.simonpcouch.com/ Edgar’s dataset: TidyTuesday Animal Crossing Dataset (May 5, 2020) → https://github.com/rfordatascience/tidytuesday Libby’s dataset: Kaggle Tweets Dataset → https://www.kaggle.com/datasets/mmmarchetti/tweets-dataset Blog from Sara and Simon on evaluating LLMs → https://posit.co/blog/r-llm-evaluation-03/ Data Science Lab YouTube playlist → https://www.youtube.com/watch?v=LDHGENv1NP4&list=PL9HYL-VRX0oSeWeMEGQt0id7adYQXebhT&index=2 AWS Bedrock → https://aws.amazon.com/bedrock/ Anthropic → https://www.anthropic.com/ Google Gemini → https://gemini.google.com/ What is rubber duck debugging anyway?? → https://en.wikipedia.org/wiki/Rubber_duck_debugging
► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu
Follow Us Here: Website: https://www.posit.co The Lab: https://pos.it/dslab Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co
Thanks for learning with us!
Timestamps 00:00 Introduction to Libby, Isabella, Edgar, and the mall package + ellmer package 07:14 “What’s the difference between using mall for these NLP tasks versus traditional or classical NLP?” 09:37 “Can mall be used with a local LLM?” 17:32 “What kind of laptop specs should I realistically have to make good use of these models?” 22:12 “Are you limited to three output options?” 22:55 “Can mall return the prediction probabilities?” 24:14 “What are a rule of thumb set of specs for a machine so local LLMs are practically feasible?” 24:47 “Would that be in the additional prompt area where you’re defining things?” 25:04 “You could use the vitals package to compare models, right?” 25:24 “Can we use LM Studio instead of Ollama?” 28:35 “How do you iterate and validate the model?” 36:39 “Why use paste if it is all text?” 37:31 “Are these recent tweets (from X) or older ones from actual Twitter?” 40:23 “Is there a playlist for the Data Science Labs on YouTube?” 46:11 “Does that mean that the python version does not work with pandas?” 50:14 “Where is this data set from?”


Using R package structure for data science projects | Kylie Ainslie | Data Science Lab
The Data Science Lab is a live weekly call. Register at pos.it/dslab! Discord invites go out each week on lives calls. We’d love to have you!
The Lab is an open, messy space for learning and asking questions. Think of it like pair coding with a friend or two. Learn something new, and share what you know to help others grow.
On this call, Libby Heeren is joined by Kylie Ainslie who walks through how structuring data science projects as R packages provides a consistent framework that integrates documentation for you and facilitates collaboration with others by organizing things really well. Kylie says, “I stumbled on using an R package structure to organize my projects a number of years ago and it has changed how I work in such a positive way that I want to share it with others! In a world where our attention is constantly being pulled in many directions, efficiency is crucial. Structuring projects as R packages is how I work more efficiently.”
Hosting crew from Posit: Libby Heeren, Isabella Velasquez
Kylie’s Bluesky: @kylieainslie.bsky.social Kylie’s LinkedIn: https://www.linkedin.com/in/kylieainslie/ Kylie’s Website: https://kylieainslie.github.io/ Kylie’s GitHub: https://github.com/kylieainslie
Resources from the hosts and chat:
posit::conf(2026) call for talks: https://posit.co/blog/posit-conf-2026-call-for-talks/ Kylie’s posit::conf(2025) talk: https://www.youtube.com/watch?v=YzIiWg4rySA {usethis} package: https://usethis.r-lib.org/ R Packages (2e) book: https://r-pkgs.org/ Paquetes de R (R Packages in Spanish): https://davidrsch.github.io/rpkgs-es/ {box} package: https://github.com/klmr/box extdata docs in Writing R Extensions: https://cran.r-project.org/doc/manuals/R-exts.html#Data-in-packages-1 Tan Ho’s talk on NFL data: https://tanho.ca/talks/rsconf2022-github/ {rv} package: https://a2-ai.github.io/rv-docs/ Whether to Import or Depend: https://r-pkgs.org/dependencies-mindset-background.html#sec-dependencies-imports-vs-depends {pkgdown} package: https://pkgdown.r-lib.org/ Edgar Ruiz’s {pkgsite} package: https://github.com/edgararuiz/pkgsite
Attendees shared examples of data packages in the chat! Here they are: https://kjhealy.github.io/nycdogs/ https://kjhealy.github.io/gssr/ https://github.com/deepshamenghani/richmondway https://github.com/kyleGrealis/nascaR.data https://github.com/ivelasq/leaidr
► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu
Follow Us Here: Website: https://www.posit.co The Lab: https://pos.it/dslab Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co
Thanks for learning with us!
Timestamps: 00:00 Introduction 06:17 Reviewing the disorganized project example 10:01 Creating the package structure using create_package 17:50 Organizing external data and scripts in the inst folder 22:55 Adding a README and License 29:06 “What are the advantages to packaging a project?” 33:35 Writing Roxygen2 documentation 36:06 “Do you type return at the end of your functions?” 41:55 Handling dependencies with use_package 43:53 “Can you just use require(dplyr) at the top?” 47:45 Setting up a pkgdown site 50:11 Creating vignettes 52:22 “What is the role of the usethis package?” 54:18 Loading the package with devtools::load_all

April 30th Workflow Demo Live Q&A
Join us for Live Q&A immediately following the Workflow Demo happening on April 30th at 11am ET / 8am PT with Edgar Ruiz @ Posit.
Demo will be here first: https://youtu.be/ab4CIlafsbo?feature=shared
Q&A will start around 11:35 am ET / 8:35 am PT

Easier data and asset sharing across projects and teams with {pins} and Databricks
Led by Edgar Ruiz, Software Engineer at Posit PBC April 30th at 11 am ET / 8 am PT
Sharing data assets can be challenging for many teams. Some may rely on emailed files to keep analyses up to date, making it difficult to keep current or know what version of the data is used. {pins} improves sharing data and other assets across projects and teams. It enables us to publish, or ‘pin’, to a variety of places, such as Amazon S3, Posit Connect and Dropbox.
Given recent customer feedback, the ability to publish, or ‘pin’ to Databricks Volumes has been added to R. The same capability is also currently in the works for the Python version of {pins}.
This session on April 30th will showcase the acceleration of predictions by distributing a ‘pinned’ model using pins and Spark in Databricks. We’ll walk through integrating {pins} with Databricks in your team’s projects and cover novel uses of pins inside the Databricks ecosystem.
GitHub repo: https://github.com/edgararuiz/talks/tree/main/end-to-end
Here are a few additional resources that you might find interesting:
- Pins for R: https://pins.rstudio.com/
- Pins for Python: https://rstudio.github.io/pins-python/
- More information on how Posit and Databricks work together: https://posit.co/use-cases/databricks/
- Customer Spotlight: Standardizing a safety model with tidymodels, Posit Team & Databricks at Suffolk Construction: https://youtu.be/yavHEWpgrCQ
- Q&A Recording: https://youtube.com/live/HDTDmEaK5zQ?feature=share

Edgar Ruiz - Easing the pain of connecting to databases
Overview of the current and planned work to make it easier to connect to databases. We will review packages such as odbc, dbplyr, as well as the documentation found on our Solutions site (https://solutions.posit.co/connections/db/databases) , which will soon include the best practices we find on how to connect to these vendors via Python.
Talk by Edgar Ruiz

Using R with Databricks Connect - posit::conf(2023)
Presented by Edgar Ruiz
Spark Connect, and Databricks Connect, enable the ability to interact with Spark stand-alone clusters remotely. This improves our ability to perform Data Science at-scale. We will share the work in sparklyr, and other products, that will make it easier for R users to take advantage this new framework.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Tidy up your models. Session Code: TALK-1084

Edgar Ruiz - GitHub Copilot in RStudio
GitHub Copilot in RStudio - Edgar Ruiz
Presentation slides available at https://colorado.posit.co/rsc/rstudio-copilot/#/TitleSlide
Speaker Bio: Edgar Ruiz is a solutions engineer at Posit with a background in deploying enterprise reporting and business intelligence solutions. He is the author of multiple articles and blog posts sharing analytics insights and server infrastructure for data science. Edgar is the author and administrator of the https://db.rstudio.com web site, and current administrator of the sparklyr web site: https://spark.rstudio.com . Co-author of the dbplyr package, and creator of the dbplot, tidypredict and modeldb package.
Presented at the 2023 R/Pharma Conference (October 26, 2023)

Edgar Ruiz | Programación con R | RStudio (2019)
Hay ocasiones que, cuando trabajamos en un análisis en R, necesitamos dividir nuestros datos en grupos, y después tenemos que correr la misma operación sobre cada grupo. Por ejemplo, puede ser que los datos que tenemos contienen varios países, y necesitamos crear un modelo por cada país. Otro caso sería el de correr múltiples operaciones sobre los mismos datos. Estos casos requieren que sepamos cómo usar iteraciones con R. Este webinar se concentrará en cómo utilizar el paquete llamado purrr para ayudarnos a resolver este tipo de problema.
Descargar materiales: https://rstudio.com/resources/webinars/programacio-n-con-r/
About Edgar: Edgar Ruiz es un Ingeniero de Soluciones en RStudio. Es el administrador de los sitios oficiales de sparklyr y de R para bases de datos. También es autor de los paquetes de R: dbplot, tidypredict y modeldb, y co-autor de el paquete dbplyr

Edgar Ruiz | Databases using R The latest | RStudio (2019)
Learn about the latest packages and techniques that can help you access and analyze data found inside databases using R. Many of the techniques we will cover are based on our personal and the community’s experiences of implementing concepts introduced last year, such as offloading most of the data wrangling to the database using dplyr, and using the RStudio IDE to preview the database’s layout and data. Also, learn more about the most recent improvements to the RStudio products that are geared to aid developers in using R with databases effectively.
VIEW MATERIALS https://github.com/edgararuiz/databases-w-r
About the Author Edgar Ruiz Edgar is the author and administrator of the https://db.rstudio.com web site, and current administrator of the [sparklyr] web site: https://spark.rstudio.com . Author of the Data Science in Spark with sparklyr cheatsheet. Co-author of the dbplyr package and creator of the dbplot package


