Julia Silge

Engineering Manager

juliasilge.com

juliasilge

0000-0002-3671-836X

JuliaSilge

I am a data scientist and engineering manager at Posit PBC where I work on tools for data science like Positron , vetiver , and others. My last name is pronounced SILL-GHEE (two syllables, short i, hard g). I love making beautiful charts, the statistical programming language R, Jane Austen, black coffee, and red wine.

In school, I studied physics and astronomy; I worked in academia (teaching and doing research) and ed tech before moving into data science in 2015 and discovering R. I am an author, an international speaker, and a real-world practitioner focusing on data analysis and machine learning. I have written books with my collaborators about text mining , supervised machine learning for text , and modeling with tidy data principles in R.

I live in Salt Lake City, UT, with my husband, three kids, and two cats.

Software by Julia Silge#

Events attended by Julia Silge#

Posts and resources by Julia Silge#

The changing landscape of data science | Kanchana Padmanabhan | Data Science Hangout

To join future data science hangouts, add it to your calendar here: https://pos.it/dsh - All are welcome! We’d love to see you!

We were recently joined by Kanchana Padmanabhan, Director of Data and AI at Homebase, to chat about data science team structures, the role of math in understanding LLMs, building effective hackathons, and communicating model insights to stakeholders.

In this Hangout, we explore the importance of understanding the probabilistic nature of LLMs and how that understanding should influence how data scientists approach their work. We also discussed how to structure a hackathon to encourage learning, centering on the customer problem, and collaboration between technical teams and business stakeholders.

Resources mentioned in the video and zoom chat: List of R Conferences for 2025 → https://rworks.dev/posts/r-conferences-2025/ Posit Conference Call for Talks → https://posit.co/blog/speak-at-posit-conf-2025/ Julia Silge’s workflow demo on model cards → https://www.linkedin.com/posts/posit-software_join-us-for-a-live-workflow-demo-on-creating-activity-7287998741557522432-jseQ?utm_source=share&utm_medium=member_desktop Shiny Assistant Gallery → https://gallery.shinyapps.io/assistant Data Science Hangout Playlist → https://www.youtube.com/playlist?list=PL9HYL-VRX0oTu3bUoyYknD-vpR7Uq6bsR Add Posit Team End-to-End Workflows to calendar → https://evt.to/aoimiohuw Making of a Manager Book → https://www.amazon.com/Making-Manager-What-Everyone-Looks/dp/0735219567

If you didn’t join live, one great discussion you missed from the zoom chat was about how to gain domain knowledge for a new industry, where attendees shared their experiences and advice. Let us know below if you’d like to hear more about this topic!

► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu

Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co

Thanks for hanging out with us!

Julia Silge

posit::conf(2023) Workshop: Deploy and Maintain Models with vetiver

This workshop is for you if you: • have intermediate R or Python knowledge (this will be a “choose your own adventure” workshop where you can work through the exercises in either R or Python) • can read data from CSV and other flat files, transform and reshape data, and make a wide variety of graphs • can fit a model to data with your modeling framework of choice

We expect participants to have exposure to basic modeling and machine learning practice, but NOT expert familiarity with advanced ML or MLOps topics.

Many data scientists understand what goes into training a machine learning or statistical model, but creating a strategy to deploy and maintain that model can be daunting. In this workshop, learn what MLOps (machine learning operations) is, what principles can be used to create a practical MLOps strategy, and what kinds of tasks and components are involved. We’ll use vetiver, a framework for MLOps tasks in Python and R, to version, deploy, and monitor the models you have trained and want to deploy and maintain in production reliably and efficiently

Julia Silge

Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Data Science in People Analytics | Led by Elizabeth Esarove, AT&T

People are the face, heart, and hands of a company. In people analytics, we analyze data to reveal actionable insights that provide evidence for decisions regarding employees, work, and business objectives. This talk will cover the use of data science for people analytics projects such as workforce planning, improving employee engagement, and retaining talent.

Speaker bio: Elizabeth Esarove is a data scientist in People Analytics at AT&T. In her role, Elizabeth is part of a larger team focused on embedding data and analytics into the root of decision-making and transforming insights into actionable solutions that improve employee outcomes and drive business value.

Timestamps: *Q&A timestamps listed further below 3:42 - Start of session 5:14 - What is People Analytics 6:26 - Opportunities for Data Science in People Analytics 7:10 - Using Predictive Models to Reduce Attrition 11:10 - Segmenting Your Population 18:55 - Communicating with Leaders 20:11 - Time Series Forecasting for Workforce Changes 24:41 - Analyzing Employee Survey Comments

Helpful Resources Below: *more follow-up to come with a Q&A blog post in the works

People Analytics Books Mentioned today: Handbook of Regression Modeling in People Analytics: with examples in R, Python and Julia by Keith McNulty https://lnkd.in/eBFgniFG Excellence in People Analytics: How to Use Workforce Data to Create Business Value by Jonathan Ferrar and David Green https://a.co/d/bJrMRuW

People analytics books shared in a previous data science hangout: Predictive HR Analytics: Mastering the HR Metric: https://a.co/d/5Hx05mw Inclusalytics - How Diversity, Equity and Inclusion Leaders Use Data to Drive Their Work: https://lnkd.in/g48tdrMu

Other links shared by Liz: Time Series Models Forecasting: Principles and Practice by Rob Hyndman and George Athanasopoulos https://otexts.com/fpp3/ Text Analytics Text Mining with R by Julia Silge & David Robinson https://lnkd.in/emawveZd

Additional resources shared: R Gov Conference: https://lnkd.in/ePfN7jru (David Meza is presenting on the RStudio (Posit) Ecosystem as a Critical Part of NASA Analytics Capabilities) People analytics for getting to the moon | Data Science Hangout with David Meza, NASA: https://lnkd.in/eDirbgCF For LATAM and Spanish Speaking people, Sergio Garcia Mora shared the R4HR community which has developed lots of free access content: https://data-4hr.com/ John Kelly IV shared the Human Resources Science LinkedIn Group: https://lnkd.in/eEMpYAfk Adrian M. Pérez shared the People Analytics Handbook: https://lnkd.in/ecsWy-dA Data Science Hangout: pos.it/dsh All upcoming #Posit community events: pos.it/community-events

Q&A Timestamps: *the following timestamps are approximate. 16:00 - What are the most important people analytics KPIs @ AT&T? Can you share how your team/HR acts on these predictions (for optimal policy) both experimentally and ethically? do you implement new policy in smaller groups? 23:00 - How have you validated the predictive models? Looking backwards, how precise were they? 25:00 - Do you work with your HRBPs to segment your population? 25:00 - What languages are you using to build your predictive models? 31:00 - Do you include demographic information (gender, race, age) in your models? 31:00 - Are your surveys anonymous? 32:00 - How would you get the ROI from HR attrition modeling? 34:00 - Are most data scientists from a Psychometrics background? 35:00 - Is there a kind of “critical mass” to apply People Analytics? (just for big companies?) 36:00 - Looking at positive / negative comments, do you quote verbatim comments in your reports? (e.g. “here is one of the very positive / very negative comments we received”) 37:00 - Do you use something like Snowflake to store and model your data? And do you deploy these models automatically or manually update them? 38:00 - R user here. How do you balance between people-ops focused analytics tools from outside vendors (often very expensive, but helpful) with custom in-house analytics (often time-consuming)? 41:00 - How much of your work is driven by HR leadership, by HR business leaders, or by the HR analytics team pushing modeling and insights to those groups? 42:00 - What was your journey into learning data science and getting into people analytics? 44:00 - Do you have a role in education business units? to improve their questions, etc.? 45:00 - What is the HR tech stack at AT&T? Does your team have a data engineer solely for people data since they’re more sensitive? 47:00 - How do you present your results? (an application, report, power point) and how important is it to learn other languages (javascript, css, sql)? If you were to start a people analytics team in a company (+1000), how do you start? 50:00 - Do you use an internal tool for surveys? Do you use thresholds to maintain anonymity? 53:00 - Does AT&T have remote workers? If so, does people analytics segment on remote vs hybrid vs on-site?

Julia Silge

MLOps with vetiver in Python and R | Led by Julia Silge & Isabel Zimmerman

Many data scientists understand what goes into training a machine learning model, but creating a strategy to deploy and maintain that model can be daunting. In this meetup, learn what MLOps is, what principles can be used to create a practical MLOps strategy, and what kinds of tasks and components are involved. See how to get started with vetiver, a framework for MLOps tasks in R and Python that provides fluent tooling to version, deploy, and monitor your models.

Blog Post with Q&A: https://www.rstudio.com/blog/vetiver-answering-your-questions/

For folks interested in seeing what data artifacts look like on Connect, we have these for R: ⬢ Versioned model object: https://colorado.rstudio.com/rsc/seattle-housing-pin/ ⬢ Deployed API: https://colorado.rstudio.com/rsc/seattle-housing/ ⬢ Monitoring dashboard: https://colorado.rstudio.com/rsc/seattle-housing-dashboard/ ⬢ Create a custom yardstick metric: https://juliasilge.com/blog/nyc-airbnb/ ⬢ End point used in the demo: https://colorado.rstudio.com/rsc/scooby

Our team’s reading list (mentioned in the meetup)

Books: ⬢ Designing Machine Learning Systems by Chip Huyen: https://www.oreilly.com/library/view/designing-machine-learning/9781098107956/

Articles: ⬢ “Machine Learning Operations (MLOps): Overview, Definition, and Architecture” by Kreuzberger et al: https://arxiv.org/abs/2205.02302 ⬢ “From Concept Drift to Model Degradation: An Overview on Performance-Aware Drift Detectors” by Bayram et al: https://arxiv.org/abs/2203.11070 ⬢ “Towards Observability for Production Machine Learning Pipelines” by Shankar et al: https://arxiv.org/pdf/2108.13557.pdf ⬢ “The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction” by Breck et al: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/aad9f93b86b7addfea4c419b9100c6cdd26cacea.pdf

Web content: ⬢ How ML Breaks: A Decade of Outages for One Large ML Pipeline by Papasian and Underwood: https://www.youtube.com/watch?v=hBMHohkRgAA ⬢ MLOps Principles by INNOQ: https://ml-ops.org/content/mlops-principles ⬢ Google’s Practitioners Guide to MLOps by Salama et al: https://services.google.com/fh/files/misc/practitioners_guide_to_mlops_whitepaper.pdf ⬢ Gently Down the Stream by Mitch Seymour: https://www.gentlydownthe.stream/

Speaker bios: Julia Silge is a software engineer at RStudio focusing on open source MLOps tools, as well as an author and international keynote speaker. Julia loves making beautiful charts, Jane Austen, and her two cats.

Isabel Zimmerman is also a software engineer on the open source team at RStudio, where she works on building MLOps frameworks. When she’s not geeking out over new data science techniques, she can be found hanging out with her dog or watching Marvel movies