Julia Silge
Engineering Manager
I am a data scientist and engineering manager at Posit PBC
where I work on tools for data science like Positron
, vetiver
, and others. My last name is pronounced SILL-GHEE (two syllables, short i, hard g). I love making beautiful charts, the statistical programming language R, Jane Austen, black coffee, and red wine.
In school, I studied physics and astronomy; I worked in academia (teaching and doing research) and ed tech before moving into data science in 2015 and discovering R. I am an author, an international speaker, and a real-world practitioner focusing on data analysis and machine learning. I have written books with my collaborators about text mining , supervised machine learning for text , and modeling with tidy data principles in R.
I live in Salt Lake City, UT, with my husband, three kids, and two cats.
Software by Julia Silge#
Events attended by Julia Silge#
Posts and resources by Julia Silge#
Julia Silge - Keynote PyCon Colombia 2025
Julia Silge
Engineering manager at Posit PBC, leading development of open-source software for data science in Python, R. Data scientist, expertise in machine learning, text mining. PhD in astrophysics, author of books on data science.
Social Networks:
Github: https://github.com/juliasilge
More About PyCon Colombia at http://www.pycon.co

How I got unstuck with Python (Julia Silge, Posit) | posit::conf(2025)
How I got unstuck with Python
Speaker(s): Julia Silge
Abstract:
Python as a language is known for being explicit, simple, readable, and beautiful. At the same time, the tooling around using and writing this language has not always made people feel productive and delighted. I know this has been true for me! In this talk, learn about recent improvements in tooling for Python that have finally addressed my own persistent challenges. Posit’s new IDE, Positron, provides a next generation environment for Python data practice, and this new IDE plays nicely with modern language tooling from the Python community. Whether you are Python curious or looking for ways to improve your Python workflows, hear about how I finally got myself unstuck with the most popular programming language in the world.
Materials - https://github.com/juliasilge/get-unstuck-with-python posit::conf(2025) Subscribe to posit::conf updates: https://posit.co/about/subscription-management/

Julia Silge - Keynote PyCon Colombia 2025
Julia Silge#
Engineering manager at Posit PBC, leading development of open-source software for data science in Python, R. Data scientist, expertise in machine learning, text mining. PhD in astrophysics, author of books on data science.#
Social Networks:
Github: https://github.com/juliasilge
More About PyCon Colombia at http://www.pycon.co

A first look at Positron - Julia Silge
Description Positron is a next generation data science IDE built by the creators of RStudio. It has been available for beta testing for a number of months, and R users may have wondered if they should try it or if it will be a good fit for them. This new IDE is an extensible tool built to facilitate exploratory data analysis, reproducible authoring, and publishing data artifacts, and it is an IDE that supports but is not built only for R. How should an R user think about Positron, compared to the other options out there?
In this talk, learn about how and why Positron is designed the way it is, what will feel familiar or new coming from other IDEs such as RStudio, and when (or if) people who use R should consider giving it a try. You’ll hear about different choices when it comes to defaults and ways of working, such as how to think about your projects or folders and how to manage multiple versions of R. You will also learn about new functionality for R users and package developers that we have never had before, like new approaches for managing R package tests and the ability to customize an IDE using extensions. If you are curious about Positron and how it fits into the R ecosystem, you’ll come away from this talk with more details about its capabilities and more clarity about whether it may be a good choice for you.
Additional Material or Paper *Visit https://positron.posit.co for documentation and installers *Find us on GitHub at https://github.com/posit-dev/positron *Positron is currently available on Posit Workbench in preview

The changing landscape of data science | Kanchana Padmanabhan | Data Science Hangout
To join future data science hangouts, add it to your calendar here: https://pos.it/dsh - All are welcome! We’d love to see you!
We were recently joined by Kanchana Padmanabhan, Director of Data and AI at Homebase, to chat about data science team structures, the role of math in understanding LLMs, building effective hackathons, and communicating model insights to stakeholders.
In this Hangout, we explore the importance of understanding the probabilistic nature of LLMs and how that understanding should influence how data scientists approach their work. We also discussed how to structure a hackathon to encourage learning, centering on the customer problem, and collaboration between technical teams and business stakeholders.
Resources mentioned in the video and zoom chat: List of R Conferences for 2025 → https://rworks.dev/posts/r-conferences-2025/ Posit Conference Call for Talks → https://posit.co/blog/speak-at-posit-conf-2025/ Julia Silge’s workflow demo on model cards → https://www.linkedin.com/posts/posit-software_join-us-for-a-live-workflow-demo-on-creating-activity-7287998741557522432-jseQ?utm_source=share&utm_medium=member_desktop Shiny Assistant Gallery → https://gallery.shinyapps.io/assistant Data Science Hangout Playlist → https://www.youtube.com/playlist?list=PL9HYL-VRX0oTu3bUoyYknD-vpR7Uq6bsR Add Posit Team End-to-End Workflows to calendar → https://evt.to/aoimiohuw Making of a Manager Book → https://www.amazon.com/Making-Manager-What-Everyone-Looks/dp/0735219567
If you didn’t join live, one great discussion you missed from the zoom chat was about how to gain domain knowledge for a new industry, where attendees shared their experiences and advice. Let us know below if you’d like to hear more about this topic!
► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu
Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co
Thanks for hanging out with us!

Live Q&A following Workflow Demo - January 29th!
This is the Live Q&A session for our Workflow Demo on January 29th on Model cards with vetiver for transparent, responsible reporting with Julia Silge.
Join us for the Demo first with Julia Silge on Jan 29th at 11am ET to learn:
1️⃣ How to get started with your first model card 2️⃣ How a model card fits in with model monitoring 3️⃣ How to use Posit Team to author and publish your model card
The demo will be here starting at 11am ET on January 29th: https://youtu.be/iNtgunGg86o
GitHub Repo: https://github.com/juliasilge/model-card-workflow-demo

Model cards with vetiver for transparent, responsible reporting
Good documentation helps us make sense of software, know when and how to use it, and understand its purpose. The same can be true of documentation or reporting for a deployed model, but it can be hard to know where to start.
The paper “Model Cards for Model Reporting” (Mitchell et al. 2019) provides a suggested framework for organizing and presenting the essential facts about a deployed machine learning model, and the vetiver packages for both R and Python provide templates for getting started with your own model card.
Julia Silge joined us on Jan 29th to share:
1️⃣ How to get started with your first model card 2️⃣ How a model card fits in with model monitoring 3️⃣ How to use Posit Team to author and publish your model card
Link to paper: https://lnkd.in/eRbYpfEW GitHub Repo: https://github.com/juliasilge/model-card-workflow-demo Q&A Recording: https://youtube.com/live/tQsyImn18q4?feature=share
Want to add future workflow demos to your calendar? We host them the last Wednesday of every month. ️ https://evt.to/aoimiohuw

Introducing Positron, a new data science IDE - posit conf 2024
Positron is a next-generation data science IDE that is newly available to the community for early beta testing. This new IDE is an extensible tool built to facilitate exploratory data analysis, reproducible authoring, and publishing data artifacts. Positron currently supports these data workflows in either or both Python and R and is designed with a forward-looking architecture that can support other data science languages in the future. In this session, learn from the team-building Positron about how and why it is designed the way it is, what will feel familiar or new coming from other IDEs, and whether it might be a good fit for your own work.
Talk by Julia Silge, Isabel Zimmerman, Tom Mock, Jonathan McPherson, Lionel Henry, Davis Vaughan, and Jenny Bryan
Slide deck 1: https://speakerdeck.com/juliasilge/introducing-positron Slide deck 6: https://speakerdeck.com/jennybc/positron-for-r-and-rstudio-users





So You Think You Can ANALYZE? (Data Content Creator Hackathon)
Watch your favorite data science content creators compete for a piece of history. They will either prove their skills in front of the world or fail in the pursuit of eternal glory. This is so you think you can analyze.
Special thanks to Posit for sponsoring this competition and make this all possible!
Competitors: Team Jack @averysmith Jack Blandin @KeithGalli
Team MMA @AlexTheAnalyst @nerdnourishment @Miki_ML
Team Null Nick Singh Mark Freeman @SeattleDataGuy
Team Shashank @ShashankData
Team Posit-ively Skewed Greg Coquillo @datascienceharp Richad Nieves-Becker Ian Greengross
Interviewer: Elijah Butler - https://www.tiktok.com/@imelijahbutler
Posit Team
- Michael Chow
- Joe Cheng
- Carolos Scheidegger
- Julia Silge Sponsors, Affiliates, and Partners:
- Pathrise - http://pathrise.com/KenJee | Career mentorship for job applicants (Free till you land a job)
- Taro - http://jointaro.com/r/kenj308 (20% discount) | Career mentorship if you already have a job
- 365 Data Science (57% discount) - https://365datascience.pxf.io/P0jbBY | Learn data science today
- Interview Query (10% discount) - https://www.interviewquery.com/?ref=kenjee | Interview prep questions
MORE DATA SCIENCE CONTENT HERE: My Twitter - https://twitter.com/KenJee_DS LinkedIn - https://www.linkedin.com/in/kenjee/ Kaggle - https://www.kaggle.com/kenjee Medium Articles - https://medium.com/@kenneth.b.jee Github - https://github.com/PlayingNumbers My Sports Blog -https://www.playingnumbers.com
Check These Videos Out Next! My Leaderboard Project: https://www.youtube.com/watch?v=myhoWUrSP7o&ab_channel=KenJee 66 Days of Data: https://www.youtube.com/watch?v=qV_AlRwhI3I&ab_channel=KenJee How I Would Learn Data Science in 2021: https://www.youtube.com/watch?v=41Clrh6nv1s&ab_channel=KenJee
My Playlists Data Science Beginners: https://www.youtube.com/playlist?list=PL2zq7klxX5ATMsmyRazei7ZXkP1GHt-vs Project From Scratch: https://www.youtube.com/watch?v=MpF9HENQjDo&list=PL2zq7klxX5ASFejJj80ob9ZAnBHdz5O1t&ab_channel=KenJee Kaggle Projects: https://www.youtube.com/playlist?list=PL2zq7klxX5AQXzNSLtc_LEKFPh2mAvHIO



Positron: An IDE Specialized For Data Science
Dr. Julia Silge, Engineering Manager at Posit, joins @JonKrohnLearns to introduce Positron, a fresh open-source IDE that’s perfect for exploratory data analysis and visualization. She also lays out her top picks for LLMs that boost coding efficiency and discusses when traditional NLP methods might be the smarter choice over LLMs. Plus, Julia highlights some must-know open-source libraries that make managing MLOps easier than ever. Tune in for insights that every data scientist, ML engineer, and developer will find useful.
Watch the full interview “817: The Positron IDE, Tidy NLP and MLOps — with Dr. Julia Silge” here: https://www.superdatascience.com/817

817: The Positron IDE, Tidy NLP and MLOps — with Dr. @JuliaSilge
#PositronIDE #Tidyverse #MLOps
Dr. Julia Silge, Engineering Manager at Posit, joins @JonKrohnLearns to introduce the brand-new Positron IDE, perfect for exploratory data analysis and visualization. She also lays out her top picks for LLMs that boost coding efficiency and discusses when traditional NLP methods might be the smarter choice over LLMs. Plus, Julia highlights some must-know open-source libraries that make managing MLOps easier than ever. Tune in for insights that every data scientist, ML engineer, and developer will find useful.
This episode is brought to you by Gurobi (https://www.gurobi.com/personas/optimization-for-data-scientists/) , the Decision Intelligence Leader, and by ODSC (https://odsc.com/california) , the Open Data Science Conference. Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.
In this episode you will learn: • [00:00:00] Introduction • [00:03:23] Overview of Posit and Positron IDE • [00:08:33] How the needs of a data scientist differ from those of a software developer • [00:17:56] How to contribute to the open-source Positron • [00:34:52] MLOps and Vetiver: Tools for deploying and maintaining ML models • [00:48:34] Natural Language Processing (NLP) and the Tidyverse approach • [01:22:18] The role of AI and LLMs in data science education
Additional materials: https://www.superdatascience.com/817

From Physics PhD to MLOps builder - Julia Silge - The Data Scientist Show #087
Julia Silge is an engineering manager at Posit PBC, formerly know as R-studio, where she leads a team of developers building open source software MLOps. Before Posit, she finished a PhD in astrophysics, worked for several years in the nonprofit space, and was a data scientist at Stack Overflow where some of her most public work involved the annual developer survey. We talked about MLOps tools, challenges in survey data, text analysis, and balancing her interests in data science and engineering. Subscribe to Daliana’s newsletter on www.dalianaliu.com for more on data science and career.
Daliana’s Twitter: https://twitter.com/DalianaLiu Daliana’s LinkedIn: https://www.linkedin.com/in/dalianaliu/ Julia’s LinkedIn: https://www.linkedin.com/in/juliasilge/ Julia’s Website: https://juliasilge.com/
00:00:00 Introduction 00:00:51 Getting into data science 00:04:45 Transition from data centers to engineering manager 00:13:59 Common challenges in tool development 00:17:33 Challenges with survey data 00:26:42 Engineering skills for data scientists 00:28:54 Balancing roles 00:34:44 Developing skills in Exploratory Data Analysis (EDA) 00:39:14 Python vs. R for data analysis 00:44:35 Exciting aspects in career and personal life

How to develop and deploy a machine learning model with Posit
While data scientists are often taught about training a machine learning model, building a reliable MLOps strategy to deploy and maintain that model can be daunting.
It doesn’t have to be this way! Join us with Julia Silge at Posit on Wednesday, April 24th at 11 am ET to learn how Posit Team provides fluent tooling for the whole ML lifecycle.
- Develop an ML model using Posit Workbench and a recent Tidy Tuesday dataset!
- Version, deploy, and monitor that model with Posit Connect
- Maintain reproducible software dependencies throughout the ML lifecycle with Posit Package Manager
No registration is required to attend - simply add it to your calendar using this link: pos.it/team-demo
Helpful resources: ️ Vetiver - https://vetiver.posit.co/ ️ Julia’s YouTube Channel - https://www.youtube.com/c/JuliaSilge ️ To join us at posit::conf(2024) - https://posit.co/conference/ ️ Follow along blog post - https://juliasilge.com/blog/educational-attainment/ ️ Live Q&A Room - https://youtube.com/live/-6lqzW1iV7E?feature=share ️ If you’re interested in learning more about Posit Workbench, Posit Connect, or Posit Package Manager - pos.it/chat-with-us ️ Subscribe to learn more about Posit events: https://posit.co/about/subscription-management/
We host these end-to-end workflow demos on the last Wednesday of every month. If you have ideas for topics or questions about them, please comment below!

Reliable Maintenance of Machine Learning Models - posit::conf(2023)
Presented by Julia Silge
Maintaining machine learning models in production can be quite different from maintaining general software projects, because of the unique statistical characteristics of ML models.
In this talk, learn about model drift, the different ways the word “performance” is used with models, what you can monitor about a model, how feedback loops impact models, and how you can use vetiver to set yourself up for success with model maintenance. This talk will help practitioners who are already deploying models, but this is also useful knowledge for practitioners earlier in their MLOps journey; decisions made along the way can make the difference between resilient models that are easier to maintain and disappointing or misleading models.
Materials: https://github.com/juliasilge/ml-maintenance-2023
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Tidy up your models. Session Code: TALK-1083

Julia Silge @ Posit | Data Science Hangout
We were recently joined by Julia Silge, Data Scientist & Software Engineer at Posit, PBC to talk about the skills and tools that data scientists need to deploy their models and scale their impact.
Bio: Julia Silge is a data scientist and software engineer at Posit PBC where she works on open source modeling and MLOps tools. She is an author, an international keynote speaker, and a real-world practitioner focusing on data analysis and machine learning. Julia loves text analysis, making beautiful charts, and communicating about technical topics with diverse audiences.
► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu
Follow Us Here: Website: https://www.posit.co LinkedIn: https://www.linkedin.com/company/posit-software Twitter: https://twitter.com/posit_pbc
To join future data science hangouts, add to your calendar here: pos.it/dsh (All are welcome! We’d love to see you!)
Thanks for hanging out with us!

Emil Hvitfeldt - Slidecraft: The Art of Creating Pretty Presentations
Slidecraft: The Art of Creating Pretty Presentations by Emil Hvitfeldt
Visit https://rstats.ai/nyr to learn more.
Abstract: Do you want to make slides that catch the eye of the room? Are you tired of using defaults when making slides? Are you ready to spend every last hour of your life fiddling with css and js? Then this talk is for you! Making slides with Quarto and revealjs is a breeze and comes with many tools and features. This talk gives an overview of how we can improve the visuals of your slides with the highest effect to effort ratio.
Bio: Emil Hvitfeldt is a software engineer at Posit and part of the tidymodels team’s effort to improve R’s modeling capabilities. He maintains several packages within the realms of modeling, text analysis, and color palettes. He co-authored the book Supervised Machine Learning for Text Analysis in R with Julia Silge.
Twitter: https://twitter.com/Emil_Hvitfeldt
Presented at the 2023 New York R Conference (July 13, 2023)


posit::conf(2023) Workshop: Deploy and Maintain Models with vetiver
Register now: http://pos.it/conf Instructor: Julia Silge Workshop Duration: 1-Day Workshop
This workshop is for you if you: • have intermediate R or Python knowledge (this will be a “choose your own adventure” workshop where you can work through the exercises in either R or Python) • can read data from CSV and other flat files, transform and reshape data, and make a wide variety of graphs • can fit a model to data with your modeling framework of choice
We expect participants to have exposure to basic modeling and machine learning practice, but NOT expert familiarity with advanced ML or MLOps topics.
Many data scientists understand what goes into training a machine learning or statistical model, but creating a strategy to deploy and maintain that model can be daunting. In this workshop, learn what MLOps (machine learning operations) is, what principles can be used to create a practical MLOps strategy, and what kinds of tasks and components are involved. We’ll use vetiver, a framework for MLOps tasks in Python and R, to version, deploy, and monitor the models you have trained and want to deploy and maintain in production reliably and efficiently

Data Science in People Analytics | Led by Elizabeth Esarove, AT&T
People are the face, heart, and hands of a company. In people analytics, we analyze data to reveal actionable insights that provide evidence for decisions regarding employees, work, and business objectives. This talk will cover the use of data science for people analytics projects such as workforce planning, improving employee engagement, and retaining talent.
Speaker bio: Elizabeth Esarove is a data scientist in People Analytics at AT&T. In her role, Elizabeth is part of a larger team focused on embedding data and analytics into the root of decision-making and transforming insights into actionable solutions that improve employee outcomes and drive business value.
Timestamps: *Q&A timestamps listed further below 3:42 - Start of session 5:14 - What is People Analytics 6:26 - Opportunities for Data Science in People Analytics 7:10 - Using Predictive Models to Reduce Attrition 11:10 - Segmenting Your Population 18:55 - Communicating with Leaders 20:11 - Time Series Forecasting for Workforce Changes 24:41 - Analyzing Employee Survey Comments
Helpful Resources Below: *more follow-up to come with a Q&A blog post in the works
People Analytics Books Mentioned today: Handbook of Regression Modeling in People Analytics: with examples in R, Python and Julia by Keith McNulty https://lnkd.in/eBFgniFG Excellence in People Analytics: How to Use Workforce Data to Create Business Value by Jonathan Ferrar and David Green https://a.co/d/bJrMRuW
People analytics books shared in a previous data science hangout: Predictive HR Analytics: Mastering the HR Metric: https://a.co/d/5Hx05mw Inclusalytics - How Diversity, Equity and Inclusion Leaders Use Data to Drive Their Work: https://lnkd.in/g48tdrMu
Other links shared by Liz: Time Series Models Forecasting: Principles and Practice by Rob Hyndman and George Athanasopoulos https://otexts.com/fpp3/ Text Analytics Text Mining with R by Julia Silge & David Robinson https://lnkd.in/emawveZd
Additional resources shared: R Gov Conference: https://lnkd.in/ePfN7jru (David Meza is presenting on the RStudio (Posit) Ecosystem as a Critical Part of NASA Analytics Capabilities) People analytics for getting to the moon | Data Science Hangout with David Meza, NASA: https://lnkd.in/eDirbgCF For LATAM and Spanish Speaking people, Sergio Garcia Mora shared the R4HR community which has developed lots of free access content: https://data-4hr.com/ John Kelly IV shared the Human Resources Science LinkedIn Group: https://lnkd.in/eEMpYAfk Adrian M. Pérez shared the People Analytics Handbook: https://lnkd.in/ecsWy-dA Data Science Hangout: pos.it/dsh All upcoming #Posit community events: pos.it/community-events
Q&A Timestamps: *the following timestamps are approximate. 16:00 - What are the most important people analytics KPIs @ AT&T? Can you share how your team/HR acts on these predictions (for optimal policy) both experimentally and ethically? do you implement new policy in smaller groups? 23:00 - How have you validated the predictive models? Looking backwards, how precise were they? 25:00 - Do you work with your HRBPs to segment your population? 25:00 - What languages are you using to build your predictive models? 31:00 - Do you include demographic information (gender, race, age) in your models? 31:00 - Are your surveys anonymous? 32:00 - How would you get the ROI from HR attrition modeling? 34:00 - Are most data scientists from a Psychometrics background? 35:00 - Is there a kind of “critical mass” to apply People Analytics? (just for big companies?) 36:00 - Looking at positive / negative comments, do you quote verbatim comments in your reports? (e.g. “here is one of the very positive / very negative comments we received”) 37:00 - Do you use something like Snowflake to store and model your data? And do you deploy these models automatically or manually update them? 38:00 - R user here. How do you balance between people-ops focused analytics tools from outside vendors (often very expensive, but helpful) with custom in-house analytics (often time-consuming)? 41:00 - How much of your work is driven by HR leadership, by HR business leaders, or by the HR analytics team pushing modeling and insights to those groups? 42:00 - What was your journey into learning data science and getting into people analytics? 44:00 - Do you have a role in education business units? to improve their questions, etc.? 45:00 - What is the HR tech stack at AT&T? Does your team have a data engineer solely for people data since they’re more sensitive? 47:00 - How do you present your results? (an application, report, power point) and how important is it to learn other languages (javascript, css, sql)? If you were to start a people analytics team in a company (+1000), how do you start? 50:00 - Do you use an internal tool for surveys? Do you use thresholds to maintain anonymity? 53:00 - Does AT&T have remote workers? If so, does people analytics segment on remote vs hybrid vs on-site?

MLOps with vetiver in Python and R | Led by Julia Silge & Isabel Zimmerman
Many data scientists understand what goes into training a machine learning model, but creating a strategy to deploy and maintain that model can be daunting. In this meetup, learn what MLOps is, what principles can be used to create a practical MLOps strategy, and what kinds of tasks and components are involved. See how to get started with vetiver, a framework for MLOps tasks in R and Python that provides fluent tooling to version, deploy, and monitor your models.
Blog Post with Q&A: https://www.rstudio.com/blog/vetiver-answering-your-questions/
For folks interested in seeing what data artifacts look like on Connect, we have these for R: ⬢ Versioned model object: https://colorado.rstudio.com/rsc/seattle-housing-pin/ ⬢ Deployed API: https://colorado.rstudio.com/rsc/seattle-housing/ ⬢ Monitoring dashboard: https://colorado.rstudio.com/rsc/seattle-housing-dashboard/ ⬢ Create a custom yardstick metric: https://juliasilge.com/blog/nyc-airbnb/ ⬢ End point used in the demo: https://colorado.rstudio.com/rsc/scooby
Our team’s reading list (mentioned in the meetup)
Books: ⬢ Designing Machine Learning Systems by Chip Huyen: https://www.oreilly.com/library/view/designing-machine-learning/9781098107956/
Articles: ⬢ “Machine Learning Operations (MLOps): Overview, Definition, and Architecture” by Kreuzberger et al: https://arxiv.org/abs/2205.02302 ⬢ “From Concept Drift to Model Degradation: An Overview on Performance-Aware Drift Detectors” by Bayram et al: https://arxiv.org/abs/2203.11070 ⬢ “Towards Observability for Production Machine Learning Pipelines” by Shankar et al: https://arxiv.org/pdf/2108.13557.pdf ⬢ “The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction” by Breck et al: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/aad9f93b86b7addfea4c419b9100c6cdd26cacea.pdf
Web content: ⬢ How ML Breaks: A Decade of Outages for One Large ML Pipeline by Papasian and Underwood: https://www.youtube.com/watch?v=hBMHohkRgAA ⬢ MLOps Principles by INNOQ: https://ml-ops.org/content/mlops-principles ⬢ Google’s Practitioners Guide to MLOps by Salama et al: https://services.google.com/fh/files/misc/practitioners_guide_to_mlops_whitepaper.pdf ⬢ Gently Down the Stream by Mitch Seymour: https://www.gentlydownthe.stream/
Speaker bios: Julia Silge is a software engineer at RStudio focusing on open source MLOps tools, as well as an author and international keynote speaker. Julia loves making beautiful charts, Jane Austen, and her two cats.
Isabel Zimmerman is also a software engineer on the open source team at RStudio, where she works on building MLOps frameworks. When she’s not geeking out over new data science techniques, she can be found hanging out with her dog or watching Marvel movies


Creating Features for Machine Learning from Text – Julia Silge, March 2022
Julia Silge is a software engineer at RStudio PBC where she works on open source modeling tools. She holds a PhD in astrophysics and has worked as a data scientist in tech and the nonprofit sector, as well as a technical advisory committee member for the US Bureau of Labor Statistics. She is an author, an international keynote speaker, and a real-world practitioner focusing on data analysis and machine learning. Julia loves text analysis, making beautiful charts, and communicating about technical topics with diverse audiences.
Natural language that we as speakers and writers use must be dramatically transformed to new representations for analysis, whether we are just starting off with exploratory data analysis or are ready to train machine learning algorithms such as predictive models. We can explore typical text preprocessing steps from the ground up, from tokenization to building word embeddings, and consider the effects of these steps. When are these preprocessing steps helpful, and when are they not? In this talk, learn about the process of text preprocessing for ML models in the real world, how and when practitioners use different preprocessing choices, and considerations for text ML tooling.
#rstats #nlp #juliasilge #coding #machinelearning https://rug-at-hdsi.org/ https://twitter.com/RUGatHDSI

Dr. Julia Silge | RStudio Voices | RStudio
Julia Silge recently sat down with Michael Demsko Jr for an interview, the first in a new Voices of RStudio PBC series.
In this excerpt, Julia discusses where she sees the most value created in the data science lifecycle–and it’s not advanced machine learning models.
Read the full interview at https://blog.rstudio.com/tags/rstudio-voices/

Julia Silge | Monitoring Model Performance | RStudio
0:00 Project introduction 1:50 Overview of the setup code chunk 3:05 Getting new data 4:05 Getting model from RStudio Connect using httr and jsonlite 6:20 Bringing in metrics 9:45 Using the pins package 10:50 Using boards on RStudio Connect 13:30 Benefits of using pins 14:00 Visualizations using ggplot and plotly 17:00 Knitting the flexdashboard 18:10 Project takeaways
You can read Julia’s blogpost, Model Monitoring with R Markdown, pins, and RStudio Connect, here: https://blog.rstudio.com/2021/04/08/model-monitoring-with-r-markdown/
Modelops playground GitHub repo: https://github.com/juliasilge/modelops-playground
pins package documentation: https://pins.rstudio.com/
flexdashboard documentation: https://rmarkdown.rstudio.com/flexdashboard/
tidymodels documentation: https://www.tidymodels.org/

David Robinson | The unreasonable effectiveness of public work | RStudio (2019)
In this talk, I’ll lay out the reasons that blogging, open source contribution, and other forms of public work are a critical part of a data science career. For beginners, a blog is a great accompaniment to data science coursework and tutorials, since it gives you experience applying practical data science skills to real problems. For data scientists at any stage of their careers, open source development offers practice in collaboration, documentation, and interface design that complement other kinds of software development. And for data scientists more advanced in their careers, writing a book is a great way to crystallize your expertise and ensure others can build on it. All of these practices build skills in communication and collaboration that form an essential component of data science work. Each also lets you build a public portfolio of your skills, get feedback from your peers, and network with the larger data science community.
VIEW MATERIALS https://bit.ly/drob-rstudio-2019
About the Author David Robinson David is the Chief Data Scientist at DataCamp, an education company for teaching data science through interactive online courses. His interests include statistics, data analysis, education, and programming in R. David is co-author with Julia Silge of the tidytext package and the O’Reilly book Text Mining with R. He also the author of the broom, gganimate, and fuzzyjoin packages, and of the e-book Introduction to Empirical Bayes. David previously worked as a data scientist at Stack Overflow, and received a PhD in Quantitative and Computational Biology from Princeton University

