plotnine

Lessons from a Broad & Varied Data Science Career | Arcenis Rojas | Data Science Hangout

ADD THE DATA SCIENCE HANGOUT TO YOUR CALENDAR HERE: https://pos.it/dsh - All are welcome! We’d love to see you!

We were recently joined by Arcenis Rojas, Data Scientist at Indeed, to chat about econometrics, public vs private sector data science, navigating a varied career trajectory, AI integration in the hiring sphere, and making friends at conferences.

In this Hangout, Arcenis talked about how his career journey has been wide as opposed to vertically narrow. He shared that this breadth of experience has given him confidence that he can quickly figure out any dataset. He feels it also taught him how to communicate effectively about data to people at different levels and across various domains. He also shared his tech stack at Indeed, including RStudio, Positron, AWS, Snowflake, Quarto for reporting, Shiny for apps, and Posit Connect for deploying them.

An attendee asked about the impacts of AI on the job search space, and Arcenis shared the AI at Work Report (linked below) from the Indeed Hiring Lab. He says, based on research, generative AI is expected to assist many people but only replace small segments of the workforce in the coming 5-10 years, and that entry-level knowledge work is predicted to be the most highly impacted area.

Resources mentioned in the video and zoom chat: Indeed Hiring Lab: AI at Work Report 2025 → https://www.hiringlab.org/2025/09/23/ai-at-work-report-2025-how-genai-is-rewiring-the-dna-of-jobs/ To Explain or to Predict? (Galit Shmueli, 2010) → https://arxiv.org/abs/1101.0891 Announcing the 2025 table and plotnine contests → https://posit.co/blog/announcing-the-2025-table-and-plotnine-contests/

If you didn’t join live, one great discussion you missed from the zoom chat was about the wide variety of data types data scientists work with. Attendees shared that their data included genomics, finance/trading, environmental/natural resources, e-commerce products, and medical/clinical data. What kind of data types do you work with?

► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu

Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co

Thanks for hanging out with us!

Timestamps 00:00 Introduction 06:16 “What do you like to do for fun?” 08:51 “What are the unique aspects of financial and economic data science?” 15:07 “What are econometrics?” 16:02 “Is the difference that hard sciences stats is trying to explain what happened where econometrics might be what might happen in the future?” 19:39 “Suggestions for making data friends and going to a conference alone.” 23:26 “Do you see any misconceptions about the job market online, specifically the ATS thing?” 29:52 “How has your varied career trajectory been an advantage or a challenge in data science?” 34:08 “How is the recent hype wave of AI integration manifesting in the hiring sphere?” 40:08 “What are the tools that you use in your job for reporting?” 41:42 “How do you know when it is time to pivot and leave your role because your skills are stagnating?” 45:56 “How would you persuade leadership to use R or Python?” 49:32 “Did you find yourself always trying to use more complex models when simpler ones would serve the audience better?”

Migrating to Open Source & the Future of Biostatistics | Beth Atkinson | Data Science Hangout

ADD THE DATA SCIENCE HANGOUT TO YOUR CALENDAR HERE: https://pos.it/dsh - All are welcome! We’d love to see you!

We were recently joined by Beth Atkinson, principal biostatistician at Mayo Clinic, to chat about the challenges of migrating from SAS to R, working with diverse and noisy data types (including wearable data and omics projects), foundational tooling like RMarkdown and Quarto, and maintaining statistical fundamentals amidst the hype cycle of new tools like AI.

In this Hangout, we explore the challenges of working with complex, high-volume data, like the data derived from wearable devices and medical charts. A challenge with wearable device data is that it can be super noisy, with issues like computers not syncing up, people forgetting to wear the device, or someone else wearing it. Medical chart data can be inconsistent; some things are recorded, and some are not. She also talks about the R/Medicine conference, the future of modern biostatistics, and the journey of compassionately helping an organization move from proprietary tools like SAS to open source tools like R.

Beth also works on omics projects, including genomics (looking at DNA), metabolomics, exposomics (chemical exposures), and multiomics, which involves looking at all of this information together in a holistic way. We hope you’ll come along with us if you’re interested in learning about the biomedical world of data!

Resources mentioned in the video and zoom chat: R/Medicine Conference website → https://rconsortium.github.io/RMedicine_website/ arsenal R package (MayoVerse) → https://mayoverse.github.io/arsenal/ (The arsenal package was created to help encourage transition from SAS to R by providing equivalent functionality for summary reporting macros that people relied on.) 2025 Posit Table and Plotnine Contests → https://posit.co/blog/announcing-the-2025-table-and-plotnine-contests/

If you didn’t join live, one great discussion you missed from the zoom chat was about the general dislike of regular expressions (regex) and the tendency to rely on tools like ChatGPT to write complex regex syntax. Many in the chat agreed that regex is difficult to commit to memory but acknowledged the power of the tool. So… do you use an LLM to help with regex?

► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu

Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co

Thanks for hanging out with us!

Timestamps 00:00 Introduction 03:25 “You do a lot of things that end in -omics. What are those things?” 05:02 “What are the types of data that you work with and some of the challenges that you face with those data?” 09:37 “What was your favorite new feature that made your work easier?” 11:42 “What is your favorite data science tool or R package that you find helpful in health research as a biostatistician?” 14:04 “I wanted to see if that’s consistent with your experience [that 80% of workflow is data prep]” 17:07 “Does it scare you to hand off data to be cleaned by someone else?” 18:11 “What have you noticed that we still need to adhere to [regarding statistics fundamentals]?” 22:48 “Do you also produce reporting products as part of your role, and is your audience primarily internal and narrow, or do you communicate with a broader external audience as well?” 26:35 “Can you talk about a little bit of your personal SAS experience as well as the bigger organizational change maybe that Mayo is is doing?” 30:05 “What are some of the roadblocks that are faced in a SAS-to-R journey and and how can we find compassion for the people that we are helping to transition?” 33:55 “What is the community aspect internally at Mayo Clinic around R?” 35:43 “How do you store and manage all of that [data]?” 40:41 “What tools and skill sets should we focus on if we want to get into biostats today? Do you think it’s important for people to still learn SAS if they’re coming in fresh? And how about the future of biostatistics as a role separate from data science?” 45:48 “Is it possible for someone with a nontraditional background to make these transitions [into computational epidemiology]?” 48:10 “What’s the source of most of these innovations?” 50:05 “Could you talk a little bit about R/Medicine conference?”

R & Python Interoperability in Data Science Teams | Dave Gruenewald | Data Science Hangout

ADD THE DATA SCIENCE HANGOUT TO YOUR CALENDAR HERE: https://pos.it/dsh - All are welcome! We’d love to see you!

We were recently joined by Dave Gruenewald, Senior Director of Data Science at Centene, to chat about polyglot teams, data science best practices, right-sizing development efforts, and process automation.

In this Hangout, we explore working in a polyglot team and fostering interoperability (a word that Libby loves, but struggles to pronounce out loud). Dave Gruenewald emphasizes that teams should use the tools they are comfortable with, whether that’s R or Python. Some strategies for collaboration across languages that Dave suggests include tools like Quarto to seamlessly run R and Python code in the same report. Teams utilize data science checkpoints, saving outputs as platform-agnostic file types like Parquet so that they can be accessed by any language. The use of REST APIs allows R processes to be accessed programmatically by Python (and vice versa), which can be a real game-changer. The newly released nanonext package was also highlighted as a promising development for improved interoperability.

Resources mentioned in the video and zoom chat: Posit Conf 2025 Table and Plotnine Contests → https://posit.co/blog/announcing-the-2025-table-and-plotnine-contests/ nanonext 1.7.0 Tidyverse Blog Post → https://www.tidyverse.org/blog/2025/09/nanonext-1-7-0/

If you didn’t join live, one great discussion you missed from the zoom chat was about pivoting away from academia, including leaving PhD programs. Many attendees shared their personal experiences of making the difficult decision to drop out of a PhD program. The community suggested alternative terms like “pivot,” “reallocating your resources,” or being a “refugee fleeing academia” instead of “drop out.” Dave Gruenewald shared that he himself left a PhD program but has “no regrets about that.” Did you leave a PhD program? You’re not alone!

► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu

Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co

Thanks for hanging out with us!

Timestamps: 00:00 Introduction 02:21 “What types of data do your teams use?” 06:53 “Which of the three pillars you mentioned is your personal favorite to work on?” 09:26 “How do you avoid or divert scope creep?” 11:41 “How much of the project should be “planning” before any code happens?” 13:53 “Do you feel like people are just hopping in and going, hey, LLM, make me a POC?” 14:28 “Do you give them what they say they want, or do you give them what they need?” 16:40 “I’m wondering what public data do you wish existed?” 18:48 “Why not Positron yet?” 20:43 “How do you unify as a team and make it so that I can always read everybody else’s code?” 23:10 “Could you talk a little bit about how R and Python work together?” 27:28 “How to start package development with a team who are very new to package development.” 33:01 “What’s your greatest regret career wise?” 35:53 “What about your biggest wins, specifically in your early career?” 39:40 “How would you recommend building a data science culture and community from scratch?” 41:49 “Would you set a specific timeline for EDA, exploratory analysis, to scope the project better?” 45:15 “How do you define fun projects, and how much time do you allocate for exploration in those?” 48:21 “Does your team use DVC or something similar for data version control?” 50:00 “Can you talk a bit more about your pivot from academia into data science?” 51:31 “Any advice on where to look for opportunities in data science after getting a masters degree?”

Data Science at the Command Line and Polars | Jeroen Janssens | Data Science Hangout

To join future data science hangouts, add it to your calendar here: https://pos.it/dsh - All are welcome! We’d love to see you!

We were recently joined by Jeroen Janssens, Senior Developer Relations Engineer at Posit, to chat about his career journey from machine learning to developer relations, the advantages of using the command line for data science, his books “Data Science at the Command Line” and “Python Polars”, and advice for aspiring DevRel professionals.

In this Hangout, we explore the benefits of working on the command line versus not. Jeroen explained that while the initial command line interface might seem stark, it offers a very different and powerful way to interact with your computer. The Unix command line is ubiquitous across various systems, from Raspberry Pis to supercomputers. Its strength lies in the ability to connect tools together through standard output and input, allowing for quick and iterative solutions by combining specialized tools. This fosters an interactive nature with a short feedback loop and provides closer interaction with the file system, making ad hoc data exploration efficient.

Resources mentioned in the video and zoom chat: Jeroen’s LinkedIn → https://www.linkedin.com/in/jeroenjanssens/ Data Science at the Command Line → https://jeroenjanssens.com/dsatcl/ Python Polars: The Definitive Guide → https://polarsguide.com/ Plotnine → https://plotnine.org/ Winner of the 2024 plotnine Plotting Contest → https://posit.co/blog/winner-of-the-2024-plotnine-plotting-contest/ Talk about plotnine → https://www.youtube.com/watch?v=xdD8r84sqYY R for Data Science → https://r4ds.had.co.nz/ Jeroen’s plotnine translation of R for Data Science → https://jeroenjanssens.com/plotnine/ froggeR package → https://azimuth-project.tech/froggeR/ Reticulate → https://rstudio.github.io/reticulate/ Install Windows Subsystem for Linux (WSL) → https://learn.microsoft.com/en-us/windows/wsl/install UTM for macOS (Virtualization) → https://mac.getutm.app fish shell → https://fishshell.com/ Quartodoc → https://github.com/machow/quartodoc Focusmate (Accountability Partner Tool) → https://www.focusmate.com/ Surface Area of Luck → https://modelthinkers.com/mental-model/surface-area-of-luck CRAN R Extensions Manual → https://cran.r-project.org/doc/manuals/r-release/R-exts.html

If you didn’t join live, one great thing you missed from the zoom chat was people sharing their varied experiences with the command line, with many admitting they primarily use it for basic navigation or only when necessary, and some sharing helpful tools and tips for those less familiar. Let us know below if you’d like to hear more about this topic!

► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co

Thanks for hanging out with us!

Jeroen Janssens

Janssens, Chow & Nieuwdorp - Turning DataFrames into Pretty Pictures with Plotnine | PyData NYC 2024

www.pydata.org

Learn how Plotnine, a Python package inspired by R’s ggplot2, enables the creation of sophisticated and effective data visualizations with minimal effort. This tutorial will explain how Plotnine’s grammar of graphics approach provides a flexible, intuitive way to visualize data, either as ad-hoc plots or fine-tuned graphs suited for communication.

Quick links Presentation: https://bit.ly/plotnine-tutorial GitHub repository: https://bit.ly/plotnine-repo Slideshow about what to expect: https://bit.ly/expect-plotnine

PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.

PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.

00:00 Welcome! 00:10 Help us add time stamps or captions to this video! See the description for details.

Want to help add timestamps to our YouTube videos to help with discoverability? Find out more here: https://github.com/numfocus/YouTubeVideoTimestamps

Jeroen Janssens - How I hacked UMAP and won at a plotting contest | PyData Amsterdam 2024

www.pydata.org

In this talk, I’ll share my journey of animating UMAP, a cutting-edge dimensionality reduction algorithm, by visualizing not just its final output but each intermediate step as well. I’ll explain why and how I modified UMAP’s source code, while also demonstrating the use of Polars for data wrangling, Plotnine for visualization, and ffmpeg for animation. The result ultimately earned me a runner-up position in the 2024 Plotnine plotting contest.

PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.

PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.

00:00 Welcome! 01:06 Plotnine contest 02:18 Hoi! I’m Jeroen 03:13 Visualizing the behavior of an algorithm 07:58 Keeping track of intermediate predictions 09:23 Tool: UMAP 12:31 Tool: Polars 13:15 Python Polars: The Definitive Guide 13:40 MNIST dataset 14:46 Tool: Plotnine 17:04 Tool: FFmpeg 18:58 Final result 19:38 To conclude 20:40 Q & A

Jeroen Janssens

posit::conf(2023) Workshop: Introduction to Data Science with Python

Register now: http://pos.it/conf Instructors: Posit Academy Instructors Workshop Duration: 1 Day Workshop

This course is ideal for: • those new to Python • anyone who has dabbled in Python, but is not sure how to use Python to do data science • R users who want to work more closely with Python users on their team

This is not a standard workshop, but a six-week online apprenticeship that culminates in one in-person day at posit::conf(2023). Begins August 7th, 2023. No knowledge of Python required. Visit posit.co/academy to learn more about this uniquely effective learning format.

Here, you will learn the foundations of Python and data analysis under the guidance of a Posit Academy mentor and in the company of a close group of fellow learners. You will be expected to complete a weekly curriculum of interactive tutorials, and to attend a weekly presentation meeting with your mentor and fellow students. Topics will include importing packages and datasets, visualizing data with plotnine, wrangling data with pandas, writing and applying functions, and reporting reproducibly with quarto

plotnine Quarto Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Michael Chow | Bringing the Tidyverse to Python with Siuba | RStudio

Last January I left my job to spend a year developing siuba, a python port of dplyr. At its core, this decision was driven by a decade of watching python and R users produce similar analyses, but in very different ways.

In this talk, I’ll discuss 3 ways siuba enables R users to transfer their hard-earned programming knowledge to python: (1) leveraging the power of dplyr syntax, (2) options to generate SQL code, and (3) working with the plotnine plotting library.

Looking back, I’ll consider two critical pieces that have helped me develop siuba: using it to livecode TidyTuesday analyses, and building an interactive tutorial for absolute beginners.

About Michael: Michael Chow is a data scientist and learning researcher. He serves as a co-director at Code for Philly. In past lives, he worked on adaptive assessment tools in ed tech, and received a PhD in cognitive psychology from Princeton University

Michael Chow

dplyr plotnine rstudio tidyverse Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Forcats Rstats Open Source OSS Reticulate Siuba Michael Chow SQL

Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#

plotnine

Contributors#

Hassan Kibirige

Michael Chow

Jeroen Janssens

Hadley Wickham

Isabel Zimmerman

Resources featuring plotnine#

Data visualization with Plotnine

Polars: The Blazing Fast Python Framework for Modern Clinical Trial Data Exploration

Lessons from a Broad & Varied Data Science Career | Arcenis Rojas | Data Science Hangout

Migrating to Open Source & the Future of Biostatistics | Beth Atkinson | Data Science Hangout

R & Python Interoperability in Data Science Teams | Dave Gruenewald | Data Science Hangout

Building Multilingual Data Science Teams (Michael Thomas, Ketchbrook Analytics) | posit::conf(2025)

Visualizing Gas Prices | PydyTuesday Uncut #2

Exploring Web APIs | PydyTuesday Uncut #1

Data Science at the Command Line and Polars | Jeroen Janssens | Data Science Hangout

Janssens, Chow & Nieuwdorp - Turning DataFrames into Pretty Pictures with Plotnine | PyData NYC 2024

Jeroen Janssens - How I hacked UMAP and won at a plotting contest | PyData Amsterdam 2024

Grammar of Graphics in Python with Plotnine - posit::conf(2023)

Fun fact, the logo for plotnine was made…in plotnine😎 #plotnine #pythontips #pythondatascience

posit::conf(2023) Workshop: Introduction to Data Science with Python

Michael Chow | Bringing the Tidyverse to Python with Siuba | RStudio

Posts about plotnine#

Version 0.14.0

2024 Plotnine Contest - Last Call

Seeing Beyond Statistics: Anscombe’s Quartet and the Power of Graphs

About Plotnine