plotnine
A Grammar of Graphics for Python
Plotnine brings the power of the grammar of graphics to Python, offering a systematic approach to building visualizations by explicitly mapping data to visual properties. Inspired by R’s ggplot2, plotnine enables data scientists and developers to construct complex, publication-quality graphics incrementally through a compositional, layer-based approach. Whether you’re conducting exploratory data analysis or creating professional reports, plotnine makes it easy to build sophisticated visualizations by combining simple, intuitive building blocks.
The library excels at making complex plots easy to reason about while keeping simple plots simple to create. With support for faceting, statistical transformations, extensive theming options, and a syntax familiar to ggplot2 users, plotnine provides a consistent and powerful framework for creating data narratives. This project is funded by Posit, bringing the elegance and flexibility of the grammar of graphics to the Python ecosystem.
Contributors#
Resources featuring plotnine#
Polars: The Blazing Fast Python Framework for Modern Clinical Trial Data Exploration
Polars: The Blazing Fast Python Framework for Modern Clinical Trial Data Exploration - Michael Chow, Jeroen Janssens
Abstract: Clinical trials generate complex and standards driven datasets that can slow down traditional data processing tools. This workshop introduces Polars, a cutting-edge Python DataFrame library engineered with a high-performance backend and the Apache Arrow columnar format for blazingly fast data manipulation. Attendees will learn how Polars lays the foundation for the pharmaverse-py, streamlining the data clinical workflow from database querying and complex data wrangling to the potential task of prepping data for regulatory Tables, Figures, and Listings (TFLs). Discover the ‘delightful’ Polars API and how its speed dramatically accelerates both exploratory and regid data tasks in pharmaceutical drug development. The workshop is led by Michael Chow, a Python developer at Posit who is a key contributor to open-source data tools, notably helping to launch the data presentation library Great Tables, and focusing on bringing efficient data analysis patterns to Python.
Resources mentioned in the workshop:
- Polars documentation: https://docs.pola.rs/
- Plotnine documentation: https://plotnine.org/
- pyreadstat: https://github.com/Roche/pyreadstat
- Examples of Great Tables and Pharma TFLs: https://github.com/machow/examples-great-tables-pharma
- UV Python package manager: https://docs.astral.sh/uv


Lessons from a Broad & Varied Data Science Career | Arcenis Rojas | Data Science Hangout
ADD THE DATA SCIENCE HANGOUT TO YOUR CALENDAR HERE: https://pos.it/dsh - All are welcome! We’d love to see you!
We were recently joined by Arcenis Rojas, Data Scientist at Indeed, to chat about econometrics, public vs private sector data science, navigating a varied career trajectory, AI integration in the hiring sphere, and making friends at conferences.
In this Hangout, Arcenis talked about how his career journey has been wide as opposed to vertically narrow. He shared that this breadth of experience has given him confidence that he can quickly figure out any dataset. He feels it also taught him how to communicate effectively about data to people at different levels and across various domains. He also shared his tech stack at Indeed, including RStudio, Positron, AWS, Snowflake, Quarto for reporting, Shiny for apps, and Posit Connect for deploying them.
An attendee asked about the impacts of AI on the job search space, and Arcenis shared the AI at Work Report (linked below) from the Indeed Hiring Lab. He says, based on research, generative AI is expected to assist many people but only replace small segments of the workforce in the coming 5-10 years, and that entry-level knowledge work is predicted to be the most highly impacted area.
Resources mentioned in the video and zoom chat: Indeed Hiring Lab: AI at Work Report 2025 → https://www.hiringlab.org/2025/09/23/ai-at-work-report-2025-how-genai-is-rewiring-the-dna-of-jobs/ To Explain or to Predict? (Galit Shmueli, 2010) → https://arxiv.org/abs/1101.0891 Announcing the 2025 table and plotnine contests → https://posit.co/blog/announcing-the-2025-table-and-plotnine-contests/
If you didn’t join live, one great discussion you missed from the zoom chat was about the wide variety of data types data scientists work with. Attendees shared that their data included genomics, finance/trading, environmental/natural resources, e-commerce products, and medical/clinical data. What kind of data types do you work with?
► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu
Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co
Thanks for hanging out with us!
Timestamps 00:00 Introduction 06:16 “What do you like to do for fun?” 08:51 “What are the unique aspects of financial and economic data science?” 15:07 “What are econometrics?” 16:02 “Is the difference that hard sciences stats is trying to explain what happened where econometrics might be what might happen in the future?” 19:39 “Suggestions for making data friends and going to a conference alone.” 23:26 “Do you see any misconceptions about the job market online, specifically the ATS thing?” 29:52 “How has your varied career trajectory been an advantage or a challenge in data science?” 34:08 “How is the recent hype wave of AI integration manifesting in the hiring sphere?” 40:08 “What are the tools that you use in your job for reporting?” 41:42 “How do you know when it is time to pivot and leave your role because your skills are stagnating?” 45:56 “How would you persuade leadership to use R or Python?” 49:32 “Did you find yourself always trying to use more complex models when simpler ones would serve the audience better?”
Migrating to Open Source & the Future of Biostatistics | Beth Atkinson | Data Science Hangout
ADD THE DATA SCIENCE HANGOUT TO YOUR CALENDAR HERE: https://pos.it/dsh - All are welcome! We’d love to see you!
We were recently joined by Beth Atkinson, principal biostatistician at Mayo Clinic, to chat about the challenges of migrating from SAS to R, working with diverse and noisy data types (including wearable data and omics projects), foundational tooling like RMarkdown and Quarto, and maintaining statistical fundamentals amidst the hype cycle of new tools like AI.
In this Hangout, we explore the challenges of working with complex, high-volume data, like the data derived from wearable devices and medical charts. A challenge with wearable device data is that it can be super noisy, with issues like computers not syncing up, people forgetting to wear the device, or someone else wearing it. Medical chart data can be inconsistent; some things are recorded, and some are not. She also talks about the R/Medicine conference, the future of modern biostatistics, and the journey of compassionately helping an organization move from proprietary tools like SAS to open source tools like R.
Beth also works on omics projects, including genomics (looking at DNA), metabolomics, exposomics (chemical exposures), and multiomics, which involves looking at all of this information together in a holistic way. We hope you’ll come along with us if you’re interested in learning about the biomedical world of data!
Resources mentioned in the video and zoom chat: R/Medicine Conference website → https://rconsortium.github.io/RMedicine_website/ arsenal R package (MayoVerse) → https://mayoverse.github.io/arsenal/ (The arsenal package was created to help encourage transition from SAS to R by providing equivalent functionality for summary reporting macros that people relied on.) 2025 Posit Table and Plotnine Contests → https://posit.co/blog/announcing-the-2025-table-and-plotnine-contests/
If you didn’t join live, one great discussion you missed from the zoom chat was about the general dislike of regular expressions (regex) and the tendency to rely on tools like ChatGPT to write complex regex syntax. Many in the chat agreed that regex is difficult to commit to memory but acknowledged the power of the tool. So… do you use an LLM to help with regex?
► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu
Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co
Thanks for hanging out with us!
Timestamps 00:00 Introduction 03:25 “You do a lot of things that end in -omics. What are those things?” 05:02 “What are the types of data that you work with and some of the challenges that you face with those data?” 09:37 “What was your favorite new feature that made your work easier?” 11:42 “What is your favorite data science tool or R package that you find helpful in health research as a biostatistician?” 14:04 “I wanted to see if that’s consistent with your experience [that 80% of workflow is data prep]” 17:07 “Does it scare you to hand off data to be cleaned by someone else?” 18:11 “What have you noticed that we still need to adhere to [regarding statistics fundamentals]?” 22:48 “Do you also produce reporting products as part of your role, and is your audience primarily internal and narrow, or do you communicate with a broader external audience as well?” 26:35 “Can you talk about a little bit of your personal SAS experience as well as the bigger organizational change maybe that Mayo is is doing?” 30:05 “What are some of the roadblocks that are faced in a SAS-to-R journey and and how can we find compassion for the people that we are helping to transition?” 33:55 “What is the community aspect internally at Mayo Clinic around R?” 35:43 “How do you store and manage all of that [data]?” 40:41 “What tools and skill sets should we focus on if we want to get into biostats today? Do you think it’s important for people to still learn SAS if they’re coming in fresh? And how about the future of biostatistics as a role separate from data science?” 45:48 “Is it possible for someone with a nontraditional background to make these transitions [into computational epidemiology]?” 48:10 “What’s the source of most of these innovations?” 50:05 “Could you talk a little bit about R/Medicine conference?”
R & Python Interoperability in Data Science Teams | Dave Gruenewald | Data Science Hangout
ADD THE DATA SCIENCE HANGOUT TO YOUR CALENDAR HERE: https://pos.it/dsh - All are welcome! We’d love to see you!
We were recently joined by Dave Gruenewald, Senior Director of Data Science at Centene, to chat about polyglot teams, data science best practices, right-sizing development efforts, and process automation.
In this Hangout, we explore working in a polyglot team and fostering interoperability (a word that Libby loves, but struggles to pronounce out loud). Dave Gruenewald emphasizes that teams should use the tools they are comfortable with, whether that’s R or Python. Some strategies for collaboration across languages that Dave suggests include tools like Quarto to seamlessly run R and Python code in the same report. Teams utilize data science checkpoints, saving outputs as platform-agnostic file types like Parquet so that they can be accessed by any language. The use of REST APIs allows R processes to be accessed programmatically by Python (and vice versa), which can be a real game-changer. The newly released nanonext package was also highlighted as a promising development for improved interoperability.
Resources mentioned in the video and zoom chat: Posit Conf 2025 Table and Plotnine Contests → https://posit.co/blog/announcing-the-2025-table-and-plotnine-contests/ nanonext 1.7.0 Tidyverse Blog Post → https://www.tidyverse.org/blog/2025/09/nanonext-1-7-0/
If you didn’t join live, one great discussion you missed from the zoom chat was about pivoting away from academia, including leaving PhD programs. Many attendees shared their personal experiences of making the difficult decision to drop out of a PhD program. The community suggested alternative terms like “pivot,” “reallocating your resources,” or being a “refugee fleeing academia” instead of “drop out.” Dave Gruenewald shared that he himself left a PhD program but has “no regrets about that.” Did you leave a PhD program? You’re not alone!
► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu
Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co
Thanks for hanging out with us!
Timestamps: 00:00 Introduction 02:21 “What types of data do your teams use?” 06:53 “Which of the three pillars you mentioned is your personal favorite to work on?” 09:26 “How do you avoid or divert scope creep?” 11:41 “How much of the project should be “planning” before any code happens?” 13:53 “Do you feel like people are just hopping in and going, hey, LLM, make me a POC?” 14:28 “Do you give them what they say they want, or do you give them what they need?” 16:40 “I’m wondering what public data do you wish existed?” 18:48 “Why not Positron yet?” 20:43 “How do you unify as a team and make it so that I can always read everybody else’s code?” 23:10 “Could you talk a little bit about how R and Python work together?” 27:28 “How to start package development with a team who are very new to package development.” 33:01 “What’s your greatest regret career wise?” 35:53 “What about your biggest wins, specifically in your early career?” 39:40 “How would you recommend building a data science culture and community from scratch?” 41:49 “Would you set a specific timeline for EDA, exploratory analysis, to scope the project better?” 45:15 “How do you define fun projects, and how much time do you allocate for exploration in those?” 48:21 “Does your team use DVC or something similar for data version control?” 50:00 “Can you talk a bit more about your pivot from academia into data science?” 51:31 “Any advice on where to look for opportunities in data science after getting a masters degree?”
Building Multilingual Data Science Teams (Michael Thomas, Ketchbrook Analytics) | posit::conf(2025)
Building Multilingual Data Science Teams
Speaker(s): Michael Thomas
Abstract:
For much of my career, I have seen data science teams make the critical decision of deciding whether they are going to be an “R shop” or a “Python shop”. Doing both seemed impossible. I argue that this has changed drastically, as we have built out an effective multilingual data science team at Ketchbrook, thanks to polars/dplyr, gt/great-tables, ggplot2/plotnine, arrow, duckdb, Quarto, etc. I would like to provide a walk through of our journey to developing a multilingual data science team, lessons learned, and best practices. posit::conf(2025) Subscribe to posit::conf updates: https://posit.co/about/subscription-management/
Visualizing Gas Prices | PydyTuesday Uncut #2
Join Michael Chow (open source developer at Posit) and Jeroen Janssens (developer relations engineer at Posit) as they dive into this week’s #PydyTuesday dataset. This time, they visualize gas prices using the four P’s: Positron, Python, Polars, and Plotnine.
True to the “PydyTuesday Uncut” title, this video is completely unedited. Every typo, mistake, web search, and “aha!” moment is left in so you can see exactly how we approach a new dataset from scratch.
Things mentioned during the session and related resources:
- Weekly US Gas Prices https://github.com/rfordatascience/tidytuesday/blob/main/data/2025/2025-07-01/readme.md
- Code produced during the session https://github.com/jeroenjanssens/pydytuesday-uncut/
- Plotnine https://plotnine.org
- Positron https://positron.posit.co
#python #polars #tidytuesday #datascience


Exploring Web APIs | PydyTuesday Uncut #1
Join Michael Chow (open source developer at Posit) and Jeroen Janssens (developer relations engineer at Posit) as they dive into this week’s #PydyTuesday dataset about Web APIs. Tools include uv, Positron, Polars, Plotnine, Great Tables, and the Unix command line.
True to the “PydyTuesday Uncut” title, this video is completely unedited. Every typo, mistake, web search, and “aha!” moment is left in so you can see exactly how others approach a new dataset from scratch.
Things mentioned during the session and related resources:
- Code produced during the session: https://github.com/jeroenjanssens/pydytuesday-uncut/blob/main/2025-06-17/01-start.py
- PydyTuesday https://github.com/posit-dev/pydytuesday
- TidyTuesday https://github.com/rfordatascience/tidytuesday
- Getting Data from the TidyTuesday Repo with Python https://www.youtube.com/watch?v=ol2FrSL5gVU
- Positron IDE https://positron.posit.co
- Data Science at the Command Line https://jeroenjanssens.com/dsatcl/
- Python Polars: The Definitive Guide https://polarsguide.com
- Polars https://pola.rs
- Plotnine https://plotnine.org
- Great Tables https://posit-dev.github.io/great-tables/
- The Big Year https://www.imdb.com/title/tt1053810/
00:00 Introduction 02:46 Getting the data with uv 13:18 Positron IDE 17:42 Importing Polars 23:17 Plotting a bar chart with Plotnine 33:55 Inspecting duplicates 46:30 Handling missing values 58:56 Crafting a great table 1:38:48 Reflection


Data Science at the Command Line and Polars | Jeroen Janssens | Data Science Hangout
To join future data science hangouts, add it to your calendar here: https://pos.it/dsh - All are welcome! We’d love to see you!
We were recently joined by Jeroen Janssens, Senior Developer Relations Engineer at Posit, to chat about his career journey from machine learning to developer relations, the advantages of using the command line for data science, his books “Data Science at the Command Line” and “Python Polars”, and advice for aspiring DevRel professionals.
In this Hangout, we explore the benefits of working on the command line versus not. Jeroen explained that while the initial command line interface might seem stark, it offers a very different and powerful way to interact with your computer. The Unix command line is ubiquitous across various systems, from Raspberry Pis to supercomputers. Its strength lies in the ability to connect tools together through standard output and input, allowing for quick and iterative solutions by combining specialized tools. This fosters an interactive nature with a short feedback loop and provides closer interaction with the file system, making ad hoc data exploration efficient.
Resources mentioned in the video and zoom chat: Jeroen’s LinkedIn → https://www.linkedin.com/in/jeroenjanssens/ Data Science at the Command Line → https://jeroenjanssens.com/dsatcl/ Python Polars: The Definitive Guide → https://polarsguide.com/ Plotnine → https://plotnine.org/ Winner of the 2024 plotnine Plotting Contest → https://posit.co/blog/winner-of-the-2024-plotnine-plotting-contest/ Talk about plotnine → https://www.youtube.com/watch?v=xdD8r84sqYY R for Data Science → https://r4ds.had.co.nz/ Jeroen’s plotnine translation of R for Data Science → https://jeroenjanssens.com/plotnine/ froggeR package → https://azimuth-project.tech/froggeR/ Reticulate → https://rstudio.github.io/reticulate/ Install Windows Subsystem for Linux (WSL) → https://learn.microsoft.com/en-us/windows/wsl/install UTM for macOS (Virtualization) → https://mac.getutm.app fish shell → https://fishshell.com/ Quartodoc → https://github.com/machow/quartodoc Focusmate (Accountability Partner Tool) → https://www.focusmate.com/ Surface Area of Luck → https://modelthinkers.com/mental-model/surface-area-of-luck CRAN R Extensions Manual → https://cran.r-project.org/doc/manuals/r-release/R-exts.html
If you didn’t join live, one great thing you missed from the zoom chat was people sharing their varied experiences with the command line, with many admitting they primarily use it for basic navigation or only when necessary, and some sharing helpful tools and tips for those less familiar. Let us know below if you’d like to hear more about this topic!
► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co
Thanks for hanging out with us!

Janssens, Chow & Nieuwdorp - Turning DataFrames into Pretty Pictures with Plotnine | PyData NYC 2024
Learn how Plotnine, a Python package inspired by R’s ggplot2, enables the creation of sophisticated and effective data visualizations with minimal effort. This tutorial will explain how Plotnine’s grammar of graphics approach provides a flexible, intuitive way to visualize data, either as ad-hoc plots or fine-tuned graphs suited for communication.
Quick links Presentation: https://bit.ly/plotnine-tutorial GitHub repository: https://bit.ly/plotnine-repo Slideshow about what to expect: https://bit.ly/expect-plotnine
PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.
PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.
00:00 Welcome! 00:10 Help us add time stamps or captions to this video! See the description for details.
Want to help add timestamps to our YouTube videos to help with discoverability? Find out more here: https://github.com/numfocus/YouTubeVideoTimestamps
Grammar of Graphics in Python with Plotnine - posit::conf(2023)
Presented by Hassan Kibirige
{plotnine} brings the elegance of {ggplot2} to the Python programming language. Learn about The Grammar of Graphics and get a feel of why it is an effective way to create Statistical Graphics.
ggplot2 is one of the most loved visualisation libraries. It implements a Grammar of Graphics system, which requires one to think about data in terms of columns of variables and how to transform them into geometric objects. It is elegant and powerful. This is a talk about plotnine, which brings the elegance of ggplot2 to the Python programming language. It is an invitation to learn about the Grammar of Graphics system and to appreciate it. It will include some tips on how to avoid common frustrations as you learn the system.
Materials:
- Website: https://plotnine.org
- Source Code: https://github.com/has2k1/plotnine
- Slides for this talk: https://github.com/has2k1/my-talks
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Data science with Python. Session Code: TALK-1137

posit::conf(2023) Workshop: Introduction to Data Science with Python
Register now: http://pos.it/conf Instructors: Posit Academy Instructors Workshop Duration: 1 Day Workshop
This course is ideal for: • those new to Python • anyone who has dabbled in Python, but is not sure how to use Python to do data science • R users who want to work more closely with Python users on their team
This is not a standard workshop, but a six-week online apprenticeship that culminates in one in-person day at posit::conf(2023). Begins August 7th, 2023. No knowledge of Python required. Visit posit.co/academy to learn more about this uniquely effective learning format.
Here, you will learn the foundations of Python and data analysis under the guidance of a Posit Academy mentor and in the company of a close group of fellow learners. You will be expected to complete a weekly curriculum of interactive tutorials, and to attend a weekly presentation meeting with your mentor and fellow students. Topics will include importing packages and datasets, visualizing data with plotnine, wrangling data with pandas, writing and applying functions, and reporting reproducibly with quarto