Software by Michael Chow#
Events attended by Michael Chow#
Posts and resources by Michael Chow#
Polars: The Blazing Fast Python Framework for Modern Clinical Trial Data Exploration
Polars: The Blazing Fast Python Framework for Modern Clinical Trial Data Exploration - Michael Chow, Jeroen Janssens
Abstract: Clinical trials generate complex and standards driven datasets that can slow down traditional data processing tools. This workshop introduces Polars, a cutting-edge Python DataFrame library engineered with a high-performance backend and the Apache Arrow columnar format for blazingly fast data manipulation. Attendees will learn how Polars lays the foundation for the pharmaverse-py, streamlining the data clinical workflow from database querying and complex data wrangling to the potential task of prepping data for regulatory Tables, Figures, and Listings (TFLs). Discover the ‘delightful’ Polars API and how its speed dramatically accelerates both exploratory and regid data tasks in pharmaceutical drug development. The workshop is led by Michael Chow, a Python developer at Posit who is a key contributor to open-source data tools, notably helping to launch the data presentation library Great Tables, and focusing on bringing efficient data analysis patterns to Python.
Resources mentioned in the workshop:
- Polars documentation: https://docs.pola.rs/
- Plotnine documentation: https://plotnine.org/
- pyreadstat: https://github.com/Roche/pyreadstat
- Examples of Great Tables and Pharma TFLs: https://github.com/machow/examples-great-tables-pharma
- UV Python package manager: https://docs.astral.sh/uv


What even is dbt? An Analytics engineer explains | Laurie Merrell & Michael Chow | Data Science Lab
The Data Science Lab is a live weekly call. Register at pos.it/dslab! Discord invites go out each week on lives calls. We’d love to have you!
The Lab is an open, messy space for learning and asking questions. Think of it like pair coding with a friend or two. Learn something new, and share what you know to help others grow.
On this call, Libby Heeren is joined by Jarvis Innovations Lead Analytics Engineer Laurie Merrell and Posit Principal Software Engineer Michael Chow as they walk us through a beginner dbt project and let us ask as many questions as we like (and we do, we ask all the questions, including, WHAT EVEN IS dbt??). This is a super friendly, MESSY, collaborative, and curious peek at dbt. It’s is a tool that’s often mysterious to data scientists and it’s a big enough framework that it can feel tough to get started with. Walking through the basics makes it way easier to get into!
Hosting crew from Posit: Libby Heeren, Isabella Velasquez
Laurie’s LinkedIn: https://www.linkedin.com/in/laurie-merrell/
Michael’s socials and urls: LinkedIn: https://www.linkedin.com/in/michael-a-chow/ Bluesky: https://bsky.app/profile/mchow.com GitHub: https://github.com/machow
Resources from the hosts and chat:
Michael Chow’s talk about dbt at the Coalesce Conference in 2022: https://www.youtube.com/watch?v=EYdb1x1cO9U Beginner dbt project Michael is using: https://github.com/dbt-labs/jaffle_shop_duckdb Laurie’s Coalesce talk with Ian and Jenna: https://www.youtube.com/watch?v=6aX7tAfMmIM Link to installation page for the DuckDB CLI: https://duckdb.org/install/?platform=macos&environment=cli “Why is dbt so important” shared by Jenna in the chat: https://highgrowthengineering.substack.com/p/why-is-dbt-so-important- dbtplyr: https://hub.getdbt.com/emilyriederer/dbtplyr/latest/ Parquet: https://parquet.apache.org/ From stored procedures to dbt: A modern migration playbook: https://www.getdbt.com/blog/stored-procedures-dbt-migration-playbook How to structure our dbt projects: https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview Jenna Jordan’s blog on dbt mesh: https://jennajordan.me/blog/data-mesh-dbt
► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu
Follow Us Here: Website: https://www.posit.co The Lab: https://pos.it/dslab Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co
Thanks for learning with us!
Timestamps 00:00 Introduction 01:09 Guest introductions: Michael Chow and Laurie Merrell 04:15 Overview of today’s session 05:51 Setting up the GitHub Codespace 07:00 The data science workflow vs. organizational needs 10:06 Why dbt is hard to learn in the abstract 13:34 “Could we back up and explain what dbt is again?” 19:12 Running ‘dbt build’ 20:00 Inspecting the database with DuckDB CLI 26:21 “Does dbt have concurrency or dependency capabilities?” 27:37 Understanding the ‘ref’ macro 29:52 “Is dbt an orchestrator?” 31:14 “Starting a project from scratch with just SQL?” 32:04 “How is this better than writing Python scripts?” 35:46 “Is data source detection dynamic with dbt?” 38:36 Generating and serving dbt docs 46:51 “Is dbt an IDE like RStudio, but for SQL?” 52:32 Branching and development environments 53:57 “Where would you begin on a brand new project?” 56:38 “How would you validate dependencies and downstream impacts?” 57:48 Defining a view versus a table

The Curse of Documentation (Michael Chow, Posit) | posit::conf(2025)
The Curse of Documentation
Speaker(s): Michael Chow
Abstract:
In Greek mythology, Tantalus was doomed to stand with a lake of water below him and branches of fruit close above. When he went to drink the water it receded, and when he reached to eat the fruit it was blown beyond his grasp. What he needed was forever at arms length.
Software documentation often puts users in a similar bind. The information is there, but something doesn’t quite connect. Maybe you try and fail to adapt an example to your use case. Maybe it’s unclear how a bunch of functions fit together.
In this talk, I’ll discuss how effective user guides–like R for Data Science and the React.js guide–break the curse. I’ll focus on three factors behind effective guides: strategic information, inductive learning, and task sequencing. posit::conf(2025) Subscribe to posit::conf updates: https://posit.co/about/subscription-management/

Visualizing Gas Prices | PydyTuesday Uncut #2
Join Michael Chow (open source developer at Posit) and Jeroen Janssens (developer relations engineer at Posit) as they dive into this week’s #PydyTuesday dataset. This time, they visualize gas prices using the four P’s: Positron, Python, Polars, and Plotnine.
True to the “PydyTuesday Uncut” title, this video is completely unedited. Every typo, mistake, web search, and “aha!” moment is left in so you can see exactly how we approach a new dataset from scratch.
Things mentioned during the session and related resources:
- Weekly US Gas Prices https://github.com/rfordatascience/tidytuesday/blob/main/data/2025/2025-07-01/readme.md
- Code produced during the session https://github.com/jeroenjanssens/pydytuesday-uncut/
- Plotnine https://plotnine.org
- Positron https://positron.posit.co
#python #polars #tidytuesday #datascience


Exploring Web APIs | PydyTuesday Uncut #1
Join Michael Chow (open source developer at Posit) and Jeroen Janssens (developer relations engineer at Posit) as they dive into this week’s #PydyTuesday dataset about Web APIs. Tools include uv, Positron, Polars, Plotnine, Great Tables, and the Unix command line.
True to the “PydyTuesday Uncut” title, this video is completely unedited. Every typo, mistake, web search, and “aha!” moment is left in so you can see exactly how others approach a new dataset from scratch.
Things mentioned during the session and related resources:
- Code produced during the session: https://github.com/jeroenjanssens/pydytuesday-uncut/blob/main/2025-06-17/01-start.py
- PydyTuesday https://github.com/posit-dev/pydytuesday
- TidyTuesday https://github.com/rfordatascience/tidytuesday
- Getting Data from the TidyTuesday Repo with Python https://www.youtube.com/watch?v=ol2FrSL5gVU
- Positron IDE https://positron.posit.co
- Data Science at the Command Line https://jeroenjanssens.com/dsatcl/
- Python Polars: The Definitive Guide https://polarsguide.com
- Polars https://pola.rs
- Plotnine https://plotnine.org
- Great Tables https://posit-dev.github.io/great-tables/
- The Big Year https://www.imdb.com/title/tt1053810/
00:00 Introduction 02:46 Getting the data with uv 13:18 Positron IDE 17:42 Importing Polars 23:17 Plotting a bar chart with Plotnine 33:55 Inspecting duplicates 46:30 Handling missing values 58:56 Crafting a great table 1:38:48 Reflection


The Test Set: A Posit Podcast Trailer
Introducing The Test Set–a Posit podcast for data science, coming July 1, 2025.
For data science junkies, anomaly hunters, and those who play outside the confidence interval. Hosted by Michael Chow, Wes McKinney & Hadley Wickham
Subscribe to receive updates: https://pos.it/thetestset


Great Tables 3: Data Color and Polishing
This workshop is all about using Great Tables to make beautiful tables for publication and display purposes. We believe that effective tables have these things in common:
structuring that aids in the reading of the table well-formatted values, fitting expectations for the field of study styling that reduces time to insight and improves aesthetics These materials are for you if:
• you have some experience with data analysis in Python • you often create reporting that involves summarizations of data • you were often frustrated with making tables for display purposes outside of Python • you found beautiful-looking tables in the wild and wondered: ‘How could I do that?’
Other videos in this series:
Great Tables 1: Structure, Format, and Style: https://youtu.be/QM7DbsY-nc4 Great Tables 2: Introducing Units Notation: https://youtu.be/SN0_vIL1Rhk
About us:
Michael Chow, Senior Software Engineer, Posit
Michael is a data scientist and software engineer. He has programmed in Python for well over a decade, and he obtained a PhD in cognitive psychology from Princeton University. His interests include statistical methods, skill acquisition, and human memory.
Richard Iannone, Senior Software Engineer, Posit
Richard is a software engineer and table enthusiast. He’s been vigorously working on making display tables easier to create/display in Python. And generally Rich enjoys creating open source packages so that people can great things in their own work.
Workshop repo: https://github.com/rich-iannone/great-tables-mini-workshop?tab=readme-ov-file Learn more at https://posit-dev.github.io/great-tables/articles/intro.html

Great Tables 2: Introducing Units Notation
This workshop is all about using Great Tables to make beautiful tables for publication and display purposes. We believe that effective tables have these things in common:
structuring that aids in the reading of the table well-formatted values, fitting expectations for the field of study styling that reduces time to insight and improves aesthetics These materials are for you if:
• you have some experience with data analysis in Python • you often create reporting that involves summarizations of data • you were often frustrated with making tables for display purposes outside of Python • you found beautiful-looking tables in the wild and wondered: ‘How could I do that?’
Other videos in this series:
Great Tables 1: Structure, Format, and Style: https://youtu.be/QM7DbsY-nc4 Great Tables 3: Data Color and Polishing https://youtu.be/Huteb5OmcrA
About us:
Michael Chow, Senior Software Engineer, Posit
Michael is a data scientist and software engineer. He has programmed in Python for well over a decade, and he obtained a PhD in cognitive psychology from Princeton University. His interests include statistical methods, skill acquisition, and human memory.
Richard Iannone, Senior Software Engineer, Posit
Richard is a software engineer and table enthusiast. He’s been vigorously working on making display tables easier to create/display in Python. And generally Rich enjoys creating open source packages so that people can great things in their own work.
Workshop repo: https://github.com/rich-iannone/great-tables-mini-workshop?tab=readme-ov-file Learn more at https://posit-dev.github.io/great-tables/articles/intro.html

Great Tables 1: Structure, Format, and Style
This workshop is all about using Great Tables to make beautiful tables for publication and display purposes. We believe that effective tables have these things in common:
structuring that aids in the reading of the table well-formatted values, fitting expectations for the field of study styling that reduces time to insight and improves aesthetics These materials are for you if:
• you have some experience with data analysis in Python • you often create reporting that involves summarizations of data • you were often frustrated with making tables for display purposes outside of Python • you found beautiful-looking tables in the wild and wondered: ‘How could I do that?’
Other videos in this series:
Great Tables 2: Introducing Units Notation: https://youtu.be/SN0_vIL1Rhk Great Tables 3: Data Color and Polishing https://youtu.be/Huteb5OmcrA
About us:
Michael Chow, Senior Software Engineer, Posit
Michael is a data scientist and software engineer. He has programmed in Python for well over a decade, and he obtained a PhD in cognitive psychology from Princeton University. His interests include statistical methods, skill acquisition, and human memory.
Richard Iannone, Senior Software Engineer, Posit
Richard is a software engineer and table enthusiast. He’s been vigorously working on making display tables easier to create/display in Python. And generally Rich enjoys creating open source packages so that people can great things in their own work.
Workshop repo: https://github.com/rich-iannone/great-tables-mini-workshop?tab=readme-ov-file Learn more at https://posit-dev.github.io/great-tables/articles/intro.html

Tables in Python with Great Tables
Tables in Python with Great Tables - Rich Iannone, Michael Chow
Resources mentioned in the workshop:
- Workshop GitHub Repository: https://github.com/rich-iannone/great-tables-mini-workshop
- Great Tables https://posit-dev.github.io/great-tables/articles/intro.html
- {reactable-py} https://github.com/machow/reactable-py
- Save a gt table as a file https://gt.rstudio.com/reference/gtsave.html
- {gto} Insert gt tables into Word documents https://gsk-biostatistics.github.io/gto/
- GT.save https://posit-dev.github.io/great-tables/reference/GT.save.html
- define_units https://posit-dev.github.io/great-tables/reference/define_units.html#great_tables.define_units
- Posit Tables Contest 2024 winners: https://posit.co/blog/2024-table-contest-winners/
Editor’s note: During this workshop, several interruptions from an unwanted and disruptive intruder (commonly referred to as a “Zoom bomber”) occurred. We removed those instances from the recording, however that causes a few of the workshop sections to appear disjointed. We apologize for the inconvenience.
Workshop recorded as part of the 2024 R/Pharma Workshop Series


We want GREAT tables! | Richard Iannone & Michael Chow | Data Science Hangout
To join future data science hangouts, add it to your calendar here: https://pos.it/dsh - All are welcome! We’d love to see you!
We were recently joined by Rich Iannone and Michael Chow, software engineers at Posit, to chat about their experiences building GT and Great Tables, how they drive community engagement around their packages, and their career advice for package developers.
GT and Great Tables are R and Python packages for creating static tables in R and Python. They were created to fill a need for a good, maintained solution for generating tables with different output types from data frames. They’ve gained loads of popularity, and have an active community of users!
Resources mentioned in the video and zoom chat: Great Tables Blog → https://posit-dev.github.io/great-tables/blog/ 2024 Table Contest Winners → https://posit.co/blog/2024-table-contest-winners/ Contributing to Public Transit Data Analysis and Tooling → https://posit-dev.github.io/great-tables/blog/open-transit-tools/ Tables as Powerful Representational Tools → https://www.researchgate.net/publication/363345970_Tables_as_Powerful_Representational_Tools The MockUp - 10+ Guidelines for Better Tables in R → https://themockup.blog/posts/2021-01-13-10-guidelines-for-better-tables-in-r/ Show Me the Numbers by Stephen Few → https://analyticspress.com/smtn.php Excel spreadsheets to gt package called {forgts} → https://github.com/luisDVA/forgts What They Forgot to Teach You About R → https://rstats.wtf/ David Robinson Talk called “The unreasonable effectiveness of public work” → https://www.youtube.com/watch?v=th79W4rv67g Publication quality tables in 2024 → https://posit.co/blog/what-we-did-with-publication-quality-tables-in-2024/ Great Tables Design Philosophy → https://posit-dev.github.io/great-tables/blog/design-philosophy/ gtsummary-to-excel → https://www.pipinghotdata.com/posts/2024-07-26-gtsummary-to-excel/ happy git with R → https://happygitwithr.com/ freeCodeCamp - how to contribute to open source → https://github.com/freeCodeCamp/how-to-contribute-to-open-source/
If you didn’t join live, one great discussion you missed from the zoom chat was about how people manage their personal and work GitHub accounts, and whether to have one account for all their work or separate accounts for each employer. Let us know YOUR thoughts on this topic below!
► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu
Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co
Thanks for hanging out with us!
#pythoncontent


So You Think You Can ANALYZE? (Data Content Creator Hackathon)
Watch your favorite data science content creators compete for a piece of history. They will either prove their skills in front of the world or fail in the pursuit of eternal glory. This is so you think you can analyze.
Special thanks to Posit for sponsoring this competition and make this all possible!
Competitors: Team Jack @averysmith Jack Blandin @KeithGalli
Team MMA @AlexTheAnalyst @nerdnourishment @Miki_ML
Team Null Nick Singh Mark Freeman @SeattleDataGuy
Team Shashank @ShashankData
Team Posit-ively Skewed Greg Coquillo @datascienceharp Richad Nieves-Becker Ian Greengross
Interviewer: Elijah Butler - https://www.tiktok.com/@imelijahbutler
Posit Team
- Michael Chow
- Joe Cheng
- Carolos Scheidegger
- Julia Silge Sponsors, Affiliates, and Partners:
- Pathrise - http://pathrise.com/KenJee | Career mentorship for job applicants (Free till you land a job)
- Taro - http://jointaro.com/r/kenj308 (20% discount) | Career mentorship if you already have a job
- 365 Data Science (57% discount) - https://365datascience.pxf.io/P0jbBY | Learn data science today
- Interview Query (10% discount) - https://www.interviewquery.com/?ref=kenjee | Interview prep questions
MORE DATA SCIENCE CONTENT HERE: My Twitter - https://twitter.com/KenJee_DS LinkedIn - https://www.linkedin.com/in/kenjee/ Kaggle - https://www.kaggle.com/kenjee Medium Articles - https://medium.com/@kenneth.b.jee Github - https://github.com/PlayingNumbers My Sports Blog -https://www.playingnumbers.com
Check These Videos Out Next! My Leaderboard Project: https://www.youtube.com/watch?v=myhoWUrSP7o&ab_channel=KenJee 66 Days of Data: https://www.youtube.com/watch?v=qV_AlRwhI3I&ab_channel=KenJee How I Would Learn Data Science in 2021: https://www.youtube.com/watch?v=41Clrh6nv1s&ab_channel=KenJee
My Playlists Data Science Beginners: https://www.youtube.com/playlist?list=PL2zq7klxX5ATMsmyRazei7ZXkP1GHt-vs Project From Scratch: https://www.youtube.com/watch?v=MpF9HENQjDo&list=PL2zq7klxX5ASFejJj80ob9ZAnBHdz5O1t&ab_channel=KenJee Kaggle Projects: https://www.youtube.com/playlist?list=PL2zq7klxX5AQXzNSLtc_LEKFPh2mAvHIO



Great Tables: Make beautiful, publication quality tables in Python | Rich Iannone & Michael Chow
Tables are undeniably useful for data work. We have many great DataFrame libraries available in Python, and they give us flexibility in terms of manipulating data at will, but what happens when presenting tables to others?
It’s nice to display tables. Tables can efficiently carry information, just like plots do, and at times it is the better way of presenting data. Indeed, it is time to bridge the divide between raw DataFrame output and wondrously structured tables suitable for publication.
Now, let us turn our attention to the state of ‘display tables’ in 2024. Let us go over what comprises key components for building effective information displays in tables. It may surprise one how new a well-crafted table can be hewn. We’ll take a look at the combinations of Python packages that fit together to make this important task possible, and marvel together at the tabular results they can provide!
Learn more at https://posit-dev.github.io/great-tables/
Timestamps: 0:00 Intro: Meet Rich and Michael 0:41 What we mean by “publication ready tables” 1:29 Overview of what we’ll talk about in this video 1:44 Table Goals: Ways to make a table beautiful 4:41 Tables made from reproducible code! 5:11 The history of table generation to influence our API 6:15 Our modern take on a table display framework 6:35 The problem with Excel 7:38 Introducing Great Tables! 8:00 Key Ingredients of making a Great Table 8:24 Structure: Title, column spanners and nice column labels 8:52 Format: Compact dollar values and percentages 9:23 Styling: Fill color and bold text 10:09 Imports and Polars Selectors 11:08 Coding the structure 12:27 Coding the format 13:07 Coding the styling 15:03 Putting images and plots in your table cells 15:49 Advanced Design 16:07 .fmt_nanoplot(): Small plots within table cells 19:08 .data_color(): Heat maps in tales 20:51 Powerful and plentiful methods to format cell values 22:48 To sum up: TABLES RULE


Build Captivating Display Tables in Python With Great Tables | Real Python Podcast #214
Do you need help making data tables in Python look interesting and attractive? How can you create beautiful display-ready tables as easily as charts and graphs in Python? This week on the show, we speak with Richard Iannone and Michael Chow from Posit about the Great Tables Python library.
Links from the show: https://realpython.com/podcasts/rpp/214/
Michael and Richard discuss the design philosophy and history behind creating display tables. We dig into the grammar of tables, the background of the project, and an ingenious way to build a collection of examples for a library.
We briefly cover how Richard and Michael started contributing to open source. We also discuss practicing data skills with challenges and resources like Tidy Tuesday.
This episode is sponsored by Mailtrap.
Topics:
- 00:00:00 – Introduction
- 00:02:00 – Michael’s background in open source
- 00:04:07 – Rich’s background in open source
- 00:05:27 – Advice for someone starting out
- 00:08:55 – What do you mean by the term “display” table
- 00:11:32 – What components were missing from other tables?
- 00:13:31 – Using examples to explain features
- 00:16:09 – Why was there an absence of this functionality in Python?
- 00:19:35 – A progressive approach and the grammar of tables
- 00:21:26 – Sponsor: Mailtrap
- 00:22:01 – The design philosophy of great tables
- 00:25:31 – Nanoplots, spark lines, and column spanners
- 00:27:06 – Building a gallery of examples
- 00:28:56 – Heat mapping cells and automatically adjusting text color
- 00:32:54 – Output formats for the tables
- 00:34:46 – Building in accessibility
- 00:36:55 – Dependencies
- 00:37:42 – What is the common workflow?
- 00:41:39 – Video Course Spotlight
- 00:43:15 – Adding graphics
- 00:46:41 – Using a table contest to get examples
- 00:49:47 – quartodoc and documenting the project
- 00:55:00 – Tidy Tuesday and data science community
- 01:00:29 – What are you excited about in the world of Python?
- 01:03:46 – What do you want to learn next?
- 01:08:05 – How can people follow the work you do online?
- 01:09:57 – Thanks and goodbye
Links from the show: https://realpython.com/podcasts/rpp/214/

Siuba and duckdb: Analyzing Everything Everywhere All at Once - posit::conf(2023)
Presented by Michael Chow
Every data analysis in Python starts with a big fork in the road: which DataFrame library should I use?
The DataFrame Decision locks you into different methods, with subtly different behavior::
- different table methods (e.g. polars
.with_columns()vs pandas.assign()) - different column methods (e.g. polars
.map_dict()vs pandas.map())
In this talk, I’ll discuss how siuba (a dplyr port to python) combines with duckdb (a crazy powerful sql engine) to provide a unified, dplyr-like interface for analyzing a wide range of data sources‚ whether pandas and polars DataFrames, parquet files in a cloud bucket, or pins on Posit Connect.
Finally, I’ll discuss recent experiments to more tightly integrate siuba and duckdb.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Databases for data science with duckdb and dbt. Session Code: TALK-1101

The accidental analytics engineer
There’s a good chance you’re an analytics engineer who just sort of landed in an analytics engineering career. Or made a murky transition from data science/data engineering/software engineering to full-time analytics person. When did you realize you fell into the wild world of analytics engineering?
In this session, Michael Chow (RStudio) draws upon his experience building open source data science tools and working with the data science community to discuss the early signs of a budding analytics engineer, and the small steps these folks can take to keep the best parts of Python and R, all while moving towards engineering best practices.
Check the slides here: https://docs.google.com/presentation/d/1H2fVa-I4D8ibanlqLutIrwPOVypIlXVzEITDUNzzPpU/edit?usp=sharing
Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/

Wrangling data for a Shiny app in Python || Michael Chow || Posit
Shiny makes it easy to build interactive web applications with the power of Python’s data and scientific stack.
Learn more about Shiny for Python: https://shiny.rstudio.com/py/ Check out our interactive Shiny for Python examples: https://shinylive.io/py/examples/
Content: Michael Chow (@chowthedog) Producer: Jesse Mostipak (@kierisi) Editing and Motion Design: Tony Pelleriti (@TonyPelleriti)

Hey Shiny Team, what are some of your biggest learnings from 2022? || Shiny Developers || RStudio
BIG THINGS happened on the Shiny team in 2022! Our team built out a new Shiny UI Editor, Shiny for Python, and Shiny for Python in the browser using WebAssembly. So we asked some of our Developers what their biggest learnings have been from building these products!
Learn more about Shiny for Python: https://shiny.rstudio.com/py/
Content: Winston Chang (@winston_chang), Carson Sievert (@cpsievert), Nick Strayer (), Michael Chow (@chowthedog) Producer: Jesse Mostipak (@kierisi) Video editing + motion design: Tony Pelleriti (@TonyPelleriti)





Data Science Hangout | Michael Chow, Posit | Exploring Team Structure w/ Data Scientists & Engineers
We were joined by Michael Chow, Data Scientist and Software Engineer at RStudio. Michael also previously led a team at the California Integrated Travel Project.
On this week’s hangout there were a lot of thoughts shared on structuring a data science team from both Michael and the broader group:
⬢ Jacqueline Nolis also shared thoughts on this on a data science hangout that there were virtues to different ones, but ended up sold on the decentralized model where data scientists are embedded in teams: https://youtu.be/CcPE29bYGVo?t=325
⬢ Michael agreed that data scientists and analysts should be sitting with the teams that they’re pushing out reports for. Otherwise, I would be trying to send people into those teams to figure out their priorities.
⬢ A data scientist should work with a Project Manager or whoever’s leading the team to push up metrics but also help change the roadmap.
⬢ It leaves a tricky question of where data engineers should be and how they should interact with the team. Today data engineers are often doing more tooling empowerment, so it can be okay to have them a bit more centralized and connect to the data scientists to enforce best practices or enable new pieces for them.
⬢ I think a nice model is for data scientists/analysts to live in the teams and data engineers to be like spokes of a wheel where then the data scientists connect with them and work closely to enforce better best practice and enable new important things.
⬢ Tatsu shared that in thinking of the structure, it’s also important to find your translators and to use the power of feedback. Reach out to those people to start to put that feedback into action.
⬢ George shared that insurance companies have come from a really traditional landscape where they have lots of actuaries working on lots of excel spreadsheets and there can be a lack of knowledge sharing and tool sharing. This is where the data science element comes in. To me, within the organization, you need to have this team which is a mini-spoke if you will, because they are central to the actuarial team. If they are too far removed and they’re back with the IT team, you end up with the old problems because they may not get the business concept communicated back. It’s all about getting enough skills, so they can get stuff done, especially proof of concepts. Maybe after that you can take a step back and then start to look at the centralized model again.
⬢ A central team can help converge to what they see as best practice, but if you’re pushing out something new, exploring a new line of work or area it can be important to set the data engineer there to actually do whatever they need to. Make sure that the converging doesn’t stifle creativity or prevent a team from doing the right thing.
⬢ Manny jumped in to share the perspective from data science being with IT as well, data science is a new field for their company (in real estate) and there’s an identity of where does data science fall. The IT team is fantastic and they’re very structured. Data science is so fluid and creative and non structured at the moment, so you kind of have to look at where it actually should fall.
- please note that some of the points above are summarized and not 100% actual quotes.
Resources shared:
⬢ Tatsu shared in the chat, a few projects that Michael is working on: vetiver: https://vetiver.tidymodels.org/articles/vetiver.html , siuba: https://github.com/machow/siuba ⬢ Libby shared a helpful tip on creating a 2 minutes YouTube video with a cover letter, to get the attention of a hiring manager ⬢ Javier shared an example Shiny app used in an interview: https://javierorraca.shinyapps.io/Bloomreach_Shiny_App/ ⬢ Michael mentioned David Robinson’s screencasts: https://www.youtube.com/channel/UCeiiqmVK07qhY-wvg3IZiZQ ⬢ Michael mentioned an article on “What data scientists really do according to 35 data scientists”: https://hbr.org/2018/08/what-data-scientists-really-do-according-to-35-data-scientists ⬢ Rachael shared a blog post link where Jacqueline Nolis talked about team structure as well: https://www.rstudio.com/blog/building-effective-data-science-team-answering-your-questions/#Structure
► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu ► Add the Data Science Hangout to your calendar: rstd.io/datasciencehangout ► View the Data Science Hangout site here: rstudio.com/data-science-hangout
Follow Us Here: Website: https://www.rstudio.com LinkedIn:https://www.linkedin.com/company/rstudio-pbc Twitter: https://twitter.com/rstudio

