pointblank
Data validation toolkit for assessing and monitoring data quality
Pointblank is a Python data validation toolkit that assesses and monitors data quality across multiple backends including Polars, Pandas, DuckDB, PostgreSQL, MySQL, SQLite, Parquet, PySpark, and Snowflake. It provides a chainable API for building validation pipelines that check data against business rules and generate interactive reports showing validation results.
The package emphasizes clear communication of data quality issues through customizable reports that make validation results immediately understandable to both technical and non-technical stakeholders. It includes AI-powered validation drafting that automatically suggests intelligent validation rules based on your data, threshold-based alerts with custom actions, YAML configuration support for version-controlled workflows, a CLI for running validations in CI/CD pipelines, and synthetic data generation for testing. Unlike validation libraries that only catch errors, Pointblank focuses on both finding issues and effectively sharing insights across teams.
Contributors#
Events featuring pointblank#
Resources featuring pointblank#
Regression models still rule in P&C Insurance | Jim Weiss | Data Science Hangout
ADD THE DATA SCIENCE HANGOUT TO YOUR CALENDAR HERE: https://pos.it/dsh - All are welcome! We’d love to see you!
This week’s guest was Jim Weiss, Chief Risk Officer, commercial and executive at Crum and Forster!
Some topics covered in this week’s Hangout were the use of regression modeling (have you ever heard of the Tweedie distribution?) in property and casualty insurance, handling changing model results and communicating them to business stakeholders, using GenAI and “co-opetition” to identify and prevent fraudulent claims, and identifying and managing bias or confounding effects in pricing models.
Resources mentioned in the video and chat: The Once and Future C&F → https://www.cfins.com/the-once-and-future-cf-landing/ Tweedie distribution → https://en.wikipedia.org/wiki/Tweedie_distribution Statistical Rethinking Lectures Playlist → https://www.youtube.com/playlist?list=PLDcUM9US4XdNOlqSyhe38US8mFgmqzI14 Considerations for Managing Potential Bias in Pricing Models → https://eforum.casact.org/article/91188-considerations-for-managing-potential-bias-in-pricing-models Pointblank for data validation (Python) → https://posit-dev.github.io/pointblank/ Pointblank (R) → https://rstudio.github.io/pointblank/
► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu
Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh The Lab: https://pos.it/dslab LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co
Thanks for hanging out with us!
Timestamps 00:00 Introduction 02:16 “Who is Crum and Forster? What do you do? What types of problems does your team help them solve?” 04:14 “What kind of data types would you see in your day to day or your team would see in your day to day? And what is an example of a problem that you feel like you’ve solved lately or that you’ve been working on lately?” 10:14 “What types of regression do you use?” 13:35 “In your insurance modeling career, what is the most unusual or unexpected variable that has contributed to one of your models?” 18:48 “How do you handle it when you see that the results are significantly different across models?” 21:45 “Are you Team Bayesian or Team Frequentist when it comes to your statistics?” 27:13 “My health care organization is interested in identifying fraudulent claims. Currently, they’re looking at Excel spreadsheets, same time, different person. Do you have any advice on a better way to guide them to automation?” 30:51 “What software do you use to do your job?” 35:40 “Do you ever use instrumental variables in your models?” 47:47 “Do you have any career advice for us? Is there something you wish that you had known when you were first entering the industry?” 50:38 “How much do you and your team use AI to help you along?” 52:56 “How does your team span expertise in such varied fields?”
pointblank, was I expecting this? |Hannah Frick
Talk from rainbowR conference 2026: https://conference.rainbowr.org

How to use {pointblank} to understand, validate, and document your data
How to use {pointblank} to understand, validate, and document your data - Rich Iannone
Abstract: This workshop will focus on the data quality and data documentation workflows that the pointblank package makes possible. We will use functions that allow us to: (1) quickly understand a new dataset (2) validate tabular data using rules that are based on our understanding of the data (3) fully document a table by describing its variables and other important details. The pointblank package was created to scale from small validation problems (“Let’s make certain this table fits my expectations before moving on”) to very large (“Let’s validate these 35 database tables every day and ensure data quality is maintained”) and we’ll delve into all sorts of data quality scenarios so you’ll be comfortable using this package in your organization. Data documentation is seemingly and unfortunately less common in organizations (maybe even less than the practice of data validation). We’ll learn all about how this doesn’t have to be a tedious chore. The pointblank package allows you to create informative and beautiful data documentation that will help others understand what’s in all those tables that are so vital to an organization.
Resources mentioned in the workshop:
- Workshop GitHub repository: https://github.com/rich-iannone/pointblank-workshop
- pointblank documentation: https://rstudio.github.io/pointblank/

Integrating Shiny with Epic EHR | Matt Maloney | Data Science Hangout
ADD THE DATA SCIENCE HANGOUT TO YOUR CALENDAR HERE: https://pos.it/dsh - All are welcome! We’d love to see you!
We were recently joined by Matt Maloney, Director of Applied AI and Data Science at City of Hope, to chat about applying data science to cancer care operations, integrating open source data science tools like Shiny with Electronic Health Records (like Epic), and the evolving governance of generative AI in healthcare.
In this Hangout, we explore the technical and operational strategies behind integrating custom data science applications directly into clinical workflows. Matt discusses how his team moves beyond standalone tools by embedding Shiny apps and other solutions into Epic, allowing medical coders and providers to access predictions and summaries without leaving their primary software environment-of-choice. He also mentions the “build vs. buy” decision-making process as vendors release their own AI solutions, emphasizing the importance of validating external models against their specific patient population.
Resources mentioned in the video and zoom chat: City of Hope → https://www.cityofhope.org Unity Health Toronto Customer Story → https://posit.co/about/customer-stories/unity-health-toronto/ pointblank (Data Validation package) → https://rstudio.github.io/pointblank/
If you didn’t join live, one great discussion you missed from the zoom chat was about where data science teams sit within community members’ organizations and whether they like it or not, specifically the pros and cons of being housed within IT versus embedded inside business units. Participants debated access to infrastructure versus proximity to business stakeholders, with several sharing their own experiences of shifting between these departments (or between companies with different structures). Let us know below if you’d like to hear more about this topic!
► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu
Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh The Lab: https://pos.it/dslab LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co
Thanks for hanging out with us!
Timestamps: 00:00 Introduction 03:37 “What does the data science function at City of Hope help with?” 08:52 “Tell us a little bit more about how you’re integrating Posit with Epic” 16:08 “How do you handle the needs of privacy with the push to adopt AI?” 18:40 “How do you manage to stay abreast of technical advancements?” 22:45 “At what point do you hand off your data work to the software engineering team?” 27:23 “How much has development that involves LLMs and generative AI taken hold?” 30:38 “Does your team evaluate a lot of the things that Epic might be throwing your way?” 34:41 “How does Epic pass an encounter number or a patient ID to Posit Connect?” 35:57 “How does your team handle these nuanced pieces of clinical information?” 40:29 “Do the administrators appreciate the time that it takes to do things?” 44:22 “What happens in the academic division?” 46:10 “Do you have a piece of career advice for us?”
Zero Tolerance for Dirty Data: A Pointblank Prescription for Data Hygiene (Arnav Patel, Synovus)
Zero Tolerance for Dirty Data: A Pointblank Prescription for Data Hygiene
Speaker(s): Arnav Patel
Abstract:
Data validation is an exercise that is commonly forgotten. It can be difficult to start the exercise, let alone figure out a holistic approach to cleaning and understanding data. This is a problem that affects us all regardless of experience, expertise, or complexity of work. With Pointblank’s suite of validation functions, you can scan your data, present that scan in a visually appealing and unique report, and then define validation rules at scale in a methodical manner. I want to show why pointblank, above other validation packages, is the best validation approach through its simple yet appealing reporting features and comprehensive validation schema. posit::conf(2025) Subscribe to posit::conf updates: https://posit.co/about/subscription-management/
Making Things Nice in Python (Rich Iannone, Posit) | posit::conf(2025)
Making Things Nice in Python
Speaker(s): Rich Iannone
Abstract:
When working on the Great Tables and Pointblank Python packages, we’ve tried to make them ’nice’. These packages give you a lot of convenient options, and a large volume of docs and examples. In the Python world, this might be received differently than it would be in R. Whether it was integrating Polars selectors in Great Tables or accepting a multitude of DataFrames and DB tables in Pointblank, these design choices can be seen as surprising things to established Python developers.
However, I argue it’s good to be doing this! People are benefitting from these approaches. I’ll share a few of these developer stories with the takeaway being that Python packages could and should pay attention to good user experience. posit::conf(2025) Subscribe to posit::conf updates: https://posit.co/about/subscription-management/

Building Tailored Dashboards: Drill Down Visualizations and LLM-Powered Summaries with Shiny
Led by Isabella Velásquez, Sr Product Marketing Manager at Posit PBC
Scenario: Imagine the weekly scramble at DemoCo. Isabella, a lonely Marketing analyst, spends hours every Friday manually compiling marketing lead performance data from Salesforce, website logs, and event records into a sprawling Excel report. The tedious, redundant process is both a time sink and a bottleneck.
Marketing Leadership, who need to identify top-performing platforms, find the mass amount of information in the report confusing. They constantly return to Isabella for clarification or, even worse, don’t reach back out and make misinformed decisions on critical budget allocation. Determined, Isabella decides it’s finally time to automate and tailor this critical dashboard, and save headaches for both Leadership and herself.
In this demo, you will learn how to:
- Transform redundant, manual, error-prone data compilation into an automated lead performance dashboard with Shiny
- Integrate diverse marketing data sources for a holistic view
- Build tailored marketing KPI dashboards that address Leadership’s specific questions
- Use drill-downs, data visualizations, and LLM-powered summaries to reduce data overload and clarify complex takeaways instantly 5.Provide flexibility in dashboard functionality to meet diverse preferences and needs, from pre-selected insights derived directly from the data to implementing an LLM-powered analytics option
Helpful resources:
- Demo Slides: https://github.com/ivelasq/2025-06-25_marketing-demo/blob/main/README.md
- Demo Marketing Dashboard: https://pub.current.posit.team/public/marketing-demo/
- GitHub Repo (including links to packages used: pointblank, querychat, etc.): https://github.com/ivelasq/2025-06-25_marketing-demo?tab=readme-ov-file
If you have specific follow-up questions about using Posit in your organization, we’d love to chat with you: https://posit.co/schedule-a-call/
dbtplyr: Bringing Column-Name Contracts from R to dbt - posit::conf(2023)
Presented by Emily Riederer
starts_with(language): Translating select helpers to dbt. Translating syntax between languages transports concepts across communities. We see a case study of adapting a column-naming workflow from dplyr to dbt’s data engineering toolkit.
dplyr’s select helpers exemplify how the tidyverse uses opinionated design to push users into the pit of success. The ability to efficiently operate on names incentivizes good naming patterns and creates efficiency in data wrangling and validation.
However, in a polyglot world, users may find they must leave the pit when comparable syntactic sugar is not accessible in other languages like Python and SQL.
In this talk, I will explain how dplyr’s select helpers inspired my approach to ‘column name contracts,’ how good naming systems can help supercharge data management with packages like {dplyr} and {pointblank}, and my experience building the {dbtplyr} to port this functionality to dbt for building complex SQL-based data pipelines.
Materials:
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Databases for data science with duckdb and dbt. Session Code: TALK-1098
Scale Your Data Validation Workflow With {pointblank} and Posit Connect - posit::conf(2023)
Presented by Michael Garcia
For the Data Services team at Medable, our number one priority is to ensure the data we collect and deliver to our clients is of the highest quality. The {pointblank} package, along with Posit Connect, modernizes how we tackle data validation within Data Services.
In this talk, I will briefly summarize how we develop test code with {pointblank}, share with {pins}, execute with {rmarkdown}, and report findings with {blastula}. Finally, I will show how we aggregate data from test results across projects into a holistic view using {shiny}.
Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#
Talk Track: Leave it to the robots: automating your work. Session Code: TALK-1058
Open Source Chat - {gt} with Rich Iannone
Join Rich Iannone, maintainer of the {gt} package, as he takes questions from the community about the latest in {gt} v0.7.0, and building great looking data display tables with R.
Key Resources: ⬡ Get started with {gt} - https://gt.rstudio.com
Reach out: 38:48 - How do I ask Rich about {gt}, feature requests, bug reports, how to solve a problem via {gt}? Rich and the {gt} team would love to hear from you. ⬡ Feature requests & bug reports with GitHub Issues, https://github.com/rstudio/gt/issues ⬡ GitHub Discussions, https://github.com/rstudio/gt/discussions ⬡ Ask the community a question, https://community.rstudio.com/tag/gt ⬡ Follow {gt} on Twitter, feel free to reach out and ask questions, https://twitter.com/gt_package
Timestamps
Rich Iannone Introduction.
03:52 - Why {gt}? - What does {gt} bring to the table? Why so much effort into static, data display tables?
05:50 - Why open source? Why is {gt} open source and why have you dedicated your career to develop open source software?
08:30 - {gt} v0.7.0, Tell us about those new vector formatting functions in {gt}. Why did you include them? Could you show us some examples?
{gt}’s vector formatting functions help you customize the styling, look and feel of your values. Converting the output values R gives you, and making them look exactly the way you want them to can be tricky. A lot of work was put into {gt} to give nice value formatting options. You can now access all these outside of a gt table; e.g. in text, in a plot, etc.
22:35 - Could you provide an example or two with the new styling function called opt_stylize()? What kinds of tables can you make with that? Can you extend that with your own tweaks?
28:15 - Can you make your own themes and share them? “How do I create my own custom theme for my table? A theme I can share with the rest of my organization?”
31:58 - What is the distinction between tab_options and the opt_* functions? Why would a function be in opt_* and not tab_options?
34:00 - sub_values() function, to find and replace certain values in your table.
36:50 - What is the current support for latex in {gt} at the moment? “Personally, I much prefer HTML, but for scientific publications, we are asked to provide a LaTeX file.”
42:50 - “In my work, I often produce A4 output in PDF, mainly with ggplot2 content. It would be nice to be able to combine ggplot + gt tables in a similar way {patchwork} works. Having the plot and the table next to it is very useful sometimes.”
44:30 - Interactive Tables with {gt}?
47:45 - “Any plans to make applying of same style to several columns easier? Unless I’m mistaken, the locations argument of tab_style requires one to specify an individual column. See here: https://gt.rstudio.com/reference/tab_style.html#examples."
Yes, supply a vector of columns or use tidyselect functions.
49:15 - “Excel output with {gt}? Would be a huge improvement. I often have to produce tabular output that can be easily reused. Usually it means Excel tables. So far I have mainly done this with Python and openpyxl or PyWin32 (through COM). A simple solution in R would be great.”
50:20 - Support for additional output formats with {gt}? Excel, PowerPoint, etc.?
50:25 - {pointplank}, a package to methodically validate your data whether in the form of data frames or as database tables., https://rich-iannone.github.io/pointblank/
. Check out the workshop materials at https://github.com/rich-iannone/pointblank-workshop
55:50 - “Are there ways to have grouped rows? I mean when repeated rows have same characters can we merge them to one?”
58:00 - “Is there an ability to add ‘battleship coordinates’ (e.g. column letters & row numbers) to a gt object? This is a standard for table across my org and I’ve been trying to figure out how to implement it.”
59:59 “Do you have suggestions or examples of building out & applying corporate formatting to gt tables (e.g. adding a company logo, company colors, etc.)?”
01:04:30 - “With PDF/LaTeX output for wide tables, it does not shrink the table.”

