pointblank

Data validation toolkit for assessing and monitoring data quality

posit-dev/pointblank

posit-dev.github.io/pointblank

392 stars

25 forks

MIT

Python

Pointblank is a Python data validation toolkit that assesses and monitors data quality across multiple backends including Polars, Pandas, DuckDB, PostgreSQL, MySQL, SQLite, Parquet, PySpark, and Snowflake. It provides a chainable API for building validation pipelines that check data against business rules and generate interactive reports showing validation results.

The package emphasizes clear communication of data quality issues through customizable reports that make validation results immediately understandable to both technical and non-technical stakeholders. It includes AI-powered validation drafting that automatically suggests intelligent validation rules based on your data, threshold-based alerts with custom actions, YAML configuration support for version-controlled workflows, a CLI for running validations in CI/CD pipelines, and synthetic data generation for testing. Unlike validation libraries that only catch errors, Pointblank focuses on both finding issues and effectively sharing insights across teams.

Contributors#

Events featuring pointblank#

Resources featuring pointblank#

Regression models still rule in P&C Insurance | Jim Weiss | Data Science Hangout

ADD THE DATA SCIENCE HANGOUT TO YOUR CALENDAR HERE: https://pos.it/dsh - All are welcome! We’d love to see you!

This week’s guest was Jim Weiss, Chief Risk Officer, commercial and executive at Crum and Forster!

Some topics covered in this week’s Hangout were the use of regression modeling (have you ever heard of the Tweedie distribution?) in property and casualty insurance, handling changing model results and communicating them to business stakeholders, using GenAI and “co-opetition” to identify and prevent fraudulent claims, and identifying and managing bias or confounding effects in pricing models.

Resources mentioned in the video and chat: The Once and Future C&F → https://www.cfins.com/the-once-and-future-cf-landing/ Tweedie distribution → https://en.wikipedia.org/wiki/Tweedie_distribution Statistical Rethinking Lectures Playlist → https://www.youtube.com/playlist?list=PLDcUM9US4XdNOlqSyhe38US8mFgmqzI14 Considerations for Managing Potential Bias in Pricing Models → https://eforum.casact.org/article/91188-considerations-for-managing-potential-bias-in-pricing-models Pointblank for data validation (Python) → https://posit-dev.github.io/pointblank/ Pointblank (R) → https://rstudio.github.io/pointblank/

► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu

Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh The Lab: https://pos.it/dslab LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co

Thanks for hanging out with us!

Timestamps 00:00 Introduction 02:16 “Who is Crum and Forster? What do you do? What types of problems does your team help them solve?” 04:14 “What kind of data types would you see in your day to day or your team would see in your day to day? And what is an example of a problem that you feel like you’ve solved lately or that you’ve been working on lately?” 10:14 “What types of regression do you use?” 13:35 “In your insurance modeling career, what is the most unusual or unexpected variable that has contributed to one of your models?” 18:48 “How do you handle it when you see that the results are significantly different across models?” 21:45 “Are you Team Bayesian or Team Frequentist when it comes to your statistics?” 27:13 “My health care organization is interested in identifying fraudulent claims. Currently, they’re looking at Excel spreadsheets, same time, different person. Do you have any advice on a better way to guide them to automation?” 30:51 “What software do you use to do your job?” 35:40 “Do you ever use instrumental variables in your models?” 47:47 “Do you have any career advice for us? Is there something you wish that you had known when you were first entering the industry?” 50:38 “How much do you and your team use AI to help you along?” 52:56 “How does your team span expertise in such varied fields?”

Integrating Shiny with Epic EHR | Matt Maloney | Data Science Hangout

ADD THE DATA SCIENCE HANGOUT TO YOUR CALENDAR HERE: https://pos.it/dsh - All are welcome! We’d love to see you!

We were recently joined by Matt Maloney, Director of Applied AI and Data Science at City of Hope, to chat about applying data science to cancer care operations, integrating open source data science tools like Shiny with Electronic Health Records (like Epic), and the evolving governance of generative AI in healthcare.

In this Hangout, we explore the technical and operational strategies behind integrating custom data science applications directly into clinical workflows. Matt discusses how his team moves beyond standalone tools by embedding Shiny apps and other solutions into Epic, allowing medical coders and providers to access predictions and summaries without leaving their primary software environment-of-choice. He also mentions the “build vs. buy” decision-making process as vendors release their own AI solutions, emphasizing the importance of validating external models against their specific patient population.

Resources mentioned in the video and zoom chat: City of Hope → https://www.cityofhope.org Unity Health Toronto Customer Story → https://posit.co/about/customer-stories/unity-health-toronto/ pointblank (Data Validation package) → https://rstudio.github.io/pointblank/

If you didn’t join live, one great discussion you missed from the zoom chat was about where data science teams sit within community members’ organizations and whether they like it or not, specifically the pros and cons of being housed within IT versus embedded inside business units. Participants debated access to infrastructure versus proximity to business stakeholders, with several sharing their own experiences of shifting between these departments (or between companies with different structures). Let us know below if you’d like to hear more about this topic!

► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu

Thanks for hanging out with us!

Timestamps: 00:00 Introduction 03:37 “What does the data science function at City of Hope help with?” 08:52 “Tell us a little bit more about how you’re integrating Posit with Epic” 16:08 “How do you handle the needs of privacy with the push to adopt AI?” 18:40 “How do you manage to stay abreast of technical advancements?” 22:45 “At what point do you hand off your data work to the software engineering team?” 27:23 “How much has development that involves LLMs and generative AI taken hold?” 30:38 “Does your team evaluate a lot of the things that Epic might be throwing your way?” 34:41 “How does Epic pass an encounter number or a patient ID to Posit Connect?” 35:57 “How does your team handle these nuanced pieces of clinical information?” 40:29 “Do the administrators appreciate the time that it takes to do things?” 44:22 “What happens in the academic division?” 46:10 “Do you have a piece of career advice for us?”

How to use pointblank to understand, validate, and document your data

R/Medicine Webinar

This workshop focused on the data quality and data documentation workflows that the pointblank package makes possible. The speaker Richard Iannone used functions that allowed us to:

quickly understand a new dataset
validate tabular data using rules that are based on our understanding of the data
fully document a table by describing its variables and other important details

The pointblank package was created to scale from small validation problems (“Let’s make certain this table fits my expectations before moving on”) to very large (“Let’s validate these 35 database tables every day and ensure data quality is maintained”) and Ionnone delved into all sorts of data quality scenarios so the viewer will be comfortable using this package in their organization.

Data documentation is seemingly and unfortunately less common in organizations (maybe even less than the practice of data validation). We’ll learn all about how this doesn’t have to be a tedious chore. The pointblank package allows you to create informative and beautiful data documentation that will help others understand what’s in all those tables that are so vital to an organization.

Speaker

Richard Iannone, Software Engineer, Posit, PBC

Rich is a software engineer at Posit that enjoys creating useful R and Python packages. He trained and worked as an atmospheric scientist and discovered working with R to be a breath of fresh air compared to the Excel-based analysis workflows common in that field. Since joining Posit he has been focused on developing packages that help organizations with data management and data visualization/publishing. When not working on R and Python packages, Rich also enjoys other things like playing and listening to music, watching movies, and getting outside

Open Source Chat - {gt} with Rich Iannone

Join Rich Iannone, maintainer of the {gt} package, as he takes questions from the community about the latest in {gt} v0.7.0, and building great looking data display tables with R.

Key Resources: ⬡ Get started with {gt} - https://gt.rstudio.com

Reach out: 38:48 - How do I ask Rich about {gt}, feature requests, bug reports, how to solve a problem via {gt}? Rich and the {gt} team would love to hear from you. ⬡ Feature requests & bug reports with GitHub Issues, https://github.com/rstudio/gt/issues ⬡ GitHub Discussions, https://github.com/rstudio/gt/discussions ⬡ Ask the community a question, https://community.rstudio.com/tag/gt ⬡ Follow {gt} on Twitter, feel free to reach out and ask questions, https://twitter.com/gt_package

Timestamps Rich Iannone Introduction. 03:52 - Why {gt}? - What does {gt} bring to the table? Why so much effort into static, data display tables? 05:50 - Why open source? Why is {gt} open source and why have you dedicated your career to develop open source software? 08:30 - {gt} v0.7.0, Tell us about those new vector formatting functions in {gt}. Why did you include them? Could you show us some examples? {gt}’s vector formatting functions help you customize the styling, look and feel of your values. Converting the output values R gives you, and making them look exactly the way you want them to can be tricky. A lot of work was put into {gt} to give nice value formatting options. You can now access all these outside of a gt table; e.g. in text, in a plot, etc. 22:35 - Could you provide an example or two with the new styling function called opt_stylize()? What kinds of tables can you make with that? Can you extend that with your own tweaks? 28:15 - Can you make your own themes and share them? “How do I create my own custom theme for my table? A theme I can share with the rest of my organization?” 31:58 - What is the distinction between tab_options and the opt_* functions? Why would a function be in opt_* and not tab_options? 34:00 - sub_values() function, to find and replace certain values in your table. 36:50 - What is the current support for latex in {gt} at the moment? “Personally, I much prefer HTML, but for scientific publications, we are asked to provide a LaTeX file.” 42:50 - “In my work, I often produce A4 output in PDF, mainly with ggplot2 content. It would be nice to be able to combine ggplot + gt tables in a similar way {patchwork} works. Having the plot and the table next to it is very useful sometimes.” 44:30 - Interactive Tables with {gt}? 47:45 - “Any plans to make applying of same style to several columns easier? Unless I’m mistaken, the locations argument of tab_style requires one to specify an individual column. See here: https://gt.rstudio.com/reference/tab_style.html#examples." Yes, supply a vector of columns or use tidyselect functions. 49:15 - “Excel output with {gt}? Would be a huge improvement. I often have to produce tabular output that can be easily reused. Usually it means Excel tables. So far I have mainly done this with Python and openpyxl or PyWin32 (through COM). A simple solution in R would be great.” 50:20 - Support for additional output formats with {gt}? Excel, PowerPoint, etc.? 50:25 - {pointplank}, a package to methodically validate your data whether in the form of data frames or as database tables., https://rich-iannone.github.io/pointblank/ . Check out the workshop materials at https://github.com/rich-iannone/pointblank-workshop 55:50 - “Are there ways to have grouped rows? I mean when repeated rows have same characters can we merge them to one?” 58:00 - “Is there an ability to add ‘battleship coordinates’ (e.g. column letters & row numbers) to a gt object? This is a standard for table across my org and I’ve been trying to figure out how to implement it.” 59:59 “Do you have suggestions or examples of building out & applying corporate formatting to gt tables (e.g. adding a company logo, company colors, etc.)?” 01:04:30 - “With PDF/LaTeX output for wide tables, it does not shrink the table.”

Rich Iannone

ggplot2 gt pointblank rstudio tidyselect Rstudio Data Science Machine Learning Python Stats Tidyverse Data Visualization Data Viz Ggplot Technology Coding Connect Server Pro Shiny RMarkdown Package Manager CRAN Interoperability Serious Data Science Dplyr Forcats Ggplot2 Tibble Readr Stringr Tidyr Purrr Github Data Wrangling Tidy Data Odbc Rayshader Plumber Blogdown Gt Lazy Evaluation Tidymodels Statistics Debugging Programming Education Rstats Open Source OSS Reticulate

Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#

Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#

pointblank

Contributors#

Rich Iannone

Michael Chow

Events featuring pointblank#

R/Medicine 2025

R/Pharma 2025

rainbowR 2026

Resources featuring pointblank#

Regression models still rule in P&C Insurance | Jim Weiss | Data Science Hangout

pointblank, was I expecting this? |Hannah Frick

How to use {pointblank} to understand, validate, and document your data

Integrating Shiny with Epic EHR | Matt Maloney | Data Science Hangout

How to use pointblank to understand, validate, and document your data

Zero Tolerance for Dirty Data: A Pointblank Prescription for Data Hygiene (Arnav Patel, Synovus)

Making Things Nice in Python (Rich Iannone, Posit) | posit::conf(2025)

Building Tailored Dashboards: Drill Down Visualizations and LLM-Powered Summaries with Shiny

dbtplyr: Bringing Column-Name Contracts from R to dbt - posit::conf(2023)

Scale Your Data Validation Workflow With {pointblank} and Posit Connect - posit::conf(2023)

How to Use {pointblank} to Understand, Validate, and Document your Data

Open Source Chat - {gt} with Rich Iannone

Posts about pointblank#

Data Validation Libraries for Polars (2025 Edition)

C’mon C’mon: Let’s Do a Pointblank Workshop!

Overhauling Pointblank’s User Guide

Level Up Your Data Validation with `Actions` and `FinalActions`

Introducing Pointblank

pointblank

Contributors#

Rich Iannone

Michael Chow

Events featuring pointblank#

R/Medicine 2025

R/Pharma 2025

rainbowR 2026

Resources featuring pointblank#

Regression models still rule in P&C Insurance | Jim Weiss | Data Science Hangout

pointblank, was I expecting this? |Hannah Frick

How to use {pointblank} to understand, validate, and document your data

Integrating Shiny with Epic EHR | Matt Maloney | Data Science Hangout

How to use pointblank to understand, validate, and document your data

Zero Tolerance for Dirty Data: A Pointblank Prescription for Data Hygiene (Arnav Patel, Synovus)

Making Things Nice in Python (Rich Iannone, Posit) | posit::conf(2025)

Building Tailored Dashboards: Drill Down Visualizations and LLM-Powered Summaries with Shiny

dbtplyr: Bringing Column-Name Contracts from R to dbt - posit::conf(2023)

Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#

Scale Your Data Validation Workflow With {pointblank} and Posit Connect - posit::conf(2023)

Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference.#

How to Use {pointblank} to Understand, Validate, and Document your Data

Open Source Chat - {gt} with Rich Iannone

Posts about pointblank#

Data Validation Libraries for Polars (2025 Edition)

C’mon C’mon: Let’s Do a Pointblank Workshop!

Overhauling Pointblank’s User Guide

Level Up Your Data Validation with Actions and FinalActions

Introducing Pointblank

Level Up Your Data Validation with `Actions` and `FinalActions`